The Stack
ToDo:
* make a diagram on how it all works
* Grafana + CloudWatch
The main requirement when we decided on what solution to implement was to have a
single pane of glass as much as possible.
Kube Prometheus Stack
Includes:
- Prometheus Operator
- Prometheus
- Alertmanager
- Node Exporter
- Kube State Metrics
- Grafana (for dashboards)
Grafana Loki
It is a fairly easy solution to store and index logs (though not so easy to configure) while keeping a low profile and small operational costs. This is achieved by using Object Storage (S3) by default.
Grafana Tempo
Just like Loki but for span/traces. This is directly integrated with our OpenTelemetry Java Agent.
Grafana Agent Operator
We manly use it in order to have a simple and descriptive solution on what logs to send to Loki and how to process them. Most solutions like Filebeat/Promtail, require you to gather all the logs in the Kubernetes cluster and then have a processing pipeline depending on each application.
With this we can embbed the need to gather the logs as well as how to process them right in the app itself (helm-chart).