15. Monitoring, Logging & ObservabilityΒΆ
15.1 Observability StackΒΆ
graph TD
subgraph COLLECT["π‘ Data Collection"]
direction LR
M["π Metrics\nPrometheus + Thanos"] ~~~ L["π Logs\nLoki + Fluent Bit"] ~~~ T["π Traces\nJaeger / Tempo"]
end
COLLECT --> VIZ["π Visualization β Grafana (unified dashboards)"]
VIZ --> ALERT["π¨ Alerting β Alertmanager β Telegram / Email / PagerDuty"] Note: Elasticsearch is retained solely for Wazuh SIEM (security event monitoring) and Meilisearch for app-level search. All application and infrastructure logging uses Loki.
15.2 Tool BreakdownΒΆ
| Tool | Purpose |
|---|---|
| Prometheus | Metrics collection (CPU, memory, request rates, SLAs) |
| Grafana | Unified dashboards for metrics, logs, traces |
| Loki | Log aggregation for all application and infrastructure logs |
| Fluent Bit | Log shipping from containers and servers |
| Jaeger / Tempo | Distributed tracing across microservices |
| Thanos | Long-term Prometheus storage, multi-cluster |
| Uptime Kuma | HTTP/TCP/DNS uptime monitoring |
| Alertmanager | Alert routing to Telegram groups, email, SMS |