42-transcendence

Prometheus Documentation

Introduction

Prometheus is an open-source systems monitoring and alerting toolkit. It collects and stores time-series data as metrics, which makes it ideal for monitoring container environments. In our project, Prometheus serves as the backbone for monitoring service health, performance metrics, and resource utilization.

Docker Compose Configuration

Our Prometheus instance is configured in Docker Compose as follows:

prometheus:
  <<: *common
  build: 
    context: ./src/grafana
    dockerfile: Dockerfile.prometheus
  profiles: ["grafanaprofile"]
  container_name: prometheus
  ports:
    - "9090:9090"
  volumes:
    - prometheus_data:/prometheus
  logging:
    driver: gelf
    options:
      gelf-address: "udp://${LOG_HOST}:12201"
      tag: "prometheus"
  healthcheck:
    test: ["CMD-SHELL", "curl -f http://localhost:9090/-/healthy || exit 1"]
    interval: 30s
    timeout: 10s
    retries: 5

Key Configuration Aspects

Prometheus Configuration

Our Prometheus is configured to scrape metrics from various services in our stack:

global:
  scrape_interval:     15s
  evaluation_interval: 15s
  external_labels:
      monitor: 'transcendence'

# This ensures that Prometheus sends alerts to Alertmanager.
alerting:
  alertmanagers:
  - static_configs:
    - targets: ['alertmanager:9093'] 

rule_files:
  - /etc/prometheus/rules.yaml
  - /etc/prometheus/alerts.yaml
  - /etc/node_exporter_recording_rules.yml

# Prometheus scrapes metrics from the services.
scrape_configs:
  - job_name: 'caddy'
    static_configs:
      - targets: ['caddy:80'] 

  - job_name: 'backend'
    static_configs:
      - targets: ['backend:8000']

  - job_name: "node"
    static_configs:
      - targets: ["node-exporter:9100"]

  - job_name: 'postgresql'
    static_configs:
      - targets: ['postgres-exporter:9187']
  
  - job_name: 'alertmanager'
    static_configs:
      - targets: ['alertmanager:9093']
    
  - job_name: 'prometheus'
    scrape_interval: 5s
    scrape_timeout: 5s
    static_configs:
      - targets: ['prometheus:9090']

Configuration Breakdown

Global Settings

Alerting

Rule Files

Scrape Configurations

Prometheus collects metrics from the following services:

  1. Caddy: Web server metrics (port 80)
  2. Backend: Django application metrics (port 8000)
  3. Node Exporter: Host machine metrics (CPU, memory, disk, network) on port 9100
  4. PostgreSQL: Database metrics via postgres-exporter on port 9187
  5. Alertmanager: Alert handling metrics (port 9093)
  6. Prometheus: Self-monitoring metrics (port 9090)

Connected Services

1. Caddy (Reverse Proxy)

More about this integration see caddy_metrics.md

2. Backend (Django Application)

See also django_metrics.md for more.

3. Node Exporter

See node_exporter.md

4. PostgreSQL (via postgres-exporter)

See postgres logs.md

5. Alertmanager

For more about alertmanager.md.

6. Prometheus (Self-monitoring)

Data Persistence

Prometheus data is persisted in a Docker volume (prometheus_data) to ensure metrics history is preserved across container restarts or rebuilds. This allows for:

The storage location is configured via the --storage.tsdb.path=/prometheus parameter in the Dockerfile CMD.

Accessing Prometheus

The Prometheus web interface is accessible at:

http://localhost:9090

Key pages:

Node Exporter Integration

The Node Exporter is a key component that provides system-level metrics. To utilize these metrics in Grafana:

  1. Prometheus scrapes metrics from the Node Exporter on port 9100
  2. In Grafana, you can import the pre-made Node Exporter dashboard (ID: 13978)
  3. This dashboard provides visualizations for CPU, memory, disk, and network metrics

Integration with Grafana

Grafana uses Prometheus as a data source to create dashboards for:

Verifying Prometheus Setup

To verify that Prometheus is correctly scraping metrics from all targets:

  1. Access the Prometheus web UI at http://localhost:9090
  2. Navigate to Status > Targets
  3. Check that all targets show “UP” status
  4. If any target shows “DOWN”, check the target’s metrics endpoint directly

For example, to check if Caddy is exposing metrics correctly:

docker run --rm --network transcendence_network curlimages/curl curl http://caddy:80/metrics

Prometheus Self-Monitoring Dashboard

To monitor Prometheus itself, we’ve integrated a specialized Grafana dashboard that tracks the health and performance of our monitoring system. This “meta-monitoring” approach ensures we can quickly detect and resolve issues with our observability infrastructure.

Dashboard Setup

  1. In Grafana, go to DashboardsImport
  2. Enter dashboard ID 3662 for the “Prometheus 2.0 Stats” dashboard
  3. Select your Prometheus data source
  4. Click Import

Key Metrics Monitored

The Prometheus self-monitoring dashboard provides visibility into:

  1. TSDB Performance
    • Head series, chunks, and sample counts
    • Storage compaction metrics
    • WAL operations and durations
  2. Resource Usage
    • Memory consumption (heap, stack, go memory)
    • Goroutine counts
    • CPU usage
  3. Scrape Performance
    • Scrape duration by target
    • Failed scrapes
    • Sample ingestion rate
  4. Rule Evaluation
    • Rule evaluation durations
    • Failed rule evaluations

This dashboard is especially valuable when troubleshooting performance issues or planning capacity for your monitoring infrastructure.

Alerting on Prometheus Health

Consider setting up alerts for:

https://prometheus.io/docs/prometheus/latest/getting_started/

https://github.com/prometheus/prometheus/tree/main?tab=readme-ov-file

https://medium.com/@tommyraspati/monitoring-your-django-project-with-prometheus-and-grafana-b06a5ca78744

id13978 node exporter quickstart and dashboard
https://grafana.com/grafana/dashboards/13978-node-exporter-quickstart-and-dashboard/

id 17658 dashboard
https://grafana.com/grafana/dashboards/17658-django/