42-transcendence

Caddy Metrics for Prometheus

Introduction

Caddy, our reverse proxy server, exposes metrics in Prometheus format, allowing detailed monitoring of HTTP requests, response times, and server performance. This integration is essential for maintaining high-quality service and promptly detecting issues.

Configuration

Enabling Metrics in Caddyfile

Caddy natively supports Prometheus metrics with minimal configuration. We’ve enabled metrics in our Caddyfile using the following directive:

metrics /metrics

This simple line exposes all Caddy metrics at the /metrics endpoint, which can be accessed at http://localhost:8080/metrics or internally at http://caddy:80/metrics.

Full Caddyfile Context

:80 {
    # Other directives omitted for brevity
    
    # Various handlers for api, static files, websockets, etc.
    
    # Expose Prometheus metrics
    metrics /metrics
}

Prometheus Configuration

In our prometheus.yml, we’ve configured Prometheus to scrape metrics from Caddy:

scrape_configs:
  - job_name: 'caddy'
    static_configs:
      - targets: ['caddy:80'] 

This configuration tells Prometheus to:

Name the job “caddy” for easy identification
Collect metrics from the Caddy service at its internal network address (port 80)
Use the default /metrics path (implicitly defined)

Available Metrics

Caddy exposes the following key metrics categories:

Metric Category	Description	Example
`caddy_http_requests_total`	Total number of HTTP requests by status code	`caddy_http_requests_total{code="200"}`
`caddy_http_request_duration_seconds`	Request duration histogram	`caddy_http_request_duration_seconds_sum`
`caddy_http_response_size_bytes`	Response size in bytes	`caddy_http_response_size_bytes_sum`
`caddy_http_request_size_bytes`	Request size in bytes	`caddy_http_request_size_bytes_sum`
`caddy_http_errors_total`	Total number of HTTP errors	`caddy_http_errors_total{zone="example.com"}`

Logging and Metrics Correlation

Our Caddy configuration includes structured JSON logging, which complements metrics collection:

log {
    output file /var/log/caddy/access.log {
        roll_size 100MiB
        roll_keep 5
        roll_keep_for 100d
    }
    output stdout
    format json
    level INFO
}

These logs are collected by Logstash (via Docker’s GELF driver), allowing correlation between metrics anomalies and specific log events.

Creating Grafana Dashboards

Using these metrics, we’ve created Grafana dashboards for monitoring:

Request volume by endpoint
HTTP status code distribution
Response time percentiles
Error rates
Resource consumption

Troubleshooting

If Caddy metrics aren’t appearing in Prometheus:

Verify the metrics endpoint is working:
```
curl http://localhost:8080/metrics
```
Check Prometheus targets:
```
http://localhost:9090/targets
```

Ensure the Caddy container is properly tagged for logging:

logging:
  driver: gelf
  options:
    gelf-address: "udp://${LOG_HOST}:12201"
    tag: "caddy"

Additional Customization

To add custom labels to metrics or enable additional metrics modules, modify the metrics directive:

metrics {
    enable_openmetrics_path
    use_caddy_labels
}

This feature is particularly useful for categorizing metrics by specific routes or services.