Prometheus Configuration: From Localhost to Production

The Configuration Mistake That Cost Me Hours

When I first deployed Prometheus to production, I used the same configuration I had in development: static targets hardcoded in prometheus.yml. Everything worked fineβ€”until we scaled to 5 API instances.

Suddenly, I had to manually update the config file every time we deployed or scaled. New instance? Edit the file, reload Prometheus. Instance goes down? Edit the file again. It was unsustainable.

That's when I discovered service discovery. One configuration change, and Prometheus automatically found all instances in Kubernetes. No more manual updates. No more stale targets.

This article will save you from making the same mistake.

The prometheus.yml File Structure

The prometheus.yml file is the heart of Prometheus configuration. Here's the basic structure:

# Global configuration
global:
  scrape_interval: 15s          # How often to scrape targets
  evaluation_interval: 15s      # How often to evaluate rules
  scrape_timeout: 10s           # Timeout for scrape requests
  
  external_labels:
    cluster: 'production'       # Labels added to all metrics
    environment: 'prod'

# Alerting configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - alertmanager:9093

# Rule files
rule_files:
  - 'alerts/*.yml'
  - 'recording_rules/*.yml'

# Scrape configurations
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  
  - job_name: 'my-api'
    static_configs:
      - targets:
          - 'api-1:3000'
          - 'api-2:3000'

Let me break down each section with real examples from my setups.

Global Configuration

Why 15 seconds?

  • Balance between granularity and overhead

  • Works well for most applications

  • Short enough to catch issues quickly

  • Long enough to not overwhelm targets

When to adjust:

  • Increase to 30s-60s: Low-priority metrics, cost reduction

  • Decrease to 5s-10s: Critical services, need fine granularity

Static Configuration: Simple Setup

Perfect for development and small deployments.

Docker Compose Configuration

My typical development setup:

Important: In Docker Compose, use service names (e.g., api:3000), not localhost.

Kubernetes Service Discovery

This changed everything for me. No more manual configuration!

Kubernetes Deployment with Annotations:

Now when you deploy pods, Prometheus automatically discovers and scrapes them!

Relabeling: The Power Tool

Relabeling modifies labels before storing metrics. It's incredibly powerful.

Common Relabeling Patterns

1. Rename Labels:

2. Drop Unnecessary Labels:

3. Keep Only Specific Targets:

4. Drop Specific Targets:

5. Add Custom Labels:

My Production Kubernetes Configuration

This is what I actually use in production:

Scrape Interval Tuning

Different jobs need different intervals:

Configuration Best Practices from Experience

1. Use Meaningful Job Names

❌ Bad:

βœ… Good:

2. Set Scrape Timeouts Properly

3. Use External Labels for Federated Setup

These labels are crucial when federating multiple Prometheus servers.

4. Organize Rule Files

5. Enable Configuration Reload

Start Prometheus with --web.enable-lifecycle:

Then reload without restart:

Monitoring Prometheus Itself

Always monitor your monitoring system:

Key Prometheus metrics to watch:

  • prometheus_tsdb_head_series - Number of time series

  • prometheus_target_scrapes_total - Total scrapes

  • prometheus_target_scrape_duration_seconds - Scrape duration

  • prometheus_rule_evaluation_duration_seconds - Rule evaluation time

Validating Configuration

Before deploying, validate your config:

Configuration File Management

How I organize configurations:

Common Configuration Issues I've Fixed

Issue 1: Scrape Timeout Too Long

Issue 2: Wrong Target Address in Docker

Issue 3: Forgetting to Reload

After changing config:

Key Takeaways

  1. Start simple - Static configs for development, service discovery for production

  2. Use Kubernetes annotations - Let Prometheus auto-discover targets

  3. Tune scrape intervals - Different intervals for different priorities

  4. Leverage relabeling - Clean up and organize labels

  5. Validate before deploying - Use promtool check config

  6. Enable lifecycle API - Reload without restart

  7. Monitor Prometheus - Your monitoring needs monitoring too

The configuration file seems complex at first, but once you understand the patterns, it becomes your most powerful tool for controlling exactly what and how Prometheus monitors.

In the next article, we'll set up alertingβ€”turning these metrics into actionable notifications.


Previous: PromQL Basics Next: Alerting with Prometheus

Last updated