Prometheus Configuration: From Localhost to Production
The Configuration Mistake That Cost Me Hours
The prometheus.yml File Structure
# Global configuration
global:
scrape_interval: 15s # How often to scrape targets
evaluation_interval: 15s # How often to evaluate rules
scrape_timeout: 10s # Timeout for scrape requests
external_labels:
cluster: 'production' # Labels added to all metrics
environment: 'prod'
# Alerting configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
# Rule files
rule_files:
- 'alerts/*.yml'
- 'recording_rules/*.yml'
# Scrape configurations
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'my-api'
static_configs:
- targets:
- 'api-1:3000'
- 'api-2:3000'Global Configuration
Static Configuration: Simple Setup
Docker Compose Configuration
Kubernetes Service Discovery
Relabeling: The Power Tool
Common Relabeling Patterns
My Production Kubernetes Configuration
Scrape Interval Tuning
Configuration Best Practices from Experience
1. Use Meaningful Job Names
2. Set Scrape Timeouts Properly
3. Use External Labels for Federated Setup
4. Organize Rule Files
5. Enable Configuration Reload
Monitoring Prometheus Itself
Validating Configuration
Configuration File Management
Common Configuration Issues I've Fixed
Issue 1: Scrape Timeout Too Long
Issue 2: Wrong Target Address in Docker
Issue 3: Forgetting to Reload
Key Takeaways
PreviousPromQL Basics: The Query Language That Changed How I DebugNextAlerting with Prometheus: Getting Woken Up Only When It Matters
Last updated