Prometheus Best Practices: Lessons from Production
The Cardinality Explosion That Took Down Prometheus
requestCounter.inc({ user_id: req.user.id });The Cardinal Rule: Control Label Cardinality
What Is Cardinality?
http_requests_total{method="GET", endpoint="/api/users", status_code="200"}High-Cardinality Labels to AVOID
Low-Cardinality Labels to USE
How to Handle High-Cardinality Data
Metric Naming Conventions
The Standard Format
Rules I Always Follow
My Naming Patterns
Retention and Storage Management
Setting Appropriate Retention
Storage Sizing
Monitoring Prometheus Itself
Scraping Best Practices
Scrape Intervals
Scrape Timeout
Keep /metrics Fast
Scrape Only What You Need
Recording Rules: When and How
When to Use Recording Rules
My Production Recording Rules
Naming Convention for Recording Rules
Label Best Practices
Keep Label Sets Consistent
Use Meaningful Label Values
Don't Include Label Names in Values
Performance Optimization
Limit Label Cardinality Per Metric
Use rate() Correctly
rate() CorrectlyChoose Appropriate Time Ranges
Aggregate Before Histogram Quantiles
Security Best Practices
Don't Expose Sensitive Data
Protect the /metrics Endpoint
Common Mistakes and Fixes
Mistake 1: Using Gauges for Counters
Mistake 2: Not Using Histogram Buckets Wisely
Mistake 3: Instrumenting Everything
Mistake 4: Forgetting for in Alerts
for in AlertsMy Production Checklist
Key Takeaways
Last updated