Part 7: Real-World CloudWatch Query Patterns

Production-Proven Query Patterns

After years of operating AWS infrastructure, I've developed a library of query patterns that solve real problems. This final part shares the production patterns I rely on daily.

Golden Signals Monitoring

The four golden signals (from Google's SRE book) form the foundation of my monitoring strategy.

1. Latency - Response Time Tracking

Pattern: P50/P95/P99 Latency

fields @timestamp, duration
| filter @timestamp > ago(1h)
| filter ispresent(duration)
| filter duration > 0
| stats 
    count() as requests,
    pct(duration, 50) as p50_ms,
    pct(duration, 95) as p95_ms,
    pct(duration, 99) as p99_ms
    by bin(5m) as interval
| sort interval desc

Pattern: Latency by Endpoint

Pattern: Slow Request Deep Dive

2. Traffic - Request Rate Monitoring

Pattern: Requests Per Second

Pattern: Traffic by Endpoint

Pattern: Concurrent Users

3. Errors - Error Rate and Types

Pattern: Error Rate Percentage

Pattern: Top Error Messages

Pattern: First Occurrence of New Errors

4. Saturation - Resource Utilization

Pattern: Lambda Memory Usage

Pattern: Database Connection Pool

SLO Tracking Patterns

Service Level Objectives are critical for production reliability.

Availability SLO (99.9%)

Latency SLO (95% < 200ms)

Error Budget Calculation

Incident Response Patterns

Queries I run during production incidents.

Pattern: Spike Detection

Pattern: Correlation Analysis

Pattern: Upstream Dependency Failures

Pattern: Cascading Failures

Pattern: Recent Deployments

Performance Troubleshooting Patterns

Pattern: N+1 Query Detection

Pattern: Memory Leak Detection

Pattern: Cache Efficiency

Pattern: Bottleneck Identification

Security Monitoring Patterns

Pattern: Failed Authentication Attempts

Pattern: Privilege Escalation Attempts

Pattern: Suspicious Data Access

Pattern: Geographic Anomalies

Pattern: Unusual API Usage

Cost Optimization Patterns

Pattern: Lambda Cost Analysis

Pattern: API Gateway Bandwidth Usage

Pattern: Idle Resources

Pattern: Most Expensive Endpoints

Business Metrics Patterns

Pattern: User Journey Tracking

Pattern: Conversion Funnel

Pattern: Feature Usage

Pattern: Revenue Tracking

Anomaly Detection Patterns

Pattern: Statistical Anomaly Detection

Pattern: Sudden Traffic Changes

Pattern: New Error Types

Capacity Planning Patterns

Pattern: Peak Traffic Analysis

Pattern: Resource Scaling Triggers

Debugging Patterns

Pattern: Request Tracing

Pattern: User Session Debugging

Pattern: Error Context

Compliance and Audit Patterns

Pattern: Data Access Audit

Pattern: PII Access Tracking

Pattern: Change Tracking

Key Takeaways

  • Golden signals (latency, traffic, errors, saturation) form monitoring foundation

  • Track SLOs and error budgets systematically

  • Have incident response queries ready before incidents occur

  • Monitor security continuously with automated queries

  • Use cost analysis patterns to optimize spending

  • Track business metrics alongside technical metrics

  • Implement anomaly detection for proactive alerting

  • Plan capacity based on historical trend analysis

  • Maintain audit trails for compliance

  • Save and organize your query library

Closing Thoughts

CloudWatch Logs Insights has been invaluable in my AWS journey. The query patterns in this series come from real production experience - countless incidents debugged, systems optimized, and insights discovered.

Start with the basics, build your query library incrementally, and always optimize for readability and performance. Most importantly, share your queries with your team - collaborative query development leads to better observability.

Next Steps

  1. Build your query library: Start with the patterns from this series

  2. Create dashboards: Visualize your most important queries

  3. Set up alerts: Convert queries into CloudWatch alarms

  4. Document your queries: Add comments and context

  5. Share with your team: Collaborative monitoring is more effective

  6. Iterate and improve: Refine queries based on real incidents

Additional Resources

Happy querying! πŸš€


CloudWatch Query 101 Series:

Last updated