Part 5: Building Observability Dashboards with CloudWatch

From Queries to Dashboards

Running adhoc queries solved immediate problems, but I needed persistent visibility into my systems. Building CloudWatch dashboards transformed how my team monitors production. This part covers everything I learned about creating effective observability dashboards.

CloudWatch Dashboards Overview

CloudWatch Dashboards provide real-time visualization of metrics and logs.

Dashboard Capabilities

  • Widgets: Multiple visualization types

  • Auto-refresh: Real-time updates

  • Time range: Flexible time windows

  • Sharing: Cross-account and public dashboards

  • Layouts: Customizable grid layout

When I Use Dashboards vs Queries

Dashboards for:

  • Continuous monitoring

  • Team visibility

  • SLA/SLO tracking

  • High-level health checks

Queries for:

  • Incident investigation

  • Deep-dive analysis

  • Adhoc exploration

  • Root cause analysis

Creating Your First Dashboard

Via AWS Console

  1. Navigate to CloudWatch β†’ Dashboards

  2. Click "Create dashboard"

  3. Name your dashboard

  4. Add widgets

Via AWS CLI

Via Terraform

Widget Types

1. Number Widget

Displays single metric value - perfect for KPIs.

I use number widgets for:

  • Error counts

  • Active users

  • Request counts

  • Success rates

2. Line Graph Widget

Shows metrics over time - most common visualization.

I use line graphs for:

  • Response times over time

  • Request rates

  • Error rates

  • Resource utilization

3. Stacked Area Chart

Shows contribution of components to a total.

4. Log Insights Widget

Displays results from CloudWatch Logs Insights queries.

5. Pie Chart Widget

Shows distribution of categorical data.

6. Bar Chart Widget

Compares values across categories.

Dashboard Design Principles

Layout Strategy

I organize dashboards using this hierarchy:

Top Row: Golden Signals

  • Request rate

  • Error rate

  • Latency (p50, p95, p99)

  • Saturation

Middle Rows: Service Health

  • Individual service metrics

  • Dependencies

  • Resource utilization

Bottom Rows: Detailed Logs

  • Recent errors

  • Slow requests

  • Anomalies

Example Layout (Grid Coordinates)

Color Coding Strategy

  • Green: Healthy, normal operation

  • Yellow: Warning, approaching threshold

  • Red: Critical, requires attention

  • Blue: Informational metrics

Real-World Dashboard Examples

Dashboard 1: Application Health Overview

This is my go-to dashboard for application health.

Dashboard 2: API Performance Dashboard

Focused on API health and performance:

Widgets:

  1. Request Rate Over Time

  1. Error Rate Percentage

  1. Latency by Endpoint

  1. Top Error Messages

  1. Slowest Endpoints

Dashboard 3: Infrastructure Health

Monitoring underlying AWS services:

Dashboard 4: Customer Experience Dashboard

Focused on user-facing metrics:

Key Metrics:

  1. Availability (Success Rate)

  1. Apdex Score

  1. Active Users

  1. User Errors

Log-Based Metrics

Transform log queries into CloudWatch metrics for alerting.

Creating Log Metric Filters

  1. Go to Log Group β†’ Metric Filters

  2. Create filter pattern

  3. Assign metric name and namespace

  4. Use in dashboards and alarms

Example: Error Rate Metric

Filter pattern:

Metric:

  • Namespace: MyApp/Errors

  • Metric name: ErrorCount

  • Value: 1

Query Log-Based Metrics

Real Example: Business Metrics from Logs

Track Order Completions:

Filter pattern:

Metric:

  • Namespace: MyApp/Business

  • Metric name: OrdersCompleted

  • Value: $.order_value

Dashboard query:

Advanced Dashboard Patterns

Pattern 1: RED Method Dashboard

Rate, Errors, Duration

Pattern 2: USE Method Dashboard

Utilization, Saturation, Errors

Pattern 3: Golden Signals Dashboard

Latency, Traffic, Errors, Saturation

  1. Latency Widget:

  1. Traffic Widget:

  1. Errors Widget:

  1. Saturation Widget:

Pattern 4: Drill-Down Dashboard

Create linked dashboards:

  • High-level overview

  • Service-specific details

  • Component-level deep dive

Dashboard Variables

Use variables for flexibility:

Time Range Variable

All widgets respect dashboard time range selector.

Region Variable

Dynamic Dimensions

Alarm Integration

Connect alarms to dashboard widgets:

Widget with Alarm

Alarm Status Widget

Dashboard Auto-Refresh

Configure refresh intervals:

  • 10 seconds

  • 1 minute (default)

  • 2 minutes

  • 5 minutes

  • 15 minutes

For production monitoring, I use 1-minute refresh.

Sharing Dashboards

Within AWS Account

Dashboards are accessible to users with CloudWatch permissions.

Cross-Account Dashboards

Public Dashboards

Make dashboards publicly accessible (useful for status pages):

  1. Go to dashboard settings

  2. Enable "Public dashboard"

  3. Get shareable URL

Dashboard Best Practices

From my experience building dozens of dashboards:

1. Start with Golden Signals

Every dashboard should show:

  • Request rate

  • Error rate

  • Response time

  • Resource saturation

2. Use Consistent Time Windows

  • Real-time monitoring: Last 1 hour

  • Trend analysis: Last 24 hours

  • Capacity planning: Last 7 days

Don't mix unrelated services in one row.

4. Include Context

Add text widgets with:

  • Dashboard purpose

  • Alert thresholds

  • Runbook links

5. Optimize Query Performance

  • Use specific time ranges

  • Apply filters early

  • Limit result sets

  • Cache expensive queries

6. Test Dashboard Load Time

Large dashboards can be slow:

  • Limit to 20-30 widgets

  • Use metric math when possible

  • Consider separate dashboards for details

7. Include Recent Logs

Always add a log widget showing recent errors:

Mobile Dashboard Access

CloudWatch mobile app provides dashboard access on-call:

  1. Download CloudWatch app (iOS/Android)

  2. Con to AWS account

  3. View dashboards

  4. Receive alarm notifications

Dashboard as Code

Manage dashboards in version control:

Dashboard JSON in Git

Automated Deployment

Terraform Module

Cost Optimization

Dashboard costs are minimal but consider:

  • Log Insights queries: $0.005/GB scanned

  • Custom metrics: $0.30/metric/month

  • API requests: Usually within free tier

Tips:

  1. Use longer time intervals for less-critical metrics

  2. Aggregate data before visualization

  3. Archive old dashboards

  4. Use efficient queries

Key Takeaways

  • CloudWatch dashboards provide real-time visibility

  • Widget types: number, line graph, logs, pie, bar

  • Follow golden signals: latency, traffic, errors, saturation

  • Log Insights widgets bring queries into dashboards

  • Use log-based metrics for business KPIs

  • Organize dashboards by audience and use case

  • Include alarms and runbook links

  • Manage dashboards as code for version control

  • Test dashboard load times with many widgets

  • Mobile app enables on-call access

In Part 6, we'll explore CloudWatch query best practices, performance optimization, cost management, and query patterns that scale to production workloads.

Last updated