Visualization with Grafana: Making Metrics Beautiful and Useful

The Dashboard That Changed Everything

For weeks after implementing Prometheus, I queried metrics through the Prometheus UI. It worked, but it was clunky. I had to remember complex PromQL queries. I couldn't see trends at a glance. My team couldn't easily check system health.

Then I spent an afternoon setting up Grafana. That evening, my team lead walked by my desk, saw the dashboard on my screen, and stopped. "Wait, is that our API? I can actually see what's happening. This is incredible."

Within a week, everyone on the team had the dashboard bookmarked. We caught issues faster. Stakeholders could see system health without asking me. And most importantly, I could understand system behavior at a glance instead of writing queries.

Good visualization turns raw metrics into understanding.

Setting Up Grafana with Prometheus

Installation with Docker Compose

# docker-compose.yml
version: '3.8'

services:
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
  
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_USERS_ALLOW_SIGN_UP=false
    volumes:
      - grafana-data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning
    depends_on:
      - prometheus

volumes:
  prometheus-data:
  grafana-data:

Start everything:

Access Grafana at http://localhost:3000 (admin/admin).

Adding Prometheus as a Data Source

Manual Setup:

  1. Go to Configuration β†’ Data Sources

  2. Click "Add data source"

  3. Select "Prometheus"

  4. URL: http://prometheus:9090 (in Docker) or http://localhost:9090 (local)

  5. Click "Save & Test"

Automated Setup (Provisioning):

Now Grafana automatically connects to Prometheus on startup.

Building Your First Dashboard

Let's build a comprehensive API monitoring dashboard step by step.

Panel 1: Request Rate

What: Requests per second over time

Query:

Panel Configuration:

  • Visualization: Time series (Graph)

  • Title: "Requests per Second"

  • Y-axis label: "req/s"

  • Legend: "Total Requests"

Advanced: Show by endpoint:

Panel 2: Error Rate

What: Percentage of requests returning 5xx errors

Query:

Panel Configuration:

  • Visualization: Time series

  • Title: "Error Rate (%)"

  • Y-axis: Unit = "percent (0-100)"

  • Thresholds: Yellow at 1%, Red at 5%

  • Alert: > 5% for 5 minutes

Panel 3: Response Time Percentiles

What: P50, P95, P99 latency

Queries:

Panel Configuration:

  • Visualization: Time series

  • Title: "Response Time Percentiles"

  • Y-axis: Unit = "seconds (s)"

  • Legend: "P50", "P95", "P99"

  • Thresholds: Yellow at 0.5s, Red at 1s

Panel 4: Status Code Breakdown

What: Traffic distribution by status code

Query:

Panel Configuration:

  • Visualization: Bar gauge or Pie chart

  • Title: "Requests by Status Code"

  • Legend: Show values and percentages

Panel 5: Memory Usage

What: Node.js heap usage over time

Query:

Panel Configuration:

  • Visualization: Time series with fill

  • Title: "Memory Usage (%)"

  • Y-axis: Unit = "percent (0-100)", Min = 0, Max = 100

  • Thresholds: Yellow at 70%, Red at 90%

Panel 6: Active Database Connections

What: Current active database connections

Query:

Panel Configuration:

  • Visualization: Stat (single number)

  • Title: "Active DB Connections"

  • Thresholds: Green < 15, Yellow < 18, Red >= 18

My Production API Dashboard Layout

Here's how I organize panels:

Dashboard Variables for Flexibility

Variables make dashboards dynamic and reusable.

Example: Environment Variable

Dashboard Settings β†’ Variables β†’ Add Variable:

  • Name: environment

  • Type: Query

  • Data source: Prometheus

  • Query: label_values(http_requests_total, environment)

  • Multi-value: Enable

  • Include All: Enable

Use in queries:

Now you can switch between dev/staging/production!

Example: Endpoint Variable

  • Name: endpoint

  • Query: label_values(http_requests_total, route)

  • Multi-value: Enable

Use in panels:

Filter dashboards by specific endpoints.

Example: Time Range Variable

Use Grafana's built-in time picker:

$__rate_interval automatically adjusts based on time range.

Essential Dashboard Panels for TypeScript APIs

Traffic Panel

Error Tracking

Performance Metrics

Resource Usage

Database Metrics

Advanced Visualization Techniques

Heatmaps for Latency Distribution

Visualization: Heatmap Query:

Format: Heatmap What it shows: Latency distribution over time (darker = more requests at that latency)

Table for Top Endpoints

Visualization: Table Queries:

Metric
Query

Endpoint

label_values(http_requests_total, route)

Requests/s

sum(rate(http_requests_total[5m])) by (route)

Error %

sum(rate(http_requests_total{status_code=~"5.."}[5m])) by (route) / sum(rate(http_requests_total[5m])) by (route) * 100

P95 Latency

histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, route))

Alert State Panel

Shows current firing alerts:

Visualization: Alert list Options: Show "Alerting" and "No Data" states

Dashboard Organization Strategy

I organize dashboards by audience and purpose:

For Developers (Detailed)

  • API Performance Dashboard - All HTTP metrics

  • Database Performance Dashboard - Query performance, connections

  • Application Health Dashboard - Memory, CPU, errors

  • Business Metrics Dashboard - Signups, purchases, active users

For Operations (Overview)

  • System Overview Dashboard - High-level health across all services

  • Infrastructure Dashboard - Host metrics, disk, network

  • SLA Dashboard - Uptime, error rates, latency

For Management (Executive)

  • Service Health Dashboard - Red/yellow/green indicators

  • Business KPIs Dashboard - User metrics, revenue, conversion

Dashboard JSON Export/Import

Export for version control:

  1. Dashboard β†’ Share β†’ Export

  2. Save JSON to grafana/dashboards/api-performance.json

  3. Commit to Git

Auto-provision dashboards:

Place JSON files in grafana/dashboards/, and they load automatically!

Dashboard Best Practices from Experience

1. Use Consistent Time Ranges

All panels should use the same time range:

Not:

2. Set Meaningful Y-Axis Ranges

❌ Auto-scale on percentage metrics (0-100% might show as 99.9-100%)

βœ… Fix range:

  • Min: 0

  • Max: 100

3. Use Units

Always set appropriate units:

  • Time: seconds (s), milliseconds (ms)

  • Data: bytes, kilobytes, megabytes

  • Rate: ops/sec, req/sec

  • Percentage: percent (0-100)

4. Color Code Thresholds

  • Green: Normal operation

  • Yellow: Warning level

  • Red: Critical level

Example:

  • 0-70%: Green

  • 70-90%: Yellow

  • 90-100%: Red

5. Add Panel Descriptions

Click panel title β†’ Edit β†’ Panel β†’ Description:

Add dashboard links at the top:

  • API Performance β†’ Database Performance

  • Database Performance β†’ Infrastructure

  • Infrastructure β†’ Alert History

7. Use Annotations for Deploys

Mark deployments on graphs:

Now you can correlate performance changes with deployments!

My Essential Dashboard Collection

1. API Performance Dashboard

Focus: Request rates, errors, latency

Panels:

  • Request rate (line graph)

  • Error rate % (line graph with threshold)

  • P50/P95/P99 latency (multi-line graph)

  • Status code distribution (bar chart)

  • Top 10 endpoints by traffic (table)

  • Top 5 slowest endpoints (table)

2. Application Health Dashboard

Focus: Resource usage, application metrics

Panels:

  • Memory usage % (area graph)

  • Heap size used vs total (dual line)

  • Event loop lag (line graph)

  • Active connections (stat)

  • Error log rate (line graph)

3. Database Dashboard

Focus: Database performance

Panels:

  • Query rate by operation (stacked area)

  • P99 query latency (line graph)

  • Connection pool usage (dual line: active + idle)

  • Slow query count (stat with threshold)

  • Top tables by query count (table)

4. Business Metrics Dashboard

Focus: User behavior and revenue

Panels:

  • Active users (stat)

  • Signups per hour (bar chart)

  • Revenue per hour (area graph)

  • Conversion rate (line graph)

  • Popular features (table)

Grafana Alerting

You can also configure alerts in Grafana (though I prefer Prometheus alerts).

Create Alert:

  1. Edit panel

  2. Alert tab

  3. Create Alert

  4. Condition: WHEN last() OF query(A, 5m, now) IS ABOVE 5

  5. Notification: Choose channel

Notification Channels:

  • Email

  • Slack

  • PagerDuty

  • Webhook

  • Microsoft Teams

Performance Tips

1. Limit Time Series

Too many series slow down dashboards:

2. Use Recording Rules

For complex queries used in multiple panels:

Then in Grafana:

Much faster!

3. Adjust Refresh Rate

  • Real-time monitoring: 5-10 seconds

  • Regular monitoring: 30-60 seconds

  • Historical analysis: Manual refresh

4. Use Dashboard Caching

Configure in grafana.ini:

Key Takeaways

  1. Start with the basics - Request rate, errors, latency

  2. Use variables - Make dashboards flexible and reusable

  3. Organize by audience - Different dashboards for different needs

  4. Set proper units and thresholds - Make interpretation easy

  5. Export dashboards - Version control in Git

  6. Link related dashboards - Easy navigation

  7. Add descriptions - Help future you understand panels

  8. Optimize queries - Limit time series, use recording rules

A good dashboard tells a story about your system's health. It should answer questions before they're asked and make problems obvious at a glance.

In the final article, we'll cover best practices learned from running Prometheus in production.


Previous: Alerting with Prometheus Next: Prometheus Best Practices

Last updated