Understanding Metrics and Data Model: The Building Blocks

The Metric That Saved My Weekend

I once spent an entire Saturday debugging why my API was "slow." Users reported sluggish response times, but when I checked, the average response time looked perfectly fine—around 100ms. Everything seemed normal.

The problem? I was tracking the wrong metric. I was looking at the average response time, which hid the fact that 5% of requests were taking 10+ seconds. The average was diluted by the 95% of fast requests.

Once I implemented a histogram metric and started tracking percentiles, the problem became obvious. One specific endpoint was occasionally hanging, causing terrible user experience for a small percentage of requests. I would have never found this with simple averages.

This taught me a critical lesson: choosing the right metric type is as important as collecting metrics at all.

What Is a Time Series?

Before diving into metric types, let's understand what Prometheus actually stores: time series.

A time series is a stream of timestamped values. Think of it as a spreadsheet where:

Each row is a measurement at a specific time
The columns are: timestamp, metric name, labels, and value

Example time series:

# Metric: http_requests_total
# Labels: method="GET", endpoint="/api/users", status="200"

Timestamp           | Value
--------------------|-------
2026-01-02 10:00:00 | 1234
2026-01-02 10:00:15 | 1245
2026-01-02 10:00:30 | 1261
2026-01-02 10:00:45 | 1278

Each unique combination of metric name + labels creates a separate time series. This is crucial to understand because it affects storage and performance.

The Four Metric Types

Prometheus has four core metric types, each designed for specific use cases. Let me walk you through each with TypeScript examples.

1. Counter: Counting Things That Only Go Up

A Counter is a cumulative metric that only increases (or resets to zero on restart). Think of it like a car's odometer—it only goes up.

When to use:

Number of requests served
Total errors encountered
Number of tasks completed
Bytes sent/received

When NOT to use:

Current temperature (can go up or down) → Use Gauge
Number of active connections (can increase or decrease) → Use Gauge

TypeScript Example:

import { Counter, Registry } from 'prom-client';

// Create a registry
const register = new Registry();

// Create a counter
const httpRequestsTotal = new Counter({
  name: 'http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'endpoint', 'status_code'],
  registers: [register]
});

// In your Express middleware
app.use((req, res, next) => {
  res.on('finish', () => {
    httpRequestsTotal.inc({
      method: req.method,
      endpoint: req.route?.path || 'unknown',
      status_code: res.statusCode.toString()
    });
  });
  next();
});

Important: You never set a counter to a specific value. You only increment it:

httpRequestsTotal.inc(); // Increment by 1
httpRequestsTotal.inc(5); // Increment by 5

Why counters are powerful: While the raw counter value isn't that useful (who cares that you've served 1,000,000 requests since startup?), you can use PromQL to calculate rates:

# Requests per second over the last 5 minutes
rate(http_requests_total[5m])

# Total requests in the last hour
increase(http_requests_total[1h])

2. Gauge: Measuring Things That Go Up and Down

A Gauge is a metric that can increase or decrease. Think of it like a thermometer.

When to use:

Memory usage
Number of active connections
Queue size
CPU usage
Number of users currently online

TypeScript Example:

import { Gauge } from 'prom-client';

const activeConnections = new Gauge({
  name: 'active_database_connections',
  help: 'Number of active database connections',
  registers: [register]
});

const memoryUsage = new Gauge({
  name: 'nodejs_memory_usage_bytes',
  help: 'Node.js memory usage in bytes',
  labelNames: ['type'],
  registers: [register]
});

// Track active connections
function onConnectionOpen() {
  activeConnections.inc(); // Increase by 1
}

function onConnectionClose() {
  activeConnections.dec(); // Decrease by 1
}

// Set to a specific value
function updateMemoryMetrics() {
  const usage = process.memoryUsage();
  memoryUsage.set({ type: 'heapUsed' }, usage.heapUsed);
  memoryUsage.set({ type: 'heapTotal' }, usage.heapTotal);
  memoryUsage.set({ type: 'external' }, usage.external);
  memoryUsage.set({ type: 'rss' }, usage.rss);
}

// Update every 10 seconds
setInterval(updateMemoryMetrics, 10000);

Key methods:

gauge.set(42);        // Set to specific value
gauge.inc();          // Increment by 1
gauge.inc(5);         // Increment by 5
gauge.dec();          // Decrement by 1
gauge.dec(3);         // Decrement by 3
gauge.setToCurrentTime(); // Set to current Unix timestamp

3. Histogram: Tracking Distributions

A Histogram samples observations (like request durations or response sizes) and counts them in configurable buckets. This is what saved my weekend.

When to use:

Request/response times
Request/response sizes
Query durations
Any measurement where you need percentiles

Why it matters: Histograms let you answer questions like:

"What's the 95th percentile response time?" (95% of requests are faster than X)
"How many requests took longer than 1 second?"
"What's the median response time?"

TypeScript Example:

import { Histogram } from 'prom-client';

const httpRequestDuration = new Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', 'endpoint', 'status_code'],
  buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10],
  registers: [register]
});

// Express middleware to track request duration
app.use((req, res, next) => {
  const start = Date.now();
  
  res.on('finish', () => {
    const duration = (Date.now() - start) / 1000; // Convert to seconds
    
    httpRequestDuration.observe(
      {
        method: req.method,
        endpoint: req.route?.path || 'unknown',
        status_code: res.statusCode.toString()
      },
      duration
    );
  });
  
  next();
});

Understanding Buckets:

Buckets define the ranges you care about. The example above creates buckets for:

5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s

Prometheus counts how many observations fall into each bucket. This allows calculating percentiles later.

Querying Histograms:

# 95th percentile response time
histogram_quantile(0.95, 
  rate(http_request_duration_seconds_bucket[5m])
)

# 50th percentile (median)
histogram_quantile(0.50, 
  rate(http_request_duration_seconds_bucket[5m])
)

# Average request duration
rate(http_request_duration_seconds_sum[5m]) 
  / rate(http_request_duration_seconds_count[5m])

What gets stored:

A histogram actually creates multiple time series:

http_request_duration_seconds_bucket{le="0.005"} - Count of requests ≤ 5ms
http_request_duration_seconds_bucket{le="0.01"} - Count of requests ≤ 10ms
... (one for each bucket)
http_request_duration_seconds_bucket{le="+Inf"} - Total count
http_request_duration_seconds_sum - Sum of all observed values
http_request_duration_seconds_count - Count of observations

4. Summary: Pre-Calculated Quantiles

A Summary is similar to a histogram but calculates quantiles on the client side (your application) instead of in Prometheus.

When to use:

When you need accurate quantiles
When you can't predict bucket sizes ahead of time
For simple percentile tracking without PromQL aggregation

When NOT to use:

When you need to aggregate across multiple instances
When you need flexible querying (histograms are better for this)

TypeScript Example:

import { Summary } from 'prom-client';

const apiResponseTime = new Summary({
  name: 'api_response_time_seconds',
  help: 'API response time in seconds',
  labelNames: ['endpoint'],
  percentiles: [0.5, 0.9, 0.95, 0.99],
  registers: [register]
});

app.get('/api/users', async (req, res) => {
  const end = apiResponseTime.startTimer({ endpoint: '/api/users' });
  
  try {
    const users = await db.getUsers();
    res.json(users);
  } finally {
    end(); // Records the duration
  }
});

Histogram vs Summary:

Feature

Histogram

Summary

Quantile calculation

Server-side (Prometheus)

Client-side (your app)

Aggregation

Can aggregate across instances

Cannot aggregate

Flexibility

Buckets fixed at instrumentation time

Percentiles fixed at instrumentation time

Resource usage

More storage (per bucket)

More CPU (calculating quantiles)

Use case

Most production scenarios

Simple cases, single instance

My recommendation: Use Histograms for almost everything. They're more flexible and can be aggregated across multiple instances.

Labels: The Secret Sauce

Labels add dimensions to your metrics. This is where Prometheus becomes incredibly powerful.

Good Label Usage

const httpRequests = new Counter({
  name: 'http_requests_total',
  help: 'Total HTTP requests',
  labelNames: ['method', 'endpoint', 'status_code']
});

// Now you can query:
// - All requests: sum(http_requests_total)
// - GET requests: sum(http_requests_total{method="GET"})
// - Errors on /api/login: sum(http_requests_total{endpoint="/api/login", status_code=~"5.."})
// - Error rate by endpoint: sum(rate(http_requests_total{status_code=~"5.."}[5m])) by (endpoint)

Label Cardinality: The Hidden Danger

Here's where you can shoot yourself in the foot. Each unique combination of labels creates a new time series.

Bad example (high cardinality):

// DON'T DO THIS!
const badCounter = new Counter({
  name: 'requests_by_user',
  help: 'Requests per user',
  labelNames: ['user_id'] // BAD: creates one time series per user!
});

// If you have 100,000 users, you just created 100,000 time series
// Prometheus will struggle, queries will be slow, and storage will explode

Good approach:

// DO THIS instead:
const goodCounter = new Counter({
  name: 'http_requests_total',
  help: 'Total HTTP requests',
  labelNames: ['endpoint', 'status_code', 'user_type']
});

// Track user type (guest, authenticated, premium) instead of individual user IDs
// Maybe 3-10 user types instead of 100,000 user IDs

Rule of thumb: Keep label cardinality low. Labels with values that grow unbounded (user IDs, session IDs, request IDs) will cause problems.

Metric Naming Conventions

Prometheus has conventions that make metrics easier to understand and query:

Basic Rules

Use snake_case: http_requests_total not httpRequestsTotal
Include the unit: _seconds, _bytes, _total
Suffix counters with _total: http_requests_total, errors_total
Base units: seconds (not milliseconds), bytes (not KB), etc.

Examples from My Projects

// Good metric names
http_requests_total              // Counter: total requests
http_request_duration_seconds    // Histogram: request duration
database_connections_active      // Gauge: active connections
cache_hits_total                 // Counter: cache hits
memory_usage_bytes              // Gauge: memory in bytes
queue_size                      // Gauge: queue length
errors_total                    // Counter: total errors

// Bad metric names (don't do this)
requests                        // Missing unit, unclear type
http_request_time_ms           // Should be seconds, not milliseconds
active_connections_count       // Redundant "_count"
total_errors                   // Should be errors_total

Practical TypeScript Setup

Here's how I structure metrics in a real TypeScript application:

// metrics.ts
import { Registry, Counter, Histogram, Gauge, collectDefaultMetrics } from 'prom-client';

export const register = new Registry();

// Collect default Node.js metrics (memory, CPU, etc.)
collectDefaultMetrics({ register });

// Application metrics
export const metrics = {
  httpRequestsTotal: new Counter({
    name: 'http_requests_total',
    help: 'Total number of HTTP requests',
    labelNames: ['method', 'endpoint', 'status_code'],
    registers: [register]
  }),
  
  httpRequestDuration: new Histogram({
    name: 'http_request_duration_seconds',
    help: 'Duration of HTTP requests in seconds',
    labelNames: ['method', 'endpoint', 'status_code'],
    buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
    registers: [register]
  }),
  
  databaseConnectionsActive: new Gauge({
    name: 'database_connections_active',
    help: 'Number of active database connections',
    registers: [register]
  }),
  
  databaseQueryDuration: new Histogram({
    name: 'database_query_duration_seconds',
    help: 'Duration of database queries in seconds',
    labelNames: ['operation', 'table'],
    buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5],
    registers: [register]
  })
};

// metrics-middleware.ts
import { Request, Response, NextFunction } from 'express';
import { metrics } from './metrics';

export function metricsMiddleware(req: Request, res: Response, next: NextFunction) {
  const start = Date.now();
  
  res.on('finish', () => {
    const duration = (Date.now() - start) / 1000;
    
    const labels = {
      method: req.method,
      endpoint: req.route?.path || 'unknown',
      status_code: res.statusCode.toString()
    };
    
    metrics.httpRequestsTotal.inc(labels);
    metrics.httpRequestDuration.observe(labels, duration);
  });
  
  next();
}

Key Takeaways

Choose the right metric type:
- Counter for cumulative values
- Gauge for values that go up and down
- Histogram for distributions (response times, sizes)
- Summary rarely (prefer Histogram)
Understand time series:
- Each unique metric + label combination = one time series
- Keep label cardinality low
Use labels wisely:
- Make queries powerful
- Avoid high-cardinality labels (user IDs, request IDs)
Follow naming conventions:
- snake_case
- Include units
- Suffix counters with _total
Histograms > Summaries:
- Use histograms for almost everything
- They're more flexible and aggregatable

In the next article, we'll explore Prometheus architecture—how all the pieces fit together, from scraping to storage to alerting.

Previous: Introduction to Prometheus Next: Prometheus Architecture

PreviousIntroduction to Prometheus: The Wake-Up Call I Needed NextPrometheus Architecture: How the Pieces Fit Together

Last updated 15 hours ago

hashtagThe Metric That Saved My Weekend

hashtagWhat Is a Time Series?

hashtagThe Four Metric Types

hashtag1. Counter: Counting Things That Only Go Up

hashtag2. Gauge: Measuring Things That Go Up and Down

hashtag3. Histogram: Tracking Distributions

hashtag4. Summary: Pre-Calculated Quantiles

hashtagLabels: The Secret Sauce

hashtagGood Label Usage

hashtagLabel Cardinality: The Hidden Danger

hashtagMetric Naming Conventions

hashtagBasic Rules

hashtagExamples from My Projects

hashtagPractical TypeScript Setup

hashtagKey Takeaways

The Metric That Saved My Weekend

What Is a Time Series?

The Four Metric Types

1. Counter: Counting Things That Only Go Up

2. Gauge: Measuring Things That Go Up and Down

3. Histogram: Tracking Distributions

4. Summary: Pre-Calculated Quantiles

Labels: The Secret Sauce

Good Label Usage

Label Cardinality: The Hidden Danger

Metric Naming Conventions

Basic Rules

Examples from My Projects

Practical TypeScript Setup

Key Takeaways