Understanding Metrics and Data Model: The Building Blocks

The Metric That Saved My Weekend

I once spent an entire Saturday debugging why my API was "slow." Users reported sluggish response times, but when I checked, the average response time looked perfectly fineβ€”around 100ms. Everything seemed normal.

The problem? I was tracking the wrong metric. I was looking at the average response time, which hid the fact that 5% of requests were taking 10+ seconds. The average was diluted by the 95% of fast requests.

Once I implemented a histogram metric and started tracking percentiles, the problem became obvious. One specific endpoint was occasionally hanging, causing terrible user experience for a small percentage of requests. I would have never found this with simple averages.

This taught me a critical lesson: choosing the right metric type is as important as collecting metrics at all.

What Is a Time Series?

Before diving into metric types, let's understand what Prometheus actually stores: time series.

A time series is a stream of timestamped values. Think of it as a spreadsheet where:

  • Each row is a measurement at a specific time

  • The columns are: timestamp, metric name, labels, and value

Example time series:

# Metric: http_requests_total
# Labels: method="GET", endpoint="/api/users", status="200"

Timestamp           | Value
--------------------|-------
2026-01-02 10:00:00 | 1234
2026-01-02 10:00:15 | 1245
2026-01-02 10:00:30 | 1261
2026-01-02 10:00:45 | 1278

Each unique combination of metric name + labels creates a separate time series. This is crucial to understand because it affects storage and performance.

The Four Metric Types

Prometheus has four core metric types, each designed for specific use cases. Let me walk you through each with TypeScript examples.

1. Counter: Counting Things That Only Go Up

A Counter is a cumulative metric that only increases (or resets to zero on restart). Think of it like a car's odometerβ€”it only goes up.

When to use:

  • Number of requests served

  • Total errors encountered

  • Number of tasks completed

  • Bytes sent/received

When NOT to use:

  • Current temperature (can go up or down) β†’ Use Gauge

  • Number of active connections (can increase or decrease) β†’ Use Gauge

TypeScript Example:

Important: You never set a counter to a specific value. You only increment it:

Why counters are powerful: While the raw counter value isn't that useful (who cares that you've served 1,000,000 requests since startup?), you can use PromQL to calculate rates:

2. Gauge: Measuring Things That Go Up and Down

A Gauge is a metric that can increase or decrease. Think of it like a thermometer.

When to use:

  • Memory usage

  • Number of active connections

  • Queue size

  • CPU usage

  • Number of users currently online

TypeScript Example:

Key methods:

3. Histogram: Tracking Distributions

A Histogram samples observations (like request durations or response sizes) and counts them in configurable buckets. This is what saved my weekend.

When to use:

  • Request/response times

  • Request/response sizes

  • Query durations

  • Any measurement where you need percentiles

Why it matters: Histograms let you answer questions like:

  • "What's the 95th percentile response time?" (95% of requests are faster than X)

  • "How many requests took longer than 1 second?"

  • "What's the median response time?"

TypeScript Example:

Understanding Buckets:

Buckets define the ranges you care about. The example above creates buckets for:

  • 5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s

Prometheus counts how many observations fall into each bucket. This allows calculating percentiles later.

Querying Histograms:

What gets stored:

A histogram actually creates multiple time series:

  • http_request_duration_seconds_bucket{le="0.005"} - Count of requests ≀ 5ms

  • http_request_duration_seconds_bucket{le="0.01"} - Count of requests ≀ 10ms

  • ... (one for each bucket)

  • http_request_duration_seconds_bucket{le="+Inf"} - Total count

  • http_request_duration_seconds_sum - Sum of all observed values

  • http_request_duration_seconds_count - Count of observations

4. Summary: Pre-Calculated Quantiles

A Summary is similar to a histogram but calculates quantiles on the client side (your application) instead of in Prometheus.

When to use:

  • When you need accurate quantiles

  • When you can't predict bucket sizes ahead of time

  • For simple percentile tracking without PromQL aggregation

When NOT to use:

  • When you need to aggregate across multiple instances

  • When you need flexible querying (histograms are better for this)

TypeScript Example:

Histogram vs Summary:

Feature
Histogram
Summary

Quantile calculation

Server-side (Prometheus)

Client-side (your app)

Aggregation

Can aggregate across instances

Cannot aggregate

Flexibility

Buckets fixed at instrumentation time

Percentiles fixed at instrumentation time

Resource usage

More storage (per bucket)

More CPU (calculating quantiles)

Use case

Most production scenarios

Simple cases, single instance

My recommendation: Use Histograms for almost everything. They're more flexible and can be aggregated across multiple instances.

Labels: The Secret Sauce

Labels add dimensions to your metrics. This is where Prometheus becomes incredibly powerful.

Good Label Usage

Label Cardinality: The Hidden Danger

Here's where you can shoot yourself in the foot. Each unique combination of labels creates a new time series.

Bad example (high cardinality):

Good approach:

Rule of thumb: Keep label cardinality low. Labels with values that grow unbounded (user IDs, session IDs, request IDs) will cause problems.

Metric Naming Conventions

Prometheus has conventions that make metrics easier to understand and query:

Basic Rules

  1. Use snake_case: http_requests_total not httpRequestsTotal

  2. Include the unit: _seconds, _bytes, _total

  3. Suffix counters with _total: http_requests_total, errors_total

  4. Base units: seconds (not milliseconds), bytes (not KB), etc.

Examples from My Projects

Practical TypeScript Setup

Here's how I structure metrics in a real TypeScript application:

Key Takeaways

  1. Choose the right metric type:

    • Counter for cumulative values

    • Gauge for values that go up and down

    • Histogram for distributions (response times, sizes)

    • Summary rarely (prefer Histogram)

  2. Understand time series:

    • Each unique metric + label combination = one time series

    • Keep label cardinality low

  3. Use labels wisely:

    • Make queries powerful

    • Avoid high-cardinality labels (user IDs, request IDs)

  4. Follow naming conventions:

    • snake_case

    • Include units

    • Suffix counters with _total

  5. Histograms > Summaries:

    • Use histograms for almost everything

    • They're more flexible and aggregatable

In the next article, we'll explore Prometheus architectureβ€”how all the pieces fit together, from scraping to storage to alerting.


Previous: Introduction to Prometheus Next: Prometheus Architecture

Last updated