Understanding Metrics and Data Model: The Building Blocks
The Metric That Saved My Weekend
I once spent an entire Saturday debugging why my API was "slow." Users reported sluggish response times, but when I checked, the average response time looked perfectly fineβaround 100ms. Everything seemed normal.
The problem? I was tracking the wrong metric. I was looking at the average response time, which hid the fact that 5% of requests were taking 10+ seconds. The average was diluted by the 95% of fast requests.
Once I implemented a histogram metric and started tracking percentiles, the problem became obvious. One specific endpoint was occasionally hanging, causing terrible user experience for a small percentage of requests. I would have never found this with simple averages.
This taught me a critical lesson: choosing the right metric type is as important as collecting metrics at all.
What Is a Time Series?
Before diving into metric types, let's understand what Prometheus actually stores: time series.
A time series is a stream of timestamped values. Think of it as a spreadsheet where:
Each row is a measurement at a specific time
The columns are: timestamp, metric name, labels, and value
Example time series:
# Metric: http_requests_total
# Labels: method="GET", endpoint="/api/users", status="200"
Timestamp | Value
--------------------|-------
2026-01-02 10:00:00 | 1234
2026-01-02 10:00:15 | 1245
2026-01-02 10:00:30 | 1261
2026-01-02 10:00:45 | 1278Each unique combination of metric name + labels creates a separate time series. This is crucial to understand because it affects storage and performance.
The Four Metric Types
Prometheus has four core metric types, each designed for specific use cases. Let me walk you through each with TypeScript examples.
1. Counter: Counting Things That Only Go Up
A Counter is a cumulative metric that only increases (or resets to zero on restart). Think of it like a car's odometerβit only goes up.
When to use:
Number of requests served
Total errors encountered
Number of tasks completed
Bytes sent/received
When NOT to use:
Current temperature (can go up or down) β Use Gauge
Number of active connections (can increase or decrease) β Use Gauge
TypeScript Example:
Important: You never set a counter to a specific value. You only increment it:
Why counters are powerful: While the raw counter value isn't that useful (who cares that you've served 1,000,000 requests since startup?), you can use PromQL to calculate rates:
2. Gauge: Measuring Things That Go Up and Down
A Gauge is a metric that can increase or decrease. Think of it like a thermometer.
When to use:
Memory usage
Number of active connections
Queue size
CPU usage
Number of users currently online
TypeScript Example:
Key methods:
3. Histogram: Tracking Distributions
A Histogram samples observations (like request durations or response sizes) and counts them in configurable buckets. This is what saved my weekend.
When to use:
Request/response times
Request/response sizes
Query durations
Any measurement where you need percentiles
Why it matters: Histograms let you answer questions like:
"What's the 95th percentile response time?" (95% of requests are faster than X)
"How many requests took longer than 1 second?"
"What's the median response time?"
TypeScript Example:
Understanding Buckets:
Buckets define the ranges you care about. The example above creates buckets for:
5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s
Prometheus counts how many observations fall into each bucket. This allows calculating percentiles later.
Querying Histograms:
What gets stored:
A histogram actually creates multiple time series:
http_request_duration_seconds_bucket{le="0.005"}- Count of requests β€ 5mshttp_request_duration_seconds_bucket{le="0.01"}- Count of requests β€ 10ms... (one for each bucket)
http_request_duration_seconds_bucket{le="+Inf"}- Total counthttp_request_duration_seconds_sum- Sum of all observed valueshttp_request_duration_seconds_count- Count of observations
4. Summary: Pre-Calculated Quantiles
A Summary is similar to a histogram but calculates quantiles on the client side (your application) instead of in Prometheus.
When to use:
When you need accurate quantiles
When you can't predict bucket sizes ahead of time
For simple percentile tracking without PromQL aggregation
When NOT to use:
When you need to aggregate across multiple instances
When you need flexible querying (histograms are better for this)
TypeScript Example:
Histogram vs Summary:
Quantile calculation
Server-side (Prometheus)
Client-side (your app)
Aggregation
Can aggregate across instances
Cannot aggregate
Flexibility
Buckets fixed at instrumentation time
Percentiles fixed at instrumentation time
Resource usage
More storage (per bucket)
More CPU (calculating quantiles)
Use case
Most production scenarios
Simple cases, single instance
My recommendation: Use Histograms for almost everything. They're more flexible and can be aggregated across multiple instances.
Labels: The Secret Sauce
Labels add dimensions to your metrics. This is where Prometheus becomes incredibly powerful.
Good Label Usage
Label Cardinality: The Hidden Danger
Here's where you can shoot yourself in the foot. Each unique combination of labels creates a new time series.
Bad example (high cardinality):
Good approach:
Rule of thumb: Keep label cardinality low. Labels with values that grow unbounded (user IDs, session IDs, request IDs) will cause problems.
Metric Naming Conventions
Prometheus has conventions that make metrics easier to understand and query:
Basic Rules
Use snake_case:
http_requests_totalnothttpRequestsTotalInclude the unit:
_seconds,_bytes,_totalSuffix counters with
_total:http_requests_total,errors_totalBase units: seconds (not milliseconds), bytes (not KB), etc.
Examples from My Projects
Practical TypeScript Setup
Here's how I structure metrics in a real TypeScript application:
Key Takeaways
Choose the right metric type:
Counter for cumulative values
Gauge for values that go up and down
Histogram for distributions (response times, sizes)
Summary rarely (prefer Histogram)
Understand time series:
Each unique metric + label combination = one time series
Keep label cardinality low
Use labels wisely:
Make queries powerful
Avoid high-cardinality labels (user IDs, request IDs)
Follow naming conventions:
snake_case
Include units
Suffix counters with
_total
Histograms > Summaries:
Use histograms for almost everything
They're more flexible and aggregatable
In the next article, we'll explore Prometheus architectureβhow all the pieces fit together, from scraping to storage to alerting.
Previous: Introduction to Prometheus Next: Prometheus Architecture
Last updated