Prometheus Architecture: How the Pieces Fit Together

The Day I Understood the Pull Model

When I first started with Prometheus, I was confused. Every other monitoring system I'd used required me to push metrics from my application to a central server. I had to configure where to send metrics, handle network failures, implement retry logic—it was a pain.

Then I discovered Prometheus's pull-based model, and it clicked. Instead of my application pushing metrics, Prometheus pulls them. My application just needs to expose an HTTP endpoint, and Prometheus does all the work of collecting metrics.

This inversion of control simplifies everything. My application doesn't need to know where Prometheus is, how many Prometheus servers exist, or what to do if Prometheus is down. It just exposes data and moves on.

Understanding this architecture changed how I think about monitoring.

The Core Components

Prometheus is more than just a metrics database. It's an ecosystem of components working together. Let's break down each piece.

1. Prometheus Server: The Heart

The Prometheus server is the core component. It:

Scrapes Metrics (Pulls Data):

Periodically fetches metrics from configured targets
Default interval: 15 seconds
Uses HTTP to pull from /metrics endpoints

Stores Time Series Data:

Local storage on disk (TSDB - Time Series Database)
Highly efficient storage format
Configurable retention (default: 15 days)

Evaluates Rules:

Recording rules: Pre-compute expensive queries
Alerting rules: Trigger alerts when conditions are met

Serves Queries:

HTTP API for querying data
PromQL query language
Powers Grafana dashboards and ad-hoc queries

2. Service Discovery

Prometheus needs to know what to scrape. You can configure targets in two ways:

Static Configuration:

scrape_configs:
  - job_name: 'my-api'
    static_configs:
      - targets: ['localhost:3000', 'localhost:3001']

Dynamic Service Discovery: Prometheus integrates with:

Kubernetes (my most-used option)
Docker/Docker Swarm
AWS EC2
Azure
Consul
DNS
And many more

For my Kubernetes deployments, Prometheus automatically discovers new pods:

scrape_configs:
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true

When I deploy a new pod with the annotation prometheus.io/scrape: "true", Prometheus automatically starts scraping it. No manual configuration needed.

3. Pushgateway: The Exception to the Rule

The pull model works great for long-running services, but what about short-lived jobs? Batch jobs, cron jobs, or serverless functions might finish before Prometheus scrapes them.

That's where the Pushgateway comes in. It's a intermediary that allows short-lived jobs to push their metrics, which Prometheus then scrapes.

TypeScript Example:

import { Pushgateway, Registry, Counter } from 'prom-client';

const register = new Registry();
const gateway = new Pushgateway('http://localhost:9091');

const jobCounter = new Counter({
  name: 'batch_job_processed_total',
  help: 'Total number of items processed',
  registers: [register]
});

async function runBatchJob() {
  // Process items
  for (let i = 0; i < 100; i++) {
    // ... process item
    jobCounter.inc();
  }
  
  // Push metrics to Pushgateway before job exits
  await gateway.pushAdd({
    jobName: 'batch-processor',
    groupings: { instance: 'batch-01' }
  });
}

Important: The Pushgateway is meant for batch jobs, not as a replacement for the pull model. I learned this the hard way when I tried using it for regular services—it caused more problems than it solved.

4. Exporters: Monitoring Third-Party Systems

Exporters are small programs that expose metrics for systems that don't natively support Prometheus.

Common exporters I use:

Exporter

Purpose

Metrics Port

Node Exporter

Host metrics (CPU, memory, disk)

9100

PostgreSQL Exporter

Database metrics

9187

Redis Exporter

Redis metrics

9121

NGINX Exporter

NGINX metrics

9113

Blackbox Exporter

HTTP/TCP probing

9115

Example: PostgreSQL Exporter

Instead of instrumenting PostgreSQL itself, I run the PostgreSQL exporter as a sidecar:

# docker-compose.yml
services:
  postgres:
    image: postgres:15
    ports:
      - "5432:5432"
  
  postgres-exporter:
    image: prometheuscommunity/postgres-exporter
    ports:
      - "9187:9187"
    environment:
      DATA_SOURCE_NAME: "postgresql://user:password@postgres:5432/mydb?sslmode=disable"

Prometheus scrapes the exporter, which queries PostgreSQL and converts the data to Prometheus metrics.

5. Alertmanager: Intelligent Alert Routing

When an alert fires, you don't want to spam everyone. Alertmanager handles:

Grouping: Multiple similar alerts get grouped into one notification.

Inhibition: If a high-priority alert fires, suppress related low-priority alerts.

Silencing: Temporarily mute alerts during maintenance.

Routing: Send different alerts to different teams/channels.

My Alertmanager Configuration:

# alertmanager.yml
route:
  receiver: 'default'
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 12h
  
  routes:
    - match:
        severity: critical
      receiver: 'pagerduty'
    
    - match:
        severity: warning
      receiver: 'slack'

receivers:
  - name: 'default'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/XXX'
        channel: '#monitoring'
  
  - name: 'pagerduty'
    pagerduty_configs:
      - service_key: 'YOUR_PAGERDUTY_KEY'
  
  - name: 'slack'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/XXX'
        channel: '#alerts'

Critical alerts page me. Warnings go to Slack. This prevents alert fatigue while ensuring I don't miss critical issues.

The Pull Model: Deep Dive

Let's understand why the pull model is so powerful.

How Prometheus Discovers and Scrapes

1. Discovery Phase: Prometheus queries service discovery systems (Kubernetes API, Consul, etc.) to get a list of targets.

2. Scrape Phase: Every 15 seconds (configurable), Prometheus:

Makes an HTTP GET request to each target's /metrics endpoint
Parses the response (Prometheus text format)
Stores the time series in its TSDB

3. Your Application's Responsibility: Just expose metrics at /metrics. That's it.

Pull vs Push: Why Pull Wins

I've worked with both models. Here's why I prefer pull:

Advantages of Pull:

Centralized Control:
- Prometheus controls scrape frequency
- Easy to adjust monitoring without changing apps
Failure Detection:
- If Prometheus can't scrape a target, it knows the service is down
- Push model: if an app stops pushing, did it crash or are metrics just delayed?
Simpler Application Code:
- No need to configure where to send metrics
- No retry logic needed
- No network error handling
Easy Testing:
- curl http://localhost:3000/metrics shows your metrics
- No need to run the entire monitoring stack for development
Multiple Prometheus Servers:
- Multiple Prometheus instances can scrape the same target
- Useful for different regions or teams

When Push Makes Sense:

Short-lived jobs (use Pushgateway)
Behind firewalls (Prometheus can't reach the target)
Very high cardinality data that needs aggregation before storage

Data Storage: The Time Series Database

Prometheus stores metrics in a local time-series database (TSDB) on disk.

Storage Structure

prometheus/
└── data/
    ├── wal/                    # Write-ahead log
    ├── chunks_head/            # Recent data (in-memory and on-disk)
    └── 01H2J3K4L5M6N7/         # Block directory (2-hour blocks)
        ├── chunks/             # Compressed sample data
        │   └── 000001
        ├── index               # Series and label index
        ├── meta.json           # Block metadata
        └── tombstones          # Deletion records

How it works:

Write-Ahead Log (WAL):
- New samples written here first
- Prevents data loss on crashes
Head Block:
- Recent data kept in memory for fast queries
- Also persisted to disk
2-Hour Blocks:
- Every 2 hours, the head block is compacted into an immutable block
- Compressed and indexed for efficient storage and queries
Compaction:
- Blocks are progressively compacted into larger blocks
- Reduces storage and improves query performance

Retention and Sizing

Default Retention: 15 days

Configure Retention:

# prometheus.yml
global:
  scrape_interval: 15s

# Command-line flags
--storage.tsdb.retention.time=30d
--storage.tsdb.retention.size=50GB

Storage Estimation:

In my experience, storage usage depends on:

Number of time series (metric + label combinations)
Scrape interval
Retention period

Example calculation:

10,000 time series
15-second scrape interval
30-day retention
~12 bytes per sample (compressed)

10,000 series × (60/15) samples/min × 60 min × 24 hours × 30 days × 12 bytes
≈ 10,000 × 4 × 60 × 24 × 30 × 12 bytes
≈ 207 GB

In practice, I've seen much better compression, often 2-4 bytes per sample. A typical installation with 10,000 time series uses 20-50 GB for 30 days.

The Query Engine

Prometheus's query engine powers everything:

Grafana dashboards
Alert rule evaluation
Ad-hoc queries via the UI

PromQL Query Flow:

The engine:

Parses your PromQL query
Determines which time series to fetch
Retrieves data from TSDB
Applies functions and aggregations
Returns results

We'll dive deep into PromQL in a dedicated article.

Recording Rules: Pre-Computing Expensive Queries

Some queries are expensive to run repeatedly. Recording rules let you pre-compute them.

Example:

Instead of running this expensive query every time:

histogram_quantile(0.95,
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le, endpoint)
)

Create a recording rule:

# prometheus.yml
groups:
  - name: api_performance
    interval: 30s
    rules:
      - record: api:http_request_duration_seconds:p95
        expr: |
          histogram_quantile(0.95,
            sum(rate(http_request_duration_seconds_bucket[5m])) by (le, endpoint)
          )

Now query the pre-computed metric:

api:http_request_duration_seconds:p95

This is much faster and reduces load on Prometheus.

Putting It All Together: My Production Setup

Here's how I structure Prometheus in production:

# docker-compose.yml
version: '3.8'

services:
  # My TypeScript application
  api:
    build: .
    ports:
      - "3000:3000"
    labels:
      prometheus.scrape: "true"
      prometheus.port: "3000"
      prometheus.path: "/metrics"
  
  # Prometheus server
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=30d'
  
  # Alertmanager
  alertmanager:
    image: prom/alertmanager:latest
    ports:
      - "9093:9093"
    volumes:
      - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml
  
  # Grafana for visualization
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana-data:/var/lib/grafana

volumes:
  prometheus-data:
  grafana-data:

prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - alertmanager:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  
  - job_name: 'api'
    static_configs:
      - targets: ['api:3000']

Key Takeaways

Pull-based architecture simplifies your application code
Prometheus server handles scraping, storage, and queries
Service discovery makes dynamic environments manageable
Pushgateway is for batch jobs only
Exporters monitor third-party systems
Alertmanager provides intelligent alert routing
TSDB efficiently stores time series data
Recording rules pre-compute expensive queries

Understanding the architecture helps you:

Design better instrumentation
Troubleshoot monitoring issues
Scale Prometheus effectively
Choose the right components for your needs

In the next article, we'll get hands-on with instrumenting TypeScript applications using the prom-client library.

Previous: Understanding Metrics and Data Model Next: Instrumenting TypeScript Applications

PreviousUnderstanding Metrics and Data Model: The Building Blocks NextInstrumenting TypeScript Applications: From Zero to Production Metrics

Last updated 15 hours ago

hashtagThe Day I Understood the Pull Model

hashtagThe Core Components

hashtag1. Prometheus Server: The Heart

hashtag2. Service Discovery

hashtag3. Pushgateway: The Exception to the Rule

hashtag4. Exporters: Monitoring Third-Party Systems

hashtag5. Alertmanager: Intelligent Alert Routing

hashtagThe Pull Model: Deep Dive

hashtagHow Prometheus Discovers and Scrapes

hashtagPull vs Push: Why Pull Wins

hashtagData Storage: The Time Series Database

hashtagStorage Structure

hashtagRetention and Sizing

hashtagThe Query Engine

hashtagRecording Rules: Pre-Computing Expensive Queries

hashtagPutting It All Together: My Production Setup

hashtagKey Takeaways