Introduction to Prometheus: The Wake-Up Call I Needed

When Monitoring Becomes Personal

I learned about Prometheus the hard way. Not from documentation, not from a tutorial, but from a 4 AM incident where I had absolutely no idea what was happening inside my production TypeScript application.

The API was responding slowly. Users were complaining. The server CPU looked fine. Memory seemed okay. But something was wrong, and I was flying blind. I had logs—thousands of lines of them—but logs only tell you what happened, not why it's happening or how bad things really are.

That's when I realized: I needed metrics, not just logs.

I needed to answer questions like:

  • How many requests per second am I actually handling?

  • What's the 95th percentile response time?

  • Which endpoints are slowing down?

  • Is my database connection pool exhausted?

  • How much memory is my Node.js process actually using over time?

Logs couldn't answer these questions in real-time. I needed a time-series monitoring system. I needed Prometheus.

What Is Prometheus, Really?

Prometheus is an open-source monitoring and alerting toolkit that collects and stores metrics as time series data. But that definition doesn't capture what it actually does for you.

Here's what Prometheus really is: It's your application's vital signs monitor.

Think of it like this:

  • Logs tell you the story of what your application did

  • Traces show you the path of a single request

  • Metrics give you the health statistics of your entire system over time

Prometheus excels at metrics. It tracks numerical measurements over time, allowing you to answer questions like:

  • "How many users logged in during the last hour?"

  • "What's my API's average response time?"

  • "Is my error rate increasing?"

  • "How many database connections are active right now?"

Why Prometheus Specifically?

After that 4 AM incident, I evaluated several monitoring solutions. Here's why I chose Prometheus:

1. Pull-Based Architecture Unlike systems where you push metrics to a central server, Prometheus pulls metrics from your application. This means:

  • Your application doesn't need to know where Prometheus is

  • No complicated client configuration for multiple monitoring servers

  • Prometheus controls the scraping frequency

  • Your app just exposes metrics; Prometheus does the rest

2. Multi-Dimensional Data Model Prometheus uses labels to add dimensions to metrics. Instead of tracking separate metrics like:

  • api_requests_login

  • api_requests_checkout

  • api_requests_search

You track one metric with labels:

This makes querying incredibly powerful.

3. Powerful Query Language (PromQL) PromQL lets you slice and dice your data in real-time:

4. No External Dependencies Prometheus stores data locally on disk. No need for a separate database. This means it's reliable even when other systems fail—exactly what you want during an outage.

5. Built for Cloud-Native Prometheus was built at SoundCloud and donated to the Cloud Native Computing Foundation (CNCF). It's designed for dynamic environments where services come and go, making it perfect for containerized applications.

The Prometheus Mental Model

Understanding Prometheus requires a mental shift. Here's how I think about it:

spinner

Your Application's Job:

  1. Instrument your code (add metric counters, gauges, etc.)

  2. Expose metrics at an HTTP endpoint (usually /metrics)

  3. That's it. You're done.

Prometheus's Job:

  1. Scrape your /metrics endpoint periodically (every 15 seconds by default)

  2. Store the metrics as time series data

  3. Evaluate alert rules

  4. Provide query interface for dashboards

This separation of concerns is elegant. Your application focuses on exposing data; Prometheus handles collection, storage, and alerting.

What Prometheus Is NOT

Let me save you some time by clarifying what Prometheus isn't good for:

1. Not for Logs

Prometheus handles metrics, not logs. For logs, use:

  • ELK Stack (Elasticsearch, Logstash, Kibana)

  • Loki (Prometheus-like solution for logs)

  • Cloud provider logging (CloudWatch, Stackdriver)

2. Not for Distributed Tracing

For following a request through multiple services, use:

  • Jaeger

  • Zipkin

  • OpenTelemetry

3. Not for 100% Accuracy

Prometheus values reliability over perfect accuracy. If you need exact billing-level accuracy for every single event, Prometheus isn't the right choice. It's designed for monitoring trends and patterns, not accounting.

4. Not for Long-Term Storage

Prometheus is designed for recent data (weeks to months). For long-term storage, integrate with:

  • Thanos

  • Cortex

  • VictoriaMetrics

When Should You Use Prometheus?

Based on my experience, use Prometheus when:

You're running microservices or distributed systems

  • Prometheus excels at monitoring many services simultaneously

You need real-time alerting

  • CPU usage spikes, error rate increases, service down

You're in a dynamic environment

  • Containers, Kubernetes, auto-scaling

You need to answer "what's happening right now?"

  • Current request rate, active connections, response times

You're building TypeScript/Node.js applications

  • Excellent client libraries (prom-client) with TypeScript support

My First Prometheus Implementation

After that 4 AM wake-up call, I added Prometheus to my TypeScript API. Here's what I instrumented first:

  1. HTTP request metrics - Count and duration of every request

  2. Error rates - How many 4xx and 5xx responses

  3. Database query times - How long database operations take

  4. Active connections - How many connections to the database

  5. Node.js metrics - Memory usage, event loop lag

Within a week, I:

  • Identified a slow endpoint I didn't know existed

  • Caught a memory leak before it became critical

  • Set up alerts for error rate increases

  • Created dashboards showing real-time traffic patterns

The difference was night and day. Instead of flying blind, I had visibility.

What's Next?

Now that you understand what Prometheus is and why it matters, the next articles will cover:

  • Understanding Metrics and Data Model - The different types of metrics and when to use each

  • Prometheus Architecture - How the pieces fit together

  • Instrumenting TypeScript Applications - Hands-on code with the prom-client library

  • PromQL Basics - Writing powerful queries

  • Configuration - Setting up Prometheus for your environment

  • Alerting - Creating meaningful alerts that actually help

  • Visualization - Building Grafana dashboards

  • Best Practices - Lessons learned from production

Key Takeaways

Before moving on, remember:

  1. Prometheus is for metrics, not logs or traces - Use the right tool for the right job

  2. Pull-based architecture - Your app exposes metrics; Prometheus scrapes them

  3. Time series data with labels - Powerful multi-dimensional data model

  4. Designed for reliability - Works when other systems are down

  5. Perfect for cloud-native applications - Built for dynamic, containerized environments

In the next article, we'll dive deep into the metrics and data model—understanding counters, gauges, histograms, and summaries, with practical TypeScript examples.

The goal isn't just to add monitoring. The goal is to sleep better at night knowing you'll understand what's happening when things go wrong.


Next: Understanding Metrics and Data Model

Last updated