Introduction to Prometheus: The Wake-Up Call I Needed

When Monitoring Becomes Personal

I learned about Prometheus the hard way. Not from documentation, not from a tutorial, but from a 4 AM incident where I had absolutely no idea what was happening inside my production TypeScript application.

The API was responding slowly. Users were complaining. The server CPU looked fine. Memory seemed okay. But something was wrong, and I was flying blind. I had logs—thousands of lines of them—but logs only tell you what happened, not why it's happening or how bad things really are.

That's when I realized: I needed metrics, not just logs.

I needed to answer questions like:

How many requests per second am I actually handling?
What's the 95th percentile response time?
Which endpoints are slowing down?
Is my database connection pool exhausted?
How much memory is my Node.js process actually using over time?

Logs couldn't answer these questions in real-time. I needed a time-series monitoring system. I needed Prometheus.

What Is Prometheus, Really?

Prometheus is an open-source monitoring and alerting toolkit that collects and stores metrics as time series data. But that definition doesn't capture what it actually does for you.

Here's what Prometheus really is: It's your application's vital signs monitor.

Think of it like this:

Logs tell you the story of what your application did
Traces show you the path of a single request
Metrics give you the health statistics of your entire system over time

Prometheus excels at metrics. It tracks numerical measurements over time, allowing you to answer questions like:

"How many users logged in during the last hour?"
"What's my API's average response time?"
"Is my error rate increasing?"
"How many database connections are active right now?"

Why Prometheus Specifically?

After that 4 AM incident, I evaluated several monitoring solutions. Here's why I chose Prometheus:

1. Pull-Based Architecture Unlike systems where you push metrics to a central server, Prometheus pulls metrics from your application. This means:

Your application doesn't need to know where Prometheus is
No complicated client configuration for multiple monitoring servers
Prometheus controls the scraping frequency
Your app just exposes metrics; Prometheus does the rest

2. Multi-Dimensional Data Model Prometheus uses labels to add dimensions to metrics. Instead of tracking separate metrics like:

api_requests_login
api_requests_checkout
api_requests_search

You track one metric with labels:

api_requests_total{method="POST", endpoint="/login", status="200"}
api_requests_total{method="GET", endpoint="/checkout", status="200"}
api_requests_total{method="GET", endpoint="/search", status="500"}

This makes querying incredibly powerful.

3. Powerful Query Language (PromQL) PromQL lets you slice and dice your data in real-time:

# 95th percentile response time over the last 5 minutes
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

# Error rate per endpoint
sum(rate(http_requests_total{status=~"5.."}[5m])) by (endpoint)

4. No External Dependencies Prometheus stores data locally on disk. No need for a separate database. This means it's reliable even when other systems fail—exactly what you want during an outage.

5. Built for Cloud-Native Prometheus was built at SoundCloud and donated to the Cloud Native Computing Foundation (CNCF). It's designed for dynamic environments where services come and go, making it perfect for containerized applications.

The Prometheus Mental Model

Understanding Prometheus requires a mental shift. Here's how I think about it:

Your Application's Job:

Instrument your code (add metric counters, gauges, etc.)
Expose metrics at an HTTP endpoint (usually /metrics)
That's it. You're done.

Prometheus's Job:

Scrape your /metrics endpoint periodically (every 15 seconds by default)
Store the metrics as time series data
Evaluate alert rules
Provide query interface for dashboards

This separation of concerns is elegant. Your application focuses on exposing data; Prometheus handles collection, storage, and alerting.

What Prometheus Is NOT

Let me save you some time by clarifying what Prometheus isn't good for:

1. Not for Logs

Prometheus handles metrics, not logs. For logs, use:

ELK Stack (Elasticsearch, Logstash, Kibana)
Loki (Prometheus-like solution for logs)
Cloud provider logging (CloudWatch, Stackdriver)

2. Not for Distributed Tracing

For following a request through multiple services, use:

Jaeger
Zipkin
OpenTelemetry

3. Not for 100% Accuracy

Prometheus values reliability over perfect accuracy. If you need exact billing-level accuracy for every single event, Prometheus isn't the right choice. It's designed for monitoring trends and patterns, not accounting.

4. Not for Long-Term Storage

Prometheus is designed for recent data (weeks to months). For long-term storage, integrate with:

Thanos
Cortex
VictoriaMetrics

When Should You Use Prometheus?

Based on my experience, use Prometheus when:

✅ You're running microservices or distributed systems

Prometheus excels at monitoring many services simultaneously

✅ You need real-time alerting

CPU usage spikes, error rate increases, service down

✅ You're in a dynamic environment

Containers, Kubernetes, auto-scaling

✅ You need to answer "what's happening right now?"

Current request rate, active connections, response times

✅ You're building TypeScript/Node.js applications

Excellent client libraries (prom-client) with TypeScript support

My First Prometheus Implementation

After that 4 AM wake-up call, I added Prometheus to my TypeScript API. Here's what I instrumented first:

HTTP request metrics - Count and duration of every request
Error rates - How many 4xx and 5xx responses
Database query times - How long database operations take
Active connections - How many connections to the database
Node.js metrics - Memory usage, event loop lag

Within a week, I:

Identified a slow endpoint I didn't know existed
Caught a memory leak before it became critical
Set up alerts for error rate increases
Created dashboards showing real-time traffic patterns

The difference was night and day. Instead of flying blind, I had visibility.

What's Next?

Now that you understand what Prometheus is and why it matters, the next articles will cover:

Understanding Metrics and Data Model - The different types of metrics and when to use each
Prometheus Architecture - How the pieces fit together
Instrumenting TypeScript Applications - Hands-on code with the prom-client library
PromQL Basics - Writing powerful queries
Configuration - Setting up Prometheus for your environment
Alerting - Creating meaningful alerts that actually help
Visualization - Building Grafana dashboards
Best Practices - Lessons learned from production

Key Takeaways

Before moving on, remember:

Prometheus is for metrics, not logs or traces - Use the right tool for the right job
Pull-based architecture - Your app exposes metrics; Prometheus scrapes them
Time series data with labels - Powerful multi-dimensional data model
Designed for reliability - Works when other systems are down
Perfect for cloud-native applications - Built for dynamic, containerized environments

In the next article, we'll dive deep into the metrics and data model—understanding counters, gauges, histograms, and summaries, with practical TypeScript examples.

The goal isn't just to add monitoring. The goal is to sleep better at night knowing you'll understand what's happening when things go wrong.

Next: Understanding Metrics and Data Model

PreviousPrometheus 101 NextUnderstanding Metrics and Data Model: The Building Blocks

Last updated 15 hours ago

hashtagWhen Monitoring Becomes Personal

hashtagWhat Is Prometheus, Really?

hashtagWhy Prometheus Specifically?

hashtagThe Prometheus Mental Model

hashtagWhat Prometheus Is NOT

hashtag1. Not for Logs

hashtag2. Not for Distributed Tracing

hashtag3. Not for 100% Accuracy

hashtag4. Not for Long-Term Storage

hashtagWhen Should You Use Prometheus?

hashtagMy First Prometheus Implementation

hashtagWhat's Next?

hashtagKey Takeaways