Introduction to Prometheus: The Wake-Up Call I Needed
When Monitoring Becomes Personal
I learned about Prometheus the hard way. Not from documentation, not from a tutorial, but from a 4 AM incident where I had absolutely no idea what was happening inside my production TypeScript application.
The API was responding slowly. Users were complaining. The server CPU looked fine. Memory seemed okay. But something was wrong, and I was flying blind. I had logs—thousands of lines of them—but logs only tell you what happened, not why it's happening or how bad things really are.
That's when I realized: I needed metrics, not just logs.
I needed to answer questions like:
How many requests per second am I actually handling?
What's the 95th percentile response time?
Which endpoints are slowing down?
Is my database connection pool exhausted?
How much memory is my Node.js process actually using over time?
Logs couldn't answer these questions in real-time. I needed a time-series monitoring system. I needed Prometheus.
What Is Prometheus, Really?
Prometheus is an open-source monitoring and alerting toolkit that collects and stores metrics as time series data. But that definition doesn't capture what it actually does for you.
Here's what Prometheus really is: It's your application's vital signs monitor.
Think of it like this:
Logs tell you the story of what your application did
Traces show you the path of a single request
Metrics give you the health statistics of your entire system over time
Prometheus excels at metrics. It tracks numerical measurements over time, allowing you to answer questions like:
"How many users logged in during the last hour?"
"What's my API's average response time?"
"Is my error rate increasing?"
"How many database connections are active right now?"
Why Prometheus Specifically?
After that 4 AM incident, I evaluated several monitoring solutions. Here's why I chose Prometheus:
1. Pull-Based Architecture Unlike systems where you push metrics to a central server, Prometheus pulls metrics from your application. This means:
Your application doesn't need to know where Prometheus is
No complicated client configuration for multiple monitoring servers
Prometheus controls the scraping frequency
Your app just exposes metrics; Prometheus does the rest
2. Multi-Dimensional Data Model Prometheus uses labels to add dimensions to metrics. Instead of tracking separate metrics like:
api_requests_loginapi_requests_checkoutapi_requests_search
You track one metric with labels:
This makes querying incredibly powerful.
3. Powerful Query Language (PromQL) PromQL lets you slice and dice your data in real-time:
4. No External Dependencies Prometheus stores data locally on disk. No need for a separate database. This means it's reliable even when other systems fail—exactly what you want during an outage.
5. Built for Cloud-Native Prometheus was built at SoundCloud and donated to the Cloud Native Computing Foundation (CNCF). It's designed for dynamic environments where services come and go, making it perfect for containerized applications.
The Prometheus Mental Model
Understanding Prometheus requires a mental shift. Here's how I think about it:
Your Application's Job:
Instrument your code (add metric counters, gauges, etc.)
Expose metrics at an HTTP endpoint (usually
/metrics)That's it. You're done.
Prometheus's Job:
Scrape your
/metricsendpoint periodically (every 15 seconds by default)Store the metrics as time series data
Evaluate alert rules
Provide query interface for dashboards
This separation of concerns is elegant. Your application focuses on exposing data; Prometheus handles collection, storage, and alerting.
What Prometheus Is NOT
Let me save you some time by clarifying what Prometheus isn't good for:
1. Not for Logs
Prometheus handles metrics, not logs. For logs, use:
ELK Stack (Elasticsearch, Logstash, Kibana)
Loki (Prometheus-like solution for logs)
Cloud provider logging (CloudWatch, Stackdriver)
2. Not for Distributed Tracing
For following a request through multiple services, use:
Jaeger
Zipkin
OpenTelemetry
3. Not for 100% Accuracy
Prometheus values reliability over perfect accuracy. If you need exact billing-level accuracy for every single event, Prometheus isn't the right choice. It's designed for monitoring trends and patterns, not accounting.
4. Not for Long-Term Storage
Prometheus is designed for recent data (weeks to months). For long-term storage, integrate with:
Thanos
Cortex
VictoriaMetrics
When Should You Use Prometheus?
Based on my experience, use Prometheus when:
✅ You're running microservices or distributed systems
Prometheus excels at monitoring many services simultaneously
✅ You need real-time alerting
CPU usage spikes, error rate increases, service down
✅ You're in a dynamic environment
Containers, Kubernetes, auto-scaling
✅ You need to answer "what's happening right now?"
Current request rate, active connections, response times
✅ You're building TypeScript/Node.js applications
Excellent client libraries (prom-client) with TypeScript support
My First Prometheus Implementation
After that 4 AM wake-up call, I added Prometheus to my TypeScript API. Here's what I instrumented first:
HTTP request metrics - Count and duration of every request
Error rates - How many 4xx and 5xx responses
Database query times - How long database operations take
Active connections - How many connections to the database
Node.js metrics - Memory usage, event loop lag
Within a week, I:
Identified a slow endpoint I didn't know existed
Caught a memory leak before it became critical
Set up alerts for error rate increases
Created dashboards showing real-time traffic patterns
The difference was night and day. Instead of flying blind, I had visibility.
What's Next?
Now that you understand what Prometheus is and why it matters, the next articles will cover:
Understanding Metrics and Data Model - The different types of metrics and when to use each
Prometheus Architecture - How the pieces fit together
Instrumenting TypeScript Applications - Hands-on code with the prom-client library
PromQL Basics - Writing powerful queries
Configuration - Setting up Prometheus for your environment
Alerting - Creating meaningful alerts that actually help
Visualization - Building Grafana dashboards
Best Practices - Lessons learned from production
Key Takeaways
Before moving on, remember:
Prometheus is for metrics, not logs or traces - Use the right tool for the right job
Pull-based architecture - Your app exposes metrics; Prometheus scrapes them
Time series data with labels - Powerful multi-dimensional data model
Designed for reliability - Works when other systems are down
Perfect for cloud-native applications - Built for dynamic, containerized environments
In the next article, we'll dive deep into the metrics and data model—understanding counters, gauges, histograms, and summaries, with practical TypeScript examples.
The goal isn't just to add monitoring. The goal is to sleep better at night knowing you'll understand what's happening when things go wrong.
Last updated