When I first started with OpenTelemetry, I tried to answer every question with traces. "How many orders per minute?" I'd count spans. "What's our error rate?" I'd filter failed spans. "Average response time?" Span duration aggregation.
This was a terrible idea. Traces are expensive—you can't keep every single one at high volume. I was sampling 10% of traffic, which meant my "metrics" were statistically wrong.
Then I discovered proper metrics, and everything clicked. Traces are for debugging individual requests. Metrics are for understanding system behavior over time.
The Three Types of Metrics
1. Counter: Counting Events
Counters only go up. They track cumulative totals.
When to use:
Total requests processed
Total errors encountered
Total bytes sent
Total orders completed
import{metrics}from'@opentelemetry/api';import{MeterProvider,PeriodicExportingMetricReader}from'@opentelemetry/sdk-metrics';import{OTLPMetricExporter}from'@opentelemetry/exporter-metrics-otlp-http';import{Resource}from'@opentelemetry/resources';import{ATTR_SERVICE_NAME}from'@opentelemetry/semantic-conventions';// Set up metricsconstmetricExporter=newOTLPMetricExporter({url:'http://localhost:4318/v1/metrics',});constmeterProvider=newMeterProvider({resource:newResource({[ATTR_SERVICE_NAME]:'order-service',}),readers: [newPeriodicExportingMetricReader({exporter:metricExporter,exportIntervalMillis:10000,// Export every 10 seconds}), ],});metrics.setGlobalMeterProvider(meterProvider);constmeter=metrics.getMeter('order-service','1.0.0');// Create countersconstorderCounter=meter.createCounter('orders.created',{description:'Total number of orders created',unit:'1',});consterrorCounter=meter.createCounter('orders.errors',{description:'Total number of order processing errors',unit:'1',});constrevenueCounter=meter.createCounter('revenue.total',{description:'Total revenue in USD',unit:'USD',});// Using countersexportasyncfunctioncreateOrder(userId:string,amount:number,items:any[]):Promise<Order>{try{constorder=awaitsaveOrderToDatabase(userId,amount,items); // Increment counter with attributesorderCounter.add(1,{'order.status':'completed','user.tier':awaitgetUserTier(userId),'order.channel':'web'}); // Track revenuerevenueCounter.add(amount,{'currency':'USD','payment.method':'credit_card'});returnorder;}catch (error) { // Track errorserrorCounter.add(1,{'error.type': (errorasError).name,'operation':'createOrder'});throwerror;}}asyncfunctionsaveOrderToDatabase(userId:string,amount:number,items:any[]):Promise<Order>{ // Simulatedreturn{id:`ORD-${Date.now()}`,userId,amount,items,status:'completed',createdAt:newDate()};}asyncfunctiongetUserTier(userId:string):Promise<string>{return'premium';// Simulated}interfaceOrder{id:string;userId:string;amount:number;items:any[];status:string;createdAt:Date;}
2. Gauge: Measuring Current State
Gauges represent a value that can go up or down.
When to use:
Current memory usage
Active connections
Queue size
Items in cart
Current temperature
3. Histogram: Distribution of Values
Histograms track the distribution of values over time.
When to use:
Request duration
Request payload size
Order value distribution
Database query duration
Real-World Metrics Dashboard
Here's the complete metrics setup I use in production:
Visualizing Metrics with Prometheus
Start Prometheus with Docker:
Visit http://localhost:9090 and query:
Production Learnings: The Metrics That Mattered
1. Error Budget Monitoring
I track error budgets using metrics, not traces:
2. Capacity Planning
Metrics revealed we were hitting PostgreSQL connection limits at 1000 req/s:
Alert when utilization > 80% → time to scale!
3. Business KPIs
Technical metrics don't tell the full story. Business metrics do:
Best Practices
1. Use Proper Metric Types
2. Keep Cardinality Low
3. Namespace Your Metrics
4. Export to Multiple Backends
Production systems need both Prometheus (alerting) and cloud backends (long-term storage):
// Average order value
const avgOrderValue = meter.createObservableGauge('business.order.average_value', {
description: 'Average order value over last hour',
unit: 'USD',
});
let orderValues: number[] = [];
avgOrderValue.addCallback((result) => {
if (orderValues.length > 0) {
const avg = orderValues.reduce((a, b) => a + b, 0) / orderValues.length;
result.observe(avg);
}
});
// Reset hourly
setInterval(() => { orderValues = []; }, 3600000);
// ❌ Wrong: Using counter for current value
const activeUsers = meter.createCounter('users.active'); // NO!
// ✅ Right: Use gauge for current value
const activeUsers = meter.createObservableGauge('users.active');
// Good metric naming
'business.orders.created.total'
'business.revenue.total'
'http.server.request.duration'
'db.query.duration'
'cache.hits.total'
const meterProvider = new MeterProvider({
readers: [
prometheusExporter, // For alerting
new PeriodicExportingMetricReader({
exporter: otlpExporter, // For correlation with traces
}),
],
});