Observability

Introduction

In distributed systems, debugging is fundamentally different. A single request might traverse dozens of services, any of which could be the source of a problem. From operating microservices in production, I've learned that without proper observability, you're flying blind.

This article covers the three pillars of observability: distributed tracing, centralized logging, and metrics collection, along with practical implementations using OpenTelemetry.

The Three Pillars

spinner
Pillar
Purpose
Example Tools

Traces

Follow request across services

Jaeger, Zipkin

Logs

Detailed event records

ELK Stack, Loki

Metrics

Aggregated measurements

Prometheus, Grafana

OpenTelemetry Setup

Installation and Configuration

# requirements.txt
opentelemetry-api==1.21.0
opentelemetry-sdk==1.21.0
opentelemetry-exporter-otlp==1.21.0
opentelemetry-instrumentation-fastapi==0.42b0
opentelemetry-instrumentation-httpx==0.42b0
opentelemetry-instrumentation-sqlalchemy==0.42b0
opentelemetry-instrumentation-redis==0.42b0

Distributed Tracing

Trace Context Propagation

spinner

Span Events and Annotations

Centralized Logging

Structured Logging

Context-Aware Logger

Metrics Collection

Custom Metrics

RED Metrics (Rate, Errors, Duration)

Health Checks

Comprehensive Health Endpoint

Alerting Rules

Dashboard Example

Key Takeaways

  1. Trace every request - Use distributed tracing with context propagation

  2. Structured logs with context - Include trace IDs for correlation

  3. RED metrics everywhere - Rate, Errors, Duration for every service

  4. Health checks for dependencies - Know when services are degraded

  5. Alerting on symptoms - Alert on user-facing issues, not internal metrics

What's Next?

With observability in place, we need to package and deploy our services. In Article 11: Containerization & Deployment, we'll cover Docker best practices and Docker Compose for local development.


This article is part of the Microservice Architecture 101 series.

Last updated