Observability-Driven Development

The Shift: Observability as a First-Class Concern

After two years of retrofitting OpenTelemetry into legacy systems, I learned the hard way: observability should be part of the design, not an afterthought.

Here's what changed when I started building systems with observability as a core requirement from day one.

Before vs. After

Before (Observability as Afterthought)

// Write code first
export async function processOrder(orderId: string) {
  const order = await db.query('SELECT * FROM orders WHERE id = $1', [orderId]);
  const inventory = await checkInventory(order.items);
  const payment = await processPayment(order.total);
  const shipping = await createShipment(order);
  return { success: true };
}

// Add observability later (painful)
export async function processOrder(orderId: string) {
  const span = tracer.startSpan('processOrder');
  try {
    span.setAttribute('order.id', orderId);
    const order = await db.query('SELECT * FROM orders WHERE id = $1', [orderId]);
    span.setAttribute('order.total', order.total);
    
    // Forgot to add span for this!
    const inventory = await checkInventory(order.items);
    
    const paymentSpan = tracer.startSpan('processPayment');
    const payment = await processPayment(order.total);
    paymentSpan.end();
    
    // This needs its own span too...
    const shipping = await createShipment(order);
    
    span.setStatus({ code: SpanStatusCode.OK });
    return { success: true };
  } catch (error) {
    span.setStatus({ code: SpanStatusCode.ERROR });
    span.recordException(error);
    throw error;
  } finally {
    span.end();
  }
}

After (Observability-First)

Key Differences:

  1. Spans are part of the function design - Not bolted on later

  2. Consistent attribute naming - Follows semantic conventions from day one

  3. Meaningful events - Business context built in

  4. Error context is rich - Failures tell the full story

Design Patterns for Observability

1. Observability Decorators

2. Observable Domain Models

3. Context Propagation Helpers

Testing with Observability

In-Memory Span Exporter for Tests

Trace Assertions

Documentation Through Observability

Self-Documenting Traces

Result in Jaeger:

Performance Budgets

Setting SLOs with Metrics

Prometheus Alerting

Real Production Example

Here's a complete service designed with observability first:

Key Takeaways

  1. Design APIs with tracing in mind - Each operation should be a span

  2. Use decorators/wrappers - Don't duplicate instrumentation code

  3. Test your telemetry - Spans are part of your API contract

  4. Document with events - Let traces tell the story

  5. Set performance budgets - SLOs backed by metrics

  6. Observability is not optional - It's part of the feature

Conclusion

Building systems with observability from day one changed how I develop software:

  • Faster debugging: Instrumentation is already there when things break

  • Better designs: Thinking in traces leads to cleaner code boundaries

  • Living documentation: Traces show how the system actually works

  • Confidence in production: Comprehensive visibility from launch day

You've completed the OpenTelemetry 101 series! 🎉

You now have everything you need to:

  • ✅ Instrument TypeScript/Node.js services

  • ✅ Collect traces, metrics, and logs

  • ✅ Deploy to production at scale

  • ✅ Integrate with any backend

  • ✅ Build observable systems from the ground up

Go forth and make your systems observable!


Previous: ← Multi-Backend Integration | Series Home: OpenTelemetry 101 Index

Observability is not a feature, it's a philosophy.

Last updated