Manual Instrumentation Deep Dive

When Auto-Instrumentation Isn't Enough

Auto-instrumentation is incredible for infrastructure layer—databases, HTTP calls, message queues. But it can't understand your business logic. It doesn't know that processOrder() involves fraud detection, inventory checks, and loyalty point calculations.

In my e-commerce platform, I had perfect traces of database queries and HTTP requests, but I couldn't answer critical questions:

  • Why did this order take 3 seconds when most take 200ms?

  • Which validation step failed?

  • How long did fraud detection take?

  • Was the inventory check the bottleneck?

Manual instrumentation fills this gap by tracing your business logic with rich, domain-specific context.

Creating Custom Spans

Basic Custom Span

import { trace, SpanStatusCode } from '@opentelemetry/api';

const tracer = trace.getTracer('order-service', '1.0.0');

async function validateOrder(order: Order): Promise<boolean> {
  return await tracer.startActiveSpan('validateOrder', async (span) => {
    try {
      // Add business context
      span.setAttribute('order.id', order.id);
      span.setAttribute('order.amount', order.amount);
      span.setAttribute('order.user_id', order.userId);
      
      // Your business logic
      const isValid = order.amount > 0 && order.amount < 10000;
      
      span.setAttribute('validation.result', isValid);
      
      if (!isValid) {
        span.setStatus({
          code: SpanStatusCode.ERROR,
          message: 'Order validation failed'
        });
      }
      
      return isValid;
    } catch (error) {
      span.recordException(error as Error);
      span.setStatus({
        code: SpanStatusCode.ERROR,
        message: (error as Error).message
      });
      throw error;
    } finally {
      span.end();
    }
  });
}

Nested Spans for Complex Operations

Real business logic isn't linear—it has steps, branches, and parallel operations. Here's how I traced my order processing pipeline:

Span Attributes: Adding Rich Context

Semantic Conventions

Always follow OpenTelemetry semantic conventions when available:

Custom Business Attributes

For domain-specific attributes, use namespacing:

Attribute Value Types

Span Events: Recording Significant Moments

Events are timestamped annotations within a span:

Exception Handling

Recording Exceptions

Partial Failures

Sometimes operations partially succeed—trace that too:

Real Production Debugging Story

Problem: Premium users reported that checkout was slower than regular users.

Investigation using custom spans:

  1. Added instrumentation to loyalty calculation:

  1. Found the issue in Jaeger:

    • Regular users: No loyalty span (instant)

    • Premium users: calculateLoyaltyDiscount taking 800-1200ms

    • Span showed: loyalty.api_calls: 3 (one per item!)

  2. Root cause: N+1 query problem

    • Premium users got per-item discounts

    • Each item = separate API call to loyalty service

    • Cart with 10 items = 10 API calls

  3. Fix: Batch API call

  1. Result: Premium checkout went from 1.2s to 180ms

Best Practices

1. Span Granularity

Too coarse:

Too fine:

Just right:

2. Meaningful Names

3. Always Use try/finally

4. Attribute Cardinality

What's Next

You've mastered custom instrumentation for business logic. Continue to Metrics Collection to learn:

  • Counters, gauges, and histograms

  • When to use metrics vs traces

  • Creating custom business metrics

  • Metric aggregation and analysis


Previous: ← Automatic Instrumentation | Next: Metrics Collection →

Traces show you the journey. Attributes tell you the story.

Last updated