Auto-instrumentation is incredible for infrastructure layer—databases, HTTP calls, message queues. But it can't understand your business logic. It doesn't know that processOrder() involves fraud detection, inventory checks, and loyalty point calculations.
In my e-commerce platform, I had perfect traces of database queries and HTTP requests, but I couldn't answer critical questions:
Why did this order take 3 seconds when most take 200ms?
Which validation step failed?
How long did fraud detection take?
Was the inventory check the bottleneck?
Manual instrumentation fills this gap by tracing your business logic with rich, domain-specific context.
Creating Custom Spans
Basic Custom Span
import{trace,SpanStatusCode}from'@opentelemetry/api';consttracer=trace.getTracer('order-service','1.0.0');asyncfunctionvalidateOrder(order:Order):Promise<boolean>{returnawaittracer.startActiveSpan('validateOrder',async(span)=>{try{ // Add business contextspan.setAttribute('order.id',order.id);span.setAttribute('order.amount',order.amount);span.setAttribute('order.user_id',order.userId); // Your business logicconstisValid=order.amount>0&&order.amount<10000;span.setAttribute('validation.result',isValid);if (!isValid) {span.setStatus({code:SpanStatusCode.ERROR,message:'Order validation failed'});}returnisValid;}catch (error) {span.recordException(errorasError);span.setStatus({code:SpanStatusCode.ERROR,message: (errorasError).message});throwerror;}finally{span.end();}});}
Nested Spans for Complex Operations
Real business logic isn't linear—it has steps, branches, and parallel operations. Here's how I traced my order processing pipeline:
Span Attributes: Adding Rich Context
Semantic Conventions
Always follow OpenTelemetry semantic conventions when available:
Custom Business Attributes
For domain-specific attributes, use namespacing:
Attribute Value Types
Span Events: Recording Significant Moments
Events are timestamped annotations within a span:
Exception Handling
Recording Exceptions
Partial Failures
Sometimes operations partially succeed—trace that too:
Real Production Debugging Story
Problem: Premium users reported that checkout was slower than regular users.
await tracer.startActiveSpan('calculateLoyaltyDiscount', async (span) => {
const itemIds = items.map(i => i.id);
span.setAttribute('loyalty.items_count', itemIds.length);
// Single batch call instead of N calls
const discounts = await fetchBatchLoyaltyDiscounts(userId, itemIds);
span.setAttribute('loyalty.api_calls', 1); // Now just 1!
// ...
});
// Bad: One giant span
await tracer.startActiveSpan('handleRequest', async (span) => {
await validateInput();
await checkAuth();
await processBusinessLogic();
await saveToDatabase();
await sendNotification();
// Can't see which step is slow!
});
// Bad: Too many spans
await tracer.startActiveSpan('validateEmail', ...);
await tracer.startActiveSpan('validatePhone', ...);
await tracer.startActiveSpan('validateAddress', ...);
// Noisy, hard to analyze