# Manual Instrumentation Deep Dive

## When Auto-Instrumentation Isn't Enough

Auto-instrumentation is incredible for infrastructure layer—databases, HTTP calls, message queues. But it can't understand your business logic. It doesn't know that `processOrder()` involves fraud detection, inventory checks, and loyalty point calculations.

In my e-commerce platform, I had perfect traces of database queries and HTTP requests, but I couldn't answer critical questions:

* Why did this order take 3 seconds when most take 200ms?
* Which validation step failed?
* How long did fraud detection take?
* Was the inventory check the bottleneck?

Manual instrumentation fills this gap by tracing your business logic with rich, domain-specific context.

## Creating Custom Spans

### Basic Custom Span

```typescript
import { trace, SpanStatusCode } from '@opentelemetry/api';

const tracer = trace.getTracer('order-service', '1.0.0');

async function validateOrder(order: Order): Promise<boolean> {
  return await tracer.startActiveSpan('validateOrder', async (span) => {
    try {
      // Add business context
      span.setAttribute('order.id', order.id);
      span.setAttribute('order.amount', order.amount);
      span.setAttribute('order.user_id', order.userId);
      
      // Your business logic
      const isValid = order.amount > 0 && order.amount < 10000;
      
      span.setAttribute('validation.result', isValid);
      
      if (!isValid) {
        span.setStatus({
          code: SpanStatusCode.ERROR,
          message: 'Order validation failed'
        });
      }
      
      return isValid;
    } catch (error) {
      span.recordException(error as Error);
      span.setStatus({
        code: SpanStatusCode.ERROR,
        message: (error as Error).message
      });
      throw error;
    } finally {
      span.end();
    }
  });
}
```

### Nested Spans for Complex Operations

Real business logic isn't linear—it has steps, branches, and parallel operations. Here's how I traced my order processing pipeline:

```typescript
interface OrderProcessingResult {
  success: boolean;
  orderId: string;
  fraudCheckPassed?: boolean;
  inventoryReserved?: boolean;
  paymentProcessed?: boolean;
  error?: string;
}

async function processCompleteOrder(orderData: {
  userId: string;
  items: Array<{ sku: string; quantity: number }>;
  amount: number;
}): Promise<OrderProcessingResult> {
  return await tracer.startActiveSpan('processCompleteOrder', async (parentSpan) => {
    try {
      parentSpan.setAttribute('order.user_id', orderData.userId);
      parentSpan.setAttribute('order.items_count', orderData.items.length);
      parentSpan.setAttribute('order.total_amount', orderData.amount);
      
      // Step 1: Fraud Detection
      const fraudCheckPassed = await tracer.startActiveSpan('fraudCheck', async (span) => {
        try {
          span.setAttribute('fraud.user_id', orderData.userId);
          span.setAttribute('fraud.amount', orderData.amount);
          
          // Simulate fraud score calculation
          const fraudScore = await calculateFraudScore(orderData);
          span.setAttribute('fraud.score', fraudScore);
          
          const passed = fraudScore < 0.7;
          span.setAttribute('fraud.passed', passed);
          
          if (!passed) {
            span.addEvent('Fraud detection failed', {
              'fraud.reason': 'High risk score',
              'fraud.threshold': 0.7
            });
            span.setStatus({ code: SpanStatusCode.ERROR, message: 'Fraud check failed' });
          }
          
          return passed;
        } finally {
          span.end();
        }
      });
      
      if (!fraudCheckPassed) {
        parentSpan.setStatus({ code: SpanStatusCode.ERROR, message: 'Fraud check failed' });
        return { success: false, orderId: '', fraudCheckPassed: false, error: 'Fraud detected' };
      }
      
      // Step 2: Inventory Check and Reservation
      const inventoryResult = await tracer.startActiveSpan('checkAndReserveInventory', async (span) => {
        try {
          span.setAttribute('inventory.items_count', orderData.items.length);
          
          const checks = await Promise.all(
            orderData.items.map((item, index) =>
              tracer.startActiveSpan(`checkInventory.item${index}`, async (itemSpan) => {
                try {
                  itemSpan.setAttribute('inventory.sku', item.sku);
                  itemSpan.setAttribute('inventory.quantity_requested', item.quantity);
                  
                  const available = await checkInventory(item.sku, item.quantity);
                  itemSpan.setAttribute('inventory.available', available);
                  
                  if (!available) {
                    itemSpan.addEvent('Insufficient inventory', {
                      'inventory.sku': item.sku,
                      'inventory.quantity_requested': item.quantity
                    });
                  }
                  
                  return available;
                } finally {
                  itemSpan.end();
                }
              })
            )
          );
          
          const allAvailable = checks.every(c => c);
          span.setAttribute('inventory.all_available', allAvailable);
          
          if (allAvailable) {
            // Reserve inventory
            await reserveInventory(orderData.items);
            span.addEvent('Inventory reserved');
          } else {
            span.setStatus({ code: SpanStatusCode.ERROR, message: 'Insufficient inventory' });
          }
          
          return allAvailable;
        } finally {
          span.end();
        }
      });
      
      if (!inventoryResult) {
        parentSpan.setStatus({ code: SpanStatusCode.ERROR, message: 'Inventory check failed' });
        return { success: false, orderId: '', inventoryReserved: false, error: 'Out of stock' };
      }
      
      // Step 3: Payment Processing
      const paymentResult = await tracer.startActiveSpan('processPayment', async (span) => {
        try {
          span.setAttribute('payment.amount', orderData.amount);
          span.setAttribute('payment.currency', 'USD');
          span.setAttribute('payment.user_id', orderData.userId);
          
          const startTime = Date.now();
          const success = await chargePayment(orderData.userId, orderData.amount);
          const duration = Date.now() - startTime;
          
          span.setAttribute('payment.success', success);
          span.setAttribute('payment.duration_ms', duration);
          
          if (success) {
            span.addEvent('Payment successful', {
              'payment.transaction_id': `TXN-${Date.now()}`
            });
          } else {
            span.setStatus({ code: SpanStatusCode.ERROR, message: 'Payment failed' });
            span.addEvent('Payment declined');
          }
          
          return success;
        } finally {
          span.end();
        }
      });
      
      if (!paymentResult) {
        // Rollback inventory reservation
        await tracer.startActiveSpan('rollbackInventory', async (span) => {
          try {
            await releaseInventory(orderData.items);
            span.addEvent('Inventory reservation released');
          } finally {
            span.end();
          }
        });
        
        parentSpan.setStatus({ code: SpanStatusCode.ERROR, message: 'Payment failed' });
        return { success: false, orderId: '', paymentProcessed: false, error: 'Payment declined' };
      }
      
      // Step 4: Create Order Record
      const orderId = await tracer.startActiveSpan('createOrderRecord', async (span) => {
        try {
          const id = `ORD-${Date.now()}`;
          span.setAttribute('order.id', id);
          
          await saveOrder({
            id,
            userId: orderData.userId,
            items: orderData.items,
            amount: orderData.amount,
            status: 'completed'
          });
          
          span.addEvent('Order record created');
          return id;
        } finally {
          span.end();
        }
      });
      
      // Step 5: Apply Loyalty Points
      await tracer.startActiveSpan('applyLoyaltyPoints', async (span) => {
        try {
          const points = Math.floor(orderData.amount * 10); // 10 points per dollar
          span.setAttribute('loyalty.points_awarded', points);
          span.setAttribute('loyalty.user_id', orderData.userId);
          
          await awardLoyaltyPoints(orderData.userId, points);
          span.addEvent('Loyalty points awarded');
        } catch (error) {
          // Non-critical: don't fail the order if loyalty fails
          span.recordException(error as Error);
          span.setStatus({ code: SpanStatusCode.ERROR, message: 'Loyalty points failed' });
        } finally {
          span.end();
        }
      });
      
      parentSpan.setAttribute('order.final_id', orderId);
      parentSpan.setStatus({ code: SpanStatusCode.OK });
      
      return {
        success: true,
        orderId,
        fraudCheckPassed: true,
        inventoryReserved: true,
        paymentProcessed: true
      };
      
    } catch (error) {
      parentSpan.recordException(error as Error);
      parentSpan.setStatus({
        code: SpanStatusCode.ERROR,
        message: (error as Error).message
      });
      return { success: false, orderId: '', error: (error as Error).message };
    } finally {
      parentSpan.end();
    }
  });
}

// Helper functions (simulated)
async function calculateFraudScore(orderData: any): Promise<number> {
  await new Promise(r => setTimeout(r, 150)); // Simulate API call
  return Math.random();
}

async function checkInventory(sku: string, quantity: number): Promise<boolean> {
  await new Promise(r => setTimeout(r, 50));
  return Math.random() > 0.1; // 90% success rate
}

async function reserveInventory(items: any[]): Promise<void> {
  await new Promise(r => setTimeout(r, 100));
}

async function chargePayment(userId: string, amount: number): Promise<boolean> {
  await new Promise(r => setTimeout(r, 200));
  return Math.random() > 0.05; // 95% success rate
}

async function releaseInventory(items: any[]): Promise<void> {
  await new Promise(r => setTimeout(r, 50));
}

async function saveOrder(order: any): Promise<void> {
  await new Promise(r => setTimeout(r, 80));
}

async function awardLoyaltyPoints(userId: string, points: number): Promise<void> {
  await new Promise(r => setTimeout(r, 30));
}
```

## Span Attributes: Adding Rich Context

### Semantic Conventions

Always follow OpenTelemetry semantic conventions when available:

```typescript
import { ATTR_HTTP_METHOD, ATTR_HTTP_ROUTE, ATTR_HTTP_STATUS_CODE } from '@opentelemetry/semantic-conventions';

span.setAttribute(ATTR_HTTP_METHOD, 'POST');
span.setAttribute(ATTR_HTTP_ROUTE, '/api/orders');
span.setAttribute(ATTR_HTTP_STATUS_CODE, 201);
```

### Custom Business Attributes

For domain-specific attributes, use namespacing:

```typescript
// Good: Namespaced attributes
span.setAttribute('order.id', orderId);
span.setAttribute('order.total_items', items.length);
span.setAttribute('order.payment_method', 'credit_card');
span.setAttribute('user.tier', 'premium');
span.setAttribute('fraud.risk_level', 'low');
span.setAttribute('inventory.warehouse', 'US-WEST-1');

// Avoid: Generic names
span.setAttribute('id', orderId);  // Too generic
span.setAttribute('count', items.length);  // What count?
```

### Attribute Value Types

```typescript
// String
span.setAttribute('order.status', 'completed');

// Number
span.setAttribute('order.amount', 99.99);
span.setAttribute('order.items_count', 5);

// Boolean
span.setAttribute('order.express_shipping', true);
span.setAttribute('fraud.high_risk', false);

// Arrays
span.setAttribute('order.item_skus', ['SKU-123', 'SKU-456']);
span.setAttribute('order.categories', ['electronics', 'accessories']);
```

## Span Events: Recording Significant Moments

Events are timestamped annotations within a span:

```typescript
async function processRefund(orderId: string, amount: number): Promise<void> {
  return await tracer.startActiveSpan('processRefund', async (span) => {
    try {
      span.setAttribute('refund.order_id', orderId);
      span.setAttribute('refund.amount', amount);
      
      // Event: Refund initiated
      span.addEvent('Refund initiated', {
        'refund.initiated_by': 'customer_support',
        'refund.reason': 'damaged_item'
      });
      
      // Check eligibility
      const eligible = await checkRefundEligibility(orderId);
      span.addEvent('Eligibility checked', {
        'refund.eligible': eligible
      });
      
      if (!eligible) {
        span.addEvent('Refund rejected', {
          'refund.rejection_reason': 'Outside return window'
        });
        throw new Error('Refund not eligible');
      }
      
      // Process refund
      await initiateRefund(orderId, amount);
      span.addEvent('Refund processed', {
        'refund.transaction_id': `REF-${Date.now()}`
      });
      
      // Update order status
      await updateOrderStatus(orderId, 'refunded');
      span.addEvent('Order status updated', {
        'order.new_status': 'refunded'
      });
      
    } finally {
      span.end();
    }
  });
}

async function checkRefundEligibility(orderId: string): Promise<boolean> {
  await new Promise(r => setTimeout(r, 100));
  return true;
}

async function initiateRefund(orderId: string, amount: number): Promise<void> {
  await new Promise(r => setTimeout(r, 300));
}

async function updateOrderStatus(orderId: string, status: string): Promise<void> {
  await new Promise(r => setTimeout(r, 50));
}
```

## Exception Handling

### Recording Exceptions

```typescript
async function riskyOperation(data: any): Promise<void> {
  return await tracer.startActiveSpan('riskyOperation', async (span) => {
    try {
      span.setAttribute('operation.data_size', JSON.stringify(data).length);
      
      // Risky operation
      if (!data.required_field) {
        throw new Error('Missing required field');
      }
      
      await performOperation(data);
      
    } catch (error) {
      // Record the exception with full stack trace
      span.recordException(error as Error);
      
      // Set span status to error
      span.setStatus({
        code: SpanStatusCode.ERROR,
        message: (error as Error).message
      });
      
      // Add contextual event
      span.addEvent('Operation failed', {
        'error.type': (error as Error).name,
        'error.handled': true
      });
      
      // Re-throw or handle
      throw error;
    } finally {
      span.end();
    }
  });
}

async function performOperation(data: any): Promise<void> {
  await new Promise(r => setTimeout(r, 100));
}
```

### Partial Failures

Sometimes operations partially succeed—trace that too:

```typescript
async function batchProcessOrders(orderIds: string[]): Promise<void> {
  return await tracer.startActiveSpan('batchProcessOrders', async (span) => {
    try {
      span.setAttribute('batch.total_orders', orderIds.length);
      
      let successCount = 0;
      let failureCount = 0;
      const errors: string[] = [];
      
      for (const orderId of orderIds) {
        try {
          await tracer.startActiveSpan(`processOrder.${orderId}`, async (orderSpan) => {
            try {
              orderSpan.setAttribute('order.id', orderId);
              await processSingleOrder(orderId);
              successCount++;
              orderSpan.addEvent('Order processed successfully');
            } catch (error) {
              failureCount++;
              errors.push(orderId);
              orderSpan.recordException(error as Error);
              orderSpan.setStatus({ code: SpanStatusCode.ERROR, message: (error as Error).message });
              // Don't throw - continue with other orders
            } finally {
              orderSpan.end();
            }
          });
        } catch (error) {
          // Catch outer errors
          failureCount++;
        }
      }
      
      span.setAttribute('batch.success_count', successCount);
      span.setAttribute('batch.failure_count', failureCount);
      span.setAttribute('batch.failed_orders', errors);
      
      if (failureCount > 0) {
        span.addEvent('Batch completed with failures', {
          'batch.failure_rate': (failureCount / orderIds.length) * 100
        });
        
        if (failureCount === orderIds.length) {
          span.setStatus({ code: SpanStatusCode.ERROR, message: 'All orders failed' });
        }
      }
      
    } finally {
      span.end();
    }
  });
}

async function processSingleOrder(orderId: string): Promise<void> {
  await new Promise(r => setTimeout(r, 100));
  if (Math.random() < 0.1) {
    throw new Error('Processing failed');
  }
}
```

## Real Production Debugging Story

**Problem**: Premium users reported that checkout was slower than regular users.

**Investigation using custom spans**:

1. Added instrumentation to loyalty calculation:

```typescript
await tracer.startActiveSpan('calculateLoyaltyDiscount', async (span) => {
  span.setAttribute('user.tier', userTier);
  const discount = await fetchLoyaltyDiscount(userId);
  span.setAttribute('loyalty.discount_amount', discount);
  span.setAttribute('loyalty.calculation_time_ms', Date.now() - start);
  // ...
});
```

2. **Found the issue** in Jaeger:
   * Regular users: No loyalty span (instant)
   * Premium users: `calculateLoyaltyDiscount` taking 800-1200ms
   * Span showed: `loyalty.api_calls: 3` (one per item!)
3. **Root cause**: N+1 query problem
   * Premium users got per-item discounts
   * Each item = separate API call to loyalty service
   * Cart with 10 items = 10 API calls
4. **Fix**: Batch API call

```typescript
await tracer.startActiveSpan('calculateLoyaltyDiscount', async (span) => {
  const itemIds = items.map(i => i.id);
  span.setAttribute('loyalty.items_count', itemIds.length);
  
  // Single batch call instead of N calls
  const discounts = await fetchBatchLoyaltyDiscounts(userId, itemIds);
  span.setAttribute('loyalty.api_calls', 1); // Now just 1!
  // ...
});
```

5. **Result**: Premium checkout went from 1.2s to 180ms

## Best Practices

### 1. Span Granularity

**Too coarse:**

```typescript
// Bad: One giant span
await tracer.startActiveSpan('handleRequest', async (span) => {
  await validateInput();
  await checkAuth();
  await processBusinessLogic();
  await saveToDatabase();
  await sendNotification();
  // Can't see which step is slow!
});
```

**Too fine:**

```typescript
// Bad: Too many spans
await tracer.startActiveSpan('validateEmail', ...);
await tracer.startActiveSpan('validatePhone', ...);
await tracer.startActiveSpan('validateAddress', ...);
// Noisy, hard to analyze
```

**Just right:**

```typescript
// Good: Logical grouping
await tracer.startActiveSpan('validateCustomerData', async (span) => {
  const emailValid = validateEmail(); // No span needed
  const phoneValid = validatePhone(); // No span needed
  span.setAttribute('validation.email_valid', emailValid);
  span.setAttribute('validation.phone_valid', phoneValid);
});
```

### 2. Meaningful Names

```typescript
// Bad
await tracer.startActiveSpan('doStuff', ...);
await tracer.startActiveSpan('process', ...);

// Good
await tracer.startActiveSpan('validateOrderItems', ...);
await tracer.startActiveSpan('calculateShippingCost', ...);
await tracer.startActiveSpan('applyPromotionalDiscounts', ...);
```

### 3. Always Use try/finally

```typescript
// Always end spans, even on error
return await tracer.startActiveSpan('operation', async (span) => {
  try {
    // Your code
  } catch (error) {
    span.recordException(error);
    throw error;
  } finally {
    span.end(); // CRITICAL!
  }
});
```

### 4. Attribute Cardinality

```typescript
// Bad: High cardinality (millions of unique values)
span.setAttribute('user.email', email); // Too many unique values
span.setAttribute('order.timestamp', new Date().toISOString()); // Infinite cardinality

// Good: Low cardinality (groupable values)
span.setAttribute('user.tier', tier); // Limited values: free, premium, enterprise
span.setAttribute('order.hour', new Date().getHours()); // 0-23
span.setAttribute('order.day_of_week', new Date().getDay()); // 0-6
```

## What's Next

You've mastered custom instrumentation for business logic. Continue to [Metrics Collection](https://blog.htunnthuthu.com/devops-and-sre/opentelemetry-101/opentelemetry-101-metrics) to learn:

* Counters, gauges, and histograms
* When to use metrics vs traces
* Creating custom business metrics
* Metric aggregation and analysis

***

**Previous**: [← Automatic Instrumentation](https://blog.htunnthuthu.com/devops-and-sre/opentelemetry-101/opentelemetry-101-auto-instrumentation) | **Next**: [Metrics Collection →](https://blog.htunnthuthu.com/devops-and-sre/opentelemetry-101/opentelemetry-101-metrics)

*Traces show you the journey. Attributes tell you the story.*
