Performance Optimization

The 2% CPU Mystery

After rolling out OpenTelemetry, I noticed our services were using 2-3% more CPU. For 100 EC2 instances running 24/7, that's ~$3,600/year in wasted compute.

The culprit? Inefficient tracing:

Creating too many spans
Over-sampling
Synchronous export blocking requests
No batching

After optimization: <0.5% overhead. Observability doesn't have to be expensive.

Measuring Overhead

First, benchmark your application without OpenTelemetry:

# Install autocannon for load testing
npm install -g autocannon

# Baseline test (no instrumentation)
autocannon -c 100 -d 30 http://localhost:3000/api/orders

Result:

Requests/sec: 15,234
Latency p99: 28ms
CPU: 45%
Memory: 380MB

Then, enable OpenTelemetry and test again:

# With OpenTelemetry
autocannon -c 100 -d 30 http://localhost:3000/api/orders

Result with poor configuration:

Requests/sec: 12,487 ❌ (-18%)
Latency p99: 38ms ❌ (+35%)
CPU: 58% ❌ (+29%)
Memory: 520MB ❌ (+37%)

Result after optimization:

Requests/sec: 15,102 ✅ (-0.9%)
Latency p99: 29ms ✅ (+3.5%)
CPU: 46% ✅ (+2.2%)
Memory: 395MB ✅ (+4%)

Optimization #1: Use Asynchronous Export

Bad: Synchronous export blocks requests

import { SimpleSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';

// ❌ DON'T DO THIS
const sdk = new NodeSDK({
  spanProcessors: [
    new SimpleSpanProcessor(  // Synchronous!
      new OTLPTraceExporter({
        url: 'http://localhost:4318/v1/traces',
      })
    ),
  ],
});

Every request waits for span export to complete before responding.

Good: Batch export asynchronously

import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';

// ✅ DO THIS
const sdk = new NodeSDK({
  spanProcessors: [
    new BatchSpanProcessor(  // Asynchronous!
      new OTLPTraceExporter({
        url: 'http://localhost:4318/v1/traces',
      }),
      {
        maxQueueSize: 2048,
        scheduledDelayMillis: 5000,  // Export every 5s
        maxExportBatchSize: 512,
      }
    ),
  ],
});

Spans queue in memory and export in background batches.

Performance impact: -15% latency overhead

Optimization #2: Aggressive Sampling

Bad: Sample everything

import { AlwaysOnSampler } from '@opentelemetry/sdk-trace-base';

// ❌ Samples 100% of traffic
const sdk = new NodeSDK({
  sampler: new AlwaysOnSampler(),
});

At 10,000 req/s × 15KB per trace = 150MB/s = 12TB/day!

Good: Smart sampling

import { ParentBasedSampler, TraceIdRatioBasedSampler } from '@opentelemetry/sdk-trace-base';

// ✅ Sample 1% of traffic
const sdk = new NodeSDK({
  sampler: new ParentBasedSampler({
    root: new TraceIdRatioBasedSampler(0.01),
  }),
});

At 10,000 req/s × 1% sampling × 15KB = 1.5MB/s = 120GB/day

Performance impact: -10% CPU overhead, -25% memory usage

Optimization #3: Limit Span Attributes

Bad: Huge attributes

span.setAttribute('request.body', JSON.stringify(hugeRequestBody));  // 50KB!
span.setAttribute('response.body', JSON.stringify(hugeResponse));     // 200KB!

Good: Selective attributes

// Only capture essentials
span.setAttribute('request.size', JSON.stringify(requestBody).length);
span.setAttribute('response.size', JSON.stringify(response).length);
span.setAttribute('user.id', userId);
span.setAttribute('order.item_count', items.length);

Performance impact: -30% memory per span

Optimization #4: Reduce Span Creation

Bad: Excessive spans

// ❌ Creating spans for trivial operations
async function processOrder(order) {
  return await tracer.startActiveSpan('processOrder', async (span1) => {
    await tracer.startActiveSpan('validateOrderId', async (span2) => {
      if (!order.id) throw new Error('Invalid');
      span2.end();
    });
    
    await tracer.startActiveSpan('validateOrderAmount', async (span3) => {
      if (order.amount <= 0) throw new Error('Invalid');
      span3.end();
    });
    
    await tracer.startActiveSpan('checkUserExists', async (span4) => {
      const user = await findUser(order.userId);
      if (!user) throw new Error('User not found');
      span4.end();
    });
    
    // ... 20 more micro-spans
    
    span1.end();
  });
}

Good: Logical span grouping

// ✅ One span with attributes
async function processOrder(order) {
  return await tracer.startActiveSpan('processOrder', async (span) => {
    try {
      // Validation (no spans needed)
      if (!order.id) throw new Error('Invalid order ID');
      if (order.amount <= 0) throw new Error('Invalid amount');
      
      const user = await findUser(order.userId);
      if (!user) throw new Error('User not found');
      
      span.setAttribute('order.id', order.id);
      span.setAttribute('order.amount', order.amount);
      span.setAttribute('user.id', order.userId);
      span.setAttribute('validation.passed', true);
      
      // ... actual processing
      
    } catch (error) {
      span.recordException(error);
      throw error;
    } finally {
      span.end();
    }
  });
}

Performance impact: -40% span creation overhead

Optimization #5: Conditional Instrumentation

Don't instrument low-value operations:

const SHOULD_TRACE = process.env.NODE_ENV === 'production';

async function fetchUserProfile(userId: string) {
  // Only trace in production
  if (SHOULD_TRACE) {
    return await tracer.startActiveSpan('fetchUserProfile', async (span) => {
      try {
        span.setAttribute('user.id', userId);
        const profile = await database.query('SELECT * FROM users WHERE id = $1', [userId]);
        return profile;
      } finally {
        span.end();
      }
    });
  } else {
    // Development: skip tracing overhead
    return await database.query('SELECT * FROM users WHERE id = $1', [userId]);
  }
}

Better: Use a helper:

async function withOptionalSpan<T>(
  name: string,
  fn: (span?: Span) => Promise<T>,
  options: { skip?: boolean } = {}
): Promise<T> {
  if (options.skip || !SHOULD_TRACE) {
    return await fn();
  }
  
  return await tracer.startActiveSpan(name, async (span) => {
    try {
      return await fn(span);
    } finally {
      span.end();
    }
  });
}

// Usage
const profile = await withOptionalSpan('fetchUserProfile', async (span) => {
  span?.setAttribute('user.id', userId);
  return await database.query('SELECT * FROM users WHERE id = $1', [userId]);
}, { skip: process.env.NODE_ENV === 'development' });

Optimization #6: Lazy Context Propagation

Bad: Always inject context

// ❌ Injecting context even when not sampled
await axios.post('https://api.example.com/webhook', data, {
  headers: propagation.inject(context.active(), {}),
});

Good: Only inject if sampled

// ✅ Check if current span is sampled
const span = trace.getSpan(context.active());
const isSampled = span?.spanContext().traceFlags === 1;

await axios.post('https://api.example.com/webhook', data, {
  headers: isSampled ? propagation.inject(context.active(), {}) : {},
});

Performance impact: -5% CPU for high-volume external calls

Optimization #7: Memory-Efficient Attributes

Bad: Array of large objects

span.setAttribute('order.items', items.map(i => ({
  id: i.id,
  name: i.name,
  description: i.description,  // Could be 1KB!
  reviews: i.reviews,          // Array of objects
  metadata: i.metadata,        // More objects
})));

Good: Compact representation

span.setAttribute('order.item_count', items.length);
span.setAttribute('order.item_ids', items.map(i => i.id).slice(0, 10)); // Max 10
span.setAttribute('order.total_value', items.reduce((sum, i) => sum + i.price, 0));

Optimization #8: Disable Auto-Instrumentation for Low-Value Libraries

Bad: Auto-instrument everything

import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';

const sdk = new NodeSDK({
  instrumentations: [
    getNodeAutoInstrumentations(),  // Everything!
  ],
});

Good: Selective auto-instrumentation

const sdk = new NodeSDK({
  instrumentations: [
    getNodeAutoInstrumentations({
      // Disable noisy instrumentations
      '@opentelemetry/instrumentation-fs': {
        enabled: false,  // File system calls are too noisy
      },
      '@opentelemetry/instrumentation-dns': {
        enabled: false,  // DNS lookups are trivial
      },
      '@opentelemetry/instrumentation-net': {
        enabled: false,  // Low-level network
      },
    }),
  ],
});

Performance impact: -8% span creation overhead

Real-World Benchmark

I benchmarked a typical REST API endpoint:

Test setup:

app.get('/api/orders/:id', async (req, res) => {
  const order = await db.query('SELECT * FROM orders WHERE id = $1', [req.params.id]);
  const user = await db.query('SELECT * FROM users WHERE id = $1', [order.user_id]);
  const items = await db.query('SELECT * FROM order_items WHERE order_id = $1', [order.id]);
  
  res.json({ order, user, items });
});

Results:

Configuration

Req/s

p99 Latency

CPU

Memory

No OTel (baseline)

15,234

28ms

45%

380MB

OTel (unoptimized)

12,487

38ms

58%

520MB

OTel (optimized)

15,102

29ms

46%

395MB

Optimizations applied:

Batch span processor (not simple)
1% sampling (not 100%)
Minimal span attributes
Disabled fs/dns/net auto-instrumentation
Logical span grouping

Monitoring Telemetry Overhead

Track instrumentation impact:

import { PerformanceObserver, performance } from 'perf_hooks';

const obs = new PerformanceObserver((items) => {
  items.getEntries().forEach((entry) => {
    console.log(`${entry.name}: ${entry.duration}ms`);
  });
});

obs.observe({ entryTypes: ['measure'] });

// Measure span creation overhead
async function processOrder(order: Order) {
  performance.mark('span-start');
  
  await tracer.startActiveSpan('processOrder', async (span) => {
    performance.mark('span-created');
    performance.measure('span-creation', 'span-start', 'span-created');
    
    // Actual work...
    
    span.end();
  });
}

Production Performance Checklist

Common Performance Mistakes

Mistake #1: Too Many Custom Spans

// ❌ BAD: 50 spans for simple operation
for (const item of order.items) {
  await tracer.startActiveSpan('processItem', async (span) => {
    await tracer.startActiveSpan('validateItem', ...);
    await tracer.startActiveSpan('checkStock', ...);
    await tracer.startActiveSpan('calculatePrice', ...);
    // ... too granular!
  });
}

// ✅ GOOD: One span with events
await tracer.startActiveSpan('processOrderItems', async (span) => {
  span.setAttribute('order.item_count', order.items.length);
  
  for (const item of order.items) {
    // Use events, not spans
    span.addEvent('processing_item', {
      'item.id': item.id,
      'item.sku': item.sku,
    });
    
    // ... processing
  }
});

Mistake #2: Synchronous Export

// ❌ BAD: Request waits for export
span.end();  // Blocks until exported!

// ✅ GOOD: Asynchronous queue
span.end();  // Queued, returns immediately

Mistake #3: High Cardinality Attributes

// ❌ BAD: Unique values
span.setAttribute('user.email', email);      // Millions of users
span.setAttribute('request.timestamp', Date.now());  // Always unique

// ✅ GOOD: Low cardinality
span.setAttribute('user.tier', 'premium');   // Limited values
span.setAttribute('request.hour', new Date().getHours());  // 0-23

Extreme Optimization: Zero-Allocation Spans

For ultra-high performance (>100k req/s):

// Pre-allocate span pool
const spanPool: Span[] = [];
for (let i = 0; i < 1000; i++) {
  spanPool.push(tracer.startSpan('pooled-span'));
}

let poolIndex = 0;

function getPooledSpan(name: string): Span {
  const span = spanPool[poolIndex % spanPool.length];
  poolIndex++;
  
  // Reuse span (not recommended for most cases)
  span.updateName(name);
  return span;
}

Warning: Only for extreme cases. Adds complexity.

What's Next

Continue to Security Best Practices to learn:

Redacting sensitive data
Securing exporter endpoints
PII handling
Compliance considerations

Previous: ← OpenTelemetry Collector | Next: Security Best Practices →

Observability should be invisible.

PreviousOpenTelemetry Collector NextSecurity Best Practices

Last updated 1 month ago

hashtagThe 2% CPU Mystery

hashtagMeasuring Overhead

hashtagOptimization #1: Use Asynchronous Export

hashtagOptimization #2: Aggressive Sampling

hashtagOptimization #3: Limit Span Attributes

hashtagOptimization #4: Reduce Span Creation

hashtagOptimization #5: Conditional Instrumentation

hashtagOptimization #6: Lazy Context Propagation

hashtagOptimization #7: Memory-Efficient Attributes

hashtagOptimization #8: Disable Auto-Instrumentation for Low-Value Libraries

hashtagReal-World Benchmark

hashtagMonitoring Telemetry Overhead

hashtagProduction Performance Checklist

hashtagCommon Performance Mistakes

hashtagMistake #1: Too Many Custom Spans

hashtagMistake #2: Synchronous Export

hashtagMistake #3: High Cardinality Attributes

hashtagExtreme Optimization: Zero-Allocation Spans

hashtagWhat's Next