Performance Optimization

The 2% CPU Mystery

After rolling out OpenTelemetry, I noticed our services were using 2-3% more CPU. For 100 EC2 instances running 24/7, that's ~$3,600/year in wasted compute.

The culprit? Inefficient tracing:

  • Creating too many spans

  • Over-sampling

  • Synchronous export blocking requests

  • No batching

After optimization: <0.5% overhead. Observability doesn't have to be expensive.

Measuring Overhead

First, benchmark your application without OpenTelemetry:

# Install autocannon for load testing
npm install -g autocannon

# Baseline test (no instrumentation)
autocannon -c 100 -d 30 http://localhost:3000/api/orders

Result:

Then, enable OpenTelemetry and test again:

Result with poor configuration:

Result after optimization:

Optimization #1: Use Asynchronous Export

Bad: Synchronous export blocks requests

Every request waits for span export to complete before responding.

Good: Batch export asynchronously

Spans queue in memory and export in background batches.

Performance impact: -15% latency overhead

Optimization #2: Aggressive Sampling

Bad: Sample everything

At 10,000 req/s Γ— 15KB per trace = 150MB/s = 12TB/day!

Good: Smart sampling

At 10,000 req/s Γ— 1% sampling Γ— 15KB = 1.5MB/s = 120GB/day

Performance impact: -10% CPU overhead, -25% memory usage

Optimization #3: Limit Span Attributes

Bad: Huge attributes

Good: Selective attributes

Performance impact: -30% memory per span

Optimization #4: Reduce Span Creation

Bad: Excessive spans

Good: Logical span grouping

Performance impact: -40% span creation overhead

Optimization #5: Conditional Instrumentation

Don't instrument low-value operations:

Better: Use a helper:

Optimization #6: Lazy Context Propagation

Bad: Always inject context

Good: Only inject if sampled

Performance impact: -5% CPU for high-volume external calls

Optimization #7: Memory-Efficient Attributes

Bad: Array of large objects

Good: Compact representation

Optimization #8: Disable Auto-Instrumentation for Low-Value Libraries

Bad: Auto-instrument everything

Good: Selective auto-instrumentation

Performance impact: -8% span creation overhead

Real-World Benchmark

I benchmarked a typical REST API endpoint:

Test setup:

Results:

Configuration
Req/s
p99 Latency
CPU
Memory

No OTel (baseline)

15,234

28ms

45%

380MB

OTel (unoptimized)

12,487

38ms

58%

520MB

OTel (optimized)

15,102

29ms

46%

395MB

Optimizations applied:

  1. Batch span processor (not simple)

  2. 1% sampling (not 100%)

  3. Minimal span attributes

  4. Disabled fs/dns/net auto-instrumentation

  5. Logical span grouping

Monitoring Telemetry Overhead

Track instrumentation impact:

Production Performance Checklist

Common Performance Mistakes

Mistake #1: Too Many Custom Spans

Mistake #2: Synchronous Export

Mistake #3: High Cardinality Attributes

Extreme Optimization: Zero-Allocation Spans

For ultra-high performance (>100k req/s):

Warning: Only for extreme cases. Adds complexity.

What's Next

Continue to Security Best Practices to learn:

  • Redacting sensitive data

  • Securing exporter endpoints

  • PII handling

  • Compliance considerations


Previous: ← OpenTelemetry Collector | Next: Security Best Practices β†’

Observability should be invisible.

Last updated