Part 5: Capacity Planning and Performance - Growing Without Breaking

What You'll Learn: This article shares my journey from reactive "throw more servers at it" scaling to proactive capacity planning. You'll learn how to forecast capacity needs, load test Go applications with k6, understand and optimize resource utilization patterns, choose between horizontal and vertical scaling, use Go's pprof for performance profiling, and plan capacity in a cost-effective way. By the end, you'll know how to grow your services without breaking them or your budget.

The Day My API Fell Over

I'll never forget the morning my expense tracking API became unexpectedly popular. Someone posted about it on Reddit, and within an hour, I went from ~100 requests per day to ~5,000 requests per hour.

My DigitalOcean droplet (2 CPU cores, 4GB RAM) couldn't handle it:

11:23 - Response times climbing: P95 went from 100ms to 2s
11:27 - Out of memory errors starting
11:31 - Server completely unresponsive
11:34 - Manual restart required
11:36 - Crashes again after 2 minutes

In my panic, I did what every inexperienced engineer does: I upgraded to the biggest server available. 8 CPUs, 16GB RAM, $160/month (my original was $20/month).

The traffic spike lasted 3 days, then returned to normal. I was now paying 8x more for capacity I didn't need, and I had learned nothing about my actual capacity requirements.

That expensive lesson taught me the importance of capacity planning - understanding your current limits and predicting future needs.

Understanding Capacity Planning

Capacity planning answers three questions:

How much load can my system handle today?
When will I run out of capacity?
How much will it cost to scale?

The Two Types of Capacity Problems

Problem 1: Organic Growth Your service grows steadily over time. Traffic increases 20% per month. This is predictable and plannable.

Problem 2: Sudden Spikes Unexpected traffic surges from marketing campaigns, viral posts, or DDoS attacks. This is harder to plan for but still manageable.

My Reddit experience was Problem 2, but I had no capacity plan for either scenario.

Step 1: Measure Current Capacity

Before you can plan for growth, you need to know your current limits. I use load testing to find the breaking point.

Load Testing with k6

I use k6 for load testing my Go APIs. It's simple, scriptable, and produces great metrics.

Basic Load Test Script

// loadtest/basic.js
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
    stages: [
        { duration: '2m', target: 50 },   // Ramp up to 50 users
        { duration: '5m', target: 50 },   // Stay at 50 users
        { duration: '2m', target: 100 },  // Ramp to 100 users
        { duration: '5m', target: 100 },  // Stay at 100 users
        { duration: '2m', target: 200 },  // Ramp to 200 users
        { duration: '5m', target: 200 },  // Stay at 200 users
        { duration: '3m', target: 0 },    // Ramp down
    ],
    thresholds: {
        http_req_duration: ['p(95)<500'],  // 95% of requests under 500ms
        http_req_failed: ['rate<0.01'],     // Less than 1% errors
    },
};

const BASE_URL = __ENV.BASE_URL || 'http://localhost:8080';

export default function() {
    // Simulate typical user behavior
    
    // 1. Create transaction
    const createPayload = JSON.stringify({
        user_id: `user_${__VU}`,  // __VU is virtual user ID
        amount: Math.random() * 100,
        category: 'test',
        description: 'Load test transaction',
    });

    const createRes = http.post(`${BASE_URL}/api/transactions`, createPayload, {
        headers: { 'Content-Type': 'application/json' },
    });

    check(createRes, {
        'create status is 200': (r) => r.status === 200,
        'create response time OK': (r) => r.timings.duration < 500,
    });

    sleep(1);

    // 2. List transactions
    const listRes = http.get(`${BASE_URL}/api/transactions?user_id=user_${__VU}`);

    check(listRes, {
        'list status is 200': (r) => r.status === 200,
        'list response time OK': (r) => r.timings.duration < 500,
    });

    sleep(2);
}

Run the load test:

# Install k6
brew install k6  # macOS
# or download from https://k6.io/

# Run load test
k6 run loadtest/basic.js

# Run with custom base URL
k6 run -e BASE_URL=https://api.example.com loadtest/basic.js

Interpreting Load Test Results

When I run this against my API, I watch for:

Breaking Point: When did response times spike?

✓ http_req_duration..............: avg=145ms    min=23ms     med=98ms     max=5.2s   p(90)=245ms p(95)=389ms
✓ http_req_failed................: 0.00%        ✓ 0          ✗ 45723

At ~180 concurrent users, my p95 jumped from 389ms to 2.1s. That's my breaking point.

Resource Saturation: What resource hit 100% first?

I check my Grafana dashboards during the load test:

CPU: 85% (still headroom)
Memory: 72% (okay)
Database connections: 25/25 (saturated!)

Aha! Database connection pool is the bottleneck, not CPU or memory.

Spike Testing

Different from gradual load testing, spike tests simulate sudden traffic surges:

// loadtest/spike.js
export const options = {
    stages: [
        { duration: '30s', target: 50 },   // Normal load
        { duration: '10s', target: 500 },  // Sudden spike!
        { duration: '3m', target: 500 },   // Sustain spike
        { duration: '30s', target: 50 },   // Return to normal
        { duration: '30s', target: 0 },    // Ramp down
    ],
};

// ... rest same as basic.js

This tests: Can my system handle sudden traffic spikes (like my Reddit incident)?

Soak Testing

Long-duration tests to find memory leaks and gradual degradation:

// loadtest/soak.js
export const options = {
    stages: [
        { duration: '5m', target: 100 },   // Ramp up
        { duration: '24h', target: 100 },  // Run for 24 hours!
        { duration: '5m', target: 0 },     // Ramp down
    ],
};

If response times or memory usage climb over 24 hours, you have a leak.

Step 2: Forecasting Future Capacity Needs

Once you know your current limits, forecast when you'll hit them.

Method 1: Linear Trend Analysis

I track request rate over time and project forward:

# Prometheus query: requests per second, 7-day avg
avg_over_time(
    sum(rate(http_requests_total[5m]))[7d:1h]
)

If I'm growing 20% per month:

Current capacity: 180 concurrent users
Current usage: 50 concurrent users (27% utilization)
Growth rate: 20% per month

Month 1: 60 users (33% utilization)
Month 2: 72 users (40% utilization)
Month 3: 86 users (48% utilization)
Month 4: 103 users (57% utilization)
Month 5: 124 users (69% utilization)
Month 6: 149 users (83% utilization) ⚠️ Need to scale soon
Month 7: 179 users (99% utilization) 🚨 Must scale before this

I plan to scale when I hit 70% utilization, giving me a buffer for spikes.

Method 2: Event-Based Planning

For predictable events (product launches, marketing campaigns), I plan capacity based on expected traffic:

Normal traffic: 50 req/s
Expected campaign traffic: 500 req/s (10x normal)

Current capacity: 180 req/s
Needed capacity: 500 req/s

Gap: 320 req/s additional capacity needed

Capacity Planning Spreadsheet

I maintain a simple spreadsheet:

Month

Projected Users

Projected RPS

Current Capacity

Utilization

Action Needed

Feb

1,000

180

28%

None

Mar

1,200

180

33%

None

Apr

1,440

180

40%

None

May

1,728

180

48%

None

Jun

2,074

103

180

57%

Monitor

Jul

2,488

124

180

69%

Plan scale

Aug

2,986

149

180

83%

Scale now

Step 3: Horizontal vs. Vertical Scaling

When you need more capacity, you have two options:

Vertical Scaling (Scale Up)

Definition: Bigger servers (more CPU, RAM)

Pros:

Simple - just upgrade your instance
No code changes needed
Works for stateful services (databases)

Cons:

Limited ceiling (largest instance size)
Expensive (cost scales non-linearly)
Single point of failure
Requires downtime

When I use it:

Quick fix for sudden capacity needs
Databases (PostgreSQL, Redis)
Services that are hard to make stateless

Horizontal Scaling (Scale Out)

Definition: More servers (add instances)

Pros:

Nearly unlimited scaling
Cost-effective (linear cost scaling)
Built-in redundancy
No downtime during scaling

Cons:

Requires stateless design
Need load balancer
Distributed system complexity
Data consistency challenges

When I use it:

API servers
Worker processes
Any stateless service

My Go API: Designed for Horizontal Scaling

I built my Go APIs to be stateless, so they scale horizontally easily.

Key principles:

No local state: Sessions stored in Redis, not memory
No sticky sessions: Any instance can handle any request
Idempotent operations: Safe to retry requests
Shared database: All instances connect to same PostgreSQL

Example: Session management

// BAD: Storing sessions in memory (doesn't scale horizontally)
var sessions = make(map[string]*Session)

func (h *Handler) Login(w http.ResponseWriter, r *http.Request) {
    session := createSession(user)
    sessions[session.ID] = session  // ❌ Only available on this instance
}

// GOOD: Storing sessions in Redis (scales horizontally)
type SessionStore struct {
    redis *redis.Client
}

func (s *SessionStore) Save(ctx context.Context, session *Session) error {
    data, err := json.Marshal(session)
    if err != nil {
        return err
    }
    return s.redis.Set(ctx, session.ID, data, 24*time.Hour).Err()
}

func (s *SessionStore) Get(ctx context.Context, id string) (*Session, error) {
    data, err := s.redis.Get(ctx, id).Bytes()
    if err != nil {
        return nil, err
    }
    
    var session Session
    err = json.Unmarshal(data, &session)
    return &session, err
}

Step 4: Performance Optimization with pprof

Before adding more servers, optimize what you have. Go's pprof is amazing for finding performance bottlenecks.

Enabling pprof in Your Go Application

// cmd/server/main.go
package main

import (
    "net/http"
    _ "net/http/pprof"  // Import for side effects
)

func main() {
    // Your application code...

    // pprof endpoints automatically registered at:
    // /debug/pprof/
    // /debug/pprof/profile
    // /debug/pprof/heap
    // /debug/pprof/goroutine
    
    http.ListenAndServe(":8080", nil)
}

CPU Profiling

Find what's using CPU:

# Capture 30 seconds of CPU profile
curl http://localhost:8080/debug/pprof/profile?seconds=30 > cpu.prof

# Analyze with pprof
go tool pprof cpu.prof

# In the pprof shell:
(pprof) top10       # Show top 10 CPU consumers
(pprof) list main.ProcessTransaction  # Show source code with CPU usage
(pprof) web         # Generate visual graph (requires graphviz)

Real example from my expense API:

(pprof) top10
Showing nodes accounting for 1820ms, 91.00% of 2000ms total
      flat  flat%   sum%        cum   cum%
     650ms 32.50% 32.50%      650ms 32.50%  encoding/json.(*decodeState).string
     420ms 21.00% 53.50%      420ms 21.00%  runtime.mallocgc
     310ms 15.50% 69.00%      310ms 15.50%  database/sql.(*Rows).Scan
     220ms 11.00% 80.00%      870ms 43.50%  main.(*Handler).CreateTransaction

I discovered JSON decoding was eating 32.5% of CPU! I optimized by reusing decoders.

Memory Profiling

Find memory leaks and high allocation:

# Capture heap profile
curl http://localhost:8080/debug/pprof/heap > heap.prof

# Analyze
go tool pprof heap.prof

(pprof) top10
(pprof) list main.CreateTransaction

Goroutine Profiling

Find goroutine leaks:

# Check current goroutine count
curl http://localhost:8080/debug/pprof/goroutine?debug=1

# If count is growing over time, you have a leak

Real Optimization Example

I found this code was allocating excessively:

// BEFORE: Allocates new slice on every call
func (h *Handler) FormatTransactions(txns []Transaction) []byte {
    var result []byte
    for _, txn := range txns {
        result = append(result, []byte(txn.String())...)
        result = append(result, '\n')
    }
    return result
}

Optimized version using strings.Builder:

// AFTER: Reuses buffer, single allocation
func (h *Handler) FormatTransactions(txns []Transaction) []byte {
    var b strings.Builder
    b.Grow(len(txns) * 100)  // Pre-allocate estimated size
    
    for _, txn := range txns {
        b.WriteString(txn.String())
        b.WriteByte('\n')
    }
    return []byte(b.String())
}

Result: 70% reduction in memory allocations, 40% faster.

Step 5: Auto-Scaling Strategy

Once your service can scale horizontally, automate it.

Kubernetes Horizontal Pod Autoscaler (HPA)

I use Kubernetes HPA to automatically scale my Go services:

# k8s/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: expense-api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: expense-api
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70  # Scale when CPU > 70%
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80  # Scale when memory > 80%
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5min before scaling down
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60  # Max 50% reduction per minute
    scaleUp:
      stabilizationWindowSeconds: 0  # Scale up immediately
      policies:
      - type: Percent
        value: 100
        periodSeconds: 60  # Max 100% increase per minute

Custom Metrics Scaling

CPU/memory aren't always the right metrics. I also scale based on request queue depth:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: expense-api-hpa-custom
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: expense-api
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_active  # Custom metric from Prometheus
      target:
        type: AverageValue
        averageValue: "50"  # Scale when avg 50+ active requests per pod

Cost-Effective Capacity Planning

After my $160/month overprovisioning mistake, I learned to balance cost and capacity.

Strategy 1: Right-Size Your Baseline

Run capacity tests to find the smallest instance that handles normal load with 30% headroom:

Normal load: 50 req/s
With 30% buffer: 65 req/s capacity needed

Instance options:
- 2 CPU, 4GB RAM: 70 req/s capacity, $20/month ✅ Perfect
- 4 CPU, 8GB RAM: 180 req/s capacity, $40/month ❌ Overkill
- 8 CPU, 16GB RAM: 400 req/s capacity, $160/month ❌ Way too much

Strategy 2: Auto-Scale for Spikes

Instead of provisioning for peak, use auto-scaling:

Normal: 2 instances ($40/month)
Peak: 10 instances ($200/month)

Average cost with 5% time at peak: $48/month
vs.
Always provisioned for peak: $200/month

Savings: 76%

Strategy 3: Spot Instances for Non-Critical Work

For batch processing, I use AWS Spot instances (up to 90% cheaper):

// Worker that processes in batches, can tolerate interruptions
func (w *Worker) ProcessBatch(ctx context.Context, jobs []Job) error {
    for _, job := range jobs {
        select {
        case <-ctx.Done():
            // Spot instance terminated, checkpoint and exit gracefully
            return w.SaveCheckpoint(job.ID)
        default:
            if err := w.Process(job); err != nil {
                return err
            }
        }
    }
    return nil
}

Strategy 4: Reserved Instances for Baseline

For predictable baseline load, I buy reserved instances (up to 75% cheaper):

Baseline: 2 instances, 24/7
- On-demand: $40/month
- Reserved (1-year): $14/month
- Savings: 65%

Burst capacity: Auto-scale on-demand instances

My Capacity Planning Checklist

Before any major launch or campaign, I run through this:

## Capacity Planning Checklist

### Current State
- [ ] Measure current capacity (load test)
- [ ] Identify bottleneck (CPU, memory, DB, etc.)
- [ ] Document baseline performance metrics

### Forecasting
- [ ] Estimate expected traffic (peak and sustained)
- [ ] Calculate required capacity with 30% buffer
- [ ] Determine scaling strategy (vertical/horizontal)

### Testing
- [ ] Run load test at 1.5x expected peak
- [ ] Run spike test to verify auto-scaling
- [ ] Run soak test to check for leaks

### Infrastructure
- [ ] Configure auto-scaling rules
- [ ] Set up load balancer health checks
- [ ] Verify database can handle load
- [ ] Check external API rate limits

### Monitoring
- [ ] Create capacity dashboard
- [ ] Set alerts for 70% utilization
- [ ] Set alerts for auto-scaling events

### Cost
- [ ] Estimate cost at peak load
- [ ] Compare scaling strategies (vertical vs horizontal)
- [ ] Identify cost optimization opportunities

### Runbook
- [ ] Document manual scaling procedure
- [ ] Document rollback procedure
- [ ] Schedule post-event review

Real-World Example: Planning for a Product Launch

Last year, I helped a friend launch their Go-based SaaS product. Here's how we planned capacity:

Expected traffic:

Launch day: 10,000 signups, 50,000 API requests
Week 1: 5,000 req/hour sustained
Week 2+: 2,000 req/hour sustained

Load test results:

1 pod (2 CPU, 4GB RAM): 120 req/s capacity
Breaking point: Database connections at 300 req/s

Capacity plan:

Baseline (pre-launch):
- API: 2 pods (240 req/s capacity)
- DB: Small instance (500 connections)
- Cost: $60/month

Launch day:
- API: Auto-scale 2-20 pods
- DB: Upgrade to medium (1000 connections)
- Cost: ~$300 for launch day

Post-launch (Week 2+):
- API: Auto-scale 2-5 pods
- DB: Downgrade to small
- Cost: ~$80/month

Result:

Launch went smoothly, zero downtime
Peak: 15 pods, 98% of requests < 200ms
Avoided over-provisioning by $200+/month

Common Capacity Planning Mistakes

Mistake 1: Not Load Testing Before Launch

I used to deploy and hope for the best. Bad idea.

Fix: Always load test at 1.5x expected peak before any major launch.

Mistake 2: Scaling Based on CPU/Memory Alone

My API was CPU at 40%, but response times were terrible because database connections were maxed.

Fix: Identify your actual bottleneck through load testing.

Mistake 3: No Buffer for Spikes

Provisioning for exactly average load means any spike causes problems.

Fix: Always plan for 30-50% above baseline.

Mistake 4: Ignoring Database Capacity

I'd scale my API to 50 instances, then wonder why the database was dying.

Fix: Scale all layers of the stack, including database connection pools.

Key Takeaways

Know your limits before you hit them. Load test regularly to understand capacity.
Plan for 70% utilization. This leaves buffer for spikes and unexpected growth.
Horizontal scaling beats vertical for stateless services. Build your Go apps to be stateless.
Optimize before scaling. Use pprof to find low-hanging performance wins.
Automate scaling. Auto-scaling saves money and prevents midnight pages.
Cost is part of capacity planning. The cheapest solution that meets your SLOs wins.

What's Next

With capacity planning and performance optimization in your toolkit, the final piece is automation. In Part 6, we'll cover:

Identifying and measuring toil
Automating deployments with CI/CD
Building self-healing systems
When NOT to automate

Resources

Conclusion

That Reddit traffic spike taught me an expensive lesson: capacity planning isn't optional. By understanding my system's limits, forecasting growth, and planning scaling strategies, I now handle traffic spikes calmly instead of panicking.

The key is to be proactive: load test before you need to, monitor capacity utilization, and automate scaling decisions. Your future self (and your budget) will thank you.

PreviousPart 4: Incident Management - From Chaos to Coordinated Response NextPart 6: Automation and Toil Reduction - Working Smarter, Not Harder

Last updated 11 days ago

hashtagThe Day My API Fell Over

hashtagUnderstanding Capacity Planning

hashtagThe Two Types of Capacity Problems

hashtagStep 1: Measure Current Capacity

hashtagLoad Testing with k6

hashtagBasic Load Test Script

hashtagInterpreting Load Test Results

hashtagSpike Testing

hashtagSoak Testing

hashtagStep 2: Forecasting Future Capacity Needs

hashtagMethod 1: Linear Trend Analysis

hashtagMethod 2: Event-Based Planning

hashtagCapacity Planning Spreadsheet

hashtagStep 3: Horizontal vs. Vertical Scaling

hashtagVertical Scaling (Scale Up)

hashtagHorizontal Scaling (Scale Out)

hashtagMy Go API: Designed for Horizontal Scaling

hashtagStep 4: Performance Optimization with pprof

hashtagEnabling pprof in Your Go Application

hashtagCPU Profiling

hashtagMemory Profiling

hashtagGoroutine Profiling

hashtagReal Optimization Example

hashtagStep 5: Auto-Scaling Strategy

hashtagKubernetes Horizontal Pod Autoscaler (HPA)

hashtagCustom Metrics Scaling

hashtagCost-Effective Capacity Planning

hashtagStrategy 1: Right-Size Your Baseline

hashtagStrategy 2: Auto-Scale for Spikes

hashtagStrategy 3: Spot Instances for Non-Critical Work

hashtagStrategy 4: Reserved Instances for Baseline

hashtagMy Capacity Planning Checklist

hashtagReal-World Example: Planning for a Product Launch

hashtagCommon Capacity Planning Mistakes

hashtagMistake 1: Not Load Testing Before Launch

hashtagMistake 2: Scaling Based on CPU/Memory Alone

hashtagMistake 3: No Buffer for Spikes

hashtagMistake 4: Ignoring Database Capacity

hashtagKey Takeaways

hashtagWhat's Next

hashtagResources

hashtagConclusion

The Day My API Fell Over

Understanding Capacity Planning

The Two Types of Capacity Problems

Step 1: Measure Current Capacity

Load Testing with k6

Basic Load Test Script

Interpreting Load Test Results

Spike Testing

Soak Testing

Step 2: Forecasting Future Capacity Needs

Method 1: Linear Trend Analysis

Method 2: Event-Based Planning

Capacity Planning Spreadsheet

Step 3: Horizontal vs. Vertical Scaling

Vertical Scaling (Scale Up)

Horizontal Scaling (Scale Out)

My Go API: Designed for Horizontal Scaling

Step 4: Performance Optimization with pprof

Enabling pprof in Your Go Application

CPU Profiling

Memory Profiling

Goroutine Profiling

Real Optimization Example

Step 5: Auto-Scaling Strategy

Kubernetes Horizontal Pod Autoscaler (HPA)

Custom Metrics Scaling

Cost-Effective Capacity Planning

Strategy 1: Right-Size Your Baseline

Strategy 2: Auto-Scale for Spikes

Strategy 3: Spot Instances for Non-Critical Work

Strategy 4: Reserved Instances for Baseline

My Capacity Planning Checklist

Real-World Example: Planning for a Product Launch

Common Capacity Planning Mistakes

Mistake 1: Not Load Testing Before Launch

Mistake 2: Scaling Based on CPU/Memory Alone

Mistake 3: No Buffer for Spikes

Mistake 4: Ignoring Database Capacity

Key Takeaways

What's Next

Resources

Conclusion