Part 5: Capacity Planning and Performance - Growing Without Breaking

What You'll Learn: This article shares my journey from reactive "throw more servers at it" scaling to proactive capacity planning. You'll learn how to forecast capacity needs, load test Go applications with k6, understand and optimize resource utilization patterns, choose between horizontal and vertical scaling, use Go's pprof for performance profiling, and plan capacity in a cost-effective way. By the end, you'll know how to grow your services without breaking them or your budget.

The Day My API Fell Over

I'll never forget the morning my expense tracking API became unexpectedly popular. Someone posted about it on Reddit, and within an hour, I went from ~100 requests per day to ~5,000 requests per hour.

My DigitalOcean droplet (2 CPU cores, 4GB RAM) couldn't handle it:

11:23 - Response times climbing: P95 went from 100ms to 2s
11:27 - Out of memory errors starting
11:31 - Server completely unresponsive
11:34 - Manual restart required
11:36 - Crashes again after 2 minutes

In my panic, I did what every inexperienced engineer does: I upgraded to the biggest server available. 8 CPUs, 16GB RAM, $160/month (my original was $20/month).

The traffic spike lasted 3 days, then returned to normal. I was now paying 8x more for capacity I didn't need, and I had learned nothing about my actual capacity requirements.

That expensive lesson taught me the importance of capacity planning - understanding your current limits and predicting future needs.

Understanding Capacity Planning

Capacity planning answers three questions:

  1. How much load can my system handle today?

  2. When will I run out of capacity?

  3. How much will it cost to scale?

The Two Types of Capacity Problems

Problem 1: Organic Growth Your service grows steadily over time. Traffic increases 20% per month. This is predictable and plannable.

Problem 2: Sudden Spikes Unexpected traffic surges from marketing campaigns, viral posts, or DDoS attacks. This is harder to plan for but still manageable.

My Reddit experience was Problem 2, but I had no capacity plan for either scenario.

Step 1: Measure Current Capacity

Before you can plan for growth, you need to know your current limits. I use load testing to find the breaking point.

Load Testing with k6

I use k6 for load testing my Go APIs. It's simple, scriptable, and produces great metrics.

Basic Load Test Script

Run the load test:

Interpreting Load Test Results

When I run this against my API, I watch for:

Breaking Point: When did response times spike?

At ~180 concurrent users, my p95 jumped from 389ms to 2.1s. That's my breaking point.

Resource Saturation: What resource hit 100% first?

I check my Grafana dashboards during the load test:

  • CPU: 85% (still headroom)

  • Memory: 72% (okay)

  • Database connections: 25/25 (saturated!)

Aha! Database connection pool is the bottleneck, not CPU or memory.

Spike Testing

Different from gradual load testing, spike tests simulate sudden traffic surges:

This tests: Can my system handle sudden traffic spikes (like my Reddit incident)?

Soak Testing

Long-duration tests to find memory leaks and gradual degradation:

If response times or memory usage climb over 24 hours, you have a leak.

Step 2: Forecasting Future Capacity Needs

Once you know your current limits, forecast when you'll hit them.

Method 1: Linear Trend Analysis

I track request rate over time and project forward:

If I'm growing 20% per month:

I plan to scale when I hit 70% utilization, giving me a buffer for spikes.

Method 2: Event-Based Planning

For predictable events (product launches, marketing campaigns), I plan capacity based on expected traffic:

Capacity Planning Spreadsheet

I maintain a simple spreadsheet:

Month
Projected Users
Projected RPS
Current Capacity
Utilization
Action Needed

Feb

1,000

50

180

28%

None

Mar

1,200

60

180

33%

None

Apr

1,440

72

180

40%

None

May

1,728

86

180

48%

None

Jun

2,074

103

180

57%

Monitor

Jul

2,488

124

180

69%

Plan scale

Aug

2,986

149

180

83%

Scale now

Step 3: Horizontal vs. Vertical Scaling

When you need more capacity, you have two options:

Vertical Scaling (Scale Up)

Definition: Bigger servers (more CPU, RAM)

Pros:

  • Simple - just upgrade your instance

  • No code changes needed

  • Works for stateful services (databases)

Cons:

  • Limited ceiling (largest instance size)

  • Expensive (cost scales non-linearly)

  • Single point of failure

  • Requires downtime

When I use it:

  • Quick fix for sudden capacity needs

  • Databases (PostgreSQL, Redis)

  • Services that are hard to make stateless

Horizontal Scaling (Scale Out)

Definition: More servers (add instances)

Pros:

  • Nearly unlimited scaling

  • Cost-effective (linear cost scaling)

  • Built-in redundancy

  • No downtime during scaling

Cons:

  • Requires stateless design

  • Need load balancer

  • Distributed system complexity

  • Data consistency challenges

When I use it:

  • API servers

  • Worker processes

  • Any stateless service

My Go API: Designed for Horizontal Scaling

I built my Go APIs to be stateless, so they scale horizontally easily.

Key principles:

  1. No local state: Sessions stored in Redis, not memory

  2. No sticky sessions: Any instance can handle any request

  3. Idempotent operations: Safe to retry requests

  4. Shared database: All instances connect to same PostgreSQL

Example: Session management

Step 4: Performance Optimization with pprof

Before adding more servers, optimize what you have. Go's pprof is amazing for finding performance bottlenecks.

Enabling pprof in Your Go Application

CPU Profiling

Find what's using CPU:

Real example from my expense API:

I discovered JSON decoding was eating 32.5% of CPU! I optimized by reusing decoders.

Memory Profiling

Find memory leaks and high allocation:

Goroutine Profiling

Find goroutine leaks:

Real Optimization Example

I found this code was allocating excessively:

Optimized version using strings.Builder:

Result: 70% reduction in memory allocations, 40% faster.

Step 5: Auto-Scaling Strategy

Once your service can scale horizontally, automate it.

Kubernetes Horizontal Pod Autoscaler (HPA)

I use Kubernetes HPA to automatically scale my Go services:

Custom Metrics Scaling

CPU/memory aren't always the right metrics. I also scale based on request queue depth:

Cost-Effective Capacity Planning

After my $160/month overprovisioning mistake, I learned to balance cost and capacity.

Strategy 1: Right-Size Your Baseline

Run capacity tests to find the smallest instance that handles normal load with 30% headroom:

Strategy 2: Auto-Scale for Spikes

Instead of provisioning for peak, use auto-scaling:

Strategy 3: Spot Instances for Non-Critical Work

For batch processing, I use AWS Spot instances (up to 90% cheaper):

Strategy 4: Reserved Instances for Baseline

For predictable baseline load, I buy reserved instances (up to 75% cheaper):

My Capacity Planning Checklist

Before any major launch or campaign, I run through this:

Real-World Example: Planning for a Product Launch

Last year, I helped a friend launch their Go-based SaaS product. Here's how we planned capacity:

Expected traffic:

  • Launch day: 10,000 signups, 50,000 API requests

  • Week 1: 5,000 req/hour sustained

  • Week 2+: 2,000 req/hour sustained

Load test results:

  • 1 pod (2 CPU, 4GB RAM): 120 req/s capacity

  • Breaking point: Database connections at 300 req/s

Capacity plan:

Result:

  • Launch went smoothly, zero downtime

  • Peak: 15 pods, 98% of requests < 200ms

  • Avoided over-provisioning by $200+/month

Common Capacity Planning Mistakes

Mistake 1: Not Load Testing Before Launch

I used to deploy and hope for the best. Bad idea.

Fix: Always load test at 1.5x expected peak before any major launch.

Mistake 2: Scaling Based on CPU/Memory Alone

My API was CPU at 40%, but response times were terrible because database connections were maxed.

Fix: Identify your actual bottleneck through load testing.

Mistake 3: No Buffer for Spikes

Provisioning for exactly average load means any spike causes problems.

Fix: Always plan for 30-50% above baseline.

Mistake 4: Ignoring Database Capacity

I'd scale my API to 50 instances, then wonder why the database was dying.

Fix: Scale all layers of the stack, including database connection pools.

Key Takeaways

  1. Know your limits before you hit them. Load test regularly to understand capacity.

  2. Plan for 70% utilization. This leaves buffer for spikes and unexpected growth.

  3. Horizontal scaling beats vertical for stateless services. Build your Go apps to be stateless.

  4. Optimize before scaling. Use pprof to find low-hanging performance wins.

  5. Automate scaling. Auto-scaling saves money and prevents midnight pages.

  6. Cost is part of capacity planning. The cheapest solution that meets your SLOs wins.

What's Next

With capacity planning and performance optimization in your toolkit, the final piece is automation. In Part 6, we'll cover:

  • Identifying and measuring toil

  • Automating deployments with CI/CD

  • Building self-healing systems

  • When NOT to automate

Resources

Conclusion

That Reddit traffic spike taught me an expensive lesson: capacity planning isn't optional. By understanding my system's limits, forecasting growth, and planning scaling strategies, I now handle traffic spikes calmly instead of panicking.

The key is to be proactive: load test before you need to, monitor capacity utilization, and automate scaling decisions. Your future self (and your budget) will thank you.

Last updated