Scalability Patterns

← Back to System Design 101 | ← Previous: Fundamentals

Understanding Scalability

Scalability is about handling growthβ€”more users, more data, more requestsβ€”without degrading performance or requiring a complete system redesign. Through building systems that went from hundreds to millions of users, I've learned that scalability isn't just about adding more servers. It's about designing systems that can grow efficiently.

Horizontal vs Vertical Scaling

Vertical Scaling (Scale Up)

Adding more power to existing machines: more CPU, RAM, or disk space.

When I use vertical scaling:

  • PostgreSQL primary database (before considering read replicas)

  • Legacy monolithic applications that can't easily be distributed

  • In-memory caches that need fast access to large datasets

  • When operational simplicity is more important than unlimited scale

Example from a real project:

# Configuration for vertical scaling - PostgreSQL on larger instance
# docker-compose.yml for development
version: '3.8'
services:
  postgres:
    image: postgres:16
    environment:
      POSTGRES_DB: myapp
      POSTGRES_USER: app_user
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    # Vertical scaling: Allocate more resources
    deploy:
      resources:
        limits:
          cpus: '4.0'      # Increased from 2.0
          memory: 16G      # Increased from 8G
        reservations:
          cpus: '2.0'
          memory: 8G
    volumes:
      - postgres_data:/var/lib/postgresql/data
    # Performance tuning for larger instance
    command: >
      postgres
      -c shared_buffers=4GB
      -c effective_cache_size=12GB
      -c maintenance_work_mem=1GB
      -c max_connections=200
      -c work_mem=20MB

volumes:
  postgres_data:

Limitations I've hit:

  • Hit AWS RDS instance size limits (we needed more than the largest available)

  • Cost increases exponentially with size

  • Still a single point of failure

  • Downtime required for upgrades

Horizontal Scaling (Scale Out)

Adding more machines to distribute the load.

When I use horizontal scaling:

  • Stateless API servers

  • Worker processes for background jobs

  • Read replicas for databases

  • Microservices architecture

Real implementation example:

Kubernetes deployment for horizontal scaling:

Load Balancing Strategies

Load balancers distribute traffic across multiple servers. I've used different strategies depending on the use case:

1. Round Robin

Distributes requests evenly across all servers.

When I use it:

  • All servers have equal capacity

  • Requests have similar processing time

  • Simple setup is preferred

2. Least Connections

Routes to the server with the fewest active connections.

When I use it:

  • Requests have varying processing times

  • Some requests are long-running (WebSockets, file uploads)

3. IP Hash / Sticky Sessions

Routes requests from the same client to the same server.

When I use it (sparingly):

  • Legacy applications that store session state in memory

  • WebSocket connections that need to maintain state

⚠️ Warning: I try to avoid sticky sessions. They make scaling harder and create issues when a server fails. Better to use external session storage.

4. Weighted Load Balancing

Distributes more traffic to more powerful servers.

When I use it:

  • Servers have different capacities

  • During gradual rollouts (canary deployments)

Stateless vs Stateful Services

The most important scalability decision is whether your service maintains state.

Stateless Services

No session data stored on the server. Each request contains all necessary information.

Example of stateless design:

Benefits I've experienced:

  • Easy to scale horizontally

  • No session synchronization needed

  • Servers can be added/removed freely

  • Simple load balancing

Stateful Services

Maintain session or connection state on the server.

When I need stateful services:

  • WebSocket connections

  • Real-time collaboration (like Google Docs)

  • Gaming servers

  • Video streaming sessions

Example: Stateful WebSocket server with Redis for session sharing:

Auto-Scaling Patterns

Auto-scaling automatically adjusts the number of instances based on demand.

Metrics I Use for Auto-Scaling

  1. CPU Utilization: Scale when average CPU > 70%

  2. Memory Utilization: Scale when average memory > 80%

  3. Request Queue Depth: Scale when queue > 100 pending requests

  4. Custom Metrics: Application-specific (e.g., active WebSocket connections)

Auto-Scaling Configuration (AWS)

Lessons Learned from Auto-Scaling

What worked:

  • Set minimum instances to handle baseline load

  • Use longer cool-down periods for scale-down (5-10 minutes)

  • Scale up aggressively, scale down conservatively

  • Monitor the metrics that matter to your application

What didn't work:

  • Scaling based on a single metric (use composite metrics)

  • Too aggressive scale-down (caused flapping)

  • Not accounting for instance startup time

  • Ignoring the cost of constantly spinning up/down instances

Database Scalability

Scaling databases requires different strategies than scaling application servers.

Read Replicas

Real-World Scaling Journey

Let me share how I scaled a real application from 1,000 to 1,000,000 users:

Phase 1: Single Server (0-10K users)

  • One server running everything (app + database)

  • Vertical scaling when needed

  • Cost: $50/month

Phase 2: Separate Database (10K-50K users)

  • Moved database to dedicated server

  • App server can now scale independently

  • Cost: $200/month

Phase 3: Horizontal Scaling (50K-200K users)

  • Multiple app servers behind load balancer

  • Database read replicas

  • Redis for caching and sessions

  • Cost: $800/month

Phase 4: Full Distribution (200K-1M users)

  • 10-20 app servers with auto-scaling

  • Database sharding by user ID

  • CDN for static assets

  • Message queue for async processing

  • Cost: $3,000/month

Key insight: We scaled gradually, adding complexity only when needed. Starting with microservices would have been premature optimization.

What's Next

Now that you understand scalability patterns, let's explore how caching can dramatically improve performance:


Navigation:

Last updated