Database Design

← Back to System Design 101 | ← Previous: Caching Strategies

Introduction

Choosing the right database and designing it properly is one of the most critical decisions in system architecture. I've made both good and poor database choices in my career, and the consequences of a wrong decision can haunt you for years.

This article covers the database design patterns I've used in production systems—the trade-offs, the pain points, and the solutions that actually worked.

SQL vs NoSQL: The Real Trade-offs

The SQL vs NoSQL debate isn't about which is "better"—it's about which fits your use case.

When I Choose SQL (PostgreSQL, MySQL)

Use cases:

Financial transactions requiring ACID guarantees
Complex queries with joins across multiple tables
Data with clear relationships and schema
When data integrity is non-negotiable

Example: E-commerce order system

-- PostgreSQL schema for order management
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE products (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(255) NOT NULL,
    price DECIMAL(10, 2) NOT NULL,
    stock INT NOT NULL DEFAULT 0,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    CONSTRAINT positive_price CHECK (price > 0),
    CONSTRAINT non_negative_stock CHECK (stock >= 0)
);

CREATE TABLE orders (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(id),
    status VARCHAR(50) NOT NULL DEFAULT 'pending',
    total DECIMAL(10, 2) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    CONSTRAINT valid_status CHECK (status IN ('pending', 'paid', 'shipped', 'delivered', 'cancelled'))
);

CREATE TABLE order_items (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    order_id UUID NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
    product_id UUID NOT NULL REFERENCES products(id),
    quantity INT NOT NULL,
    price DECIMAL(10, 2) NOT NULL,
    CONSTRAINT positive_quantity CHECK (quantity > 0)
);

-- Indexes for common queries
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);

Python implementation with transaction:

from sqlalchemy import create_engine, text
from sqlalchemy.orm import sessionmaker
from contextlib import contextmanager

engine = create_engine('postgresql://user:pass@localhost/ecommerce')
SessionLocal = sessionmaker(bind=engine)

@contextmanager
def get_db_session():
    """Context manager for database sessions."""
    session = SessionLocal()
    try:
        yield session
        session.commit()
    except Exception:
        session.rollback()
        raise
    finally:
        session.close()

def create_order(user_id: str, items: list[dict]) -> str:
    """
    Create order with ACID transaction.
    All operations succeed or all fail.
    """
    with get_db_session() as session:
        # Calculate total
        total = sum(item['price'] * item['quantity'] for item in items)
        
        # Create order
        result = session.execute(
            text("""
                INSERT INTO orders (user_id, total, status)
                VALUES (:user_id, :total, 'pending')
                RETURNING id
            """),
            {"user_id": user_id, "total": total}
        )
        order_id = result.fetchone()[0]
        
        # Add order items and update stock
        for item in items:
            # Add order item
            session.execute(
                text("""
                    INSERT INTO order_items (order_id, product_id, quantity, price)
                    VALUES (:order_id, :product_id, :quantity, :price)
                """),
                {
                    "order_id": order_id,
                    "product_id": item['product_id'],
                    "quantity": item['quantity'],
                    "price": item['price']
                }
            )
            
            # Decrease stock (will fail if insufficient)
            session.execute(
                text("""
                    UPDATE products
                    SET stock = stock - :quantity
                    WHERE id = :product_id AND stock >= :quantity
                """),
                {
                    "product_id": item['product_id'],
                    "quantity": item['quantity']
                }
            )
        
        # If we get here, all operations succeeded
        return order_id

When I Choose NoSQL (MongoDB, DynamoDB)

Use cases:

Flexible schema that evolves frequently
Massive scale (millions of writes/sec)
Document-oriented data (user profiles, product catalogs)
When eventual consistency is acceptable

Example: User profile system (MongoDB)

from pymongo import MongoClient
from datetime import datetime
from typing import Optional

client = MongoClient('mongodb://localhost:27017/')
db = client.app_database

# MongoDB schema is flexible
user_profile_example = {
    "_id": "user_123",
    "email": "[email protected]",
    "profile": {
        "first_name": "John",
        "last_name": "Doe",
        "avatar_url": "https://cdn.example.com/avatars/user_123.jpg",
        "bio": "Software engineer passionate about distributed systems"
    },
    "preferences": {
        "theme": "dark",
        "notifications": {
            "email": True,
            "push": True,
            "sms": False
        },
        "language": "en"
    },
    "metadata": {
        "created_at": datetime.utcnow(),
        "updated_at": datetime.utcnow(),
        "last_login": datetime.utcnow()
    },
    # Can add new fields without schema migration
    "premium_features": {
        "enabled": True,
        "tier": "pro",
        "expires_at": datetime(2026, 12, 31)
    }
}

def update_user_profile(user_id: str, updates: dict) -> dict:
    """
    Update user profile with flexible schema.
    Can add new fields on the fly.
    """
    result = db.users.update_one(
        {"_id": user_id},
        {
            "$set": {
                **{f"profile.{k}": v for k, v in updates.items()},
                "metadata.updated_at": datetime.utcnow()
            }
        },
        upsert=True
    )
    
    return db.users.find_one({"_id": user_id})

# Complex query with aggregation
def get_active_premium_users(days: int = 30) -> list:
    """Get premium users active in the last N days."""
    from_date = datetime.utcnow() - timedelta(days=days)
    
    pipeline = [
        {
            "$match": {
                "premium_features.enabled": True,
                "metadata.last_login": {"$gte": from_date}
            }
        },
        {
            "$project": {
                "email": 1,
                "profile.first_name": 1,
                "profile.last_name": 1,
                "premium_features.tier": 1,
                "days_since_login": {
                    "$divide": [
                        {"$subtract": [datetime.utcnow(), "$metadata.last_login"]},
                        86400000  # milliseconds in a day
                    ]
                }
            }
        },
        {"$sort": {"days_since_login": 1}},
        {"$limit": 100}
    ]
    
    return list(db.users.aggregate(pipeline))

My Decision Matrix

Factor

SQL

NoSQL

ACID Transactions

✅ Strong

❌ Limited

Schema Flexibility

❌ Rigid

✅ Flexible

Horizontal Scaling

⚠️ Complex

✅ Easy

Complex Joins

✅ Excellent

❌ Limited

Write Performance

⚠️ Good

✅ Excellent

Consistency

✅ Strong

⚠️ Eventual

Tooling/Ecosystem

✅ Mature

⚠️ Growing

Database Replication

Replication improves availability and read performance by maintaining copies of data across multiple servers.

Master-Slave Replication

One primary (master) handles writes, multiple replicas (slaves) handle reads.

PostgreSQL streaming replication setup:

# Master configuration (postgresql.conf)
wal_level = replica
max_wal_senders = 10
wal_keep_size = 1GB
hot_standby = on

# Create replication user
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'secure_password';

# Python: Route reads to replicas
import random
from sqlalchemy import create_engine

class DatabaseRouter:
    """
    Route database queries to appropriate server.
    I use this pattern in production for read-heavy workloads.
    """
    
    def __init__(self, master_url: str, replica_urls: list[str]):
        self.master = create_engine(master_url, pool_size=20)
        self.replicas = [
            create_engine(url, pool_size=20)
            for url in replica_urls
        ]
    
    def get_engine(self, readonly: bool = False):
        """Get appropriate database engine."""
        if readonly and self.replicas:
            # Round-robin across replicas
            return random.choice(self.replicas)
        return self.master

# Configuration
db = DatabaseRouter(
    master_url="postgresql://master.db:5432/myapp",
    replica_urls=[
        "postgresql://replica1.db:5432/myapp",
        "postgresql://replica2.db:5432/myapp",
        "postgresql://replica3.db:5432/myapp",
    ]
)

# Usage
def get_user(user_id: str) -> dict:
    """Read from replica."""
    engine = db.get_engine(readonly=True)
    with engine.connect() as conn:
        result = conn.execute(
            text("SELECT * FROM users WHERE id = :id"),
            {"id": user_id}
        )
        return result.fetchone()

def update_user(user_id: str, data: dict):
    """Write to master."""
    engine = db.get_engine(readonly=False)
    with engine.connect() as conn:
        conn.execute(
            text("UPDATE users SET email = :email WHERE id = :id"),
            {"id": user_id, "email": data['email']}
        )
        conn.commit()

Challenges I've faced:

Replication lag: Replicas can be seconds behind master
Read-after-write consistency: User updates profile, immediately reads old data from replica
Failover complexity: What happens when master fails?

My solutions:

class SmartDatabaseRouter:
    """
    Database router with read-after-write consistency.
    """
    
    def __init__(self, master_url: str, replica_urls: list[str]):
        self.master = create_engine(master_url)
        self.replicas = [create_engine(url) for url in replica_urls]
        # Track recent writes per user
        self.recent_writes = {}  # user_id -> timestamp
        self.write_window = 5  # seconds
    
    def get_engine(self, readonly: bool = False, user_id: str = None):
        """
        Get engine with read-after-write consistency.
        If user recently wrote, route reads to master.
        """
        if not readonly:
            # Always route writes to master
            if user_id:
                self.recent_writes[user_id] = datetime.utcnow()
            return self.master
        
        # Check if user recently wrote
        if user_id and user_id in self.recent_writes:
            last_write = self.recent_writes[user_id]
            seconds_since_write = (datetime.utcnow() - last_write).total_seconds()
            
            if seconds_since_write < self.write_window:
                # Route to master to ensure consistency
                return self.master
            else:
                # Safe to remove from tracking
                del self.recent_writes[user_id]
        
        # Route to replica
        return random.choice(self.replicas) if self.replicas else self.master

Multi-Master Replication

Multiple nodes can accept writes. More complex but eliminates single point of failure.

When I use multi-master:

Global applications with users in different regions
High write throughput requirements
When downtime is absolutely unacceptable

Trade-offs:

✅ No single point of failure
✅ Better write performance
✅ Lower latency for geographically distributed users
❌ Conflict resolution complexity
❌ Eventual consistency (not strong consistency)
❌ More complex operations

Database Sharding

Sharding splits data across multiple databases to handle massive scale.

Sharding Strategies

1. Range-based sharding:

class RangeSharding:
    """
    Shard data based on ranges.
    Example: Users A-M on shard1, N-Z on shard2
    """
    
    def __init__(self, shards: dict):
        """
        shards = {
            "shard1": ("A", "M", engine1),
            "shard2": ("N", "Z", engine2)
        }
        """
        self.shards = shards
    
    def get_shard(self, user_id: str):
        """Determine which shard contains the user."""
        first_char = user_id[0].upper()
        
        for shard_name, (start, end, engine) in self.shards.items():
            if start <= first_char <= end:
                return engine
        
        raise ValueError(f"No shard found for user_id: {user_id}")

# Usage
shards = RangeSharding({
    "shard1": ("A", "M", create_engine("postgresql://shard1.db:5432/users")),
    "shard2": ("N", "Z", create_engine("postgresql://shard2.db:5432/users"))
})

def get_user(user_id: str):
    """Get user from appropriate shard."""
    engine = shards.get_shard(user_id)
    with engine.connect() as conn:
        return conn.execute(
            text("SELECT * FROM users WHERE id = :id"),
            {"id": user_id}
        ).fetchone()

Problems I've hit:

Uneven distribution (shard1 has way more users than shard2)
Hard to rebalance
Hot shards (one shard gets disproportionate traffic)

2. Hash-based sharding (my preferred method):

import hashlib

class HashSharding:
    """
    Shard data using consistent hashing.
    Better distribution than range-based sharding.
    """
    
    def __init__(self, shard_engines: list):
        self.shards = shard_engines
        self.num_shards = len(shard_engines)
    
    def get_shard(self, key: str):
        """Determine shard using hash of key."""
        hash_value = int(hashlib.md5(key.encode()).hexdigest(), 16)
        shard_index = hash_value % self.num_shards
        return self.shards[shard_index]

# Setup
shard_engines = [
    create_engine("postgresql://shard1.db:5432/users"),
    create_engine("postgresql://shard2.db:5432/users"),
    create_engine("postgresql://shard3.db:5432/users"),
    create_engine("postgresql://shard4.db:5432/users"),
]

sharding = HashSharding(shard_engines)

def create_user(user_id: str, data: dict):
    """Create user in appropriate shard."""
    engine = sharding.get_shard(user_id)
    with engine.connect() as conn:
        conn.execute(
            text("""
                INSERT INTO users (id, email, created_at)
                VALUES (:id, :email, :created_at)
            """),
            {
                "id": user_id,
                "email": data['email'],
                "created_at": datetime.utcnow()
            }
        )
        conn.commit()

3. Geographic sharding:

class GeoSharding:
    """
    Shard by geographic region for reduced latency.
    I use this for global applications.
    """
    
    def __init__(self):
        self.region_shards = {
            "us-east": create_engine("postgresql://us-east.db:5432/users"),
            "us-west": create_engine("postgresql://us-west.db:5432/users"),
            "eu-west": create_engine("postgresql://eu-west.db:5432/users"),
            "ap-south": create_engine("postgresql://ap-south.db:5432/users"),
        }
    
    def get_shard(self, region: str):
        """Get shard for user's region."""
        return self.region_shards.get(region, self.region_shards["us-east"])

geo_shards = GeoSharding()

def get_user_by_region(user_id: str, region: str):
    """Get user from geographically appropriate shard."""
    engine = geo_shards.get_shard(region)
    # Query logic...

Sharding Challenges

Cross-shard queries:

def get_all_premium_users():
    """
    Query across all shards.
    This is expensive - avoid if possible!
    """
    all_results = []
    
    # Query each shard
    for engine in shard_engines:
        with engine.connect() as conn:
            results = conn.execute(
                text("SELECT * FROM users WHERE premium = true")
            ).fetchall()
            all_results.extend(results)
    
    # Merge and sort results
    return sorted(all_results, key=lambda x: x['created_at'], reverse=True)

My approach to minimize cross-shard queries:

Denormalize data when necessary
Use a separate analytics database
Design shard key to keep related data together
Accept eventual consistency for aggregations

Database Partitioning

Partitioning splits large tables within a single database.

-- PostgreSQL: Partition orders table by date
CREATE TABLE orders (
    id UUID,
    user_id UUID,
    created_at TIMESTAMP,
    total DECIMAL(10, 2)
) PARTITION BY RANGE (created_at);

-- Create partitions for each month
CREATE TABLE orders_2026_01 PARTITION OF orders
    FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');

CREATE TABLE orders_2026_02 PARTITION OF orders
    FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

CREATE TABLE orders_2026_03 PARTITION OF orders
    FOR VALUES FROM ('2026-03-01') TO ('2026-04-01');

-- Queries automatically use appropriate partition
SELECT * FROM orders WHERE created_at >= '2026-01-15';
-- Only scans orders_2026_01 and orders_2026_02

Indexing Strategies

Indexes are critical for query performance.

-- Single column index
CREATE INDEX idx_users_email ON users(email);

-- Composite index (order matters!)
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- Partial index (for specific query patterns)
CREATE INDEX idx_active_premium_users ON users(id)
    WHERE premium = true AND status = 'active';

-- Full-text search index
CREATE INDEX idx_products_search ON products
    USING gin(to_tsvector('english', name || ' ' || description));

Indexing rules I follow:

# Monitor slow queries
"""
PostgreSQL slow query log configuration:

log_min_duration_statement = 1000  # Log queries > 1 second
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '
log_statement = 'all'
"""

# Analyze query performance
def analyze_query(query: str):
    """Use EXPLAIN ANALYZE to understand query performance."""
    with engine.connect() as conn:
        result = conn.execute(text(f"EXPLAIN ANALYZE {query}"))
        for row in result:
            print(row[0])

# Example
analyze_query("""
    SELECT u.email, COUNT(o.id) as order_count
    FROM users u
    LEFT JOIN orders o ON u.id = o.user_id
    WHERE u.created_at > '2025-01-01'
    GROUP BY u.email
    HAVING COUNT(o.id) > 5
""")

Real-World Database Architecture

Here's a production setup I've used:

Tier 1: Application Database (PostgreSQL)

Master: Handles all writes
3 read replicas: Distribute read load
Connection pooling: PgBouncer
Automatic failover: Patroni + etcd

Tier 2: Cache Layer (Redis)

Reduces database load by 85%
Stores session data, frequently accessed records
Redis Cluster for high availability

Tier 3: Analytics Database (ClickHouse)

Separate from transactional database
Optimized for aggregations and reporting
Data replicated from PostgreSQL via Kafka

Tier 4: Search Engine (Elasticsearch)

Full-text search capabilities
Product catalog search
Log aggregation and analysis

class MultiTierDatabase:
    """
    Multi-tier database architecture I use in production.
    """
    
    def __init__(self):
        # Transactional database
        self.db = DatabaseRouter(
            master_url="postgresql://master:5432/app",
            replica_urls=[
                "postgresql://replica1:5432/app",
                "postgresql://replica2:5432/app",
                "postgresql://replica3:5432/app",
            ]
        )
        
        # Cache layer
        self.cache = redis.Redis(host='redis-cluster', port=6379)
        
        # Search engine
        self.search = Elasticsearch(['elasticsearch:9200'])
    
    def get_user(self, user_id: str) -> dict:
        """Get user with multi-tier lookup."""
        # Try cache first
        cache_key = f"user:{user_id}"
        cached = self.cache.get(cache_key)
        if cached:
            return json.loads(cached)
        
        # Query database
        engine = self.db.get_engine(readonly=True)
        with engine.connect() as conn:
            user = conn.execute(
                text("SELECT * FROM users WHERE id = :id"),
                {"id": user_id}
            ).fetchone()
        
        # Update cache
        if user:
            self.cache.setex(cache_key, 3600, json.dumps(dict(user)))
        
        return dict(user) if user else None
    
    def search_products(self, query: str, page: int = 1, size: int = 20):
        """Search products using Elasticsearch."""
        result = self.search.search(
            index="products",
            body={
                "query": {
                    "multi_match": {
                        "query": query,
                        "fields": ["name^2", "description", "category"]
                    }
                },
                "from": (page - 1) * size,
                "size": size
            }
        )
        return result['hits']['hits']

Lessons Learned

What worked:

Start with a single database, scale when you have real metrics
Use read replicas before sharding
PostgreSQL handles way more than you think
Monitor query performance from day one
Cache aggressively, invalidate carefully

What didn't work:

Premature sharding (added complexity without benefit)
Choosing NoSQL for everything (missed SQL's strengths)
Not setting up proper indexes (queries became unbearably slow)
Ignoring replication lag (caused user-facing bugs)
Over-normalizing data (too many joins killed performance)

What's Next

With database design covered, let's explore asynchronous processing with message queues:

Message Queues & Async Processing →: Event-driven architecture patterns

Navigation:

PreviousCaching Strategies NextMessage Queues & Async Processing

Last updated 1 month ago

hashtagIntroduction

hashtagSQL vs NoSQL: The Real Trade-offs

hashtagWhen I Choose SQL (PostgreSQL, MySQL)

hashtagWhen I Choose NoSQL (MongoDB, DynamoDB)

hashtagMy Decision Matrix

hashtagDatabase Replication

hashtagMaster-Slave Replication

hashtagMulti-Master Replication

hashtagDatabase Sharding

hashtagSharding Strategies

hashtagSharding Challenges

hashtagDatabase Partitioning

hashtagIndexing Strategies

hashtagReal-World Database Architecture

hashtagLessons Learned

hashtagWhat's Next