# Database Design Best Practices

*Pulling it all together — patterns, anti-patterns, and production realities*

***

## Table of Contents

1. [Introduction: The Evolution of My Blog Schema](#introduction-the-evolution-of-my-blog-schema)
2. [Schema Design Principles](#schema-design-principles)
3. [Naming Conventions That Save You Pain](#naming-conventions-that-save-you-pain)
4. [Data Types: Choosing Correctly the First Time](#data-types-choosing-correctly-the-first-time)
5. [Schema Evolution and Migrations](#schema-evolution-and-migrations)
6. [PostgreSQL Advanced Features in Practice](#postgresql-advanced-features-in-practice)
7. [Connection Pooling and Resource Management](#connection-pooling-and-resource-management)
8. [Monitoring and Observability](#monitoring-and-observability)
9. [Common Anti-Patterns to Avoid](#common-anti-patterns-to-avoid)
10. [The Production-Ready Blog Schema](#the-production-ready-blog-schema)
11. [Bridging to ORMs](#bridging-to-orms)
12. [What I Learned Designing Databases](#what-i-learned-designing-databases)
13. [Your Database Journey Continues](#your-database-journey-continues)

***

## Introduction: The Evolution of My Blog Schema

When I started this series, I showed you a single `posts` table with five columns. By now, working through relationships, indexes, transactions, views, security, and backups, that humble table has grown into a production-grade schema.

But databases don't spring into existence fully formed. They **evolve** — and how you manage that evolution is just as important as how you design the initial schema.

This final article is about the design decisions, naming conventions, advanced PostgreSQL features, and monitoring practices that separate a hobby project database from a production system. It's a synthesis of everything we've covered, plus the operational wisdom I accumulated through years of running real systems.

***

## Schema Design Principles

### 1. Design for Queries, Not Just Storage

The most common mistake is designing a schema that "makes sense" conceptually but performs poorly for actual queries. Before finalising your schema, list the top 10 queries your application will run. Design around them.

```sql
-- If your most common query is:
SELECT p.*, a.name FROM posts p JOIN authors a ON p.author_id = a.id
WHERE p.status = 'published' ORDER BY p.published_at DESC LIMIT 20;

-- Your schema needs:
-- • status column with an index (or partial index WHERE status='published')
-- • published_at with a descending index
-- • author_id with a foreign key index
-- • Fast author name lookup
```

### 2. Stable Primary Keys

Use surrogate keys (SERIAL, BIGSERIAL, UUID) as primary keys, not natural keys (email, username, slug). Natural keys change; surrogate keys don't.

```sql
-- Good: surrogate primary key
CREATE TABLE authors (
    id      BIGSERIAL PRIMARY KEY,   -- stable, never changes
    email   TEXT UNIQUE NOT NULL,    -- natural key — unique but not the PK
    slug    TEXT UNIQUE NOT NULL     -- another natural key
);

-- Avoid: natural key as PK (email changes cause cascading FK updates)
CREATE TABLE authors (
    email   TEXT PRIMARY KEY         -- ❌ painful when email changes
);
```

### 3. UUID vs BIGSERIAL

```sql
-- BIGSERIAL: simple, fast, sequential (but reveals row count)
id BIGSERIAL PRIMARY KEY

-- UUID v4: globally unique, no ordering guarantee, larger (16 bytes)
id UUID DEFAULT gen_random_uuid() PRIMARY KEY

-- UUID v7 (PostgreSQL 17+ or uuid-ossp extension): time-ordered UUID
-- Best of both: globally unique + sortable
COMMENT ON COLUMN posts.id IS 'UUID v7: ordered for index efficiency';
```

**Recommendation**: Use `BIGSERIAL` for single-server applications. Use UUID v7 for distributed systems where IDs must be globally unique.

### 4. Soft Deletes vs Hard Deletes

```sql
-- Soft delete pattern
ALTER TABLE posts ADD COLUMN deleted_at TIMESTAMPTZ DEFAULT NULL;

-- Partial index: only index non-deleted rows (most queries ignore deleted)
CREATE INDEX idx_posts_active ON posts(published_at DESC)
    WHERE deleted_at IS NULL AND status = 'published';

-- View for application to use (hides deleted rows by default)
CREATE VIEW active_posts AS
SELECT * FROM posts WHERE deleted_at IS NULL;
```

Soft deletes are essential when:

* You need an audit trail of deletions
* Users may want to "undo" a deletion
* Regulatory compliance requires data retention

***

## Naming Conventions That Save You Pain

Consistent naming is invisible when you get it right and excruciating when you get it wrong.

### Tables and Columns

```sql
-- Tables: lowercase, plural, snake_case
posts, authors, post_categories, activity_logs

-- Columns: lowercase, snake_case, explicit
author_id     -- not: author, authorId, AuthorID
published_at  -- not: publishDate, publish_date, publishedAt
is_featured   -- not: featured, isFeatured, FEATURED

-- Standard columns on every table
id          BIGSERIAL PRIMARY KEY,
created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()

-- Booleans: use is_ or has_ prefix
is_published  BOOLEAN NOT NULL DEFAULT FALSE
has_comments  BOOLEAN NOT NULL DEFAULT TRUE
```

### Indexes and Constraints

```sql
-- Indexes: idx_{table}_{column(s)}
idx_posts_author_id
idx_posts_status_published_at
idx_comments_post_id

-- Unique constraints: uq_{table}_{column(s)}
uq_authors_email
uq_posts_slug

-- Foreign keys (PostgreSQL names them automatically, but for explicitness):
-- fk_{table}_{referenced_table}
CONSTRAINT fk_posts_authors FOREIGN KEY (author_id) REFERENCES authors(id)

-- Check constraints: chk_{table}_{description}
CONSTRAINT chk_posts_status CHECK (status IN ('draft', 'published', 'archived'))
```

***

## Data Types: Choosing Correctly the First Time

Poor type choices are expensive to fix later. Get them right upfront.

### Text Columns

```sql
-- TEXT: preferred for variable-length strings (no padding, no limts to worry about)
title   TEXT NOT NULL
content TEXT NOT NULL
email   TEXT NOT NULL

-- VARCHAR(n): use only when you have a hard business rule for max length
-- (e.g., "our API only accepts slugs up to 255 characters")
slug    VARCHAR(255) UNIQUE NOT NULL

-- CHAR(n): almost never use — pads with spaces, wastes space
-- status CHAR(10)  -- ❌ avoid
```

### Numeric Columns

```sql
-- BIGINT: IDs, counts (safer than INT for anything that might grow large)
id          BIGINT
view_count  BIGINT DEFAULT 0

-- INTEGER: acceptable for columns guaranteed to stay under 2 billion
age         INTEGER

-- NUMERIC(p, s): exact decimal arithmetic (money, measurements)
price       NUMERIC(10, 2)  -- up to 99,999,999.99
rating      NUMERIC(3, 1)   -- e.g. 4.5

-- FLOAT / DOUBLE: scientific calculations only (imprecise for money!)
-- NEVER use FLOAT for financial values
```

### Date and Time

```sql
-- TIMESTAMPTZ: almost always the right choice
-- Stores in UTC, displays in session timezone
created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
published_at TIMESTAMPTZ

-- TIMESTAMP (without timezone): only for truly timezone-agnostic data
-- (e.g., "open at 9:00 AM local time, wherever that is")

-- DATE: for date-only values (birthdays, event dates)
birth_date  DATE

-- INTERVAL: for durations
reading_time INTERVAL  -- e.g. '5 minutes'
```

### JSONB: When to Use It

```sql
-- Bad use of JSONB (schema-less data that should be columnar)
-- post_metadata JSONB  -- contains title, author_id, status → should be columns

-- Good use of JSONB (genuinely flexible, extensible metadata)
CREATE TABLE posts (
    id              BIGSERIAL PRIMARY KEY,
    title           TEXT NOT NULL,            -- structured → column
    status          TEXT NOT NULL,            -- structured → column
    extra_metadata  JSONB DEFAULT '{}'::JSONB  -- flexible, extensible → JSONB
);

-- Index JSONB for query performance
CREATE INDEX idx_posts_metadata ON posts USING GIN(extra_metadata);

-- Query JSONB
SELECT * FROM posts WHERE extra_metadata @> '{"featured": true}';
SELECT * FROM posts WHERE extra_metadata->>'canonical_url' IS NOT NULL;
SELECT extra_metadata->>'reading_time' AS reading_time FROM posts WHERE id = 42;
```

***

## Schema Evolution and Migrations

Your schema will change. Managing migrations safely is crucial.

### Safe Migration Patterns

```sql
-- ✅ SAFE: Adding a nullable column (no table lock)
ALTER TABLE posts ADD COLUMN subtitle TEXT;

-- ✅ SAFE: Adding a column with a default (PostgreSQL 11+)
ALTER TABLE posts ADD COLUMN is_featured BOOLEAN NOT NULL DEFAULT FALSE;

-- ⚠️ RISKY: Adding a NOT NULL column without a default on large tables
-- (rewrites entire table — causes long lock)
-- Do it in steps:
ALTER TABLE posts ADD COLUMN slug TEXT;                    -- step 1: add nullable
UPDATE posts SET slug = generate_slug(title) WHERE slug IS NULL;  -- step 2: populate
ALTER TABLE posts ALTER COLUMN slug SET NOT NULL;          -- step 3: add constraint

-- ✅ SAFE: Renaming via view (zero downtime)
-- 1. Add new column
ALTER TABLE posts ADD COLUMN body TEXT;
-- 2. Create trigger to keep both in sync
-- 3. Backfill: UPDATE posts SET body = content;
-- 4. Update app to use new column name
-- 5. Drop old column after all references updated
```

### Zero-Downtime Index Creation

```sql
-- Standard CREATE INDEX takes a lock that blocks writes
-- For production: use CONCURRENTLY
CREATE INDEX CONCURRENTLY idx_posts_new_column ON posts(new_column);

-- Note: CONCURRENTLY is slower and cannot run inside a transaction block
-- Check for invalid indexes after creation:
SELECT indexname, indisvalid
FROM pg_indexes
JOIN pg_class ON pg_class.relname = pg_indexes.indexname
WHERE schemaname = 'public' AND NOT indisvalid;
```

### Migration Tooling

Structure SQL migrations as numbered, sequential files:

```
migrations/
├── 001_initial_schema.sql
├── 002_add_post_views.sql
├── 003_add_categories.sql
├── 004_add_search_vector.sql
├── 005_add_soft_delete.sql
```

Use a migration tool to track which have run:

* **Flyway** or **Liquibase** for Java ecosystems
* **Alembic** for Python
* **Prisma Migrate** for Node.js/TypeScript (see ORM 101 series)
* **golang-migrate** for Go
* **dbmate** for a database-agnostic choice

***

## PostgreSQL Advanced Features in Practice

### Generated Columns

```sql
-- Computed column, always derived from other columns
ALTER TABLE posts ADD COLUMN word_count INT
    GENERATED ALWAYS AS (
        ARRAY_LENGTH(STRING_TO_ARRAY(TRIM(content), ' '), 1)
    ) STORED;

-- Full-text search vector, always kept up to date
ALTER TABLE posts ADD COLUMN search_vector TSVECTOR
    GENERATED ALWAYS AS (
        TO_TSVECTOR('english', COALESCE(title, '') || ' ' || COALESCE(content, ''))
    ) STORED;

CREATE INDEX idx_posts_search ON posts USING GIN(search_vector);

-- Query immediately available
SELECT * FROM posts WHERE search_vector @@ TO_TSQUERY('english', 'postgresql & performance');
```

### Table Partitioning

For very large tables (millions of rows), partitioning splits the physical storage:

```sql
-- Range partition posts by year
CREATE TABLE posts (
    id          BIGSERIAL,
    title       TEXT NOT NULL,
    published_at TIMESTAMPTZ,
    -- other columns
    PRIMARY KEY (id, published_at)
) PARTITION BY RANGE (published_at);

CREATE TABLE posts_2024 PARTITION OF posts
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

CREATE TABLE posts_2025 PARTITION OF posts
    FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

CREATE TABLE posts_2026 PARTITION OF posts
    FOR VALUES FROM ('2026-01-01') TO ('2027-01-01');

-- Queries that include a date range filter only scan relevant partitions
SELECT * FROM posts
WHERE published_at BETWEEN '2025-01-01' AND '2025-12-31';
-- Only scans posts_2025 partition!
```

### CTEs with `RETURNING`

```sql
-- Insert a post and log it in one statement
WITH new_post AS (
    INSERT INTO posts (title, content, author_id, status)
    VALUES ('Advanced PostgreSQL', 'Content here...', 5, 'draft')
    RETURNING id, title, author_id
),
logged AS (
    INSERT INTO activity_log (action, entity_type, entity_id)
    SELECT 'post_created', 'post', id FROM new_post
)
SELECT id, title FROM new_post;
```

### `UPSERT` with `ON CONFLICT`

```sql
-- Insert or update post view count atomically
INSERT INTO post_view_counts (post_id, view_count, last_viewed_at)
VALUES (42, 1, NOW())
ON CONFLICT (post_id) DO UPDATE
    SET view_count    = post_view_counts.view_count + EXCLUDED.view_count,
        last_viewed_at = EXCLUDED.last_viewed_at;

-- Insert or ignore (e.g., deduplication)
INSERT INTO post_tags (post_id, tag_id)
VALUES (42, 7)
ON CONFLICT (post_id, tag_id) DO NOTHING;
```

***

## Connection Pooling and Resource Management

Each PostgreSQL connection is an OS process consuming \~5–10 MB of RAM. Connection pooling is essential for web applications.

{% @mermaid/diagram content="graph LR
App1\["App Instance 1\n(10 threads)"]
App2\["App Instance 2\n(10 threads)"]
App3\["App Instance 3\n(10 threads)"]
Pool\["PgBouncer\nConnection Pool\n(20 connections)"]
DB\["PostgreSQL\n(max 100 connections)"]

```
App1 -->|up to 10| Pool
App2 -->|up to 10| Pool
App3 -->|up to 10| Pool
Pool -->|pooled 20| DB" %}
```

### PgBouncer Configuration

```ini
# pgbouncer.ini
[databases]
blog_db = host=127.0.0.1 port=5432 dbname=blog_db

[pgbouncer]
pool_mode = transaction          ; transaction pooling (most efficient)
max_client_conn = 1000           ; max connections from applications
default_pool_size = 25           ; connections to PostgreSQL per database
min_pool_size = 5
reserve_pool_size = 5
server_lifetime = 3600
server_idle_timeout = 600
log_connections = 0
log_disconnections = 0
listen_port = 6432
```

### Configure postgresql.conf for Your Workload

```ini
# postgresql.conf tuning for a typical web application
max_connections = 100           # Use PgBouncer to handle more clients
shared_buffers = 256MB          # 25% of RAM (start here)
effective_cache_size = 1GB      # 75% of RAM (planner hint)
work_mem = 4MB                  # Per-sort/hash operation
maintenance_work_mem = 64MB     # For VACUUM, CREATE INDEX
wal_buffers = 16MB
checkpoint_completion_target = 0.9
random_page_cost = 1.1          # For SSD storage (default 4.0 is for HDD)
effective_io_concurrency = 200  # For SSD storage
```

Use [PGTune](https://pgtune.leopard.in.ua/) to generate settings tuned to your hardware.

***

## Monitoring and Observability

### Key Metrics to Track

```sql
-- 1. Slow queries
SELECT
    LEFT(query, 80)                              AS query,
    calls,
    ROUND(mean_exec_time::NUMERIC, 2)            AS avg_ms,
    ROUND(total_exec_time::NUMERIC / 1000, 1)    AS total_seconds
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

-- 2. Table bloat (dead tuples that need VACUUM)
SELECT
    relname               AS table_name,
    n_live_tup            AS live_rows,
    n_dead_tup            AS dead_rows,
    ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 1) AS dead_pct,
    last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;

-- 3. Cache hit ratio (should be > 99%)
SELECT
    SUM(heap_blks_hit) AS heap_hits,
    SUM(heap_blks_read) AS heap_reads,
    ROUND(100.0 * SUM(heap_blks_hit) /
        NULLIF(SUM(heap_blks_hit) + SUM(heap_blks_read), 0), 2) AS hit_ratio_pct
FROM pg_statio_user_tables;

-- 4. Lock waits
SELECT
    pid,
    wait_event_type,
    wait_event,
    LEFT(query, 60) AS query
FROM pg_stat_activity
WHERE wait_event_type = 'Lock'
ORDER BY query_start;

-- 5. Index usage efficiency
SELECT
    indexrelname,
    idx_scan                                     AS scans,
    idx_tup_read                                 AS rows_read,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;
```

### Autovacuum Tuning

```sql
-- Per-table autovacuum settings for high-traffic tables
ALTER TABLE posts SET (
    autovacuum_vacuum_scale_factor = 0.01,    -- vacuum after 1% dead tuples
    autovacuum_analyze_scale_factor = 0.005,  -- analyze after 0.5% changes
    autovacuum_vacuum_cost_delay = 2           -- less aggressive throttling
);

-- Check autovacuum activity
SELECT
    relname,
    last_vacuum,
    last_autovacuum,
    last_analyze,
    last_autoanalyze,
    n_dead_tup
FROM pg_stat_user_tables
WHERE relname = 'posts';
```

***

## Common Anti-Patterns to Avoid

### 1. The EAV (Entity-Attribute-Value) Trap

```sql
-- ❌ EAV: flexible but catastrophically slow to query
CREATE TABLE post_properties (
    post_id     INTEGER,
    prop_name   TEXT,
    prop_value  TEXT
);
-- Querying "posts where author = X AND status = published" requires multiple SELFs JOINs
-- Use JSONB instead if you truly need flexible attributes

-- ✅ Better: JSONB with appropriate indexes
ALTER TABLE posts ADD COLUMN extra JSONB DEFAULT '{}';
```

### 2. Storing Comma-Separated Values

```sql
-- ❌ Never do this
ALTER TABLE posts ADD COLUMN tag_names TEXT;  -- 'postgresql,tutorial,backend'

-- ✅ Use a proper junction table
CREATE TABLE post_tags (
    post_id  BIGINT REFERENCES posts(id) ON DELETE CASCADE,
    tag_id   BIGINT REFERENCES tags(id)  ON DELETE CASCADE,
    PRIMARY KEY (post_id, tag_id)
);

-- Or PostgreSQL arrays with GIN index (acceptable for simple lookups)
ALTER TABLE posts ADD COLUMN tag_ids BIGINT[];
CREATE INDEX idx_posts_tag_ids ON posts USING GIN(tag_ids);
```

### 3. SELECT \* in Application Code

```sql
-- ❌ Fetching all columns including large TEXT fields
SELECT * FROM posts WHERE status = 'published';

-- ✅ Only fetch what you need
SELECT id, title, slug, published_at, author_id
FROM posts WHERE status = 'published';
```

### 4. Missing Updated\_at

```sql
-- Every mutable table should have updated_at
-- Apply the trigger from the Views article to all tables:
CREATE TRIGGER trg_posts_updated_at
BEFORE UPDATE ON posts
FOR EACH ROW EXECUTE FUNCTION set_updated_at();

CREATE TRIGGER trg_authors_updated_at
BEFORE UPDATE ON authors
FOR EACH ROW EXECUTE FUNCTION set_updated_at();

-- Repeat for every table that will be updated
```

### 5. Not Using Transactions for Multi-Step Operations

This was covered in the Transactions article, but it bears repeating as the #1 source of data corruption in hobby applications. Any operation that touches more than one table **must** be wrapped in `BEGIN...COMMIT`.

***

## The Production-Ready Blog Schema

Here is the complete, production-ready blog schema incorporating everything from this series:

```sql
-- ============================================
-- BLOG SYSTEM: Production-Ready Schema
-- Last updated: 2026
-- ============================================

-- Trigger function for updated_at (applied to all tables)
CREATE OR REPLACE FUNCTION set_updated_at()
RETURNS TRIGGER AS $$
BEGIN NEW.updated_at := NOW(); RETURN NEW; END;
$$ LANGUAGE plpgsql;

-- AUTHORS
CREATE TABLE authors (
    id              BIGSERIAL PRIMARY KEY,
    name            TEXT        NOT NULL,
    email           TEXT        NOT NULL,
    bio             TEXT,
    slug            TEXT        NOT NULL,
    avatar_url      TEXT,
    published_post_count INTEGER NOT NULL DEFAULT 0,
    is_active       BOOLEAN     NOT NULL DEFAULT TRUE,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    deleted_at      TIMESTAMPTZ,

    CONSTRAINT uq_authors_email UNIQUE (email),
    CONSTRAINT uq_authors_slug  UNIQUE (slug)
);
CREATE INDEX idx_authors_active ON authors(slug) WHERE deleted_at IS NULL AND is_active;
CREATE TRIGGER trg_authors_ua BEFORE UPDATE ON authors FOR EACH ROW EXECUTE FUNCTION set_updated_at();

-- CATEGORIES
CREATE TABLE categories (
    id          BIGSERIAL PRIMARY KEY,
    name        TEXT    NOT NULL,
    slug        TEXT    NOT NULL,
    description TEXT,
    parent_id   BIGINT  REFERENCES categories(id),
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),

    CONSTRAINT uq_categories_slug UNIQUE (slug)
);
CREATE TRIGGER trg_categories_ua BEFORE UPDATE ON categories FOR EACH ROW EXECUTE FUNCTION set_updated_at();

-- TAGS
CREATE TABLE tags (
    id         BIGSERIAL PRIMARY KEY,
    name       TEXT NOT NULL,
    slug       TEXT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),

    CONSTRAINT uq_tags_slug UNIQUE (slug)
);

-- POSTS
CREATE TABLE posts (
    id              BIGSERIAL PRIMARY KEY,
    author_id       BIGINT      NOT NULL REFERENCES authors(id),
    title           TEXT        NOT NULL,
    slug            TEXT        NOT NULL,
    excerpt         TEXT,
    content         TEXT,
    status          TEXT        NOT NULL DEFAULT 'draft',
    views           BIGINT      NOT NULL DEFAULT 0,
    is_featured     BOOLEAN     NOT NULL DEFAULT FALSE,
    extra           JSONB       NOT NULL DEFAULT '{}',
    search_vector   TSVECTOR    GENERATED ALWAYS AS (
                        TO_TSVECTOR('english',
                            COALESCE(title, '') || ' ' || COALESCE(content, ''))
                    ) STORED,
    published_at    TIMESTAMPTZ,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    deleted_at      TIMESTAMPTZ,

    CONSTRAINT uq_posts_slug   UNIQUE (slug),
    CONSTRAINT chk_post_status CHECK (status IN ('draft','published','archived','scheduled'))
);
CREATE INDEX idx_posts_author     ON posts(author_id);
CREATE INDEX idx_posts_published  ON posts(published_at DESC) WHERE status = 'published' AND deleted_at IS NULL;
CREATE INDEX idx_posts_featured   ON posts(published_at DESC) WHERE is_featured AND status = 'published';
CREATE INDEX idx_posts_search     ON posts USING GIN(search_vector);
CREATE INDEX idx_posts_extra      ON posts USING GIN(extra);
CREATE TRIGGER trg_posts_ua BEFORE UPDATE ON posts FOR EACH ROW EXECUTE FUNCTION set_updated_at();

-- JUNCTION TABLES
CREATE TABLE post_categories (
    post_id     BIGINT NOT NULL REFERENCES posts(id)      ON DELETE CASCADE,
    category_id BIGINT NOT NULL REFERENCES categories(id) ON DELETE CASCADE,
    PRIMARY KEY (post_id, category_id)
);
CREATE INDEX idx_post_categories_cat ON post_categories(category_id);

CREATE TABLE post_tags (
    post_id BIGINT NOT NULL REFERENCES posts(id) ON DELETE CASCADE,
    tag_id  BIGINT NOT NULL REFERENCES tags(id)  ON DELETE CASCADE,
    PRIMARY KEY (post_id, tag_id)
);
CREATE INDEX idx_post_tags_tag ON post_tags(tag_id);

-- COMMENTS
CREATE TABLE comments (
    id          BIGSERIAL PRIMARY KEY,
    post_id     BIGINT  NOT NULL REFERENCES posts(id) ON DELETE CASCADE,
    parent_id   BIGINT  REFERENCES comments(id) ON DELETE CASCADE,
    author_name TEXT    NOT NULL,
    author_email TEXT,
    body        TEXT    NOT NULL,
    is_approved BOOLEAN NOT NULL DEFAULT FALSE,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_comments_post     ON comments(post_id) WHERE is_approved;
CREATE INDEX idx_comments_parent   ON comments(parent_id);
CREATE TRIGGER trg_comments_ua BEFORE UPDATE ON comments FOR EACH ROW EXECUTE FUNCTION set_updated_at();
```

***

## Bridging to ORMs

At this point, you understand what's happening "under the hood" when an ORM like Prisma generates SQL.

```typescript
// Prisma Query
const posts = await prisma.post.findMany({
  where: { status: 'published' },
  include: { author: true },
  orderBy: { publishedAt: 'desc' },
  take: 20
});

// Maps exactly to:
// SELECT p.*, a.*
// FROM posts p
// JOIN authors a ON p.author_id = a.id
// WHERE p.status = 'published'
// ORDER BY p.published_at DESC
// LIMIT 20;
```

Knowing the SQL behind ORM calls means you can:

* Spot N+1 query problems in ORM code
* Know when to use raw SQL for complex queries
* Understand and design migrations correctly
* Debug performance issues intelligently

The ORM 101 series covers Prisma in depth — you're now fully prepared for it.

***

## What I Learned Designing Databases

Looking back at that first five-column `posts` table, the distance travelled is remarkable. Here are the lessons that mattered most:

1. **Design for your queries first** — EXPLAIN ANALYZE before you ship any query
2. **Name things consistently** — you'll thank yourself in 6 months
3. **Choose types precisely** — `TIMESTAMPTZ` not `TIMESTAMP`, `NUMERIC` not `FLOAT` for money
4. **Migrations are not optional** — every schema change needs a migration script
5. **Transactions for multi-step operations** — no exceptions
6. **Index foreign keys and WHERE columns** — lean on `pg_stat_statements` to find what else needs indexing
7. **JSONB for genuinely flexible data** — not as a replacement for proper columns
8. **Test your backups** — a backup you haven't tested doesn't exist
9. **Least privilege always** — your application user should never be superuser
10. **Monitor, then optimise** — never guess; measure

***

## Your Database Journey Continues

You've completed the Database 101 series. Here's where to go next:

### Immediate Next Steps

* [**ORM 101 →**](https://blog.htunnthuthu.com/getting-started/programming/orm-101) — Prisma with PostgreSQL: type-safe database access from Node.js/TypeScript
* **Advanced PostgreSQL** — Window functions at scale, logical replication, pg\_partman for partition management

### Intermediate Topics

* **Database Administration**: `pgAdmin`, `pg_activity`, connection pooling at scale
* **Performance Tuning**: `auto_explain`, `pg_stat_statements` dashboards, Grafana + Prometheus for PostgreSQL metrics
* **High Availability**: Patroni for automated failover, pgBackRest for enterprise backup management

### Advanced Topics

* **PostgreSQL Logical Replication**: replicate specific tables to downstream systems
* **Time-Series Data**: TimescaleDB as a PostgreSQL extension
* **Multi-Tenant Architectures**: Schema-per-tenant vs RLS-based tenancy
* **NoSQL Comparison**: When MongoDB, Redis, or Cassandra make more sense than PostgreSQL

***

*You started with a spreadsheet. You now have the knowledge to build production-grade data systems. That's a big deal.*

**←** [**Previous: Backup and Recovery**](https://blog.htunnthuthu.com/getting-started/programming/database-101/database-101-backup-and-recovery)

***

*Part of the* [*Database 101 Series*](https://blog.htunnthuthu.com/getting-started/programming/database-101)
