Indexes and Performance

My journey from 45-second page loads to sub-100ms responses

Introduction: The Day My Blog Became Unusable

It was a Saturday morning. I had just crossed 50,000 rows in my posts table after three years of writing. I opened my blog's admin panel to review comments, and nothing loaded. After 30 seconds, a timeout error.

I restarted the server. Same thing. Cleared the application cache. Still broken.

I connected directly to the database and ran the exact query my admin panel used:

SELECT p.*, a.name AS author_name, COUNT(c.id) AS comment_count
FROM posts p
JOIN authors a ON p.author_id = a.id
LEFT JOIN comments c ON p.id = c.post_id
WHERE p.status = 'published'
GROUP BY p.id, a.name
ORDER BY p.published_at DESC
LIMIT 20;

It returned… in 43 seconds.

The same query had been running in 80ms when I had 5,000 posts. Nobody told me databases slow down as they grow. Nobody told me I needed to design for scale.

Two hours later, after adding the right indexes, the same query ran in 4ms.

This article is the guide I wish had existed that Saturday morning.

What Is an Index?

An index is a separate data structure that PostgreSQL maintains alongside your table. It's like the index at the back of a textbook—instead of reading every page to find "ACID properties", you look it up in the index and jump directly to page 347.

Trade-off: Indexes speed up reads but slow down writes (INSERT, UPDATE, DELETE), because PostgreSQL must update the index alongside the table. Choose indexes that serve your actual query patterns.

How PostgreSQL Finds Data Without an Index

Without an index, PostgreSQL performs a Sequential Scan (seq scan): it reads every single row in the table from start to finish, checks if the row matches the WHERE condition, and either includes or discards it.

For a table with 50,000 posts:

Sequential scan: read all 50,000 rows
Your query matches 12,000 published posts
PostgreSQL discarded 38,000 rows it didn't need

With an index on status:

Index lookup: jump directly to the 12,000 matching rows
No wasted reads

Sequential scans aren't always bad—for small tables or when you're fetching most rows, they can be faster than index scans. PostgreSQL's query planner decides which to use.

B-Tree Indexes: The Default Workhorse

The default index type in PostgreSQL is a B-Tree (Balanced Tree). It handles:

=, <, >, <=, >= comparisons
BETWEEN ranges
IN lists
LIKE 'prefix%' (prefix matches only, not '%suffix')
IS NULL, IS NOT NULL
ORDER BY without a separate sort

-- Creating indexes on the blog schema
CREATE INDEX idx_posts_status ON posts(status);
CREATE INDEX idx_posts_author_id ON posts(author_id);
CREATE INDEX idx_posts_published_at ON posts(published_at);
CREATE INDEX idx_comments_post_id ON comments(post_id);
CREATE INDEX idx_post_categories_post_id ON post_categories(post_id);
CREATE INDEX idx_post_categories_category_id ON post_categories(category_id);

💡 Foreign keys are not automatically indexed in PostgreSQL. Always add indexes on foreign key columns—they're used in almost every JOIN.

Naming Convention

-- Pattern: idx_{table}_{column(s)}
CREATE INDEX idx_posts_status         ON posts(status);
CREATE INDEX idx_posts_author_status  ON posts(author_id, status);

-- Unique indexes
CREATE UNIQUE INDEX uq_authors_email  ON authors(email);

Index Strategies for Real Queries

Always Index Foreign Keys

-- These should exist on every blog schema
CREATE INDEX idx_posts_author_id      ON posts(author_id);
CREATE INDEX idx_comments_post_id     ON comments(post_id);
CREATE INDEX idx_post_categories_post ON post_categories(post_id);
CREATE INDEX idx_post_categories_cat  ON post_categories(category_id);
CREATE INDEX idx_post_tags_post       ON post_tags(post_id);
CREATE INDEX idx_post_tags_tag        ON post_tags(tag_id);

Index Columns Used in WHERE Clauses

Identify the most common filter conditions:

-- We frequently filter by status and sort by published_at
CREATE INDEX idx_posts_status_published ON posts(status, published_at DESC);

Index Columns Used in ORDER BY

Without an index, ORDER BY requires a full sort. With the right index, results arrive pre-sorted:

-- Dashboard: latest published posts (very common query)
-- This index serves both the WHERE and ORDER BY
CREATE INDEX idx_posts_published_at_desc ON posts(published_at DESC)
    WHERE status = 'published';

EXPLAIN and EXPLAIN ANALYZE

EXPLAIN shows the query plan. EXPLAIN ANALYZE actually executes the query and shows real timing.

-- Check query plan without executing
EXPLAIN SELECT * FROM posts WHERE status = 'published';

-- Execute and show real timing
EXPLAIN ANALYZE SELECT * FROM posts WHERE status = 'published';

-- Full verbose output
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT p.title, a.name
FROM posts p
JOIN authors a ON p.author_id = a.id
WHERE p.status = 'published'
ORDER BY p.published_at DESC
LIMIT 20;

Reading EXPLAIN Output

Limit (cost=0.29..1.23 rows=20 width=136) (actual time=0.042..0.198 rows=20 loops=1)
  -> Index Scan using idx_posts_published_at_desc on posts p
       (cost=0.29..2847.13 rows=60521 width=136)
       (actual time=0.040..0.180 rows=20 loops=1)
       Filter: ((status)::text = 'published'::text)
     -> Index Scan using authors_pkey on authors a
          (cost=0.15..0.17 rows=1 width=52) (actual time=0.004..0.004 rows=1 loops=20)
          Index Cond: (id = p.author_id)
Planning Time: 0.312 ms
Execution Time: 0.248 ms

Key things to look for:

Term

Meaning

Seq Scan

⚠️ Reading every row—may need an index

Index Scan

✅ Using an index

Index Only Scan

✅✅ Reading only from index (fastest)

Bitmap Heap Scan

✅ Multiple index matches, then heap lookup

Hash Join

Joining using a hash table

Nested Loop

Joining row by row (good for small inner sets)

Merge Join

Joining pre-sorted data

cost=X..Y

Estimated startup..total cost (arbitrary units)

actual time=X..Y

Real execution time in milliseconds

rows=N

Estimated vs actual row count

When estimated rows ≠ actual rows — your table statistics are stale. Run:

ANALYZE posts;
-- Or update all tables:
ANALYZE;

Partial Indexes: Index Only What You Need

A partial index only includes rows matching a WHERE condition. Smaller, faster, and perfectly targeted:

-- 90% of queries only look at published posts
-- Index only those rows
CREATE INDEX idx_posts_pub_date ON posts(published_at DESC)
    WHERE status = 'published';

-- Only index non-deleted records
CREATE INDEX idx_authors_active ON authors(name)
    WHERE deleted_at IS NULL;

-- Index only recent posts (last 2 years) for the homepage
CREATE INDEX idx_posts_recent ON posts(published_at DESC)
    WHERE status = 'published'
      AND published_at > NOW() - INTERVAL '2 years';

Partial indexes can be 10-100x smaller than full-table indexes while covering 99% of real queries.

Composite Indexes and Column Order

A composite (multi-column) index can serve multiple query patterns, but column order matters enormously.

-- This index:
CREATE INDEX idx_posts_author_status ON posts(author_id, status);

-- Serves these queries:
--   WHERE author_id = 5                        ✅ (leading column match)
--   WHERE author_id = 5 AND status = 'published' ✅ (full index match)

-- But NOT these:
--   WHERE status = 'published'                 ❌ (non-leading column)

The "Equality First, Range Last" Rule

Put equality conditions (=) before range conditions (<, >, BETWEEN) in composite indexes:

-- Query: WHERE status = 'published' AND published_at > '2024-01-01'
-- Good index:
CREATE INDEX idx_posts_status_date ON posts(status, published_at);
--  status uses equality → put first
--  published_at uses range → put second

-- Inefficient index (range column first):
CREATE INDEX idx_posts_date_status ON posts(published_at, status);
-- PostgreSQL can use the range on published_at but can't narrow by status efficiently

Index-Only Scans and Covering Indexes

An index-only scan retrieves data solely from the index without touching the main table. This is extremely fast:

-- Query: SELECT title, published_at FROM posts WHERE status = 'published'
-- If index contains: (status, title, published_at), PostgreSQL never touches the heap

CREATE INDEX idx_posts_cover ON posts(status, published_at DESC)
    INCLUDE (title, author_id);
--  status, published_at in the index key (for seeking)
--  title, author_id INCLUDED (available for index-only scans)

INCLUDE columns (PostgreSQL 11+) don't affect sort order but allow index-only scans for those columns.

The Cost of Indexes

Indexes aren't free. Every index:

Slows down writes: INSERT, UPDATE, DELETE must update each index
Uses disk space: An index can be as large as the table itself
Needs maintenance: VACUUM and AUTOVACUUM spend time on indexes

-- Check index sizes
SELECT
    indexname,
    pg_size_pretty(pg_relation_size(indexname::regclass)) AS size,
    idx_scan        AS times_used,
    idx_tup_read    AS rows_returned
FROM pg_stat_user_indexes
WHERE relname = 'posts'
ORDER BY pg_relation_size(indexname::regclass) DESC;

When NOT to Add an Index

Tables with fewer than ~1,000 rows (seq scan is faster)
Columns with very low cardinality (e.g., boolean or status with 2 values in a small table)
Columns rarely used in WHERE, JOIN, or ORDER BY
Write-heavy tables where index maintenance outweighs read benefit

Query Optimisation Techniques

Avoid Functions on Indexed Columns in WHERE

-- WRONG: function on left side prevents index use
SELECT * FROM posts WHERE LOWER(title) = 'postgresql tips';

-- RIGHT option 1: use a functional index
CREATE INDEX idx_posts_title_lower ON posts(LOWER(title));
SELECT * FROM posts WHERE LOWER(title) = 'postgresql tips'; -- now uses index

-- RIGHT option 2: store data normalised
-- Store title always in lowercase at insert time

Use LIMIT Aggressively

-- Without LIMIT: sorts ALL published posts, returns 20
SELECT * FROM posts WHERE status = 'published' ORDER BY published_at DESC;

-- With LIMIT: PostgreSQL stops after finding 20
SELECT * FROM posts WHERE status = 'published' ORDER BY published_at DESC LIMIT 20;

Avoid `SELECT *` in Production

-- Pulls all columns (including large content TEXT)
SELECT * FROM posts WHERE status = 'published';

-- Only fetch what you need
SELECT id, title, published_at, views FROM posts WHERE status = 'published';

Use Connection Pooling

Individual connections are expensive. Use PgBouncer or application-level pooling:

-- Check current connections
SELECT count(*) FROM pg_stat_activity;
SELECT max_conn, used, res_available
FROM (
    SELECT COUNT(*) used FROM pg_stat_activity
) t,
(
    SELECT setting::int res_available FROM pg_settings WHERE name = 'max_connections'
) s,
(SELECT setting::int max_conn FROM pg_settings WHERE name = 'max_connections') mc;

Monitoring Index Usage

Find Unused Indexes (Index Bloat)

-- Indexes that have never been used since last stats reset
SELECT
    schemaname,
    tablename,
    indexname,
    pg_size_pretty(pg_relation_size(indexname::regclass)) AS size,
    idx_scan AS times_used
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND NOT indisprimary              -- not primary key
  AND NOT indisunique               -- not unique constraint
ORDER BY pg_relation_size(indexname::regclass) DESC;

Find Slow Queries

-- Enable pg_stat_statements extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Top 10 slowest queries
SELECT
    LEFT(query, 80)          AS query_snippet,
    calls,
    ROUND(mean_exec_time::numeric, 2)  AS avg_ms,
    ROUND(total_exec_time::numeric, 2) AS total_ms
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

Find Missing Indexes (Sequential Scans on Large Tables)

SELECT
    relname            AS table_name,
    seq_scan,
    seq_tup_read,
    idx_scan,
    n_live_tup         AS row_count
FROM pg_stat_user_tables
WHERE seq_scan > 0
  AND n_live_tup > 10000           -- only tables with significant data
ORDER BY seq_tup_read DESC
LIMIT 20;

High seq_tup_read on a large table = potential missing index.

What I Learned About Performance

The performance incident that broke my blog was entirely avoidable. The checklist I now follow for every new table:

When creating a table:

Add indexes on all foreign key columns
Add indexes on columns used in the most common WHERE clauses
Add composite index if queries frequently filter by 2+ columns together
Create partial index if only a subset of rows is queried

When a query feels slow:

Run EXPLAIN ANALYZE first—never guess
Look for Seq Scan on large tables
Check if estimated vs actual row counts diverge significantly (run ANALYZE)
Add the targeted index
Verify improvement with EXPLAIN ANALYZE again

Golden rules:

✅ Always index foreign keys
✅ Index columns in WHERE, JOIN ON, and ORDER BY
✅ Partial indexes for filtered queries
✅ Use INCLUDE for covering indexes (PostgreSQL 11+)
✅ Monitor with pg_stat_user_indexes and pg_stat_statements
❌ Don't add indexes "just in case"—measure first

Next Steps

With indexes and performance mastered, the next topic is ensuring your data stays consistent even when things go wrong: Transactions and ACID.

Part of the Database 101 Series

PreviousAdvanced Queries and Joins NextTransactions and ACID

Last updated 1 month ago

Indexes and Performance

Table of Contents

Introduction: The Day My Blog Became Unusable

What Is an Index?

How PostgreSQL Finds Data Without an Index

B-Tree Indexes: The Default Workhorse

Naming Convention

Other Index Types in PostgreSQL

Hash Index

GIN Index (Generalised Inverted Index)

GiST Index (Generalised Search Tree)

BRIN Index (Block Range Index)

Index Strategies for Real Queries

Always Index Foreign Keys

Index Columns Used in WHERE Clauses

Index Columns Used in ORDER BY

EXPLAIN and EXPLAIN ANALYZE

Reading EXPLAIN Output

Partial Indexes: Index Only What You Need

Composite Indexes and Column Order

The "Equality First, Range Last" Rule

Index-Only Scans and Covering Indexes

The Cost of Indexes

When NOT to Add an Index

Query Optimisation Techniques

Avoid Functions on Indexed Columns in WHERE

Use LIMIT Aggressively

Avoid `SELECT *` in Production

Use Connection Pooling

Monitoring Index Usage

Find Unused Indexes (Index Bloat)

Find Slow Queries

Find Missing Indexes (Sequential Scans on Large Tables)

What I Learned About Performance

Next Steps

hashtagTable of Contents

hashtagIntroduction: The Day My Blog Became Unusable

hashtagWhat Is an Index?

hashtagHow PostgreSQL Finds Data Without an Index

hashtagB-Tree Indexes: The Default Workhorse

hashtagNaming Convention

hashtagOther Index Types in PostgreSQL

hashtagHash Index

hashtagGIN Index (Generalised Inverted Index)

hashtagGiST Index (Generalised Search Tree)

hashtagBRIN Index (Block Range Index)

hashtagIndex Strategies for Real Queries

hashtagAlways Index Foreign Keys

hashtagIndex Columns Used in WHERE Clauses

hashtagIndex Columns Used in ORDER BY

hashtagEXPLAIN and EXPLAIN ANALYZE

hashtagReading EXPLAIN Output

hashtagPartial Indexes: Index Only What You Need

hashtagComposite Indexes and Column Order

hashtagThe "Equality First, Range Last" Rule

hashtagIndex-Only Scans and Covering Indexes

hashtagThe Cost of Indexes

hashtagWhen NOT to Add an Index

hashtagQuery Optimisation Techniques

hashtagAvoid Functions on Indexed Columns in WHERE

hashtagUse LIMIT Aggressively

hashtagAvoid SELECT * in Production

hashtagUse Connection Pooling

hashtagMonitoring Index Usage

hashtagFind Unused Indexes (Index Bloat)

hashtagFind Slow Queries

hashtagFind Missing Indexes (Sequential Scans on Large Tables)

hashtagWhat I Learned About Performance

hashtagNext Steps

Table of Contents

Introduction: The Day My Blog Became Unusable

What Is an Index?

How PostgreSQL Finds Data Without an Index

B-Tree Indexes: The Default Workhorse

Naming Convention

Other Index Types in PostgreSQL

Hash Index

GIN Index (Generalised Inverted Index)

GiST Index (Generalised Search Tree)

BRIN Index (Block Range Index)

Index Strategies for Real Queries

Always Index Foreign Keys

Index Columns Used in WHERE Clauses

Index Columns Used in ORDER BY

EXPLAIN and EXPLAIN ANALYZE

Reading EXPLAIN Output

Partial Indexes: Index Only What You Need

Composite Indexes and Column Order

The "Equality First, Range Last" Rule

Index-Only Scans and Covering Indexes

The Cost of Indexes

When NOT to Add an Index

Query Optimisation Techniques

Avoid Functions on Indexed Columns in WHERE

Use LIMIT Aggressively

Avoid `SELECT *` in Production

Use Connection Pooling

Monitoring Index Usage

Find Unused Indexes (Index Bloat)

Find Slow Queries

Find Missing Indexes (Sequential Scans on Large Tables)

What I Learned About Performance

Next Steps