Advanced Queries and Joins

My journey from simple SELECTs to powerful multi-table mastery


Table of Contents


Introduction: The Query That Made Me Realise I Knew Nothing

I thought I had SQL figured out. I could do SELECT, INSERT, UPDATE, DELETE. I knew WHERE and GROUP BY. My blog was running, queries were returning data. Life was good.

Then a user asked: "Can you show me the top 5 most commented posts from the last 30 days, but only from authors who have published at least 3 posts this year, ranked by a combination of comments and views?"

I sat down to write the query and froze. I could see the data in my headβ€”it was all there in the databaseβ€”but I had no idea how to express that question in SQL.

I cobbled together something with five separate queries, pulled the results into Python, and filtered them manually. It worked. It was also 200 lines of code and took 3 seconds to run.

A senior engineer on my team looked at it and rewrote it as a single, elegant 20-line SQL query using CTEs and window functions. It ran in 12 milliseconds.

That was the day I committed to learning advanced SQL. This article is everything I wish someone had shown me back then.


Recap: JOIN Types Refresher

Before diving in, let's quickly visualise the JOIN types we covered in the previous article:

spinner

We have the following blog tables to work with throughout this article:


Advanced JOIN Techniques

Self JOIN

A self join joins a table to itself. Useful when rows have parent-child relationships.

Imagine our authors table has a mentor_id column pointing to another author:

Bob is a mentor but has no mentor himself (senior author).

Multiple JOINs in One Query

Real queries often join 3, 4, or more tables:

Key insight: Use LEFT JOIN for optional relationships (a post may have no categories or no comments). Use INNER JOIN when the relationship is mandatory.

LATERAL JOIN

LATERAL allows a subquery on the right-hand side to reference columns from tables on the left. Think of it as a correlated subquery in a JOIN position:

Without LATERAL, this would require a complex window function or multiple queries. LATERAL makes it elegant.


Subqueries: Queries Within Queries

Scalar Subquery

Returns a single value, usable anywhere an expression is expected:

⚠️ Performance note: Correlated scalar subqueries execute once per row. For large tables, use JOINs or CTEs instead.

IN and EXISTS Subqueries

EXISTS is generally faster than IN because it short-circuits on the first match.

Derived Tables (Subquery in FROM)

Derived tables are great but can only be referenced once. CTEs are better for reuse.


Common Table Expressions (CTEs)

CTEs (introduced with WITH) make complex queries readable and allow reuse of intermediate results.

Basic CTE

Chained CTEs

You can reference earlier CTEs in later ones:

Recursive CTEs

Powerful for hierarchical data (categories with subcategories, comment threads, org charts):


Window Functions

Window functions perform calculations across a set of rows related to the current rowβ€”without collapsing them into groups like GROUP BY does.

spinner

RANK and ROW_NUMBER

Difference: RANK() gives ties the same rank (1, 1, 3...). ROW_NUMBER() always gives unique sequential numbers (1, 2, 3...). DENSE_RANK() gives ties the same rank without gaps (1, 1, 2...).

Running Totals and Moving Averages

LAG and LEAD

Access values from previous or next rows:

FIRST_VALUE and LAST_VALUE


UNION, INTERSECT, and EXCEPT

These set operators combine results from multiple SELECT statements.

Rules for set operators:

  • Same number of columns in both SELECT statements

  • Corresponding columns must have compatible data types

  • Column names come from the first SELECT


Advanced Filtering and Aggregation

FILTER Clause

Apply aggregate functions conditionally without pivoting or CASE expressions:

GROUPING SETS, ROLLUP, CUBE

Generate multiple levels of aggregation in one query:

ROLLUP(a, b) generates: (a, b), (a), () β€” subtotals and grand total. CUBE(a, b) generates all combinations: (a, b), (a), (b), ().


Full-Text Search in PostgreSQL

PostgreSQL has built-in full-text search that's far more powerful than LIKE:


Query Patterns for the Blog System

Pattern 1: Dashboard Overview

Pattern 2: Author Leaderboard


Common Mistakes and How to Avoid Them

1. N+1 Query Problem

❌ Wrong: Fetching posts then querying comments for each post individually (N+1 queries).

βœ… Right: Use a JOIN or a single IN query:

2. Forgetting NULL Handling

3. HAVING vs WHERE Confusion

4. Implicit vs Explicit JOINs


Performance Implications

Technique
Performance Notes

EXISTS vs IN

EXISTS is usually faster for large subqueries

CTE vs Subquery

PostgreSQL 12+ optimises CTEs like subqueries by default

Window Functions

Efficient but require full sort; add appropriate indexes

UNION ALL vs UNION

UNION ALL is faster (skips deduplication)

Recursive CTE

Can be slow for deep hierarchies; add depth limits

Full-text search

Requires GIN index; LIKE '%text%' doesn't use B-tree indexes

Always verify with EXPLAIN ANALYZE:


What I Learned About Advanced Queries

Looking back at that original complex question that stumped me, I now know exactly how to approach it:

  1. Break it down into CTEs β€” each CTE answers one sub-question

  2. Use window functions for ranking and running calculations without collapsing rows

  3. Combine set operations for union/difference logic

  4. Validate with EXPLAIN ANALYZE before deploying to production

The biggest shift was realising that SQL is a declarative languageβ€”you describe what you want, not how to get it. Once I stopped thinking in loops and started thinking in sets and relations, complex queries became intuitive.

Key principles:

  • βœ… Favour JOINs over correlated subqueries for performance

  • βœ… Use CTEs to make queries readable and maintainable

  • βœ… EXISTS beats IN for large datasets

  • βœ… Window functions replace many application-level calculations

  • βœ… Always test with EXPLAIN ANALYZE


Next Steps

With advanced queries under your belt, the next frontier is Indexes and Performanceβ€”understanding how PostgreSQL finds your data and how to make it dramatically faster.


Part of the Database 101 Series

Last updated