# Transactions and ACID

*My journey from data corruption disasters to bulletproof consistency*

***

## Table of Contents

1. [Introduction: The Partial Update That Cost Me a Weekend](#introduction-the-partial-update-that-cost-me-a-weekend)
2. [What Is a Transaction?](#what-is-a-transaction)
3. [ACID Properties Explained](#acid-properties-explained)
4. [Basic Transaction Syntax](#basic-transaction-syntax)
5. [Savepoints: Partial Rollbacks](#savepoints-partial-rollbacks)
6. [Transaction Isolation Levels](#transaction-isolation-levels)
7. [Concurrency Problems and How Isolation Levels Solve Them](#concurrency-problems-and-how-isolation-levels-solve-them)
8. [Locking in PostgreSQL](#locking-in-postgresql)
9. [Deadlocks: What They Are and How to Prevent Them](#deadlocks-what-they-are-and-how-to-prevent-them)
10. [Practical Transaction Patterns](#practical-transaction-patterns)
11. [Advisory Locks for Application-Level Coordination](#advisory-locks-for-application-level-coordination)
12. [Two-Phase Commit (Distributed Transactions)](#two-phase-commit-distributed-transactions)
13. [Common Transaction Mistakes](#common-transaction-mistakes)
14. [What I Learned About Transactions](#what-i-learned-about-transactions)
15. [Next Steps](#next-steps)

***

## Introduction: The Partial Update That Cost Me a Weekend

My blog had a "publish post" workflow: set `status = 'published'`, update `published_at`, and send a notification email. I'd written it as three separate database calls in my Python code:

```python
# My original code - DO NOT DO THIS
db.execute("UPDATE posts SET status = 'published' WHERE id = %s", [post_id])
db.execute("UPDATE posts SET published_at = NOW() WHERE id = %s", [post_id])
send_notification_email(post_id)  # This failed halfway through one night
```

One night, the email server was unavailable. The `send_notification_email` call threw an exception. The post was now `published` with a `published_at` timestamp, but no notification was sent. Fine—I could resend the email.

But then I noticed something worse: the email server failure had caused my code to exit before updating a `post_metadata` table. Twenty posts had their `status` as published in one table but their metadata was out of sync. My homepage query joined both tables and returned broken results.

I spent a weekend writing a repair script to find and fix all the inconsistent rows.

The fix was simple: **transactions**. All related operations should either **all succeed** or **all fail**. There's no acceptable in-between.

***

## What Is a Transaction?

A transaction is a sequence of SQL operations treated as a single unit of work. It either commits (all changes applied) or rolls back (all changes undone).

{% @mermaid/diagram content="sequenceDiagram
participant App
participant DB

```
App->>DB: BEGIN
App->>DB: UPDATE posts SET status='published'
App->>DB: UPDATE post_metadata SET ...
App->>DB: INSERT INTO notifications ...

alt All succeed
    App->>DB: COMMIT
    DB-->>App: All changes permanent
else Any failure
    App->>DB: ROLLBACK
    DB-->>App: All changes undone
end" %}
```

In PostgreSQL, **every SQL statement is implicitly wrapped in a transaction** unless you use explicit `BEGIN`/`COMMIT`. A single `UPDATE` without `BEGIN` is auto-committed immediately.

***

## ACID Properties Explained

ACID is the gold standard for reliable database transactions. Each property addresses a specific failure scenario.

### Atomicity: All or Nothing

Either all operations in a transaction succeed, or none of them do.

```sql
BEGIN;

UPDATE posts SET status = 'published', published_at = NOW()
WHERE id = 42;

UPDATE authors SET post_count = post_count + 1
WHERE id = (SELECT author_id FROM posts WHERE id = 42);

INSERT INTO activity_log (action, entity_id, entity_type, created_at)
VALUES ('post_published', 42, 'post', NOW());

COMMIT;  -- All three happen together, or none do
```

If any statement fails between `BEGIN` and `COMMIT`, PostgreSQL automatically rolls back all prior changes in that transaction.

### Consistency: Valid State to Valid State

A transaction moves the database from one valid state to another. Constraints (NOT NULL, FOREIGN KEY, UNIQUE, CHECK) are enforced at commit time.

```sql
BEGIN;

-- This would violate a NOT NULL constraint on published_at
UPDATE posts SET published_at = NULL WHERE id = 42;

COMMIT;  -- ERROR: null value in column "published_at" violates NOT NULL constraint
-- Entire transaction rolled back
```

### Isolation: Concurrent Transactions Don't See Each Other's In-Progress Work

Two concurrent users editing different posts won't interfere with each other. The database behaves as if transactions run serially (to the degree specified by the isolation level).

### Durability: Committed Data Survives Crashes

Once PostgreSQL returns "COMMIT", the data is persisted to disk. A power failure immediately after won't lose committed data—PostgreSQL uses a Write-Ahead Log (WAL) to guarantee this.

***

## Basic Transaction Syntax

```sql
-- Explicit transaction
BEGIN;
    -- Your SQL statements here
COMMIT;

-- Equivalent syntax
START TRANSACTION;
    -- Your SQL statements here
COMMIT;

-- Rollback on error
BEGIN;
    UPDATE posts SET views = views + 1 WHERE id = 42;
    UPDATE analytics SET total_views = total_views + 1;
    -- Simulated error
    SELECT 1/0;  -- Division by zero
ROLLBACK;  -- Or auto-rolled back on error in most clients
```

### Transaction in psql

```sql
-- In psql, check if you're in a transaction
\echo :AUTOCOMMIT

-- Disable autocommit
\set AUTOCOMMIT off

-- All statements now require explicit COMMIT
UPDATE posts SET views = views + 1 WHERE id = 1;
-- <nothing saved yet>
COMMIT;
-- <saved now>
```

***

## Savepoints: Partial Rollbacks

Savepoints let you roll back to a specific point within a transaction without losing earlier work:

```sql
BEGIN;

-- Step 1: publish the post
UPDATE posts SET status = 'published', published_at = NOW() WHERE id = 42;

SAVEPOINT post_published;

-- Step 2: try to send notification (might fail)
INSERT INTO notifications (user_id, message, post_id)
SELECT follower_id, 'New post published', 42
FROM followers
WHERE author_id = (SELECT author_id FROM posts WHERE id = 42);

-- If step 2 fails, roll back only to the savepoint (keep step 1)
-- ROLLBACK TO SAVEPOINT post_published;

SAVEPOINT notifications_sent;

-- Step 3: update analytics
UPDATE post_stats SET status_changed_at = NOW() WHERE post_id = 42;

COMMIT;  -- All three steps committed
```

This is useful in batch processing—you can retry individual items without losing the entire batch.

***

## Transaction Isolation Levels

PostgreSQL supports four isolation levels, each with different trade-offs between consistency and concurrency:

| Isolation Level          | Dirty Read  | Non-Repeatable Read | Phantom Read  |
| ------------------------ | ----------- | ------------------- | ------------- |
| READ UNCOMMITTED         | ✅ Possible  | ✅ Possible          | ✅ Possible    |
| READ COMMITTED (default) | ❌ Prevented | ✅ Possible          | ✅ Possible    |
| REPEATABLE READ          | ❌ Prevented | ❌ Prevented         | ❌ Prevented\* |
| SERIALIZABLE             | ❌ Prevented | ❌ Prevented         | ❌ Prevented   |

\*PostgreSQL's MVCC prevents phantom reads at REPEATABLE READ level.

### Setting Isolation Level

```sql
-- For a single transaction
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
    SELECT * FROM posts WHERE status = 'published';
    -- ... do work ...
COMMIT;

-- For the current session
SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL SERIALIZABLE;
```

### When to Use Each Level

* **READ COMMITTED** (default): Suitable for most OLTP workloads
* **REPEATABLE READ**: Reporting/analytics queries that must see a consistent snapshot
* **SERIALIZABLE**: Financial operations, booking systems, anything requiring strict correctness

***

## Concurrency Problems and How Isolation Levels Solve Them

### Dirty Read

Reading uncommitted data from another transaction:

```sql
-- Transaction A (not yet committed)
BEGIN;
UPDATE posts SET views = 9999 WHERE id = 1;
-- Not committed yet

-- Transaction B (READ UNCOMMITTED would see 9999)
SELECT views FROM posts WHERE id = 1;
-- READ COMMITTED sees the original value (PostgreSQL's default)
```

PostgreSQL's MVCC prevents dirty reads even at READ UNCOMMITTED—they're effectively impossible in PostgreSQL.

### Non-Repeatable Read

Reading the same row twice in one transaction and getting different values:

```sql
-- Transaction A (READ COMMITTED level)
BEGIN;
SELECT views FROM posts WHERE id = 1;  -- Returns 100

-- Transaction B commits a change
UPDATE posts SET views = 200 WHERE id = 1;
COMMIT;

-- Transaction A reads again
SELECT views FROM posts WHERE id = 1;  -- Now returns 200 (non-repeatable!)
COMMIT;

-- Solution: Use REPEATABLE READ or SERIALIZABLE
```

### Phantom Read

A re-query returns new rows that didn't exist before:

```sql
-- Transaction A (REPEATABLE READ or lower)
BEGIN;
SELECT COUNT(*) FROM posts WHERE status = 'published';  -- Returns 50

-- Transaction B inserts a new published post and commits
INSERT INTO posts (...) VALUES (...);
COMMIT;

-- Transaction A counts again
SELECT COUNT(*) FROM posts WHERE status = 'published';  -- Still 50 with REPEATABLE READ
-- With READ COMMITTED would show 51 (phantom!)
COMMIT;
```

***

## Locking in PostgreSQL

PostgreSQL uses MVCC (Multi-Version Concurrency Control) to allow readers and writers to coexist without blocking each other. But explicit locks are sometimes needed.

### Row-Level Locks

```sql
-- SELECT FOR UPDATE: lock rows for update (block other updates)
BEGIN;
SELECT * FROM posts WHERE id = 42 FOR UPDATE;
-- Other transactions trying to update post 42 will wait
UPDATE posts SET views = views + 1 WHERE id = 42;
COMMIT;

-- SELECT FOR UPDATE SKIP LOCKED: skip rows locked by others
-- Useful for job queues
BEGIN;
SELECT id, title
FROM posts
WHERE status = 'pending_review'
ORDER BY created_at
LIMIT 1
FOR UPDATE SKIP LOCKED;  -- Skip posts being reviewed by someone else
COMMIT;

-- SELECT FOR SHARE: allow other readers but block writers
BEGIN;
SELECT * FROM posts WHERE id = 42 FOR SHARE;
COMMIT;
```

### Table-Level Locks

```sql
-- Explicit table lock (rarely needed; MVCC handles most cases)
BEGIN;
LOCK TABLE posts IN SHARE ROW EXCLUSIVE MODE;
-- ... operations ...
COMMIT;
```

PostgreSQL automatically acquires appropriate locks for `INSERT`, `UPDATE`, `DELETE`.

***

## Deadlocks: What They Are and How to Prevent Them

A deadlock occurs when two transactions each hold a lock the other needs:

{% @mermaid/diagram content="sequenceDiagram
participant T1 as Transaction 1
participant PostA as posts (id=1)
participant PostB as posts (id=2)
participant T2 as Transaction 2

```
T1->>PostA: LOCK (id=1)
T2->>PostB: LOCK (id=2)
T1->>PostB: Wait for lock...
T2->>PostA: Wait for lock...
Note over T1,T2: DEADLOCK! PostgreSQL detects and kills one transaction" %}
```

```sql
-- Transaction 1                    -- Transaction 2
BEGIN;                               BEGIN;
UPDATE posts SET views=1 WHERE id=1; UPDATE posts SET views=2 WHERE id=2;
-- holds lock on id=1               -- holds lock on id=2
UPDATE posts SET views=1 WHERE id=2; UPDATE posts SET views=2 WHERE id=1;
-- waiting for id=2...              -- waiting for id=1...
-- DEADLOCK DETECTED
-- PostgreSQL kills one transaction with:
-- ERROR: deadlock detected
```

### Preventing Deadlocks

1. **Always acquire locks in the same order** across your application:

```sql
-- Always lock posts in ascending ID order
BEGIN;
SELECT * FROM posts WHERE id IN (1, 2) ORDER BY id FOR UPDATE;
-- Process both
COMMIT;
```

2. **Use `NOWAIT` to fail immediately** rather than waiting:

```sql
BEGIN;
SELECT * FROM posts WHERE id = 42 FOR UPDATE NOWAIT;
-- ERROR: could not obtain lock on row in relation "posts"
-- Handle the error in application code, retry later
COMMIT;
```

3. **Keep transactions short** to reduce lock contention windows.

***

## Practical Transaction Patterns

### Publishing a Post (Atomically)

```sql
BEGIN;

-- Update post status
UPDATE posts
SET status = 'published', published_at = NOW()
WHERE id = $1 AND status = 'draft'
RETURNING id, author_id, title;

-- Update author's published post count
UPDATE authors
SET published_post_count = published_post_count + 1
WHERE id = (SELECT author_id FROM posts WHERE id = $1);

-- Create activity log entry
INSERT INTO activity_log (action, entity_type, entity_id, created_at)
VALUES ('post_published', 'post', $1, NOW());

COMMIT;
```

### Safe Counter Increment (No Race Condition)

```sql
-- Wrong: read-modify-write in application code (race condition!)
-- value = SELECT views FROM posts WHERE id = 42;  -- 100
-- UPDATE posts SET views = value + 1 WHERE id = 42;  -- might conflict

-- Right: let the database do it atomically
UPDATE posts SET views = views + 1 WHERE id = 42;
-- PostgreSQL serialises concurrent updates to the same row automatically
```

### Transfer/Move Between Entities

```sql
BEGIN;

-- Move a post to a different author
INSERT INTO post_history (post_id, old_author_id, new_author_id, changed_at)
SELECT id, author_id, $new_author_id, NOW()
FROM posts
WHERE id = $post_id;

UPDATE posts
SET author_id = $new_author_id
WHERE id = $post_id;

UPDATE authors SET post_count = post_count - 1 WHERE id = $old_author_id;
UPDATE authors SET post_count = post_count + 1 WHERE id = $new_author_id;

COMMIT;
```

***

## Advisory Locks for Application-Level Coordination

Advisory locks are PostgreSQL locks that your application controls directly—not tied to any specific table row:

```sql
-- Acquire a session-level advisory lock (number is arbitrary application key)
SELECT pg_advisory_lock(42);

-- ... exclusive operation (e.g., scheduled job) ...

-- Release it
SELECT pg_advisory_unlock(42);

-- Try-lock (non-blocking): returns true if locked, false if already taken
SELECT pg_try_advisory_lock(42);

-- Transaction-level (auto-released on commit/rollback)
BEGIN;
SELECT pg_advisory_xact_lock(42);
-- ... work ...
COMMIT;  -- lock released automatically
```

Useful for distributed scheduled jobs where only one instance should run at a time.

***

## Two-Phase Commit (Distributed Transactions)

For operations spanning multiple databases or servers:

```sql
-- Phase 1: Prepare (writes to WAL but doesn't commit)
BEGIN;
UPDATE posts SET status = 'published' WHERE id = 42;
PREPARE TRANSACTION 'publish_post_42';

-- Phase 2: Commit from coordinator
COMMIT PREPARED 'publish_post_42';

-- Or rollback if something went wrong on another system
ROLLBACK PREPARED 'publish_post_42';

-- View pending prepared transactions
SELECT * FROM pg_prepared_xacts;
```

Two-phase commit is rarely needed in single-database applications but is essential for distributed systems.

***

## Common Transaction Mistakes

### 1. Transactions That Are Too Long

```sql
-- WRONG: opening a transaction, then doing slow operations
BEGIN;
SELECT * FROM posts;  -- fast
-- ... send emails, call external APIs, wait for user input ...
-- 30 seconds later...
COMMIT;
-- Held locks for 30 seconds, blocking other transactions
```

✅ Keep transactions as short as possible. Prepare data outside the transaction, then use a short transaction just for the database writes.

### 2. Forgetting to Handle Errors

If an error occurs inside a transaction in PostgreSQL, the transaction is **aborted**. Any further statements will fail until you `ROLLBACK`:

```sql
BEGIN;
UPDATE posts SET views = views + 1 WHERE id = 42;
SELECT 1/0;  -- ERROR: division by zero
-- Transaction is now ABORTED
UPDATE posts SET views = views + 1 WHERE id = 43;
-- ERROR: current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;  -- Must rollback before doing anything else
```

### 3. Auto-Commit Confusion

In `psql` and most clients, `AUTOCOMMIT` is on by default. This means every statement is its own transaction:

```sql
-- With AUTOCOMMIT=on (default in psql)
UPDATE posts SET status = 'published' WHERE id = 42;
-- Committed immediately! No chance to rollback.

-- To use explicit transactions, either:
-- 1. Use BEGIN...COMMIT
-- 2. Or set \set AUTOCOMMIT off in psql
```

***

## What I Learned About Transactions

The weekend I spent fixing my inconsistent database taught me that **transactions are not optional**. Any operation that modifies more than one row or more than one table must be wrapped in a transaction.

My mental checklist now:

**Always use a transaction when:**

* [ ] Updating multiple related tables
* [ ] The failure of step 2 would leave step 1's changes in an invalid state
* [ ] Processing a batch of records where partial success is worse than total failure
* [ ] Implementing a workflow with multiple steps

**Best practices:**

* ✅ Keep transactions short—do work outside, commit quickly
* ✅ Use SAVEPOINTS for complex batch operations with partial rollback needs
* ✅ Use `SELECT ... FOR UPDATE` to prevent lost updates
* ✅ Use `SKIP LOCKED` for queue processing patterns
* ✅ Always acquire locks in a consistent order to prevent deadlocks
* ❌ Never hold a transaction open waiting for user input or external API calls

***

## Next Steps

With data integrity secured through transactions, let's explore **Views, Functions, and Stored Procedures**—powerful tools for encapsulating and reusing database logic.

* [**→ Next: Views, Functions, and Stored Procedures**](https://blog.htunnthuthu.com/getting-started/programming/database-101/database-101-views-functions-stored-procedures)
* [**← Previous: Indexes and Performance**](https://blog.htunnthuthu.com/getting-started/programming/database-101/database-101-indexes-and-performance)

***

*Part of the* [*Database 101 Series*](https://blog.htunnthuthu.com/getting-started/programming/database-101)
