Article 2: pgvector on PostgreSQL — Setup, Vector Types, and Indexes

Introduction

pgvector is a PostgreSQL extension that adds a native vector data type and similarity search operators. It's the difference between needing a separate vector database service and storing your embeddings in the same PostgreSQL instance you're already running.

This article covers installing pgvector, the vector column type, the two index types (HNSW and IVFFlat), and the SQLAlchemy 2 async schema I use in the RAG service. I also explain my index choice and why I made it.

Installing pgvector

Docker Compose (what I run)

The easiest setup is pgvector/pgvector:pg16 — a pre-built image with the extension already installed:

# config/docker-compose.yml
services:
  postgres:
    image: pgvector/pgvector:pg16
    environment:
      POSTGRES_USER: rag
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_DB: rag_db
    volumes:
      - pgdata:/var/lib/postgresql/data
    ports:
      - "5432:5432"
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U rag -d rag_db"]
      interval: 10s
      timeout: 5s
      retries: 5

volumes:
  pgdata:

Once the container starts, enable the extension in the target database:

CREATE EXTENSION IF NOT EXISTS vector;

Alembic handles this in the initial migration (covered below), so I don't run it manually.

On an existing PostgreSQL instance

If you're adding pgvector to an existing PostgreSQL 16 installation on Debian/Ubuntu:

# Install the pgvector extension package
apt install postgresql-16-pgvector

# Then in psql for your database
CREATE EXTENSION IF NOT EXISTS vector;

The vector Data Type

vector(n) stores a fixed-dimension floating-point array. The dimension must match the embedding model you use:

Model

Dimensions

all-MiniLM-L6-v2 (sentence-transformers)

384

text-embedding-3-small (OpenAI / GitHub Models)

1536

text-embedding-3-large (OpenAI / GitHub Models)

3072

togethercomputer/m2-bert-80M-8k-retrieval

768

The dimension is fixed at column definition time. In the RAG service I use 384 (local model) by default and make it configurable via an environment variable for when I want to switch to an API-based model.

-- A chunks table with a 384-dimension embedding column
CREATE TABLE chunks (
    id          BIGSERIAL PRIMARY KEY,
    document_id BIGINT REFERENCES documents(id) ON DELETE CASCADE,
    chunk_index INTEGER NOT NULL,
    content     TEXT NOT NULL,
    tokens      INTEGER,
    embedding   vector(384),
    created_at  TIMESTAMPTZ DEFAULT NOW()
);

Similarity Operators

pgvector provides three distance operators:

Operator

Distance Metric

Use Case

<=>

Cosine distance

Most embedding models (normalized vectors)

<->

Euclidean (L2) distance

When vectors are not normalized

<#>

Negative inner product

When using dot-product similarity

For sentence-transformer embeddings I always use cosine distance (<=>). The models produce normalized vectors, and cosine similarity correctly captures directional closeness (same meaning) vs. magnitude.

-- Top 5 most similar chunks to a query embedding
SELECT
    c.id,
    c.content,
    d.file_path,
    1 - (c.embedding <=> '[0.021, -0.043, ...]'::vector) AS similarity
FROM chunks c
JOIN documents d ON d.id = c.document_id
ORDER BY c.embedding <=> '[0.021, -0.043, ...]'::vector
LIMIT 5;

1 - cosine_distance converts to cosine similarity (higher = more similar). I return this as a score in the API response so callers can apply a minimum-similarity threshold.

Index Types: HNSW vs IVFFlat

Without an index, pgvector does an exact sequential scan of every row. For a few thousand chunks this is fast enough, but with tens of thousands of chunks (or more) you want an approximate nearest-neighbor (ANN) index.

HNSW (Hierarchical Navigable Small World)

CREATE INDEX ON chunks USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

Builds a multi-layer graph. Query traverses from the top layer down to approximate neighbors.
Pros: High recall, fast queries, no need to specify the number of clusters up front
Cons: Higher memory usage during build; INSERT is slower than IVFFlat because the graph must be updated on every new row
Best for: Real-time ingestion with frequent inserts

IVFFlat (Inverted File with Flat Quantization)

-- Must have data before building IVFFlat (needs to choose centroids)
CREATE INDEX ON chunks USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

Clusters vectors into lists Voronoi cells at build time; queries search probes cells.
Pros: Faster build, lower memory
Cons: Must rebuild when data distribution changes significantly; needs existing data to create
Best for: Batch-ingested, relatively static corpora

I use HNSW. My ingestion pipeline runs incrementally — new articles are added continuously. HNSW handles inserts cleanly without requiring a full index rebuild. The memory overhead is not a concern on any machine running PostgreSQL.

Database Schema for RAG

Three tables: documents, chunks, and a ingestion_jobs tracking table.

-- Documents: one row per source file
CREATE TABLE documents (
    id          BIGSERIAL PRIMARY KEY,
    file_path   TEXT NOT NULL UNIQUE,  -- e.g. "artificial-intelligence/aiops-101/README.md"
    file_hash   TEXT NOT NULL,         -- SHA-256 of file content, for change detection
    title       TEXT,
    source_type TEXT DEFAULT 'markdown',
    char_count  INTEGER,
    ingested_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at  TIMESTAMPTZ DEFAULT NOW()
);

-- Chunks: sections of documents after splitting
CREATE TABLE chunks (
    id            BIGSERIAL PRIMARY KEY,
    document_id   BIGINT NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
    chunk_index   INTEGER NOT NULL,    -- Position within document
    content       TEXT NOT NULL,       -- Raw text of the chunk
    tokens        INTEGER,             -- Approximate token count
    embedding     vector(384),         -- Null until embedded
    embedding_model TEXT DEFAULT 'all-MiniLM-L6-v2',
    created_at    TIMESTAMPTZ DEFAULT NOW(),
    UNIQUE (document_id, chunk_index)
);

CREATE INDEX ON chunks USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Also a full-text index for hybrid search
CREATE INDEX ON chunks USING gin(to_tsvector('english', content));

-- Ingestion jobs: track async ingestion status
CREATE TABLE ingestion_jobs (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    status      TEXT NOT NULL DEFAULT 'pending',  -- pending/running/done/failed
    file_path   TEXT,
    file_count  INTEGER,
    chunks_created INTEGER DEFAULT 0,
    error       TEXT,
    started_at  TIMESTAMPTZ DEFAULT NOW(),
    finished_at TIMESTAMPTZ
);

The file_hash column on documents is how I detect changes. Before re-ingesting a file, the pipeline computes SHA-256 of the current content and compares it against the stored hash. If they match, the file hasn't changed and re-ingestion is skipped.

SQLAlchemy 2 Async Setup

# src/db/base.py
from sqlalchemy.ext.asyncio import create_async_engine, async_sessionmaker, AsyncSession
from sqlalchemy.orm import DeclarativeBase
from src.core.config import settings

engine = create_async_engine(
    settings.database_url,          # postgresql+asyncpg://...
    pool_size=10,
    max_overflow=20,
    pool_pre_ping=True,
    echo=settings.db_echo,
)

AsyncSessionLocal = async_sessionmaker(
    bind=engine,
    class_=AsyncSession,
    expire_on_commit=False,
)

class Base(DeclarativeBase):
    pass

async def get_db() -> AsyncSession:
    async with AsyncSessionLocal() as session:
        yield session

# src/db/models.py
from __future__ import annotations
from datetime import datetime
from sqlalchemy import BigInteger, Integer, Text, DateTime, ForeignKey, String, UniqueConstraint
from sqlalchemy.orm import Mapped, mapped_column, relationship
from sqlalchemy.sql import func
from pgvector.sqlalchemy import Vector
from src.db.base import Base

class Document(Base):
    __tablename__ = "documents"

    id:           Mapped[int]      = mapped_column(BigInteger, primary_key=True)
    file_path:    Mapped[str]      = mapped_column(Text, unique=True, nullable=False)
    file_hash:    Mapped[str]      = mapped_column(Text, nullable=False)
    title:        Mapped[str | None] = mapped_column(Text)
    source_type:  Mapped[str]      = mapped_column(Text, default="markdown")
    char_count:   Mapped[int | None] = mapped_column(Integer)
    ingested_at:  Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=func.now())
    updated_at:   Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=func.now(), onupdate=func.now())

    chunks: Mapped[list[Chunk]] = relationship("Chunk", back_populates="document", cascade="all, delete-orphan")


class Chunk(Base):
    __tablename__ = "chunks"
    __table_args__ = (UniqueConstraint("document_id", "chunk_index"),)

    id:               Mapped[int]        = mapped_column(BigInteger, primary_key=True)
    document_id:      Mapped[int]        = mapped_column(BigInteger, ForeignKey("documents.id", ondelete="CASCADE"), nullable=False)
    chunk_index:      Mapped[int]        = mapped_column(Integer, nullable=False)
    content:          Mapped[str]        = mapped_column(Text, nullable=False)
    tokens:           Mapped[int | None] = mapped_column(Integer)
    embedding:        Mapped[list[float] | None] = mapped_column(Vector(384))
    embedding_model:  Mapped[str]        = mapped_column(String(128), default="all-MiniLM-L6-v2")
    created_at:       Mapped[datetime]   = mapped_column(DateTime(timezone=True), server_default=func.now())

    document: Mapped[Document] = relationship("Document", back_populates="chunks")

pgvector.sqlalchemy.Vector is the SQLAlchemy column type provided by the pgvector Python package. It maps directly to the vector(n) PostgreSQL type.

Alembic Migration

The initial migration creates the extension and all tables:

# alembic/versions/0001_initial.py
from alembic import op
import sqlalchemy as sa
from pgvector.sqlalchemy import Vector

revision = '0001'
down_revision = None

def upgrade() -> None:
    # Enable pgvector extension
    op.execute("CREATE EXTENSION IF NOT EXISTS vector")

    op.create_table(
        "documents",
        sa.Column("id",          sa.BigInteger(), primary_key=True),
        sa.Column("file_path",   sa.Text(),   nullable=False, unique=True),
        sa.Column("file_hash",   sa.Text(),   nullable=False),
        sa.Column("title",       sa.Text()),
        sa.Column("source_type", sa.Text(),   server_default="markdown"),
        sa.Column("char_count",  sa.Integer()),
        sa.Column("ingested_at", sa.DateTime(timezone=True), server_default=sa.func.now()),
        sa.Column("updated_at",  sa.DateTime(timezone=True), server_default=sa.func.now()),
    )

    op.create_table(
        "chunks",
        sa.Column("id",              sa.BigInteger(), primary_key=True),
        sa.Column("document_id",     sa.BigInteger(), sa.ForeignKey("documents.id", ondelete="CASCADE"), nullable=False),
        sa.Column("chunk_index",     sa.Integer(),    nullable=False),
        sa.Column("content",         sa.Text(),       nullable=False),
        sa.Column("tokens",          sa.Integer()),
        sa.Column("embedding",       Vector(384)),
        sa.Column("embedding_model", sa.String(128),  server_default="all-MiniLM-L6-v2"),
        sa.Column("created_at",      sa.DateTime(timezone=True), server_default=sa.func.now()),
        sa.UniqueConstraint("document_id", "chunk_index"),
    )

    # HNSW index for vector similarity
    op.execute("""
        CREATE INDEX chunks_embedding_hnsw_idx
        ON chunks USING hnsw (embedding vector_cosine_ops)
        WITH (m = 16, ef_construction = 64)
    """)

    # Full-text index for hybrid search
    op.execute("""
        CREATE INDEX chunks_fts_idx
        ON chunks USING gin(to_tsvector('english', content))
    """)

def downgrade() -> None:
    op.drop_table("chunks")
    op.drop_table("documents")
    op.execute("DROP EXTENSION IF EXISTS vector")

Run migrations:

alembic upgrade head

Verifying the Setup

After running migrations and starting the PostgreSQL container, verify everything is working:

# Connect to the database
docker exec -it rag-postgres-1 psql -U rag -d rag_db

# Check the extension is active
SELECT * FROM pg_extension WHERE extname = 'vector';

# Check the tables exist
\dt

# Check the HNSW index
\di chunks_embedding_hnsw_idx

# Insert a test vector and query it
INSERT INTO documents (file_path, file_hash) VALUES ('test.md', 'abc123');

INSERT INTO chunks (document_id, chunk_index, content, embedding)
VALUES (1, 0, 'This is a test chunk', '[0.1, 0.2, 0.3, ...]'::vector(384));

-- Similarity search (replace with actual 384-dim vector)
SELECT id, content, 1 - (embedding <=> '[0.1, 0.2, ...]'::vector) as similarity
FROM chunks
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 3;

What I Learned

The HNSW index build is automatic for inserts but requires tuning. The default m=16, ef_construction=64 is a reasonable starting point. m controls the number of bidirectional links per node — higher means better recall but more memory. ef_construction controls build-time search effort — higher means better quality index but slower inserts. For a personal knowledge base I haven't needed to touch these defaults.

ef_search at query time is separate from ef_construction. You can tune recall at query time without rebuilding the index:

SET hnsw.ef_search = 100;  -- Default is 40
SELECT ... ORDER BY embedding <=> $1 LIMIT 5;

Higher ef_search = higher recall at the cost of query latency. I run with 100 because I care more about not missing relevant chunks than about shaving milliseconds off the search.

The full-text index deserves as much thought as the vector index. The gin(to_tsvector('english', content)) index is what makes hybrid search possible. I set it up from the start rather than adding it later, because adding a GIN index to a large table is slow and locks the table briefly.

Dimension is not a free variable. When I considered switching from all-MiniLM-L6-v2 (384 dims) to text-embedding-3-small (1536 dims), it required a migration to change the vector(384) column to vector(1536) and re-embed everything. Choosing the embedding model early and sticking with it saves this pain.

Next: Article 3 — Document Loading and Chunking Strategies

PreviousArticle 1: What is RAG and Why I Built One NextArticle 3: Document Loading and Chunking Strategies

Last updated 21 days ago

hashtagIntroduction

hashtagTable of Contents

hashtagInstalling pgvector

hashtagDocker Compose (what I run)

hashtagOn an existing PostgreSQL instance

hashtagThe vector Data Type

hashtagSimilarity Operators

hashtagIndex Types: HNSW vs IVFFlat

hashtagHNSW (Hierarchical Navigable Small World)

hashtagIVFFlat (Inverted File with Flat Quantization)

hashtagDatabase Schema for RAG

hashtagSQLAlchemy 2 Async Setup

hashtagAlembic Migration

hashtagVerifying the Setup

hashtagWhat I Learned