Article 2: pgvector on PostgreSQL — Setup, Vector Types, and Indexes
Introduction
pgvector is a PostgreSQL extension that adds a native vector data type and similarity search operators. It's the difference between needing a separate vector database service and storing your embeddings in the same PostgreSQL instance you're already running.
This article covers installing pgvector, the vector column type, the two index types (HNSW and IVFFlat), and the SQLAlchemy 2 async schema I use in the RAG service. I also explain my index choice and why I made it.
Table of Contents
Installing pgvector
Docker Compose (what I run)
The easiest setup is pgvector/pgvector:pg16 — a pre-built image with the extension already installed:
Once the container starts, enable the extension in the target database:
Alembic handles this in the initial migration (covered below), so I don't run it manually.
On an existing PostgreSQL instance
If you're adding pgvector to an existing PostgreSQL 16 installation on Debian/Ubuntu:
The vector Data Type
vector(n) stores a fixed-dimension floating-point array. The dimension must match the embedding model you use:
all-MiniLM-L6-v2 (sentence-transformers)
384
text-embedding-3-small (OpenAI / GitHub Models)
1536
text-embedding-3-large (OpenAI / GitHub Models)
3072
togethercomputer/m2-bert-80M-8k-retrieval
768
The dimension is fixed at column definition time. In the RAG service I use 384 (local model) by default and make it configurable via an environment variable for when I want to switch to an API-based model.
Similarity Operators
pgvector provides three distance operators:
<=>
Cosine distance
Most embedding models (normalized vectors)
<->
Euclidean (L2) distance
When vectors are not normalized
<#>
Negative inner product
When using dot-product similarity
For sentence-transformer embeddings I always use cosine distance (<=>). The models produce normalized vectors, and cosine similarity correctly captures directional closeness (same meaning) vs. magnitude.
1 - cosine_distance converts to cosine similarity (higher = more similar). I return this as a score in the API response so callers can apply a minimum-similarity threshold.
Index Types: HNSW vs IVFFlat
Without an index, pgvector does an exact sequential scan of every row. For a few thousand chunks this is fast enough, but with tens of thousands of chunks (or more) you want an approximate nearest-neighbor (ANN) index.
HNSW (Hierarchical Navigable Small World)
Builds a multi-layer graph. Query traverses from the top layer down to approximate neighbors.
Pros: High recall, fast queries, no need to specify the number of clusters up front
Cons: Higher memory usage during build;
INSERTis slower than IVFFlat because the graph must be updated on every new rowBest for: Real-time ingestion with frequent inserts
IVFFlat (Inverted File with Flat Quantization)
Clusters vectors into
listsVoronoi cells at build time; queries searchprobescells.Pros: Faster build, lower memory
Cons: Must rebuild when data distribution changes significantly; needs existing data to create
Best for: Batch-ingested, relatively static corpora
I use HNSW. My ingestion pipeline runs incrementally — new articles are added continuously. HNSW handles inserts cleanly without requiring a full index rebuild. The memory overhead is not a concern on any machine running PostgreSQL.
Database Schema for RAG
Three tables: documents, chunks, and a ingestion_jobs tracking table.
The file_hash column on documents is how I detect changes. Before re-ingesting a file, the pipeline computes SHA-256 of the current content and compares it against the stored hash. If they match, the file hasn't changed and re-ingestion is skipped.
SQLAlchemy 2 Async Setup
pgvector.sqlalchemy.Vector is the SQLAlchemy column type provided by the pgvector Python package. It maps directly to the vector(n) PostgreSQL type.
Alembic Migration
The initial migration creates the extension and all tables:
Run migrations:
Verifying the Setup
After running migrations and starting the PostgreSQL container, verify everything is working:
What I Learned
The HNSW index build is automatic for inserts but requires tuning. The default m=16, ef_construction=64 is a reasonable starting point. m controls the number of bidirectional links per node — higher means better recall but more memory. ef_construction controls build-time search effort — higher means better quality index but slower inserts. For a personal knowledge base I haven't needed to touch these defaults.
ef_search at query time is separate from ef_construction. You can tune recall at query time without rebuilding the index:
Higher ef_search = higher recall at the cost of query latency. I run with 100 because I care more about not missing relevant chunks than about shaving milliseconds off the search.
The full-text index deserves as much thought as the vector index. The gin(to_tsvector('english', content)) index is what makes hybrid search possible. I set it up from the start rather than adding it later, because adding a GIN index to a large table is slow and locks the table briefly.
Dimension is not a free variable. When I considered switching from all-MiniLM-L6-v2 (384 dims) to text-embedding-3-small (1536 dims), it required a migration to change the vector(384) column to vector(1536) and re-embed everything. Choosing the embedding model early and sticking with it saves this pain.
Last updated