RAG 101

A practical series on building a production-grade RAG system using Python 3.12, FastAPI, and PostgreSQL with pgvector.

I've been maintaining this git-book for a while — hundreds of markdown articles across Kubernetes, architecture, DevOps, AI, and more. At some point it became difficult to find things I'd already written. A vector search over my own knowledge base was the obvious solution. This series documents exactly how I built it: from understanding what RAG actually is, to a running FastAPI service that answers questions against my personal documentation.

No fake product scenarios. No contrived "imagine you have 10 million documents" examples. This is a real system I built for a real personal need.

The Project

Goal: A self-hosted RAG service that:

Ingests markdown files from this git-book (or any text corpus)
Embeds them into pgvector using sentence-transformers or GitHub Models API
Answers natural-language questions by retrieving relevant chunks and calling an LLM
Exposes a REST API via FastAPI

Stack

Layer

Technology

Language

Python 3.12

API Framework

FastAPI (async)

Vector Store

PostgreSQL 16 + pgvector

ORM / Migrations

SQLAlchemy 2 async + Alembic

Embedding Models

sentence-transformers (all-MiniLM-L6-v2) / GitHub Models API

LLM

GitHub Models API (GPT-4o)

Infra

Docker Compose (single VM)

Series Structure

Phase 1 — Foundations

Article

Topic

Article 1

What is RAG and Why I Built One

Article 2

pgvector on PostgreSQL — Setup, Vector Types, and Indexes

Phase 2 — Ingestion Pipeline

Article

Topic

Article 3

Document Loading and Chunking Strategies

Article 4

Generating and Storing Embeddings in pgvector

Phase 3 — Retrieval and Generation

Article

Topic

Article 5

Semantic Search, Cosine Similarity, and Hybrid Retrieval

Article 6

Prompt Construction and the Generation Layer

Phase 4 — Production Service

Article

Topic

Article 7

Wrapping Everything in a FastAPI Service

Project File Tree

rag-service/
├── src/
│   ├── main.py                  # FastAPI app, lifespan
│   ├── core/
│   │   ├── config.py            # Settings from env
│   │   └── logging.py           # structlog setup
│   ├── db/
│   │   ├── base.py              # SQLAlchemy async engine
│   │   └── models.py            # Document, Chunk, Embedding tables
│   ├── ingestion/
│   │   ├── loader.py            # File reader (markdown, txt, pdf)
│   │   ├── chunker.py           # Chunking strategies
│   │   └── pipeline.py          # Orchestrates load → chunk → embed → store
│   ├── embeddings/
│   │   ├── base.py              # EmbeddingProvider ABC
│   │   ├── local.py             # sentence-transformers
│   │   └── github_models.py     # GitHub Models API text-embedding
│   ├── retrieval/
│   │   ├── vector_search.py     # pgvector cosine similarity
│   │   └── hybrid_search.py     # Vector + full-text BM25-style
│   ├── generation/
│   │   ├── prompt_builder.py    # Assemble context + question
│   │   └── llm_client.py        # GitHub Models API chat completion
│   └── api/
│       ├── ingest.py            # POST /ingest endpoints
│       └── query.py             # POST /query endpoint
├── alembic/
│   └── versions/
├── config/
│   └── docker-compose.yml
├── pyproject.toml
└── Dockerfile

How to Follow Along

Each article is self-contained. You can read in order or jump to any topic. Code snippets reference actual file paths from the project tree above.

Dependencies covered across the series:

# pyproject.toml
[tool.poetry.dependencies]
python = "^3.12"
fastapi = "^0.115"
uvicorn = {extras = ["standard"], version = "^0.32"}
sqlalchemy = {extras = ["asyncio"], version = "^2.0"}
asyncpg = "^0.30"
alembic = "^1.14"
pgvector = "^0.3"
sentence-transformers = "^3.3"
openai = "^1.59"          # GitHub Models API uses OpenAI SDK
structlog = "^24.4"
pydantic-settings = "^2.6"

PreviousArticle 8: Observability — Making the Agent Itself Observable NextArticle 1: What is RAG and Why I Built One

Last updated 21 days ago

hashtagThe Project

hashtagSeries Structure

hashtagPhase 1 — Foundations

hashtagPhase 2 — Ingestion Pipeline

hashtagPhase 3 — Retrieval and Generation

hashtagPhase 4 — Production Service

hashtagProject File Tree

hashtagHow to Follow Along

The Project

Series Structure

Phase 1 — Foundations

Phase 2 — Ingestion Pipeline

Phase 3 — Retrieval and Generation

Phase 4 — Production Service

Project File Tree

How to Follow Along