A practical series on building a production-grade RAG system using Python 3.12, FastAPI, and PostgreSQL with pgvector.
I've been maintaining this git-book for a while β hundreds of markdown articles across Kubernetes, architecture, DevOps, AI, and more. At some point it became difficult to find things I'd already written. A vector search over my own knowledge base was the obvious solution. This series documents exactly how I built it: from understanding what RAG actually is, to a running FastAPI service that answers questions against my personal documentation.
No fake product scenarios. No contrived "imagine you have 10 million documents" examples. This is a real system I built for a real personal need.
The Project
Goal: A self-hosted RAG service that:
Ingests markdown files from this git-book (or any text corpus)
Embeds them into pgvector using sentence-transformers or GitHub Models API
Answers natural-language questions by retrieving relevant chunks and calling an LLM
Exposes a REST API via FastAPI
Stack
Layer
Technology
Language
Python 3.12
API Framework
FastAPI (async)
Vector Store
PostgreSQL 16 + pgvector
ORM / Migrations
SQLAlchemy 2 async + Alembic
Embedding Models
sentence-transformers (all-MiniLM-L6-v2) / GitHub Models API