# RAG 101

A practical series on building a production-grade RAG system using **Python 3.12**, **FastAPI**, and **PostgreSQL with pgvector**.

I've been maintaining this git-book for a while — hundreds of markdown articles across Kubernetes, architecture, DevOps, AI, and more. At some point it became difficult to find things I'd already written. A vector search over my own knowledge base was the obvious solution. This series documents exactly how I built it: from understanding what RAG actually is, to a running FastAPI service that answers questions against my personal documentation.

No fake product scenarios. No contrived "imagine you have 10 million documents" examples. This is a real system I built for a real personal need.

***

## The Project

**Goal**: A self-hosted RAG service that:

* Ingests markdown files from this git-book (or any text corpus)
* Embeds them into pgvector using sentence-transformers or GitHub Models API
* Answers natural-language questions by retrieving relevant chunks and calling an LLM
* Exposes a REST API via FastAPI

**Stack**

| Layer            | Technology                                                     |
| ---------------- | -------------------------------------------------------------- |
| Language         | Python 3.12                                                    |
| API Framework    | FastAPI (async)                                                |
| Vector Store     | PostgreSQL 16 + pgvector                                       |
| ORM / Migrations | SQLAlchemy 2 async + Alembic                                   |
| Embedding Models | sentence-transformers (`all-MiniLM-L6-v2`) / GitHub Models API |
| LLM              | GitHub Models API (GPT-4o)                                     |
| Infra            | Docker Compose (single VM)                                     |

***

## Series Structure

### Phase 1 — Foundations

| Article                                                                                                                  | Topic                                                     |
| ------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------- |
| [Article 1](https://blog.htunnthuthu.com/ai-and-machine-learning/artificial-intelligence/rag-101/rag-101-what-is-rag)    | What is RAG and Why I Built One                           |
| [Article 2](https://blog.htunnthuthu.com/ai-and-machine-learning/artificial-intelligence/rag-101/rag-101-pgvector-setup) | pgvector on PostgreSQL — Setup, Vector Types, and Indexes |

### Phase 2 — Ingestion Pipeline

| Article                                                                                                              | Topic                                         |
| -------------------------------------------------------------------------------------------------------------------- | --------------------------------------------- |
| [Article 3](https://blog.htunnthuthu.com/ai-and-machine-learning/artificial-intelligence/rag-101/rag-101-chunking)   | Document Loading and Chunking Strategies      |
| [Article 4](https://blog.htunnthuthu.com/ai-and-machine-learning/artificial-intelligence/rag-101/rag-101-embeddings) | Generating and Storing Embeddings in pgvector |

### Phase 3 — Retrieval and Generation

| Article                                                                                                              | Topic                                                    |
| -------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------- |
| [Article 5](https://blog.htunnthuthu.com/ai-and-machine-learning/artificial-intelligence/rag-101/rag-101-retrieval)  | Semantic Search, Cosine Similarity, and Hybrid Retrieval |
| [Article 6](https://blog.htunnthuthu.com/ai-and-machine-learning/artificial-intelligence/rag-101/rag-101-generation) | Prompt Construction and the Generation Layer             |

### Phase 4 — Production Service

| Article                                                                                                                   | Topic                                    |
| ------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------- |
| [Article 7](https://blog.htunnthuthu.com/ai-and-machine-learning/artificial-intelligence/rag-101/rag-101-fastapi-service) | Wrapping Everything in a FastAPI Service |

***

## Project File Tree

```
rag-service/
├── src/
│   ├── main.py                  # FastAPI app, lifespan
│   ├── core/
│   │   ├── config.py            # Settings from env
│   │   └── logging.py           # structlog setup
│   ├── db/
│   │   ├── base.py              # SQLAlchemy async engine
│   │   └── models.py            # Document, Chunk, Embedding tables
│   ├── ingestion/
│   │   ├── loader.py            # File reader (markdown, txt, pdf)
│   │   ├── chunker.py           # Chunking strategies
│   │   └── pipeline.py          # Orchestrates load → chunk → embed → store
│   ├── embeddings/
│   │   ├── base.py              # EmbeddingProvider ABC
│   │   ├── local.py             # sentence-transformers
│   │   └── github_models.py     # GitHub Models API text-embedding
│   ├── retrieval/
│   │   ├── vector_search.py     # pgvector cosine similarity
│   │   └── hybrid_search.py     # Vector + full-text BM25-style
│   ├── generation/
│   │   ├── prompt_builder.py    # Assemble context + question
│   │   └── llm_client.py        # GitHub Models API chat completion
│   └── api/
│       ├── ingest.py            # POST /ingest endpoints
│       └── query.py             # POST /query endpoint
├── alembic/
│   └── versions/
├── config/
│   └── docker-compose.yml
├── pyproject.toml
└── Dockerfile
```

***

## How to Follow Along

Each article is self-contained. You can read in order or jump to any topic. Code snippets reference actual file paths from the project tree above.

Dependencies covered across the series:

```toml
# pyproject.toml
[tool.poetry.dependencies]
python = "^3.12"
fastapi = "^0.115"
uvicorn = {extras = ["standard"], version = "^0.32"}
sqlalchemy = {extras = ["asyncio"], version = "^2.0"}
asyncpg = "^0.30"
alembic = "^1.14"
pgvector = "^0.3"
sentence-transformers = "^3.3"
openai = "^1.59"          # GitHub Models API uses OpenAI SDK
structlog = "^24.4"
pydantic-settings = "^2.6"
```