Part 2: Python Tooling for AI Engineers

Why Python Won

I write TypeScript for web services and Rust when I need performance. But for AI engineering, Python is the default — and after building several AI systems, I understand why.

Every major AI library is Python-first. PyTorch, Hugging Face Transformers, sentence-transformers, LangChain, LlamaIndex, the OpenAI SDK, the Anthropic SDK — they all ship Python as the primary interface. Some have TypeScript or Rust bindings, but the documentation, examples, and community assume Python.

More importantly, the ecosystem fits together. FastAPI for async APIs, Pydantic for data validation, httpx for HTTP calls, SQLAlchemy for database access — these libraries work well individually and compose cleanly. When I need to go from "I have an idea" to "I have a running prototype," Python gets me there faster than anything else.

That said, Python has real limitations. It's slow for CPU-bound work, the packaging story has historically been painful, and type safety is opt-in rather than enforced. This article covers how I set up Python projects to minimize those pain points.

Project Structure

Every AI project I build follows the same structure. It took me a few projects to settle on this, and it's served me well:

ai-engineer-project/
├── pyproject.toml           # Dependencies, metadata, tool config
├── README.md
├── .env.example             # Template for required environment variables
├── .gitignore
├── docker-compose.yml       # PostgreSQL + pgvector for local dev
├── src/
│   └── ai_engineer/
│       ├── __init__.py
│       ├── main.py          # FastAPI app entry point
│       ├── config.py        # Settings from environment variables
│       ├── models.py        # Pydantic request/response models
│       ├── llm/
│       │   ├── __init__.py
│       │   ├── base.py      # LLMProvider protocol
│       │   ├── openai.py    # OpenAI implementation
│       │   └── github.py    # GitHub Models implementation
│       ├── embeddings/
│       │   ├── __init__.py
│       │   ├── base.py      # EmbeddingProvider protocol
│       │   └── local.py     # sentence-transformers
│       └── db/
│           ├── __init__.py
│           ├── engine.py    # SQLAlchemy async engine
│           └── models.py    # Database table models
├── tests/
│   ├── conftest.py
│   ├── test_config.py
│   ├── test_llm.py
│   └── test_embeddings.py
└── scripts/
    └── seed.py              # Data loading / one-off scripts

A few intentional decisions here:

src/ layout: The src/ai_engineer/ layout prevents accidental imports from the project root. I've been bitten by this in the past — running pytest from the project root and importing the wrong module because Python found a local file before the installed package.
Separate llm/ and embeddings/: These are distinct concerns. LLM providers generate text. Embedding providers generate vectors. They share nothing except being AI-related.
config.py at the top level: Every module imports settings from one place. No scattered os.getenv() calls.

Package Management with uv

I switched from pip to uv about a year ago and haven't looked back. It's written in Rust, resolves dependencies in seconds instead of minutes, and handles virtual environments automatically.

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create a new project
uv init ai-engineer-project
cd ai-engineer-project

# Add dependencies
uv add fastapi uvicorn pydantic pydantic-settings
uv add httpx python-dotenv
uv add sqlalchemy[asyncio] asyncpg alembic
uv add sentence-transformers

# Add dev dependencies
uv add --dev pytest pytest-asyncio pytest-cov ruff mypy

# Run a script
uv run python -m ai_engineer.main

# Run tests
uv run pytest

The uv.lock file pins exact versions, so every team member and CI run gets identical dependencies. No more "works on my machine" from transitive dependency drift.

pyproject.toml — The Single Source of Truth

Here's the pyproject.toml I start every AI project with:

[project]
name = "ai-engineer-project"
version = "0.1.0"
description = "AI-powered service built for AI Engineer 101"
requires-python = ">=3.12"
dependencies = [
    "fastapi>=0.115.0",
    "uvicorn[standard]>=0.32.0",
    "pydantic>=2.10.0",
    "pydantic-settings>=2.6.0",
    "httpx>=0.28.0",
    "python-dotenv>=1.0.0",
    "sqlalchemy[asyncio]>=2.0.36",
    "asyncpg>=0.30.0",
    "alembic>=1.14.0",
    "sentence-transformers>=3.3.0",
]

[project.optional-dependencies]
dev = [
    "pytest>=8.3.0",
    "pytest-asyncio>=0.24.0",
    "pytest-cov>=6.0.0",
    "ruff>=0.8.0",
    "mypy>=1.13.0",
]

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.hatch.build.targets.wheel]
packages = ["src/ai_engineer"]

[tool.ruff]
target-version = "py312"
line-length = 100

[tool.ruff.lint]
select = ["E", "F", "I", "N", "UP", "B", "SIM", "TCH"]

[tool.pytest.ini_options]
testpaths = ["tests"]
asyncio_mode = "auto"

[tool.mypy]
python_version = "3.12"
strict = true
warn_return_any = true

Key points:

requires-python = ">=3.12": I use Python 3.12 features — better error messages, type statement, performance improvements. No reason to support older versions in a new project.
Ruff for linting and formatting: It replaced both flake8 and black in my workflow. One tool, runs in milliseconds, configured in pyproject.toml.
mypy in strict mode: Catches type errors before runtime. AI code often passes complex objects between functions — type hints make this maintainable.
pytest with asyncio_mode = "auto": Every test can be async without decorating each one.

Configuration with Pydantic Settings

I learned the hard way not to scatter os.getenv() calls through the codebase. Pydantic Settings validates all configuration at startup and gives you typed access:

# src/ai_engineer/config.py
from pydantic_settings import BaseSettings, SettingsConfigDict


class Settings(BaseSettings):
    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        case_sensitive=False,
    )

    # API
    app_name: str = "ai-engineer-service"
    debug: bool = False

    # Database
    database_url: str = "postgresql+asyncpg://postgres:postgres@localhost:5432/ai_engineer"

    # LLM Provider
    llm_provider: str = "github"  # "github" | "openai" | "anthropic"
    llm_model: str = "gpt-4o"
    llm_api_key: str = ""
    llm_max_tokens: int = 1024
    llm_temperature: float = 0.1

    # Embeddings
    embedding_model: str = "all-MiniLM-L6-v2"
    embedding_dimension: int = 384

    # pgvector
    pgvector_distance: str = "cosine"  # "cosine" | "l2" | "inner_product"


# Singleton — import this everywhere
settings = Settings()

And the .env.example:

# .env.example — copy to .env and fill in values

# LLM
LLM_PROVIDER=github
LLM_MODEL=gpt-4o
LLM_API_KEY=your-api-key-here
LLM_MAX_TOKENS=1024
LLM_TEMPERATURE=0.1

# Database
DATABASE_URL=postgresql+asyncpg://postgres:postgres@localhost:5432/ai_engineer

# Embeddings
EMBEDDING_MODEL=all-MiniLM-L6-v2

If a required variable is missing or has the wrong type, the application crashes immediately at startup with a clear error message. This is vastly better than discovering a KeyError at 3 AM when a user hits a code path you didn't test.

The Essential Library Stack

After building several projects, I've settled on a core set of libraries. Here's what I use and why:

FastAPI — Async API Framework

# src/ai_engineer/main.py
from contextlib import asynccontextmanager
from collections.abc import AsyncGenerator

from fastapi import FastAPI

from ai_engineer.config import settings
from ai_engineer.db.engine import init_db, close_db


@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
    """Startup and shutdown events."""
    await init_db()
    yield
    await close_db()


app = FastAPI(
    title=settings.app_name,
    lifespan=lifespan,
)


@app.get("/health")
async def health() -> dict[str, str]:
    return {"status": "ok"}

FastAPI gives me:

Async by default — essential when every request waits on an LLM API call
Automatic OpenAPI docs — my endpoints are self-documenting
Pydantic integration — request/response validation with zero extra code
Dependency injection — clean way to pass database sessions and API clients

Pydantic — Data Validation

Every boundary in my AI systems uses Pydantic models. User input, LLM responses, database records:

# src/ai_engineer/models.py
from pydantic import BaseModel, Field


class QuestionRequest(BaseModel):
    question: str = Field(
        ...,
        min_length=1,
        max_length=2000,
        description="The question to answer",
    )
    max_tokens: int = Field(
        default=512,
        ge=1,
        le=4096,
        description="Maximum tokens in the response",
    )
    include_sources: bool = Field(
        default=True,
        description="Whether to include source references",
    )


class Source(BaseModel):
    title: str
    content_preview: str = Field(..., max_length=200)
    similarity_score: float = Field(..., ge=0.0, le=1.0)


class AnswerResponse(BaseModel):
    answer: str
    sources: list[Source]
    model: str
    tokens_used: int
    latency_ms: float

Why I'm strict about this:

LLM outputs are unpredictable. Pydantic catches malformed responses before they reach the user.
When I change a response schema, mypy tells me every callsite that needs updating.
The Field descriptions flow into OpenAPI docs automatically.

httpx — Async HTTP Client

# Making LLM API calls with httpx
import httpx

async def call_llm(prompt: str, api_key: str) -> str:
    async with httpx.AsyncClient(timeout=30.0) as client:
        response = await client.post(
            "https://models.inference.ai.azure.com/chat/completions",
            headers={
                "Authorization": f"Bearer {api_key}",
                "Content-Type": "application/json",
            },
            json={
                "model": "gpt-4o",
                "messages": [{"role": "user", "content": prompt}],
                "max_tokens": 512,
                "temperature": 0.1,
            },
        )
        response.raise_for_status()
        data = response.json()
        return data["choices"][0]["message"]["content"]

I use httpx over requests because:

Native async support — no thread pool workarounds
Connection pooling with AsyncClient as a context manager
Timeout configuration that actually works

sentence-transformers — Local Embeddings

# Generating embeddings locally
from sentence_transformers import SentenceTransformer

# Load model once at startup
model = SentenceTransformer("all-MiniLM-L6-v2")

def embed_texts(texts: list[str]) -> list[list[float]]:
    """Generate embeddings for a list of texts."""
    embeddings = model.encode(texts, normalize_embeddings=True)
    return embeddings.tolist()

# Usage
texts = [
    "How do I set up pgvector with PostgreSQL?",
    "What is the difference between cosine and L2 distance?",
]
vectors = embed_texts(texts)
# Each vector: 384 dimensions, normalized to unit length

I prefer local embeddings for development:

No API calls — faster iteration
No cost — I can re-embed my entire corpus without worrying about billing
Deterministic — same input always produces the same output

For production, I'll switch to an API-based embedding model when I need higher quality. The provider abstraction (covered in Part 4) makes this a config change, not a code change.

Development Environment Setup

Here's how I bootstrap a new AI project from scratch:

# 1. Create project
uv init ai-engineer-project
cd ai-engineer-project

# 2. Set Python version
echo "3.12" > .python-version

# 3. Add dependencies
uv add fastapi "uvicorn[standard]" pydantic pydantic-settings httpx
uv add "sqlalchemy[asyncio]" asyncpg alembic
uv add sentence-transformers python-dotenv
uv add --dev pytest pytest-asyncio pytest-cov ruff mypy

# 4. Create source structure
mkdir -p src/ai_engineer/{llm,embeddings,db}
touch src/ai_engineer/__init__.py
touch src/ai_engineer/{main,config,models}.py
touch src/ai_engineer/llm/{__init__,base,openai,github}.py
touch src/ai_engineer/embeddings/{__init__,base,local}.py
touch src/ai_engineer/db/{__init__,engine,models}.py
mkdir -p tests
touch tests/{conftest,test_config}.py
mkdir -p scripts

# 5. Set up environment file
cp .env.example .env  # edit with your API keys

# 6. Start PostgreSQL with pgvector
cat > docker-compose.yml << 'EOF'
services:
  postgres:
    image: pgvector/pgvector:pg16
    environment:
      POSTGRES_DB: ai_engineer
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:
EOF

docker compose up -d

# 7. Verify everything works
uv run python -c "from ai_engineer.config import Settings; print('Config OK')"
uv run uvicorn ai_engineer.main:app --reload

Type Safety Strategy

Python's type system is opt-in, which means you have to be disciplined about it. Here's my approach:

Use Protocols Instead of Abstract Base Classes

# src/ai_engineer/llm/base.py
from typing import Protocol


class LLMProvider(Protocol):
    """Any LLM provider must implement this interface."""

    async def generate(
        self,
        prompt: str,
        *,
        max_tokens: int = 512,
        temperature: float = 0.1,
    ) -> str: ...

    async def generate_structured(
        self,
        prompt: str,
        response_model: type,
        *,
        max_tokens: int = 512,
    ) -> dict: ...

# src/ai_engineer/embeddings/base.py
from typing import Protocol


class EmbeddingProvider(Protocol):
    """Any embedding provider must implement this interface."""

    @property
    def dimension(self) -> int: ...

    async def embed(self, texts: list[str]) -> list[list[float]]: ...

Protocols work with structural typing — any class that has the right methods satisfies the protocol, no inheritance required. This makes testing easy (just pass a fake object with the right methods) and keeps coupling loose.

Run mypy in CI

# Check types
uv run mypy src/

# What I expect to see
# Success: no issues found in 12 source files

When mypy catches an error, it's almost always a real bug — usually a case where I'm passing a string where I need a list, or where an optional value could be None.

Linting and Formatting with Ruff

Ruff replaces flake8, isort, and black with a single tool that runs in milliseconds:

# Check for issues
uv run ruff check src/

# Auto-fix what it can
uv run ruff check --fix src/

# Format code
uv run ruff format src/

My ruff rules (configured in pyproject.toml above) catch:

E/F: Standard Python errors (undefined names, unused imports)
I: Import sorting
N: Naming conventions
UP: Unnecessary Python 2 compatibility
B: Common bugs (mutable default arguments, etc.)
SIM: Code simplification
TCH: Type checking imports (move type-only imports behind TYPE_CHECKING)

Testing Setup

AI systems have both deterministic and non-deterministic components. I test them differently:

# tests/conftest.py
import pytest
from ai_engineer.config import Settings


@pytest.fixture
def settings() -> Settings:
    """Test settings with safe defaults."""
    return Settings(
        database_url="postgresql+asyncpg://postgres:postgres@localhost:5432/ai_engineer_test",
        llm_api_key="test-key",
        llm_provider="github",
        debug=True,
    )

# tests/test_config.py
from ai_engineer.config import Settings


def test_settings_defaults():
    """Settings should have sensible defaults."""
    s = Settings(llm_api_key="test-key")
    assert s.llm_temperature == 0.1
    assert s.embedding_dimension == 384
    assert s.llm_max_tokens == 1024


def test_settings_override():
    """Settings should be overridable."""
    s = Settings(
        llm_api_key="test-key",
        llm_temperature=0.7,
        llm_model="claude-sonnet-4-20250514",
    )
    assert s.llm_temperature == 0.7
    assert s.llm_model == "claude-sonnet-4-20250514"

# tests/test_models.py
import pytest
from pydantic import ValidationError
from ai_engineer.models import QuestionRequest


def test_question_request_valid():
    req = QuestionRequest(question="What is RAG?")
    assert req.question == "What is RAG?"
    assert req.max_tokens == 512  # default


def test_question_request_empty_question():
    with pytest.raises(ValidationError):
        QuestionRequest(question="")


def test_question_request_too_long():
    with pytest.raises(ValidationError):
        QuestionRequest(question="x" * 2001)

Deterministic components (config, validation, data transformation) get standard pytest tests. Non-deterministic components (LLM output quality, retrieval relevance) get evaluation tests — covered in Part 7.

VS Code Configuration

I spend most of my time in VS Code. Here's the workspace settings I use for AI projects:

{
    "python.defaultInterpreterPath": ".venv/bin/python",
    "[python]": {
        "editor.defaultFormatter": "charliermarsh.ruff",
        "editor.formatOnSave": true,
        "editor.codeActionsOnSave": {
            "source.fixAll": "explicit",
            "source.organizeImports": "explicit"
        }
    },
    "python.analysis.typeCheckingMode": "basic",
    "python.testing.pytestEnabled": true,
    "python.testing.pytestArgs": ["tests"]
}

With GitHub Copilot handling code generation and Ruff handling formatting, the edit-save-test cycle is fast.

What's Next

With the tooling in place, we're ready to understand the AI components. Part 3 covers how LLMs actually work — not at the research level, but at the level an AI engineer needs to debug token limits, choose parameters, and understand why a model gives different answers to the same question.

Previous: Part 1 — What is an AI Engineer?

Next: Part 3 — How LLMs Work — What AI Engineers Need to Know

PreviousPart 1: What is an AI Engineer?NextPart 3: How LLMs Work — A Practical Guide

Last updated 6 hours ago

hashtagWhy Python Won

hashtagProject Structure

hashtagPackage Management with uv

hashtagpyproject.toml — The Single Source of Truth

hashtagConfiguration with Pydantic Settings

hashtagThe Essential Library Stack

hashtagFastAPI — Async API Framework

hashtagPydantic — Data Validation

hashtaghttpx — Async HTTP Client

hashtagsentence-transformers — Local Embeddings

hashtagDevelopment Environment Setup

hashtagType Safety Strategy

hashtagUse Protocols Instead of Abstract Base Classes

hashtagRun mypy in CI

hashtagLinting and Formatting with Ruff

hashtagTesting Setup

hashtagVS Code Configuration

hashtagWhat's Next