Part 2: Python Tooling for AI Engineers

Why Python Won

I write TypeScript for web services and Rust when I need performance. But for AI engineering, Python is the default β€” and after building several AI systems, I understand why.

Every major AI library is Python-first. PyTorch, Hugging Face Transformers, sentence-transformers, LangChain, LlamaIndex, the OpenAI SDK, the Anthropic SDK β€” they all ship Python as the primary interface. Some have TypeScript or Rust bindings, but the documentation, examples, and community assume Python.

More importantly, the ecosystem fits together. FastAPI for async APIs, Pydantic for data validation, httpx for HTTP calls, SQLAlchemy for database access β€” these libraries work well individually and compose cleanly. When I need to go from "I have an idea" to "I have a running prototype," Python gets me there faster than anything else.

That said, Python has real limitations. It's slow for CPU-bound work, the packaging story has historically been painful, and type safety is opt-in rather than enforced. This article covers how I set up Python projects to minimize those pain points.


Project Structure

Every AI project I build follows the same structure. It took me a few projects to settle on this, and it's served me well:

ai-engineer-project/
β”œβ”€β”€ pyproject.toml           # Dependencies, metadata, tool config
β”œβ”€β”€ README.md
β”œβ”€β”€ .env.example             # Template for required environment variables
β”œβ”€β”€ .gitignore
β”œβ”€β”€ docker-compose.yml       # PostgreSQL + pgvector for local dev
β”œβ”€β”€ src/
β”‚   └── ai_engineer/
β”‚       β”œβ”€β”€ __init__.py
β”‚       β”œβ”€β”€ main.py          # FastAPI app entry point
β”‚       β”œβ”€β”€ config.py        # Settings from environment variables
β”‚       β”œβ”€β”€ models.py        # Pydantic request/response models
β”‚       β”œβ”€β”€ llm/
β”‚       β”‚   β”œβ”€β”€ __init__.py
β”‚       β”‚   β”œβ”€β”€ base.py      # LLMProvider protocol
β”‚       β”‚   β”œβ”€β”€ openai.py    # OpenAI implementation
β”‚       β”‚   └── github.py    # GitHub Models implementation
β”‚       β”œβ”€β”€ embeddings/
β”‚       β”‚   β”œβ”€β”€ __init__.py
β”‚       β”‚   β”œβ”€β”€ base.py      # EmbeddingProvider protocol
β”‚       β”‚   └── local.py     # sentence-transformers
β”‚       └── db/
β”‚           β”œβ”€β”€ __init__.py
β”‚           β”œβ”€β”€ engine.py    # SQLAlchemy async engine
β”‚           └── models.py    # Database table models
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ conftest.py
β”‚   β”œβ”€β”€ test_config.py
β”‚   β”œβ”€β”€ test_llm.py
β”‚   └── test_embeddings.py
└── scripts/
    └── seed.py              # Data loading / one-off scripts

A few intentional decisions here:

  • src/ layout: The src/ai_engineer/ layout prevents accidental imports from the project root. I've been bitten by this in the past β€” running pytest from the project root and importing the wrong module because Python found a local file before the installed package.

  • Separate llm/ and embeddings/: These are distinct concerns. LLM providers generate text. Embedding providers generate vectors. They share nothing except being AI-related.

  • config.py at the top level: Every module imports settings from one place. No scattered os.getenv() calls.


Package Management with uv

I switched from pip to uv about a year ago and haven't looked back. It's written in Rust, resolves dependencies in seconds instead of minutes, and handles virtual environments automatically.

The uv.lock file pins exact versions, so every team member and CI run gets identical dependencies. No more "works on my machine" from transitive dependency drift.


pyproject.toml β€” The Single Source of Truth

Here's the pyproject.toml I start every AI project with:

Key points:

  • requires-python = ">=3.12": I use Python 3.12 features β€” better error messages, type statement, performance improvements. No reason to support older versions in a new project.

  • Ruff for linting and formatting: It replaced both flake8 and black in my workflow. One tool, runs in milliseconds, configured in pyproject.toml.

  • mypy in strict mode: Catches type errors before runtime. AI code often passes complex objects between functions β€” type hints make this maintainable.

  • pytest with asyncio_mode = "auto": Every test can be async without decorating each one.


Configuration with Pydantic Settings

I learned the hard way not to scatter os.getenv() calls through the codebase. Pydantic Settings validates all configuration at startup and gives you typed access:

And the .env.example:

If a required variable is missing or has the wrong type, the application crashes immediately at startup with a clear error message. This is vastly better than discovering a KeyError at 3 AM when a user hits a code path you didn't test.


The Essential Library Stack

After building several projects, I've settled on a core set of libraries. Here's what I use and why:

FastAPI β€” Async API Framework

FastAPI gives me:

  • Async by default β€” essential when every request waits on an LLM API call

  • Automatic OpenAPI docs β€” my endpoints are self-documenting

  • Pydantic integration β€” request/response validation with zero extra code

  • Dependency injection β€” clean way to pass database sessions and API clients

Pydantic β€” Data Validation

Every boundary in my AI systems uses Pydantic models. User input, LLM responses, database records:

Why I'm strict about this:

  • LLM outputs are unpredictable. Pydantic catches malformed responses before they reach the user.

  • When I change a response schema, mypy tells me every callsite that needs updating.

  • The Field descriptions flow into OpenAPI docs automatically.

httpx β€” Async HTTP Client

I use httpx over requests because:

  • Native async support β€” no thread pool workarounds

  • Connection pooling with AsyncClient as a context manager

  • Timeout configuration that actually works

sentence-transformers β€” Local Embeddings

I prefer local embeddings for development:

  • No API calls β€” faster iteration

  • No cost β€” I can re-embed my entire corpus without worrying about billing

  • Deterministic β€” same input always produces the same output

For production, I'll switch to an API-based embedding model when I need higher quality. The provider abstraction (covered in Part 4) makes this a config change, not a code change.


Development Environment Setup

Here's how I bootstrap a new AI project from scratch:


Type Safety Strategy

Python's type system is opt-in, which means you have to be disciplined about it. Here's my approach:

Use Protocols Instead of Abstract Base Classes

Protocols work with structural typing β€” any class that has the right methods satisfies the protocol, no inheritance required. This makes testing easy (just pass a fake object with the right methods) and keeps coupling loose.

Run mypy in CI

When mypy catches an error, it's almost always a real bug β€” usually a case where I'm passing a string where I need a list, or where an optional value could be None.


Linting and Formatting with Ruff

Ruff replaces flake8, isort, and black with a single tool that runs in milliseconds:

My ruff rules (configured in pyproject.toml above) catch:

  • E/F: Standard Python errors (undefined names, unused imports)

  • I: Import sorting

  • N: Naming conventions

  • UP: Unnecessary Python 2 compatibility

  • B: Common bugs (mutable default arguments, etc.)

  • SIM: Code simplification

  • TCH: Type checking imports (move type-only imports behind TYPE_CHECKING)


Testing Setup

AI systems have both deterministic and non-deterministic components. I test them differently:

Deterministic components (config, validation, data transformation) get standard pytest tests. Non-deterministic components (LLM output quality, retrieval relevance) get evaluation tests β€” covered in Part 7.


VS Code Configuration

I spend most of my time in VS Code. Here's the workspace settings I use for AI projects:

With GitHub Copilot handling code generation and Ruff handling formatting, the edit-save-test cycle is fast.


What's Next

With the tooling in place, we're ready to understand the AI components. Part 3 covers how LLMs actually work β€” not at the research level, but at the level an AI engineer needs to debug token limits, choose parameters, and understand why a model gives different answers to the same question.


Previous: Part 1 β€” What is an AI Engineer?

Next: Part 3 β€” How LLMs Work β€” What AI Engineers Need to Know

Last updated