Part 1: What is an AI Engineer?
The Role That Didn't Exist Five Years Ago
When I started my career, the path was clear: you were either a software engineer or a data scientist. Software engineers built APIs, deployed services, and wrote infrastructure code. Data scientists trained models, ran experiments, and wrote notebooks. The two worlds occasionally overlapped, but they had different tools, different workflows, and different definitions of "done."
Then large language models happened.
Suddenly, you could build intelligent systems without training a single model. The bottleneck shifted from "can we build a model that does this?" to "can we build a reliable system around a model that already does this?" That shift created the AI engineer role β someone who bridges software engineering discipline with enough AI understanding to build production systems that use language models, embeddings, and retrieval effectively.
I came to this from the software engineering side. I was building backend services, DevOps automation, and Kubernetes infrastructure. When I started integrating LLMs into my projects β a RAG service over my personal knowledge base, an LLM-powered monitoring agent, automation tools that use natural language β I realized I needed a different skill set than what I had.
Not entirely different. Maybe 60% of what makes a good AI engineer is just being a good software engineer. But that other 40% β understanding tokens, embeddings, prompt design, evaluation β makes the difference between a demo that works and a system you can actually rely on.
AI Engineer vs ML Engineer vs Data Scientist
These roles overlap, but the core focus is different. Here's how I think about it after working across all three areas:
Primary output
Insights, models, experiments
Trained models, training pipelines
AI-powered applications and services
Core skill
Statistics, experimentation
Model training, MLOps
Software engineering + AI integration
Typical tools
Jupyter, pandas, scikit-learn
PyTorch, Kubeflow, MLflow
FastAPI, LLM APIs, vector databases
Trains models?
Yes β often from scratch
Yes β at scale
Rarely β uses pre-trained models and APIs
Writes production code?
Sometimes
Yes
Always
Cares about latency?
Not usually
For inference, yes
For every request, yes
Evaluation approach
Accuracy, F1, loss curves
Model performance metrics
End-to-end system quality, user experience
The boundaries are blurry. I've done work that falls into all three columns. But when I'm building an AI-powered API that takes a user question, retrieves relevant context from a vector store, constructs a prompt, calls an LLM, and returns a structured response β that's AI engineering.
The AI Engineer Skills Map
Through building my own projects, I've found the skills fall into four categories:
1. Software Engineering Fundamentals (the 60%)
This is the foundation. Without solid software engineering, your AI system will be a pile of notebooks that only works on your laptop.
What matters here:
API design: REST endpoints, request/response schemas, error handling
Async programming: LLM calls are I/O-bound; you need concurrency
Type safety: Pydantic models for every input and output
Testing: pytest for deterministic parts, evaluation for non-deterministic parts
Deployment: Docker, CI/CD, observability
2. LLM Understanding (the core AI knowledge)
You don't need to train models from scratch. But you need to understand how they work well enough to debug problems.
What I've found essential:
Tokenization: knowing that
"ChatGPT"becomes multiple tokens, and why that matters for context windowsContext windows: understanding that a 128k context window doesn't mean you should stuff 128k tokens into every request
Temperature and sampling: knowing when to use
temperature=0(structured extraction) vstemperature=0.7(creative generation)Model capabilities: understanding what GPT-4o is good at vs what Claude Sonnet handles better
3. Retrieval and Embeddings
Most useful AI systems retrieve information before generating. Understanding this pipeline changed how I build everything:
This is the RAG pattern, and it's the backbone of most AI engineering work I've done.
4. Evaluation and Quality
This is the hardest part. When your system returns a natural-language answer, how do you know if it's good?
I've learned to think about evaluation at three levels:
Component-level: Does the retrieval return relevant documents? (measurable with precision/recall)
Output-level: Is the generated answer accurate and helpful? (requires human judgment or LLM-as-judge)
System-level: Does the user get value? (requires logging, feedback, and iteration)
Where AI Engineers Fit in a Team
In my experience, AI engineers sit at the intersection of three concerns:
From product: "We need the system to answer customer questions about their orders"
From ML/data science: "Here's a fine-tuned model" or "Use this embedding model"
The AI engineer's job: Build a reliable service that takes user questions, retrieves relevant order data, calls the model through a well-designed prompt, evaluates the output quality, and serves it through an API with proper error handling, caching, and observability
My Path to AI Engineering
I didn't plan to become an AI engineer. I was building backend services and DevOps tooling. Here's roughly how the transition happened:
Phase 1 β Curiosity: I started calling the OpenAI API from a Python script to summarize log files. No architecture, no error handling, just httpx.post() and print().
Phase 2 β First real project: I built a RAG service over my own git-book β the knowledge base you're reading right now. This forced me to learn about embeddings, vector search, chunking strategies, and prompt construction. Suddenly I needed pgvector, sentence-transformers, and proper async code.
Phase 3 β Production thinking: I built an LLM-powered DevOps monitoring agent. This required observability (tracking token usage and latency), guardrails (preventing the LLM from executing dangerous commands), and evaluation (making sure the agent's suggestions were actually correct).
Phase 4 β System design: Now I think about AI systems the same way I think about any distributed system β with contracts, failure modes, testing strategies, and deployment pipelines. The AI part is a component, not the whole system.
What This Series Covers
Each part builds on the previous one. By the end, you'll have built a working AI-powered system from scratch:
The code is real. The patterns come from my own projects. The problems I discuss are problems I actually hit.
Prerequisites
Before starting this series, you should be comfortable with:
Python 3.11+: functions, classes, type hints, async/await
REST APIs: HTTP methods, request/response patterns, JSON
Command line: pip/uv, virtual environments, running scripts
Git: version control basics
If you're not there yet, start with Python 101 and REST API 101 in this git-book.
What You'll Need
We'll set up the full development environment in Part 2.
Last updated