Part 5: RAG, Fine-Tuning, and Prompt Engineering

Part of the AI Fundamentals 101 Series

The Problem: Foundation Models Don't Know Your Stuff

Foundation models are impressive — but out of the box, they have three fundamental gaps:

  1. They don't know your data. Claude has never seen your internal runbooks, your architecture diagrams, or your Kubernetes cluster configuration.

  2. Their knowledge is frozen. Whatever was in the training data is all they know. They can't tell you about last night's incident.

  3. They're general-purpose. They know a little about everything, but they're not specialized in your domain.

There are three strategies to bridge these gaps, and choosing the right one is one of the most important decisions in AI engineering. I've used all three in my own projects, and each has a sweet spot.


The Three Strategies at a Glance

customization_strategies = {
    "Prompt Engineering": {
        "what": "Craft better instructions and provide examples in the prompt",
        "when": "Always — this is your first tool",
        "cost": "Free (no training, no infrastructure)",
        "data_needed": "None to a few examples",
        "latency_impact": "Minimal (slightly longer prompts)",
        "best_for": "Formatting, tone, task definition, few-shot learning"
    },
    "RAG (Retrieval-Augmented Generation)": {
        "what": "Retrieve relevant documents and include them in the prompt",
        "when": "Model needs access to your specific data or current information",
        "cost": "Moderate (vector DB, embedding compute)",
        "data_needed": "Your documents/knowledge base",
        "latency_impact": "Moderate (retrieval step adds 100-500ms)",
        "best_for": "Q&A over docs, knowledge bases, current data access"
    },
    "Fine-Tuning": {
        "what": "Further train the model on your task-specific data",
        "when": "Need consistent behavior that prompting can't achieve",
        "cost": "High (GPU compute, labeled data, ongoing maintenance)",
        "data_needed": "Hundreds to thousands of labeled examples",
        "latency_impact": "None (model runs at same speed)",
        "best_for": "Specialized tone, domain-specific patterns, consistent formatting"
    }
}

for strategy, details in customization_strategies.items():
    print(f"\n{'='*50}")
    print(f"  {strategy}")
    print(f"  When: {details['when']}")
    print(f"  Cost: {details['cost']}")
    print(f"  Best for: {details['best_for']}")

Strategy 1: Prompt Engineering

Prompt engineering is the practice of designing effective inputs to get better outputs from an LLM. It's not guesswork — it's a systematic approach to communicating with the model.

Basic Techniques

Few-Shot Prompting

Provide examples of input→output pairs so the model understands the pattern:

Chain-of-Thought Prompting

Ask the model to reason step-by-step:

System Prompts

The system prompt defines the model's behavior for an entire conversation:


Strategy 2: RAG (Retrieval-Augmented Generation)

RAG combines information retrieval with text generation. Instead of relying on what the model learned during training, you retrieve relevant documents from your own data and include them in the prompt.

How RAG Works

The RAG Pipeline Step by Step

Output:

The key insight: The LLM now has your specific documentation as context. It will answer based on your runbooks, not generic knowledge.

Multimodal RAG

Standard RAG works with text. Multimodal RAG extends this to images, tables, diagrams, and more.


Strategy 3: Fine-Tuning

Fine-tuning means further training a pre-trained model on your task-specific data. The model's weights are actually updated — it learns new patterns specific to your domain.

When Fine-Tuning Makes Sense

Fine-Tuning Conceptual Example

Fine-Tuning vs RAG: The Decision


Comparing All Three: Same Task, Three Approaches

Let's see how each strategy handles the same task: analyzing infrastructure health.

Prompt Engineering Approach

RAG Approach

Fine-Tuning Approach

Side-by-Side Comparison


Combining Strategies: The Practical Approach

In my projects, I rarely use just one strategy. Here's how I combine them:

The takeaway: Start with prompt engineering. Add RAG when the model needs your data. Consider fine-tuning only when the other two aren't enough. Most projects never need fine-tuning.


What's Next

Now that you understand how to customize AI models, we'll explore the most exciting frontier: AI Agents — systems that don't just generate text, but take actions. We'll cover agent architectures, communication protocols (MCP, A2A, gRPC), and human-in-the-loop patterns.


Next: Part 6 — AI Agents and Communication Protocols


← Part 4: LLMs and Generative AI · Series Overview · Next →

Last updated