Part 2: Machine Learning, Deep Learning, and Foundation Models

Part of the AI Fundamentals 101 Series

Why These Three Matter

In Part 1, we mapped the AI landscape. Now we're going to zoom into the three layers that power almost every AI system you'll encounter as an engineer:

  1. Machine Learning — algorithms that learn from data

  2. Deep Learning — neural networks with many layers

  3. Foundation Models — massive pre-trained models that changed the industry

Understanding the differences between these isn't academic — it determines which tool you reach for when solving a real problem. I've wasted weeks using an LLM for a task that scikit-learn could handle in 10 lines. And I've wasted days trying to train a classifier when I should have just prompted Claude. The distinction matters.


Machine Learning: Learning Rules from Data

Machine learning is the approach where instead of programming explicit rules, you give an algorithm data and let it learn the patterns.

The Three Types of Machine Learning

1. Supervised Learning — "Learn from Labeled Examples"

You provide input-output pairs. The algorithm learns the mapping.

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

# Load a dataset with labels
iris = load_iris()
X, y = iris.data, iris.target

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train: the model learns the relationship between features and labels
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Predict on data the model has never seen
predictions = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, predictions):.2%}")
# Accuracy: 100.00%

Real use cases I've encountered:

  • Predicting whether a server alert is a real issue or noise (classification)

  • Estimating build times for CI/CD pipelines based on code change size (regression)

  • Detecting anomalous network traffic patterns (classification)

2. Unsupervised Learning — "Find Structure in Unlabeled Data"

No labels. The algorithm discovers patterns on its own.

Real use cases:

  • Grouping similar error log messages together (clustering)

  • Reducing high-dimensional monitoring data for visualization (dimensionality reduction)

  • Detecting unusual patterns in server behavior (anomaly detection)

3. Reinforcement Learning — "Learn by Trial and Reward"

The agent takes actions in an environment and receives rewards or penalties. It learns the best strategy over time.

Real use cases:

  • Game AI (AlphaGo, game bots)

  • Robotics (learning to walk, grasp objects)

  • Resource optimization (dynamic scaling, cache policies)

When to Use Which Type

Type
You Need
You Get
Example

Supervised

Labeled data (input → output)

Predictions on new data

"Is this server going to crash?"

Unsupervised

Unlabeled data

Discovered patterns

"What groups do these servers fall into?"

Reinforcement

An environment + reward signal

An optimal strategy

"What's the best scaling policy?"


Deep Learning: Neural Networks with Many Layers

Deep learning is a subset of machine learning that uses neural networks with multiple layers to learn increasingly abstract representations of data.

What Makes It "Deep"?

A shallow model (like logistic regression) learns a single transformation: input → output. A deep model learns a hierarchy of transformations:

For example, in image recognition:

  • Layer 1 detects edges

  • Layer 2 combines edges into shapes

  • Layer 3 combines shapes into parts (eyes, wheels)

  • Layer 4 combines parts into objects (face, car)

Each layer builds on the previous one, learning more complex and abstract features.

A Simple Neural Network in Python

The deep model captures the non-linear decision boundary that the shallow model misses.

Key Deep Learning Architectures

Architecture
Best For
How It Works

CNN (Convolutional Neural Network)

Images, spatial data

Slides filters across input to detect patterns

RNN (Recurrent Neural Network)

Sequences, time series

Processes input one step at a time, maintaining hidden state

LSTM (Long Short-Term Memory)

Long sequences

RNN with memory gates that can remember/forget over long distances

Transformer

Language, multimodal

Processes entire sequences in parallel using attention mechanism

GAN (Generative Adversarial Network)

Image generation

Two networks compete: generator creates, discriminator judges

Autoencoder

Compression, anomaly detection

Learns to compress and reconstruct data

When Deep Learning Beats Traditional ML

The rule of thumb:

  • < 10,000 samples, < 50 features? Start with traditional ML (Random Forest, XGBoost)

  • Images, text, audio, or massive tabular data? Deep learning

  • Need interpretability? Traditional ML (decision trees, logistic regression)

  • Need maximum accuracy and have compute? Deep learning


Foundation Models: The Paradigm Shift

Foundation models are the biggest shift in AI engineering in the last decade. Understanding them is essential.

What is a Foundation Model?

A foundation model is a large model trained on broad data that can be adapted to many downstream tasks. The "foundation" metaphor is deliberate — it's a base you build on.

Examples of foundation models:

  • GPT-4, Claude, LLaMA — language understanding and generation

  • DALL-E, Stable Diffusion — image generation

  • Whisper — speech recognition

  • CLIP — connecting images and text

  • Codex/Copilot — code generation

What Changed

Before foundation models, every task required its own model trained on task-specific data:

With foundation models, one model handles many tasks through different prompts:

Why Foundation Models Work

Three ingredients came together:

  1. Scale — GPT-3 was trained on ~300 billion tokens of internet text. The model saw enough language to learn grammar, facts, reasoning patterns, and coding conventions.

  2. Self-supervised learning — The model learns by predicting the next word in a sequence. No human labels needed for training data.

  1. Transfer learning — Knowledge learned in one context transfers to another. A model that learned to summarize news articles can also summarize error logs — the underlying skill (compression, key point extraction) is the same.

The Foundation Model Stack

As an engineer, you typically work in the top two layers. You don't train foundation models — you use and adapt them.


Comparing All Three: ML vs DL vs Foundation Models

Decision Framework

Factor
Traditional ML
Deep Learning
Foundation Model

Training data

Hundreds–thousands

Thousands–millions

Zero (few-shot)

Training time

Minutes

Hours–days

None (pre-trained)

Inference cost

Near-zero

Low

$0.001–0.10+ per call

Interpretability

High

Low

Very low

Customization

Full control

Full control

Prompt/fine-tune only

Accuracy (small data)

Good

Poor

Good (via in-context learning)

Accuracy (large data)

Good

Excellent

Excellent

Handles new patterns

No (needs retraining)

No (needs retraining)

Yes (generalization)

My personal rule:

  1. Can I solve this with 10 lines of scikit-learn? → Traditional ML

  2. Do I have images, audio, or need to learn complex patterns from massive data? → Deep Learning

  3. Do I need language understanding, generation, or zero-shot capability? → Foundation Model

  4. Am I unsure? → Start with traditional ML, upgrade if it's not enough


Ten Real-World ML Use Cases You Encounter Daily

Understanding these helps you recognize AI in the wild:


From My Own Experience: Choosing the Right Level

When I started my home server monitoring project, I went through all three levels:

Attempt 1 — Traditional ML: I collected CPU, memory, and disk metrics over two weeks, labeled them (alert/no-alert), and trained a Random Forest. It worked well — 91% accuracy — but it couldn't handle new types of issues it hadn't seen in training.

Attempt 2 — Deep Learning: I tried an LSTM to capture time-series patterns. It improved accuracy to 94% but required significantly more data and compute for marginal gains on my small dataset. Overkill for my use case.

Attempt 3 — Foundation Model (current): I added an LLM layer that takes the prediction from my Random Forest and the raw metrics, then generates a human-readable diagnosis. The ML model handles the fast, cheap detection. The LLM handles the nuanced explanation.

The sweet spot is often a hybrid: traditional ML for the fast, repeatable classification plus a foundation model for the parts that need language understanding.


What's Next

Now that you understand the three layers of modern AI (ML → DL → Foundation Models), we'll explore one of the most practically useful branches: Natural Language Processing — the technology that lets machines understand, process, and generate human language.


Next: Part 3 — Natural Language Processing: NLP, NLU, and NLG


← Part 1: What is AI? · Series Overview · Next →

Last updated