# Part 1: What is Artificial Intelligence?

*Part of the* [*AI Fundamentals 101 Series*](https://blog.htunnthuthu.com/ai-and-machine-learning/artificial-intelligence/ai-fundamentals-101)

## The Question That Tripped Me Up

A colleague once asked me: "You keep saying you're building AI systems — but is it actually AI?" And honestly, I paused. I had built a monitoring tool that used an LLM to analyze Kubernetes logs. An automation system that used ML to predict server failures. A chatbot that answered questions about my documentation.

**Were any of those "real AI"?**

The answer is yes — but understanding *why* requires us to go back to the beginning and define what AI actually means. And it turns out, the definition has changed dramatically over the decades.

***

## What is Artificial Intelligence?

**Artificial Intelligence (AI)** is the field of computer science focused on creating systems that can perform tasks that typically require human intelligence — things like understanding language, recognizing patterns, making decisions, and learning from experience.

That's the textbook definition. Here's the practical one:

> **AI is any system that takes in data, processes it in a way that mimics some aspect of human cognition, and produces an output that would normally require a human to generate.**

When I built a system that reads Prometheus alerts and writes a root cause analysis in natural language — that's AI. The system understands context (alert metrics + cluster state), reasons about what went wrong, and generates a human-readable explanation.

When I wrote a Python script that sends an email when CPU > 90% — that's not AI. It's a conditional check with no learning, reasoning, or adaptation.

The line between "AI" and "just code" comes down to: **does the system learn, reason, or adapt based on data?**

***

## A Brief History of AI

Understanding where AI came from helps you understand why the field looks the way it does today.

### 1950s–1960s: The Birth of AI (Symbolic AI)

AI started with a bold idea: encode human knowledge as rules and let computers reason with them.

```
IF patient has fever AND cough AND fatigue
THEN diagnosis = flu (confidence: 0.8)
```

These were called **expert systems** — hand-coded if-else trees written by domain experts. They worked for narrow problems but fell apart the moment you encountered a situation nobody had written a rule for.

I've actually hit this exact limitation myself. Before I learned ML, I tried writing rule-based alerting for my home server:

```python
# My first attempt at "intelligent" monitoring (2019)
def diagnose_alert(cpu: float, memory: float, disk_io: float) -> str:
    if cpu > 90 and memory > 80:
        return "Resource exhaustion — consider scaling"
    elif cpu > 90 and disk_io > 70:
        return "IO-bound process — check database queries"
    elif memory > 90:
        return "Memory leak — check container limits"
    else:
        return "Investigate manually"
```

This covered maybe 30% of real scenarios. Every new pattern required a new rule. **The symbolic AI approach doesn't scale.**

### 1970s–1980s: AI Winters

The hype outpaced reality. Governments and corporations invested billions, systems underdelivered, funding dried up. Two major "AI winters" followed in the 1970s and late 1980s.

**The lesson:** AI progress is not linear. Hype cycles are real.

### 1990s–2000s: Machine Learning Takes Over

Instead of hand-coding rules, researchers asked: **what if the computer could learn the rules from data?**

Machine learning algorithms like decision trees, support vector machines, and random forests could learn patterns from labeled examples. You didn't need to be a domain expert to build a predictive model — you needed data.

```python
from sklearn.ensemble import RandomForestClassifier

# Instead of writing rules, let the algorithm learn from data
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)  # Learn patterns from historical data
predictions = clf.predict(X_test)  # Apply learned patterns to new data
```

**The shift:** From "program the rules" to "program the learning algorithm and let it find the rules."

### 2010s: Deep Learning Breakthrough

Neural networks had existed since the 1950s, but they needed three things to work at scale:

1. **Massive datasets** (ImageNet, Common Crawl, Wikipedia)
2. **GPU compute** (NVIDIA CUDA made parallel computation practical)
3. **Better architectures** (CNNs for images, RNNs for sequences, then transformers)

In 2012, AlexNet won the ImageNet challenge by a huge margin using a deep convolutional neural network. That moment is widely considered the start of the deep learning revolution.

### 2017: Transformers Change Everything

Google published "Attention Is All You Need" — the paper that introduced the **transformer architecture**. This architecture could process entire sequences of text in parallel (not word-by-word like RNNs), making it dramatically faster and more effective.

Every major AI system you use today — GPT, Claude, Gemini, LLaMA — is built on transformers.

### 2020s: Generative AI and the Agentic Era

GPT-3 (2020) showed that sufficiently large transformer models could generate coherent text, write code, and answer questions. ChatGPT (late 2022) put that capability in front of millions of users.

Now we're in the **agentic AI** era — systems that don't just generate text, but take actions: browse the web, execute code, call APIs, and orchestrate multi-step workflows.

```
Symbolic AI (1950s) → Machine Learning (1990s) → Deep Learning (2010s) 
→ Generative AI (2020s) → Agentic AI (2024+)
```

***

## The 7 Types of AI

AI is categorized in two different ways: **by capability** (what it can do) and **by functionality** (how it works internally).

### By Capability

#### 1. Artificial Narrow Intelligence (ANI) — Weak AI

**Every AI system that exists today is ANI.** It excels at one specific task but can't generalize to others.

Examples:

* GPT-4 generates excellent text, but can't drive a car
* AlphaGo beat the world champion at Go, but can't play chess without being retrained
* My home server anomaly detector predicts CPU spikes, but knows nothing about network issues

```python
# This is ANI — it does ONE thing well
from sklearn.linear_model import LogisticRegression

# Model trained specifically to detect spam
spam_detector = LogisticRegression()
spam_detector.fit(email_features, spam_labels)

# It can tell you if an email is spam.
# It cannot book your flights, write your code, or diagnose your server.
```

#### 2. Artificial General Intelligence (AGI) — Strong AI

A system that can perform **any** intellectual task a human can, across domains, with the ability to transfer knowledge between them. **AGI does not exist yet.** Despite the hype, no current system — including GPT-4, Claude, or Gemini — is AGI.

The key test: Can it learn a completely new domain (say, molecular biology) with the same efficiency a human can, without being specifically trained on biology data? Current models can't.

#### 3. Artificial Super Intelligence (ASI)

A hypothetical system that surpasses human intelligence in every domain. This is purely theoretical and the subject of much philosophical debate. Not relevant to engineering today.

### By Functionality

#### 4. Reactive Machines

The simplest form. They respond to inputs with no memory of past interactions.

**Example:** IBM's Deep Blue chess engine. It evaluates the current board position and picks the best move. It doesn't remember previous games or learn from them.

```python
# Reactive: no memory, just input → output
def reactive_classifier(features: list[float]) -> str:
    """Classify based on current input only."""
    model = load_pretrained_model()
    return model.predict([features])[0]

# Every call is independent — no history, no adaptation
```

#### 5. Limited Memory

Most modern AI systems fall here. They use historical data during training and can maintain context within a session (like an LLM's context window), but they don't continuously learn from new interactions.

**Example:** An LLM that remembers our conversation within a session but starts fresh next time.

```python
# Limited memory: maintains context within a session
class ConversationAgent:
    def __init__(self):
        self.history: list[dict] = []  # Short-term memory

    def chat(self, user_message: str) -> str:
        self.history.append({"role": "user", "content": user_message})
        # Model sees the full history for this session
        response = call_llm(messages=self.history)
        self.history.append({"role": "assistant", "content": response})
        return response

    # But when this object is destroyed, the memory is gone
```

#### 6. Theory of Mind

A system that understands emotions, beliefs, and intentions. Current AI can simulate this (an LLM can say "I understand you're frustrated") but doesn't truly comprehend emotional states. This type is still in research.

#### 7. Self-Aware

A system with consciousness and self-awareness. This is purely theoretical and raises deep philosophical questions about what consciousness even means.

***

## The Modern AI Landscape — A Map

Here's how the major AI concepts relate to each other:

```
Artificial Intelligence (the whole field)
├── Machine Learning (learning from data)
│   ├── Supervised Learning (labeled data → predictions)
│   ├── Unsupervised Learning (find patterns in unlabeled data)
│   ├── Reinforcement Learning (learn by trial and reward)
│   └── Deep Learning (neural networks with many layers)
│       ├── CNNs (images, spatial data)
│       ├── RNNs/LSTMs (sequences, time series)
│       └── Transformers (language, multimodal)
│           ├── Large Language Models (GPT, Claude, LLaMA)
│           └── Vision Transformers (image understanding)
├── Natural Language Processing (understanding human language)
│   ├── NLU — Natural Language Understanding
│   ├── NLG — Natural Language Generation
│   └── Machine Translation
├── Computer Vision (understanding images/video)
├── Robotics (physical world interaction)
└── Generative AI (creating new content)
    ├── Text generation (LLMs)
    ├── Image generation (DALL-E, Stable Diffusion)
    ├── Code generation (Copilot, Cursor)
    └── Multimodal (text + image + audio)
```

This tree is what I wish someone had shown me when I started. Every AI tool, framework, and paper sits somewhere in this tree.

***

## Key Terminology Every Engineer Needs

Before going deeper, let's nail down the terms you'll see throughout this series and every AI discussion.

### Model

A mathematical function that maps inputs to outputs. When someone says "the model," they mean the trained artifact — the weights and architecture that represent learned patterns.

```python
from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier()
model.fit(X_train, y_train)  # Train: learn patterns from data

# Now 'model' contains learned decision rules
# It IS the model — the trained artifact
prediction = model.predict(X_new)  # Inference: apply learned patterns
```

### Training vs Inference

* **Training:** The process of showing data to an algorithm so it learns patterns. Expensive, slow, done once (or periodically).
* **Inference:** Using a trained model to make predictions on new data. Fast, cheap, done millions of times.

```python
# Training (expensive — minutes to hours)
model.fit(X_train, y_train)

# Inference (cheap — milliseconds)
result = model.predict(new_data)
```

When you call the OpenAI API, you're doing inference. OpenAI already did the training.

### Parameters and Weights

**Parameters** (or **weights**) are the numbers inside a model that get adjusted during training. When you hear "GPT-4 has \~1.8 trillion parameters," it means 1.8 trillion numbers that were tuned to minimize prediction errors.

```python
from sklearn.linear_model import LinearRegression
import numpy as np

X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 6, 8, 10])

model = LinearRegression()
model.fit(X, y)

# These ARE the learned parameters
print(f"Weight (slope): {model.coef_[0]:.2f}")     # 2.00
print(f"Bias (intercept): {model.intercept_:.2f}")  # 0.00

# The model learned: y = 2x + 0
```

For a linear regression, there are just 2 parameters. For GPT-4, there are trillions. The concept is the same — numbers that get adjusted to fit the data.

### Hyperparameters

Settings you choose *before* training that control *how* the model learns. They're not learned from data — you set them.

```python
from sklearn.ensemble import RandomForestClassifier

# These are hyperparameters — I chose them
model = RandomForestClassifier(
    n_estimators=100,    # How many trees
    max_depth=10,        # How deep each tree can grow
    min_samples_split=5, # Minimum samples to split a node
    random_state=42
)

# These are learned parameters (set during .fit())
model.fit(X_train, y_train)
# model.estimators_ — the actual tree structures, learned from data
```

### Features and Labels

* **Features:** The input variables (what you feed the model)
* **Labels:** The output variable (what you want the model to predict)

```python
import pandas as pd

data = pd.DataFrame({
    "cpu_usage": [45, 92, 30, 88, 55],
    "memory_pct": [60, 85, 40, 90, 50],
    "disk_io": [20, 75, 15, 80, 30],
    "alert": [0, 1, 0, 1, 0]  # 0 = no alert, 1 = alert
})

features = data[["cpu_usage", "memory_pct", "disk_io"]]  # Input (X)
labels = data["alert"]                                     # Output (y)
```

### Epoch, Batch, and Iteration

* **Epoch:** One complete pass through the entire training dataset
* **Batch:** A subset of the training data processed in one forward/backward pass
* **Iteration:** One update of the model's weights (one batch processed)

```python
# If you have 1,000 training samples and batch_size=100:
# - 1 epoch = 10 iterations (1000 / 100)
# - Each iteration processes 100 samples
# - After 1 epoch, the model has seen ALL 1,000 samples once
```

### Overfitting and Underfitting

* **Overfitting:** The model memorized the training data but can't generalize to new data. Like studying only past exam answers and failing when questions change.
* **Underfitting:** The model is too simple to capture the patterns. Like trying to fit a straight line to a curve.

```python
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=500, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Overfitting: no constraints, memorizes training data
overfit_model = DecisionTreeClassifier(max_depth=None)
overfit_model.fit(X_train, y_train)
print(f"Train: {overfit_model.score(X_train, y_train):.3f}")  # ~1.000 (perfect!)
print(f"Test:  {overfit_model.score(X_test, y_test):.3f}")    # ~0.850 (much worse)

# Underfitting: too constrained, can't learn patterns
underfit_model = DecisionTreeClassifier(max_depth=1)
underfit_model.fit(X_train, y_train)
print(f"Train: {underfit_model.score(X_train, y_train):.3f}")  # ~0.750
print(f"Test:  {underfit_model.score(X_test, y_test):.3f}")    # ~0.730

# Good fit: balanced constraints
balanced_model = DecisionTreeClassifier(max_depth=5)
balanced_model.fit(X_train, y_train)
print(f"Train: {balanced_model.score(X_train, y_train):.3f}")  # ~0.950
print(f"Test:  {balanced_model.score(X_test, y_test):.3f}")    # ~0.910
```

***

## AI vs Machine Learning — Getting the Relationship Right

This is one of the most common sources of confusion, so let me be precise:

* **AI** is the broad field of making machines exhibit intelligent behavior
* **Machine Learning** is a subset of AI — a specific approach that learns from data
* **Deep Learning** is a subset of ML — neural networks with multiple layers
* **Generative AI** is a subset of DL — models that generate new content

```
AI ⊃ ML ⊃ Deep Learning ⊃ Generative AI
```

Not all AI is ML. A rule-based expert system is AI but not ML. A chess engine with hand-coded evaluation functions is AI but not ML.

But today, nearly all practical AI uses ML in some form. When people say "AI," they almost always mean ML or LLMs.

***

## Why This Matters for Engineers

You might be thinking: "I just want to build things. Why do I need to know the history and taxonomy?"

Here's why: **the taxonomy tells you what tool to reach for.**

When I encounter a problem, the first question is: what *type* of AI problem is this?

| Problem                                | Type                           | Tool                       |
| -------------------------------------- | ------------------------------ | -------------------------- |
| "Predict if this server will crash"    | Supervised ML (classification) | scikit-learn, XGBoost      |
| "Group similar error logs together"    | Unsupervised ML (clustering)   | scikit-learn, HDBSCAN      |
| "Summarize this incident report"       | NLG (generative AI)            | LLM (Claude, GPT)          |
| "Extract entities from log messages"   | NLP (NER)                      | spaCy, transformers        |
| "Generate a Kubernetes manifest"       | Code generation                | LLM with structured output |
| "Auto-remediate infrastructure issues" | Agentic AI                     | Agent framework + tools    |

Knowing the landscape means you don't bring an LLM to a problem that scikit-learn solves in 5 lines. And you don't try to train a classifier when you need natural language generation.

***

## Practical Example: Classifying the AI in My Own Projects

Let me classify the AI components in projects I've actually built:

```python
my_projects = {
    "Home Server Anomaly Detector": {
        "ai_type": "Supervised ML (classification)",
        "model": "RandomForestClassifier",
        "what_it_does": "Predicts CPU spikes from historical metrics",
        "not_ai": "The Prometheus alerting rules — those are just thresholds",
    },
    "DevOps Knowledge Base RAG": {
        "ai_type": "Generative AI + Information Retrieval",
        "model": "Embeddings (sentence-transformers) + LLM (Claude)",
        "what_it_does": "Answers questions about my documentation using retrieved context",
        "not_ai": "The PostgreSQL queries — those are just database operations",
    },
    "Kubernetes Log Analyzer": {
        "ai_type": "NLP + NLG",
        "model": "LLM (Claude via API)",
        "what_it_does": "Reads error logs, identifies root cause, writes human-readable report",
        "not_ai": "The log collection pipeline — that's just Fluentd config",
    },
    "Ansible Automation with TinyLlama": {
        "ai_type": "Agentic AI (simple)",
        "model": "TinyLlama (local) + tool execution",
        "what_it_does": "Interprets natural language commands and generates playbooks",
        "not_ai": "The Ansible execution itself — the LLM generates it, Ansible runs it",
    }
}

for name, details in my_projects.items():
    print(f"\n{'='*50}")
    print(f"Project: {name}")
    print(f"  AI Type: {details['ai_type']}")
    print(f"  Model: {details['model']}")
    print(f"  AI Part: {details['what_it_does']}")
    print(f"  Not AI: {details['not_ai']}")
```

**The pattern:** In every project, the AI component is surrounded by non-AI code — data pipelines, API endpoints, database queries, configuration. Understanding where AI starts and ends in your system is fundamental.

***

## Common Misconceptions

### "AI will replace all programmers"

AI is a tool. LLMs generate code, but they can't architect systems, debug production issues with full context, or make trade-off decisions based on business requirements. They're incredibly useful assistants, not replacements.

### "More data always means better AI"

Quality matters more than quantity. I've gotten better results from 1,000 clean, well-labeled examples than from 100,000 noisy ones. Data cleaning is often 70% of an ML project.

### "AI understands what it's doing"

Current AI systems are sophisticated pattern matchers, not conscious entities. When Claude writes a helpful explanation, it's generating text based on patterns in its training data — not "understanding" the concept the way you do.

### "You need a PhD to work with AI"

You need to understand the fundamentals (which this series covers), but you don't need to derive backpropagation by hand. The tools — scikit-learn, transformers, LLM APIs — abstract away the math. Focus on knowing when and how to use each tool.

### "AI is new"

The field is 70+ years old. What's new is that compute, data, and architectures have converged to make AI practically useful at scale. The ideas behind neural networks date back to 1943.

***

## What's Next

Now that you understand what AI is, its history, and the major categories, we'll dive into the most impactful subset: **Machine Learning, Deep Learning, and Foundation Models** — the technologies that power everything from spam filters to ChatGPT.

***

*Next:* [*Part 2 — Machine Learning, Deep Learning, and Foundation Models*](https://blog.htunnthuthu.com/ai-and-machine-learning/artificial-intelligence/ai-fundamentals-101/part-2-ml-dl-foundation-models)

***

[← Back to Series Overview](https://blog.htunnthuthu.com/ai-and-machine-learning/artificial-intelligence/ai-fundamentals-101) · [Next →](https://blog.htunnthuthu.com/ai-and-machine-learning/artificial-intelligence/ai-fundamentals-101/part-2-ml-dl-foundation-models)