Part 1: What is Artificial Intelligence?
Part of the AI Fundamentals 101 Series
The Question That Tripped Me Up
A colleague once asked me: "You keep saying you're building AI systems β but is it actually AI?" And honestly, I paused. I had built a monitoring tool that used an LLM to analyze Kubernetes logs. An automation system that used ML to predict server failures. A chatbot that answered questions about my documentation.
Were any of those "real AI"?
The answer is yes β but understanding why requires us to go back to the beginning and define what AI actually means. And it turns out, the definition has changed dramatically over the decades.
What is Artificial Intelligence?
Artificial Intelligence (AI) is the field of computer science focused on creating systems that can perform tasks that typically require human intelligence β things like understanding language, recognizing patterns, making decisions, and learning from experience.
That's the textbook definition. Here's the practical one:
AI is any system that takes in data, processes it in a way that mimics some aspect of human cognition, and produces an output that would normally require a human to generate.
When I built a system that reads Prometheus alerts and writes a root cause analysis in natural language β that's AI. The system understands context (alert metrics + cluster state), reasons about what went wrong, and generates a human-readable explanation.
When I wrote a Python script that sends an email when CPU > 90% β that's not AI. It's a conditional check with no learning, reasoning, or adaptation.
The line between "AI" and "just code" comes down to: does the system learn, reason, or adapt based on data?
A Brief History of AI
Understanding where AI came from helps you understand why the field looks the way it does today.
1950sβ1960s: The Birth of AI (Symbolic AI)
AI started with a bold idea: encode human knowledge as rules and let computers reason with them.
These were called expert systems β hand-coded if-else trees written by domain experts. They worked for narrow problems but fell apart the moment you encountered a situation nobody had written a rule for.
I've actually hit this exact limitation myself. Before I learned ML, I tried writing rule-based alerting for my home server:
This covered maybe 30% of real scenarios. Every new pattern required a new rule. The symbolic AI approach doesn't scale.
1970sβ1980s: AI Winters
The hype outpaced reality. Governments and corporations invested billions, systems underdelivered, funding dried up. Two major "AI winters" followed in the 1970s and late 1980s.
The lesson: AI progress is not linear. Hype cycles are real.
1990sβ2000s: Machine Learning Takes Over
Instead of hand-coding rules, researchers asked: what if the computer could learn the rules from data?
Machine learning algorithms like decision trees, support vector machines, and random forests could learn patterns from labeled examples. You didn't need to be a domain expert to build a predictive model β you needed data.
The shift: From "program the rules" to "program the learning algorithm and let it find the rules."
2010s: Deep Learning Breakthrough
Neural networks had existed since the 1950s, but they needed three things to work at scale:
Massive datasets (ImageNet, Common Crawl, Wikipedia)
GPU compute (NVIDIA CUDA made parallel computation practical)
Better architectures (CNNs for images, RNNs for sequences, then transformers)
In 2012, AlexNet won the ImageNet challenge by a huge margin using a deep convolutional neural network. That moment is widely considered the start of the deep learning revolution.
2017: Transformers Change Everything
Google published "Attention Is All You Need" β the paper that introduced the transformer architecture. This architecture could process entire sequences of text in parallel (not word-by-word like RNNs), making it dramatically faster and more effective.
Every major AI system you use today β GPT, Claude, Gemini, LLaMA β is built on transformers.
2020s: Generative AI and the Agentic Era
GPT-3 (2020) showed that sufficiently large transformer models could generate coherent text, write code, and answer questions. ChatGPT (late 2022) put that capability in front of millions of users.
Now we're in the agentic AI era β systems that don't just generate text, but take actions: browse the web, execute code, call APIs, and orchestrate multi-step workflows.
The 7 Types of AI
AI is categorized in two different ways: by capability (what it can do) and by functionality (how it works internally).
By Capability
1. Artificial Narrow Intelligence (ANI) β Weak AI
Every AI system that exists today is ANI. It excels at one specific task but can't generalize to others.
Examples:
GPT-4 generates excellent text, but can't drive a car
AlphaGo beat the world champion at Go, but can't play chess without being retrained
My home server anomaly detector predicts CPU spikes, but knows nothing about network issues
2. Artificial General Intelligence (AGI) β Strong AI
A system that can perform any intellectual task a human can, across domains, with the ability to transfer knowledge between them. AGI does not exist yet. Despite the hype, no current system β including GPT-4, Claude, or Gemini β is AGI.
The key test: Can it learn a completely new domain (say, molecular biology) with the same efficiency a human can, without being specifically trained on biology data? Current models can't.
3. Artificial Super Intelligence (ASI)
A hypothetical system that surpasses human intelligence in every domain. This is purely theoretical and the subject of much philosophical debate. Not relevant to engineering today.
By Functionality
4. Reactive Machines
The simplest form. They respond to inputs with no memory of past interactions.
Example: IBM's Deep Blue chess engine. It evaluates the current board position and picks the best move. It doesn't remember previous games or learn from them.
5. Limited Memory
Most modern AI systems fall here. They use historical data during training and can maintain context within a session (like an LLM's context window), but they don't continuously learn from new interactions.
Example: An LLM that remembers our conversation within a session but starts fresh next time.
6. Theory of Mind
A system that understands emotions, beliefs, and intentions. Current AI can simulate this (an LLM can say "I understand you're frustrated") but doesn't truly comprehend emotional states. This type is still in research.
7. Self-Aware
A system with consciousness and self-awareness. This is purely theoretical and raises deep philosophical questions about what consciousness even means.
The Modern AI Landscape β A Map
Here's how the major AI concepts relate to each other:
This tree is what I wish someone had shown me when I started. Every AI tool, framework, and paper sits somewhere in this tree.
Key Terminology Every Engineer Needs
Before going deeper, let's nail down the terms you'll see throughout this series and every AI discussion.
Model
A mathematical function that maps inputs to outputs. When someone says "the model," they mean the trained artifact β the weights and architecture that represent learned patterns.
Training vs Inference
Training: The process of showing data to an algorithm so it learns patterns. Expensive, slow, done once (or periodically).
Inference: Using a trained model to make predictions on new data. Fast, cheap, done millions of times.
When you call the OpenAI API, you're doing inference. OpenAI already did the training.
Parameters and Weights
Parameters (or weights) are the numbers inside a model that get adjusted during training. When you hear "GPT-4 has ~1.8 trillion parameters," it means 1.8 trillion numbers that were tuned to minimize prediction errors.
For a linear regression, there are just 2 parameters. For GPT-4, there are trillions. The concept is the same β numbers that get adjusted to fit the data.
Hyperparameters
Settings you choose before training that control how the model learns. They're not learned from data β you set them.
Features and Labels
Features: The input variables (what you feed the model)
Labels: The output variable (what you want the model to predict)
Epoch, Batch, and Iteration
Epoch: One complete pass through the entire training dataset
Batch: A subset of the training data processed in one forward/backward pass
Iteration: One update of the model's weights (one batch processed)
Overfitting and Underfitting
Overfitting: The model memorized the training data but can't generalize to new data. Like studying only past exam answers and failing when questions change.
Underfitting: The model is too simple to capture the patterns. Like trying to fit a straight line to a curve.
AI vs Machine Learning β Getting the Relationship Right
This is one of the most common sources of confusion, so let me be precise:
AI is the broad field of making machines exhibit intelligent behavior
Machine Learning is a subset of AI β a specific approach that learns from data
Deep Learning is a subset of ML β neural networks with multiple layers
Generative AI is a subset of DL β models that generate new content
Not all AI is ML. A rule-based expert system is AI but not ML. A chess engine with hand-coded evaluation functions is AI but not ML.
But today, nearly all practical AI uses ML in some form. When people say "AI," they almost always mean ML or LLMs.
Why This Matters for Engineers
You might be thinking: "I just want to build things. Why do I need to know the history and taxonomy?"
Here's why: the taxonomy tells you what tool to reach for.
When I encounter a problem, the first question is: what type of AI problem is this?
"Predict if this server will crash"
Supervised ML (classification)
scikit-learn, XGBoost
"Group similar error logs together"
Unsupervised ML (clustering)
scikit-learn, HDBSCAN
"Summarize this incident report"
NLG (generative AI)
LLM (Claude, GPT)
"Extract entities from log messages"
NLP (NER)
spaCy, transformers
"Generate a Kubernetes manifest"
Code generation
LLM with structured output
"Auto-remediate infrastructure issues"
Agentic AI
Agent framework + tools
Knowing the landscape means you don't bring an LLM to a problem that scikit-learn solves in 5 lines. And you don't try to train a classifier when you need natural language generation.
Practical Example: Classifying the AI in My Own Projects
Let me classify the AI components in projects I've actually built:
The pattern: In every project, the AI component is surrounded by non-AI code β data pipelines, API endpoints, database queries, configuration. Understanding where AI starts and ends in your system is fundamental.
Common Misconceptions
"AI will replace all programmers"
AI is a tool. LLMs generate code, but they can't architect systems, debug production issues with full context, or make trade-off decisions based on business requirements. They're incredibly useful assistants, not replacements.
"More data always means better AI"
Quality matters more than quantity. I've gotten better results from 1,000 clean, well-labeled examples than from 100,000 noisy ones. Data cleaning is often 70% of an ML project.
"AI understands what it's doing"
Current AI systems are sophisticated pattern matchers, not conscious entities. When Claude writes a helpful explanation, it's generating text based on patterns in its training data β not "understanding" the concept the way you do.
"You need a PhD to work with AI"
You need to understand the fundamentals (which this series covers), but you don't need to derive backpropagation by hand. The tools β scikit-learn, transformers, LLM APIs β abstract away the math. Focus on knowing when and how to use each tool.
"AI is new"
The field is 70+ years old. What's new is that compute, data, and architectures have converged to make AI practically useful at scale. The ideas behind neural networks date back to 1943.
What's Next
Now that you understand what AI is, its history, and the major categories, we'll dive into the most impactful subset: Machine Learning, Deep Learning, and Foundation Models β the technologies that power everything from spam filters to ChatGPT.
Next: Part 2 β Machine Learning, Deep Learning, and Foundation Models
Last updated