Architecture and Components
Introduction
Working with containerized applications across multiple cloud providers taught me an important lesson: understanding Kubernetes architecture isn't just about memorizing component names—it's about grasping how these pieces work together to create a resilient, self-healing system. When I first encountered production issues where pods weren't scheduling correctly, or services weren't routing traffic as expected, I realized that surface-level knowledge wasn't enough. I needed to understand the control plane, the worker nodes, and how the reconciliation loop maintains desired state.
In this article, I'll share what I've learned about Kubernetes architecture from troubleshooting real cluster issues, scaling microservices workloads, and designing reliable container orchestration solutions. We'll explore how the control plane components coordinate to manage your cluster, how worker nodes execute workloads, and how the entire system maintains your application's desired state even when things fail.
Table of Contents
High-Level Architecture Overview
Kubernetes follows a master-worker architecture pattern, though the terminology has evolved to use "control plane" and "worker nodes" instead. The architecture is designed around a declarative model where you specify the desired state of your application, and Kubernetes continuously works to maintain that state.
Core Architectural Principles
Declarative Configuration: You describe what you want (desired state), not how to achieve it. Kubernetes controllers handle the implementation details.
Controller Pattern: Independent controllers watch for changes and work to reconcile current state with desired state. This creates a self-healing system.
API-Driven: Everything in Kubernetes is an API object. The API server is the central communication hub for all components.
Distributed System: Components are loosely coupled and communicate through the API server, making the system resilient to individual component failures.
Architecture Diagram
Control Plane Components
The control plane makes global decisions about the cluster (like scheduling) and detects and responds to cluster events. Control plane components can run on any machine in the cluster, but typically run on dedicated master nodes that don't execute user workloads.
API Server (kube-apiserver)
The API server is the front end for the Kubernetes control plane. It's the only component that directly interacts with etcd and serves as the central communication hub for all other components.
Key Responsibilities:
Validates and processes REST operations
Authenticates and authorizes requests
Provides the only interface to etcd
Serves as the communication hub for all components
Implements admission controllers for policy enforcement
How It Works:
When you run kubectl apply -f deployment.yaml, here's what happens:
kubectl sends an HTTP POST request to the API server
API server authenticates the request (using certificates, tokens, etc.)
API server authorizes the request (RBAC checks)
Admission controllers process the request (mutating then validating)
API server validates the object schema
API server writes to etcd
API server returns the response to kubectl
API Server Watch Mechanism:
Components don't poll the API server; they establish watch connections. This is efficient because:
etcd
etcd is a distributed, consistent key-value store that serves as Kubernetes' backing store for all cluster data. It's the single source of truth for your cluster's state.
Key Responsibilities:
Store all cluster state data
Provide consistency guarantees (using Raft consensus)
Support watch operations for event notification
Handle leader election and distributed locking
Data Structure in etcd:
Everything in Kubernetes is stored in etcd under specific key prefixes:
etcd Cluster Configuration:
etcd Best Practices:
Always run etcd in a cluster (3 or 5 nodes for production)
Regular backups are critical
Monitor etcd performance (disk I/O is crucial)
Use dedicated disks (SSDs recommended)
Secure communication with TLS
Backing up etcd:
Scheduler (kube-scheduler)
The scheduler watches for newly created pods with no assigned node and selects a node for them to run on based on resource requirements, constraints, and policies.
Key Responsibilities:
Watch for unscheduled pods
Find feasible nodes (filtering phase)
Score nodes to find the best fit (scoring phase)
Bind pods to nodes
Scheduling Process:
Filtering Predicates:
The scheduler applies predicates to filter nodes:
PodFitsResources: Node has enough CPU/memory
PodFitsHostPorts: Required ports are available
MatchNodeSelector: Node matches pod's nodeSelector
CheckNodeTaints: Pod tolerates node taints
CheckVolumeBinding: Required volumes can be mounted
Scoring Functions:
After filtering, the scheduler scores remaining nodes:
LeastRequestedPriority: Prefers nodes with fewer requested resources
BalancedResourceAllocation: Balances CPU and memory usage
SelectorSpreadPriority: Spreads pods across nodes
NodeAffinityPriority: Prefers nodes matching affinity rules
Advanced Scheduling Example:
Custom Scheduler:
You can write custom schedulers for specific requirements:
Controller Manager (kube-controller-manager)
The controller manager runs multiple controllers as a single process. Each controller is a control loop that watches the shared state of the cluster through the API server and makes changes to move the current state toward the desired state.
Built-in Controllers:
Node Controller: Monitors node health, marks nodes as NotReady
Replication Controller: Maintains correct number of pods for ReplicaSets
Endpoints Controller: Populates Endpoints objects (joins Services and Pods)
Service Account Controller: Creates default ServiceAccounts for namespaces
Namespace Controller: Deletes all resources when a namespace is deleted
PersistentVolume Controller: Binds PVs to PVCs
Job Controller: Creates pods for Jobs
CronJob Controller: Creates Jobs on a schedule
Deployment Controller: Manages ReplicaSets for Deployments
StatefulSet Controller: Manages StatefulSets
Controller Pattern:
Controller Manager Configuration:
Cloud Controller Manager
The cloud controller manager runs controllers specific to cloud providers. It allows cloud vendors to integrate with Kubernetes without modifying core Kubernetes code.
Cloud-Specific Controllers:
Node Controller: Checks cloud provider to determine if node has been deleted
Route Controller: Sets up routes in cloud infrastructure
Service Controller: Creates/updates/deletes cloud load balancers
Volume Controller: Creates/attaches/mounts cloud volumes
AWS Cloud Controller Example:
Worker Node Components
Worker nodes run the containerized applications. Each node contains the services necessary to run pods and is managed by the control plane.
kubelet
The kubelet is the primary node agent that runs on each node. It ensures containers are running in a pod as specified.
Key Responsibilities:
Register node with API server
Watch for pod assignments to its node
Pull container images
Start and stop containers
Report pod and node status
Execute liveness and readiness probes
Mount volumes
How kubelet Works:
kubelet Configuration:
Static Pods:
kubelet can manage static pods directly without the API server:
The kubelet automatically creates and manages this pod. It appears in the API server but can only be deleted by removing the file.
kube-proxy
kube-proxy maintains network rules on nodes, implementing part of the Kubernetes Service concept. It enables the Service abstraction by maintaining network rules and performing connection forwarding.
Key Responsibilities:
Maintain network rules for Services
Implement Service load balancing
Handle iptables/ipvs rules
Enable pod-to-service communication
Proxy Modes:
1. iptables Mode (default):
2. IPVS Mode (more scalable):
kube-proxy DaemonSet:
Container Runtime
The container runtime is responsible for running containers. Kubernetes supports several runtimes through the Container Runtime Interface (CRI).
Supported Runtimes:
containerd (most common, CNCF project)
CRI-O (lightweight, OCI-focused)
Docker Engine (via cri-dockerd shim)
Container Runtime Interface:
containerd Configuration:
The Kubernetes Control Loop
The control loop is the core of Kubernetes' self-healing nature. Understanding this loop is essential to understanding how Kubernetes works.
Reconciliation Loop
Example: Deployment Controller Flow
Let's trace what happens when you create a Deployment:
Step-by-Step Process:
Continuous Reconciliation:
Controllers continuously reconcile:
Deployment Controller ensures correct ReplicaSet exists
ReplicaSet Controller ensures correct number of pods
Node Controller monitors node health
Endpoints Controller updates Service endpoints
If a pod dies:
Communication Patterns
Understanding how components communicate is crucial for troubleshooting.
Communication Flow
Key Communication Rules:
Only the API server talks to etcd - All state changes go through the API
Components use watches, not polling - Efficient event-driven architecture
All communication is authenticated - Mutual TLS between components
API server is the single source of truth - No direct component-to-component communication
Network Policies for Control Plane
High Availability Architecture
Production clusters require high availability for the control plane.
HA Control Plane
HA Considerations:
API Server: All instances are active (load balanced)
etcd: Cluster with quorum (3 or 5 nodes)
Scheduler: One active, others on standby (leader election)
Controller Manager: One active, others on standby (leader election)
Leader Election Configuration:
Stacked vs External etcd
Stacked etcd topology:
External etcd topology:
Architecture Best Practices
Control Plane
Run at least 3 master nodes for production
Use external etcd for large clusters (>100 nodes)
Monitor etcd performance - it's the most critical component
Regular etcd backups - automated and tested
Separate master and worker nodes - don't schedule workloads on masters
Resource reservations for control plane components
Worker Nodes
Right-size nodes - balance between too many small nodes and few large nodes
Use node pools for different workload types
Configure resource reservations for system components
Enable swap accounting for better resource management
Monitor node resources and set up autoscaling
Networking
Choose the right CNI for your use case (Calico, Cilium, Flannel)
Plan IP address spaces carefully
Implement Network Policies for security
Use appropriate Service types for different scenarios
Security
Enable RBAC and follow principle of least privilege
Use Pod Security Standards (Baseline, Restricted)
Encrypt etcd data at rest
Rotate certificates regularly
Enable audit logging
Common Architecture Issues
Issue 1: etcd Performance Degradation
Symptoms:
Slow API responses
Controller delays
Watch events delayed
Diagnosis:
Solutions:
Use SSDs for etcd
Defragment etcd database
Compact etcd history
Consider scaling etcd cluster
Issue 2: Scheduler Not Scheduling Pods
Symptoms:
Pods stuck in Pending state
No node assignment
Diagnosis:
Common Causes:
Insufficient resources
Node taints without tolerations
Node affinity not satisfied
Volume binding issues
Issue 3: Control Plane Communication Issues
Symptoms:
Components can't reach API server
Certificate errors
Authentication failures
Diagnosis:
Solutions:
Issue 4: Worker Node NotReady
Symptoms:
Nodes show NotReady status
Pods evicted from node
Diagnosis:
Common Causes:
kubelet crashed
Network plugin issues
Resource pressure (disk, memory)
Certificate problems
What I Learned
Understanding Kubernetes architecture transformed how I approach container orchestration challenges:
Start with the API Server: Everything in Kubernetes flows through the API server. When troubleshooting, check API server logs first, then work backward to other components.
etcd is Critical: The health of your cluster depends on etcd performance. I learned to monitor etcd metrics closely and ensure it runs on fast SSDs with dedicated resources.
Controllers are Independent: Each controller works independently, watching for its specific resources. This design makes Kubernetes resilient but also means you need to understand the reconciliation loop to debug issues effectively.
The Declarative Model Works: Specifying desired state and letting controllers reconcile it is more reliable than imperative commands. Trust the control loop—it will eventually converge to desired state.
Component Communication Matters: Understanding that only the API server talks to etcd, and all other components use watches (not polling), helps explain why certain operations are fast and others are slow.
High Availability Requires Planning: Don't wait until production to think about HA. Design for it from the start—etcd quorum, leader election, and load balancing all need careful consideration.
Security is Built-In: Kubernetes' architecture includes security by design—mutual TLS, RBAC, admission controllers. Use these features; don't work around them.
The architecture of Kubernetes reflects years of experience running distributed systems at scale. Understanding these components and how they interact gives you the foundation to build reliable, scalable applications on Kubernetes. In the next articles, we'll build on this architectural knowledge to explore practical implementation patterns and best practices.
Last updated