Architecture and Components

Introduction

Working with containerized applications across multiple cloud providers taught me an important lesson: understanding Kubernetes architecture isn't just about memorizing component names—it's about grasping how these pieces work together to create a resilient, self-healing system. When I first encountered production issues where pods weren't scheduling correctly, or services weren't routing traffic as expected, I realized that surface-level knowledge wasn't enough. I needed to understand the control plane, the worker nodes, and how the reconciliation loop maintains desired state.

In this article, I'll share what I've learned about Kubernetes architecture from troubleshooting real cluster issues, scaling microservices workloads, and designing reliable container orchestration solutions. We'll explore how the control plane components coordinate to manage your cluster, how worker nodes execute workloads, and how the entire system maintains your application's desired state even when things fail.

Table of Contents

High-Level Architecture Overview

Kubernetes follows a master-worker architecture pattern, though the terminology has evolved to use "control plane" and "worker nodes" instead. The architecture is designed around a declarative model where you specify the desired state of your application, and Kubernetes continuously works to maintain that state.

Core Architectural Principles

Declarative Configuration: You describe what you want (desired state), not how to achieve it. Kubernetes controllers handle the implementation details.

Controller Pattern: Independent controllers watch for changes and work to reconcile current state with desired state. This creates a self-healing system.

API-Driven: Everything in Kubernetes is an API object. The API server is the central communication hub for all components.

Distributed System: Components are loosely coupled and communicate through the API server, making the system resilient to individual component failures.

Architecture Diagram

spinner

Control Plane Components

The control plane makes global decisions about the cluster (like scheduling) and detects and responds to cluster events. Control plane components can run on any machine in the cluster, but typically run on dedicated master nodes that don't execute user workloads.

API Server (kube-apiserver)

The API server is the front end for the Kubernetes control plane. It's the only component that directly interacts with etcd and serves as the central communication hub for all other components.

Key Responsibilities:

  • Validates and processes REST operations

  • Authenticates and authorizes requests

  • Provides the only interface to etcd

  • Serves as the communication hub for all components

  • Implements admission controllers for policy enforcement

How It Works:

When you run kubectl apply -f deployment.yaml, here's what happens:

  1. kubectl sends an HTTP POST request to the API server

  2. API server authenticates the request (using certificates, tokens, etc.)

  3. API server authorizes the request (RBAC checks)

  4. Admission controllers process the request (mutating then validating)

  5. API server validates the object schema

  6. API server writes to etcd

  7. API server returns the response to kubectl

API Server Watch Mechanism:

Components don't poll the API server; they establish watch connections. This is efficient because:

etcd

etcd is a distributed, consistent key-value store that serves as Kubernetes' backing store for all cluster data. It's the single source of truth for your cluster's state.

Key Responsibilities:

  • Store all cluster state data

  • Provide consistency guarantees (using Raft consensus)

  • Support watch operations for event notification

  • Handle leader election and distributed locking

Data Structure in etcd:

Everything in Kubernetes is stored in etcd under specific key prefixes:

etcd Cluster Configuration:

etcd Best Practices:

  1. Always run etcd in a cluster (3 or 5 nodes for production)

  2. Regular backups are critical

  3. Monitor etcd performance (disk I/O is crucial)

  4. Use dedicated disks (SSDs recommended)

  5. Secure communication with TLS

Backing up etcd:

Scheduler (kube-scheduler)

The scheduler watches for newly created pods with no assigned node and selects a node for them to run on based on resource requirements, constraints, and policies.

Key Responsibilities:

  • Watch for unscheduled pods

  • Find feasible nodes (filtering phase)

  • Score nodes to find the best fit (scoring phase)

  • Bind pods to nodes

Scheduling Process:

spinner

Filtering Predicates:

The scheduler applies predicates to filter nodes:

  1. PodFitsResources: Node has enough CPU/memory

  2. PodFitsHostPorts: Required ports are available

  3. MatchNodeSelector: Node matches pod's nodeSelector

  4. CheckNodeTaints: Pod tolerates node taints

  5. CheckVolumeBinding: Required volumes can be mounted

Scoring Functions:

After filtering, the scheduler scores remaining nodes:

  1. LeastRequestedPriority: Prefers nodes with fewer requested resources

  2. BalancedResourceAllocation: Balances CPU and memory usage

  3. SelectorSpreadPriority: Spreads pods across nodes

  4. NodeAffinityPriority: Prefers nodes matching affinity rules

Advanced Scheduling Example:

Custom Scheduler:

You can write custom schedulers for specific requirements:

Controller Manager (kube-controller-manager)

The controller manager runs multiple controllers as a single process. Each controller is a control loop that watches the shared state of the cluster through the API server and makes changes to move the current state toward the desired state.

Built-in Controllers:

  1. Node Controller: Monitors node health, marks nodes as NotReady

  2. Replication Controller: Maintains correct number of pods for ReplicaSets

  3. Endpoints Controller: Populates Endpoints objects (joins Services and Pods)

  4. Service Account Controller: Creates default ServiceAccounts for namespaces

  5. Namespace Controller: Deletes all resources when a namespace is deleted

  6. PersistentVolume Controller: Binds PVs to PVCs

  7. Job Controller: Creates pods for Jobs

  8. CronJob Controller: Creates Jobs on a schedule

  9. Deployment Controller: Manages ReplicaSets for Deployments

  10. StatefulSet Controller: Manages StatefulSets

Controller Pattern:

Controller Manager Configuration:

Cloud Controller Manager

The cloud controller manager runs controllers specific to cloud providers. It allows cloud vendors to integrate with Kubernetes without modifying core Kubernetes code.

Cloud-Specific Controllers:

  1. Node Controller: Checks cloud provider to determine if node has been deleted

  2. Route Controller: Sets up routes in cloud infrastructure

  3. Service Controller: Creates/updates/deletes cloud load balancers

  4. Volume Controller: Creates/attaches/mounts cloud volumes

AWS Cloud Controller Example:

Worker Node Components

Worker nodes run the containerized applications. Each node contains the services necessary to run pods and is managed by the control plane.

kubelet

The kubelet is the primary node agent that runs on each node. It ensures containers are running in a pod as specified.

Key Responsibilities:

  • Register node with API server

  • Watch for pod assignments to its node

  • Pull container images

  • Start and stop containers

  • Report pod and node status

  • Execute liveness and readiness probes

  • Mount volumes

How kubelet Works:

spinner

kubelet Configuration:

Static Pods:

kubelet can manage static pods directly without the API server:

The kubelet automatically creates and manages this pod. It appears in the API server but can only be deleted by removing the file.

kube-proxy

kube-proxy maintains network rules on nodes, implementing part of the Kubernetes Service concept. It enables the Service abstraction by maintaining network rules and performing connection forwarding.

Key Responsibilities:

  • Maintain network rules for Services

  • Implement Service load balancing

  • Handle iptables/ipvs rules

  • Enable pod-to-service communication

Proxy Modes:

1. iptables Mode (default):

2. IPVS Mode (more scalable):

kube-proxy DaemonSet:

Container Runtime

The container runtime is responsible for running containers. Kubernetes supports several runtimes through the Container Runtime Interface (CRI).

Supported Runtimes:

  • containerd (most common, CNCF project)

  • CRI-O (lightweight, OCI-focused)

  • Docker Engine (via cri-dockerd shim)

Container Runtime Interface:

spinner

containerd Configuration:

The Kubernetes Control Loop

The control loop is the core of Kubernetes' self-healing nature. Understanding this loop is essential to understanding how Kubernetes works.

Reconciliation Loop

spinner

Example: Deployment Controller Flow

Let's trace what happens when you create a Deployment:

Step-by-Step Process:

spinner

Continuous Reconciliation:

Controllers continuously reconcile:

  1. Deployment Controller ensures correct ReplicaSet exists

  2. ReplicaSet Controller ensures correct number of pods

  3. Node Controller monitors node health

  4. Endpoints Controller updates Service endpoints

If a pod dies:

spinner

Communication Patterns

Understanding how components communicate is crucial for troubleshooting.

Communication Flow

spinner

Key Communication Rules:

  1. Only the API server talks to etcd - All state changes go through the API

  2. Components use watches, not polling - Efficient event-driven architecture

  3. All communication is authenticated - Mutual TLS between components

  4. API server is the single source of truth - No direct component-to-component communication

Network Policies for Control Plane

High Availability Architecture

Production clusters require high availability for the control plane.

HA Control Plane

spinner

HA Considerations:

  1. API Server: All instances are active (load balanced)

  2. etcd: Cluster with quorum (3 or 5 nodes)

  3. Scheduler: One active, others on standby (leader election)

  4. Controller Manager: One active, others on standby (leader election)

Leader Election Configuration:

Stacked vs External etcd

Stacked etcd topology:

External etcd topology:

Architecture Best Practices

Control Plane

  1. Run at least 3 master nodes for production

  2. Use external etcd for large clusters (>100 nodes)

  3. Monitor etcd performance - it's the most critical component

  4. Regular etcd backups - automated and tested

  5. Separate master and worker nodes - don't schedule workloads on masters

  6. Resource reservations for control plane components

Worker Nodes

  1. Right-size nodes - balance between too many small nodes and few large nodes

  2. Use node pools for different workload types

  3. Configure resource reservations for system components

  4. Enable swap accounting for better resource management

  5. Monitor node resources and set up autoscaling

Networking

  1. Choose the right CNI for your use case (Calico, Cilium, Flannel)

  2. Plan IP address spaces carefully

  3. Implement Network Policies for security

  4. Use appropriate Service types for different scenarios

Security

  1. Enable RBAC and follow principle of least privilege

  2. Use Pod Security Standards (Baseline, Restricted)

  3. Encrypt etcd data at rest

  4. Rotate certificates regularly

  5. Enable audit logging

Common Architecture Issues

Issue 1: etcd Performance Degradation

Symptoms:

  • Slow API responses

  • Controller delays

  • Watch events delayed

Diagnosis:

Solutions:

  • Use SSDs for etcd

  • Defragment etcd database

  • Compact etcd history

  • Consider scaling etcd cluster

Issue 2: Scheduler Not Scheduling Pods

Symptoms:

  • Pods stuck in Pending state

  • No node assignment

Diagnosis:

Common Causes:

  • Insufficient resources

  • Node taints without tolerations

  • Node affinity not satisfied

  • Volume binding issues

Issue 3: Control Plane Communication Issues

Symptoms:

  • Components can't reach API server

  • Certificate errors

  • Authentication failures

Diagnosis:

Solutions:

Issue 4: Worker Node NotReady

Symptoms:

  • Nodes show NotReady status

  • Pods evicted from node

Diagnosis:

Common Causes:

  • kubelet crashed

  • Network plugin issues

  • Resource pressure (disk, memory)

  • Certificate problems

What I Learned

Understanding Kubernetes architecture transformed how I approach container orchestration challenges:

Start with the API Server: Everything in Kubernetes flows through the API server. When troubleshooting, check API server logs first, then work backward to other components.

etcd is Critical: The health of your cluster depends on etcd performance. I learned to monitor etcd metrics closely and ensure it runs on fast SSDs with dedicated resources.

Controllers are Independent: Each controller works independently, watching for its specific resources. This design makes Kubernetes resilient but also means you need to understand the reconciliation loop to debug issues effectively.

The Declarative Model Works: Specifying desired state and letting controllers reconcile it is more reliable than imperative commands. Trust the control loop—it will eventually converge to desired state.

Component Communication Matters: Understanding that only the API server talks to etcd, and all other components use watches (not polling), helps explain why certain operations are fast and others are slow.

High Availability Requires Planning: Don't wait until production to think about HA. Design for it from the start—etcd quorum, leader election, and load balancing all need careful consideration.

Security is Built-In: Kubernetes' architecture includes security by design—mutual TLS, RBAC, admission controllers. Use these features; don't work around them.

The architecture of Kubernetes reflects years of experience running distributed systems at scale. Understanding these components and how they interact gives you the foundation to build reliable, scalable applications on Kubernetes. In the next articles, we'll build on this architectural knowledge to explore practical implementation patterns and best practices.

Last updated