# Kubernetes Operators and CRDs

> **CNPA Domain:** Platform APIs and Provisioning Infrastructure (12%) **Topics:** Kubernetes Reconciliation Loop, APIs for Self-Service Platforms (CRDs), Kubernetes Operator Pattern for Integration, Infrastructure Provisioning with Kubernetes

## Overview

Kubernetes operators and Custom Resource Definitions (CRDs) are how platform teams extend Kubernetes with domain-specific automation. Instead of writing runbooks and manual procedures, operators encode operational knowledge into code — continuously reconciling complex systems toward their desired state.

***

## The Kubernetes Reconciliation Loop

Every Kubernetes controller operates on the same fundamental loop:

```
┌──────────────────────────────────────────────────┐
│              Reconciliation Loop                 │
│                                                  │
│  1. Watch API server for resource changes        │
│  2. Read current state from cluster              │
│  3. Compute diff: desired state vs actual state  │
│  4. Take action to close the gap                 │
│  5. Update resource status                       │
│  6. Back to step 1 (with backoff on errors)      │
└──────────────────────────────────────────────────┘
```

This is how built-in controllers work (Deployment controller, Service controller) and it's exactly how custom operators work too.

```go
// Simplified reconciler structure (controller-runtime)
func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // 1. Fetch the desired state (your CRD instance)
    var db myplatformv1.Database
    if err := r.Get(ctx, req.NamespacedName, &db); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // 2. Check actual state (does the DB exist in AWS RDS?)
    exists, err := r.RDSClient.DatabaseExists(db.Spec.Name)
    if err != nil {
        return ctrl.Result{}, err
    }

    // 3. Reconcile
    if !exists {
        if err := r.RDSClient.CreateDatabase(db.Spec); err != nil {
            return ctrl.Result{}, err
        }
    }

    // 4. Update status
    db.Status.Endpoint = r.RDSClient.GetEndpoint(db.Spec.Name)
    db.Status.Ready = true
    r.Status().Update(ctx, &db)

    return ctrl.Result{RequeueAfter: 5 * time.Minute}, nil
}
```

***

## Custom Resource Definitions (CRDs)

CRDs extend the Kubernetes API with your own resource types. They enable **platform APIs** — developers interact with Kubernetes objects that represent high-level platform concepts.

```yaml
# Without CRDs: developers must know all the details
# - RDS instance class, VPC, subnet group, parameter group...

# With CRDs: developers declare what they need
apiVersion: platform.example.com/v1
kind: Database
metadata:
  name: payment-db
  namespace: payments-production
spec:
  engine: postgres
  version: "15"
  size: medium          # Platform abstracts instance types
  storage: 100Gi
  backups:
    enabled: true
    retentionDays: 30
```

The platform operator translates this into the actual cloud provider calls.

### Defining a CRD

```yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.platform.example.com
spec:
  group: platform.example.com
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              required: [engine, size]
              properties:
                engine:
                  type: string
                  enum: [postgres, mysql, redis]
                size:
                  type: string
                  enum: [small, medium, large]
                storage:
                  type: string
            status:
              type: object
              properties:
                ready:
                  type: boolean
                endpoint:
                  type: string
      subresources:
        status: {}
  scope: Namespaced
  names:
    plural: databases
    singular: database
    kind: Database
```

***

## The Operator Pattern

An **operator** = CRD + controller that automates operational tasks for a specific domain.

### What Problem Do Operators Solve?

| Manual Operations                  | With an Operator                                      |
| ---------------------------------- | ----------------------------------------------------- |
| Click through console to create DB | `kubectl apply -f database.yaml`                      |
| Write runbook for failover         | Operator detects failure, runs failover automatically |
| Manual backup scheduling           | Operator manages backup CronJobs                      |
| Complex upgrade procedures         | `kubectl edit database payment-db` → `version: 16`    |

### Operator Maturity Levels (OperatorHub Model)

```
Level 1: Basic Install
  → Automated application installation
Level 2: Seamless Upgrades
  → Upgrade without data loss
Level 3: Full Lifecycle
  → Backup, restore, failure recovery
Level 4: Deep Insights
  → Metrics, alerts, log processing
Level 5: Auto Pilot
  → Horizontal scaling, auto-tuning, anomaly detection
```

***

## Crossplane: Infrastructure Provisioning via CRDs

[Crossplane](https://docs.crossplane.io/) takes the operator pattern to cloud infrastructure. It allows platform teams to expose cloud resources (AWS RDS, GCP Cloud SQL, Azure Service Bus) as Kubernetes CRDs.

### Crossplane Architecture

```
Developer creates:
  kubectl apply -f database-claim.yaml

Crossplane interprets:
  Claim → Composite Resource → Managed Resources → Cloud API calls

Actual resources created:
  AWS: RDS instance + subnet group + parameter group + security group
```

### Composite Resource Definition (XRD)

```yaml
# Platform team defines the platform API
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
  name: xpostgresdatabases.platform.example.com
spec:
  group: platform.example.com
  names:
    kind: XPostgresDatabase
    plural: xpostgresdatabases
  claimNames:
    kind: PostgresDatabase     # ← What developers use
    plural: postgresdatabases
  versions:
    - name: v1alpha1
      served: true
      referenceable: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                size:
                  type: string
                  enum: [small, medium, large]
```

```yaml
# Developer creates a claim (no cloud details needed)
apiVersion: platform.example.com/v1alpha1
kind: PostgresDatabase
metadata:
  name: payments-db
  namespace: payments-production
spec:
  size: medium
```

***

## Building Operators: Tooling

| Tool                   | Language          | Notes                                         |
| ---------------------- | ----------------- | --------------------------------------------- |
| **Operator SDK**       | Go, Ansible, Helm | CNCF project, batteries-included              |
| **Kubebuilder**        | Go                | Lower-level, official sig-api-machinery tools |
| **controller-runtime** | Go                | Library used by both above                    |
| **kopf**               | Python            | Kubernetes Operator Pythonic Framework        |
| **KUDO**               | YAML              | Declarative operator framework                |

### Minimum Operator Project (Kubebuilder)

```bash
# Scaffold a new operator
kubebuilder init --domain platform.example.com --repo github.com/example/platform-operator

# Add a new API/CRD
kubebuilder create api --group platform --version v1 --kind Database

# Generate manifests and code
make manifests generate

# Run locally (against current kubeconfig)
make run
```

***

## CRDs for Self-Service Platform APIs

CRDs are the foundation of **self-service platform APIs** — developers use `kubectl` (or Backstage/a portal) to request platform capabilities:

```yaml
# Self-service: request a message queue
apiVersion: platform.example.com/v1
kind: MessageQueue
metadata:
  name: payment-events
spec:
  type: kafka
  partitions: 12
  replication: 3
  retention: 7d

---
# Self-service: request environment provisioning
apiVersion: platform.example.com/v1
kind: Environment
metadata:
  name: feature-new-checkout
spec:
  team: payments
  template: nodejs-service
  ttl: 7d
  services:
    - name: checkout-service
      image: checkout-service:pr-103
```

***

## Key Takeaways

* The **reconciliation loop** is the core Kubernetes pattern: watch → diff → act → repeat
* **CRDs** extend the Kubernetes API with domain-specific resource types, enabling platform self-service APIs
* **Operators** encode operational knowledge (install, upgrade, backup, failover) into running code
* **Crossplane** uses the operator pattern to provision cloud infrastructure via declarative Kubernetes manifests
* Developer-facing **Composite Resources** (XRDs + Compositions) hide cloud provider complexity behind clean APIs
* Use **Operator SDK** or **Kubebuilder** to build Go-based operators; **kopf** for Python teams

***

## Further Reading

* [Kubernetes Operator Pattern](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/)
* [Kubebuilder Book](https://book.kubebuilder.io/)
* [Crossplane Documentation](https://docs.crossplane.io/)
* [OperatorHub.io](https://operatorhub.io/)
* → Next: [Security and Compliance](https://blog.htunnthuthu.com/getting-started/fundamentals/platform-engineering-101/platform-engineering-101-security-compliance)
