# Policy Engines for Platform Governance

> **CNPA Domain:** Platform Observability, Security, and Conformance (20%) **Topic:** Policy Engines for Platform Governance

## Overview

Platform governance requires enforcing organizational standards consistently across every workload, namespace, and team. **Policy engines** automate this enforcement by intercepting Kubernetes API requests and validating them against defined rules — before resources are ever created.

Without policy engines, platform teams rely on documentation, code reviews, and hope. With them, guardrails are enforced at the API layer.

***

## Kubernetes Admission Control

Policy engines plug into the Kubernetes API server via **admission webhooks**:

```
kubectl apply -f deployment.yaml
     ↓
Kubernetes API Server
     ↓
[Authentication] → [Authorization (RBAC)] → [Admission Control]
                                                     ↓
                                         ┌───────────────────────┐
                                         │  Validating Webhooks  │ ← Policy Engine
                                         │  Mutating Webhooks    │ ← Policy Engine
                                         └───────────────────────┘
                                                     ↓
                                            etcd (persisted)
```

**Mutating webhooks** can modify resources (e.g., inject sidecars, set defaults).\
**Validating webhooks** accept or reject resources based on rules.

***

## OPA / Gatekeeper

[Open Policy Agent (OPA)](https://www.openpolicyagent.org/) is a general-purpose policy engine. [OPA Gatekeeper](https://open-policy-agent.github.io/gatekeeper/) is its Kubernetes-native integration.

Policies are written in **Rego** — a purpose-built policy language.

### Gatekeeper Architecture

```
ConstraintTemplate  →  Defines the policy logic (Rego)
Constraint          →  Instantiates the policy with parameters
```

### Example: Require Resource Limits on All Containers

```yaml
# ConstraintTemplate: defines the policy
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8srequiredresourcelimits
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredResourceLimits
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequiredresourcelimits

        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not container.resources.limits.cpu
          msg := sprintf("Container '%s' must have CPU limits", [container.name])
        }

        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not container.resources.limits.memory
          msg := sprintf("Container '%s' must have memory limits", [container.name])
        }
```

```yaml
# Constraint: enforce the policy in specific namespaces
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredResourceLimits
metadata:
  name: require-resource-limits
spec:
  match:
    kinds:
      - apiGroups: ["apps"]
        kinds: ["Deployment", "StatefulSet"]
    namespaceSelector:
      matchLabels:
        policy.example.com/enforce: "true"
```

***

## Kyverno

[Kyverno](https://kyverno.io/) is a Kubernetes-native policy engine that uses YAML-based policies — no Rego required.

### Policy Types

| Type             | Use Case                                                             |
| ---------------- | -------------------------------------------------------------------- |
| **Validate**     | Reject non-compliant resources                                       |
| **Mutate**       | Auto-patch resources (add labels, set defaults)                      |
| **Generate**     | Create related resources (e.g., NetworkPolicy on Namespace creation) |
| **VerifyImages** | Enforce container image signatures                                   |

### Example: Disallow `latest` Image Tag

```yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: disallow-latest-tag
  annotations:
    policies.kyverno.io/description: >-
      Require container images to specify a tag other than 'latest'.
spec:
  validationFailureAction: Enforce
  rules:
    - name: require-image-tag
      match:
        any:
          - resources:
              kinds: [Pod]
      validate:
        message: "Image tag 'latest' is not allowed. Use a specific version."
        pattern:
          spec:
            containers:
              - image: "!*:latest"
            initContainers:
              - image: "!*:latest"
```

### Example: Auto-Generate NetworkPolicy for New Namespaces

```yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-default-networkpolicy
spec:
  rules:
    - name: default-deny-ingress
      match:
        any:
          - resources:
              kinds: [Namespace]
      generate:
        apiVersion: networking.k8s.io/v1
        kind: NetworkPolicy
        name: default-deny-ingress
        namespace: "{{request.object.metadata.name}}"
        data:
          spec:
            podSelector: {}
            policyTypes: [Ingress]
```

***

## OPA Gatekeeper vs Kyverno

| Aspect             | OPA Gatekeeper     | Kyverno        |
| ------------------ | ------------------ | -------------- |
| Policy language    | Rego               | YAML           |
| Learning curve     | Steep (Rego)       | Low (YAML)     |
| Flexibility        | Very high          | High           |
| Mutation           | Limited            | Strong         |
| Image verification | Via external tools | Built-in       |
| Audit mode         | ✅                  | ✅              |
| Community          | CNCF graduated     | CNCF graduated |

**Guidance:** Use Kyverno for most Kubernetes-native governance use cases. Use OPA for complex multi-system policy needs.

***

## Policy-as-Code Patterns

### Audit Mode vs Enforce Mode

```yaml
# Kyverno
validationFailureAction: Audit    # Log violations, don't block
validationFailureAction: Enforce  # Block non-compliant resources

# Gatekeeper
enforcementAction: dryrun         # Log only
enforcementAction: deny           # Block
enforcementAction: warn           # Warn but allow
```

**Best practice:** Start in `Audit` mode to find violations, then switch to `Enforce` once policies are refined.

### Policy Library

Platform teams maintain a **policy library** in Git:

```
platform-policies/
├── security/
│   ├── disallow-privileged-containers.yaml
│   ├── require-pod-security-labels.yaml
│   └── disallow-latest-tag.yaml
├── networking/
│   ├── require-network-policy.yaml
│   └── disallow-host-network.yaml
├── resources/
│   ├── require-resource-limits.yaml
│   └── require-labels.yaml
└── images/
    ├── require-image-signing.yaml
    └── allow-registry-list.yaml
```

Policies are deployed via GitOps alongside application workloads.

***

## Key Takeaways

* Policy engines plug into **admission webhooks** to validate/mutate resources before they're accepted into the cluster
* **OPA Gatekeeper** uses Rego for flexible, powerful policy logic suited for complex rules
* **Kyverno** uses YAML policies — lower barrier to entry, strong mutation and generate capabilities
* Start policies in **Audit mode** to measure compliance before enforcing
* Treat policies as code — version-controlled, reviewed, deployed via GitOps
* **Generate policies** in Kyverno can automatically create companion resources (NetworkPolicies, ResourceQuotas) when namespaces are created

***

## Further Reading

* [OPA Gatekeeper Documentation](https://open-policy-agent.github.io/gatekeeper/)
* [Kyverno Documentation](https://kyverno.io/docs/)
* [CNCF Policy Working Group](https://github.com/cncf/tag-security/tree/main/policy)
* [Kyverno Policy Library](https://kyverno.io/policies/)
* → Next: [Kubernetes Security Essentials](https://blog.htunnthuthu.com/getting-started/fundamentals/platform-engineering-101/platform-engineering-101-kubernetes-security)
