Status, Events, and Observability

Table of Contents


Introduction

A controller that works but is invisible is a liability. When something goes wrong at 2am, the only useful signal is what the operator left behind β€” status conditions, events, logs, and metrics. This article covers how to make an operator observable in the same way built-in Kubernetes controllers are.

The patterns here are what I apply to every controller in appstack-operator. They make the difference between "I need to look at controller logs to understand what happened" and "I ran kubectl describe appstack api-service and got the full picture in 10 seconds".


Status Subresource Mechanics

AppStack uses the status subresource (enabled by // +kubebuilder:subresource:status on the type). This means:

  • Updating .spec uses r.Update(ctx, appStack)

  • Updating .status uses r.Status().Update(ctx, appStack)

These are separate API calls with separate RBAC. If you call r.Update() on an object with a modified .status, the API server silently ignores the status changes (they're controlled by the subresource). I spent an afternoon debugging why my status never updated before I understood this.

Optimistic Locking and Conflict Errors

The Kubernetes API uses resource versioning (metadata.resourceVersion) for optimistic locking. If you read an object, another process updates it, and then you try to update it, you get a conflict error:

Handle conflicts by requeueing:


Writing Status Conditions

meta.SetStatusCondition from k8s.io/apimachinery/pkg/api/meta is the correct way to set conditions. It handles the LastTransitionTime automatically β€” only updating it when Status actually changes (not on every reconcile).

Always set ObservedGeneration: appStack.Generation. This tells consumers (including tooling like ArgoCD) that this condition reflects the current spec version, not a stale observation.

The Full Status Update Helper

I consolidate all status updates into one function that evaluates the Deployment state and writes all conditions atomically:

What kubectl describe Looks Like

After updateStatus runs, kubectl describe appstack api-service shows:


Kubernetes Events

Events are the operator's audit log. They appear in kubectl get events and kubectl describe output. They're scoped to a namespace and retained for ~1 hour by default (configurable in the cluster).

The EventRecorder you set up in main.go records events against a specific object:

Event types: corev1.EventTypeNormal ("Normal") or corev1.EventTypeWarning ("Warning").

Reason: CamelCase, no spaces. Should be a stable identifier β€” don't put variable data here.

Message: Human-readable description. Can include variable data.

Events in Action

After a deployment is created:

After an image update:


Structured Logging with logr

controller-runtime's logging uses logr, a structured logging interface. The log.FromContext(ctx) call returns a logger that's pre-seeded with the reconciler's name and the object being reconciled.

Key Logging Practices

Always add structured fields over string formatting:

Log important decisions, not implementation steps:

Log at the right level:


Exposing Prometheus Metrics

controller-runtime registers default metrics on the /metrics endpoint automatically:

Metric
What it measures

controller_runtime_reconcile_total

Total reconcile calls, by result (success/error)

controller_runtime_reconcile_errors_total

Failed reconcile calls

controller_runtime_reconcile_time_seconds

Reconcile duration histogram

controller_runtime_active_workers

Number of active goroutines in the work queue

These are immediately available with no extra code once your operator is running with the metrics server enabled (default port :8080).

Custom Metrics

For domain-specific metrics, use the prometheus/client_golang package:

Record metrics in the reconciler:

Scraping with Prometheus

The controller metrics endpoint is typically not scraped automatically. Add a ServiceMonitor (if using the Prometheus Operator):


What Good Observability Looks Like in Practice

When api-service is stuck not rolling out, a well-instrumented operator lets you diagnose it without touching kubectl exec or controller logs:

Output:

No log diving needed. The condition says the image is wrong. The event shows when the bad update was applied. Fix the image, kubectl apply, watch the status flip back to Running.


Next: Testing Operators with envtest β†’

Last updated