Status, Events, and Observability
Table of Contents
Introduction
A controller that works but is invisible is a liability. When something goes wrong at 2am, the only useful signal is what the operator left behind β status conditions, events, logs, and metrics. This article covers how to make an operator observable in the same way built-in Kubernetes controllers are.
The patterns here are what I apply to every controller in appstack-operator. They make the difference between "I need to look at controller logs to understand what happened" and "I ran kubectl describe appstack api-service and got the full picture in 10 seconds".
Status Subresource Mechanics
AppStack uses the status subresource (enabled by // +kubebuilder:subresource:status on the type). This means:
Updating
.specusesr.Update(ctx, appStack)Updating
.statususesr.Status().Update(ctx, appStack)
These are separate API calls with separate RBAC. If you call r.Update() on an object with a modified .status, the API server silently ignores the status changes (they're controlled by the subresource). I spent an afternoon debugging why my status never updated before I understood this.
Optimistic Locking and Conflict Errors
The Kubernetes API uses resource versioning (metadata.resourceVersion) for optimistic locking. If you read an object, another process updates it, and then you try to update it, you get a conflict error:
Handle conflicts by requeueing:
Writing Status Conditions
meta.SetStatusCondition from k8s.io/apimachinery/pkg/api/meta is the correct way to set conditions. It handles the LastTransitionTime automatically β only updating it when Status actually changes (not on every reconcile).
Always set ObservedGeneration: appStack.Generation. This tells consumers (including tooling like ArgoCD) that this condition reflects the current spec version, not a stale observation.
The Full Status Update Helper
I consolidate all status updates into one function that evaluates the Deployment state and writes all conditions atomically:
What kubectl describe Looks Like
kubectl describe Looks LikeAfter updateStatus runs, kubectl describe appstack api-service shows:
Kubernetes Events
Events are the operator's audit log. They appear in kubectl get events and kubectl describe output. They're scoped to a namespace and retained for ~1 hour by default (configurable in the cluster).
The EventRecorder you set up in main.go records events against a specific object:
Event types: corev1.EventTypeNormal ("Normal") or corev1.EventTypeWarning ("Warning").
Reason: CamelCase, no spaces. Should be a stable identifier β don't put variable data here.
Message: Human-readable description. Can include variable data.
Events in Action
After a deployment is created:
After an image update:
Structured Logging with logr
controller-runtime's logging uses logr, a structured logging interface. The log.FromContext(ctx) call returns a logger that's pre-seeded with the reconciler's name and the object being reconciled.
Key Logging Practices
Always add structured fields over string formatting:
Log important decisions, not implementation steps:
Log at the right level:
Exposing Prometheus Metrics
controller-runtime registers default metrics on the /metrics endpoint automatically:
controller_runtime_reconcile_total
Total reconcile calls, by result (success/error)
controller_runtime_reconcile_errors_total
Failed reconcile calls
controller_runtime_reconcile_time_seconds
Reconcile duration histogram
controller_runtime_active_workers
Number of active goroutines in the work queue
These are immediately available with no extra code once your operator is running with the metrics server enabled (default port :8080).
Custom Metrics
For domain-specific metrics, use the prometheus/client_golang package:
Record metrics in the reconciler:
Scraping with Prometheus
The controller metrics endpoint is typically not scraped automatically. Add a ServiceMonitor (if using the Prometheus Operator):
What Good Observability Looks Like in Practice
When api-service is stuck not rolling out, a well-instrumented operator lets you diagnose it without touching kubectl exec or controller logs:
Output:
No log diving needed. The condition says the image is wrong. The event shows when the bad update was applied. Fix the image, kubectl apply, watch the status flip back to Running.
Last updated