Status, Events, and Observability

Introduction

A controller that works but is invisible is a liability. When something goes wrong at 2am, the only useful signal is what the operator left behind — status conditions, events, logs, and metrics. This article covers how to make an operator observable in the same way built-in Kubernetes controllers are.

The patterns here are what I apply to every controller in appstack-operator. They make the difference between "I need to look at controller logs to understand what happened" and "I ran kubectl describe appstack api-service and got the full picture in 10 seconds".

Status Subresource Mechanics

AppStack uses the status subresource (enabled by // +kubebuilder:subresource:status on the type). This means:

Updating .spec uses r.Update(ctx, appStack)
Updating .status uses r.Status().Update(ctx, appStack)

These are separate API calls with separate RBAC. If you call r.Update() on an object with a modified .status, the API server silently ignores the status changes (they're controlled by the subresource). I spent an afternoon debugging why my status never updated before I understood this.

// WRONG — status changes are silently dropped
appStack.Status.Phase = appsv1alpha1.AppStackPhaseRunning
return ctrl.Result{}, r.Update(ctx, appStack)

// CORRECT — use the status client
appStack.Status.Phase = appsv1alpha1.AppStackPhaseRunning
return ctrl.Result{}, r.Status().Update(ctx, appStack)

Optimistic Locking and Conflict Errors

The Kubernetes API uses resource versioning (metadata.resourceVersion) for optimistic locking. If you read an object, another process updates it, and then you try to update it, you get a conflict error:

Operation cannot be fulfilled on appstacks.apps.htunn.io "api-service": the object has been modified

Handle conflicts by requeueing:

if err := r.Status().Update(ctx, appStack); err != nil {
    if apierrors.IsConflict(err) {
        // Object was updated between our read and write — requeue to re-read
        return ctrl.Result{Requeue: true}, nil
    }
    return ctrl.Result{}, err
}

Writing Status Conditions

meta.SetStatusCondition from k8s.io/apimachinery/pkg/api/meta is the correct way to set conditions. It handles the LastTransitionTime automatically — only updating it when Status actually changes (not on every reconcile).

import "k8s.io/apimachinery/pkg/api/meta"

meta.SetStatusCondition(&appStack.Status.Conditions, metav1.Condition{
    Type:               appsv1alpha1.ConditionTypeAvailable,
    Status:             metav1.ConditionTrue,
    Reason:             "DeploymentReady",
    Message:            fmt.Sprintf("%d/%d replicas are ready", readyReplicas, desiredReplicas),
    ObservedGeneration: appStack.Generation,
})

Always set ObservedGeneration: appStack.Generation. This tells consumers (including tooling like ArgoCD) that this condition reflects the current spec version, not a stale observation.

The Full Status Update Helper

I consolidate all status updates into one function that evaluates the Deployment state and writes all conditions atomically:

func (r *AppStackReconciler) updateStatus(ctx context.Context, appStack *appsv1alpha1.AppStack) error {
    // Re-fetch the deployment for current state
    deployment := &appsv1.Deployment{}
    if err := r.Get(ctx, types.NamespacedName{
        Name:      appStack.Name,
        Namespace: appStack.Namespace,
    }, deployment); err != nil {
        return client.IgnoreNotFound(err)
    }

    desiredReplicas := int32(1)
    if appStack.Spec.Replicas != nil {
        desiredReplicas = *appStack.Spec.Replicas
    }

    readyReplicas := deployment.Status.ReadyReplicas
    appStack.Status.ReadyReplicas = readyReplicas
    appStack.Status.ObservedGeneration = appStack.Generation

    // Available condition
    if readyReplicas >= desiredReplicas {
        appStack.Status.Phase = appsv1alpha1.AppStackPhaseRunning
        meta.SetStatusCondition(&appStack.Status.Conditions, metav1.Condition{
            Type:               appsv1alpha1.ConditionTypeAvailable,
            Status:             metav1.ConditionTrue,
            Reason:             "DeploymentReady",
            Message:            fmt.Sprintf("%d/%d replicas ready", readyReplicas, desiredReplicas),
            ObservedGeneration: appStack.Generation,
        })
        meta.SetStatusCondition(&appStack.Status.Conditions, metav1.Condition{
            Type:               appsv1alpha1.ConditionTypeDegraded,
            Status:             metav1.ConditionFalse,
            Reason:             "AsExpected",
            Message:            "",
            ObservedGeneration: appStack.Generation,
        })
    } else {
        if readyReplicas == 0 {
            appStack.Status.Phase = appsv1alpha1.AppStackPhasePending
        } else {
            appStack.Status.Phase = appsv1alpha1.AppStackPhaseDegraded
        }
        meta.SetStatusCondition(&appStack.Status.Conditions, metav1.Condition{
            Type:               appsv1alpha1.ConditionTypeAvailable,
            Status:             metav1.ConditionFalse,
            Reason:             "DeploymentPending",
            Message:            fmt.Sprintf("%d/%d replicas ready", readyReplicas, desiredReplicas),
            ObservedGeneration: appStack.Generation,
        })
    }

    // Progressing condition — check if a rollout is in progress
    if deployment.Status.ObservedGeneration < deployment.Generation {
        meta.SetStatusCondition(&appStack.Status.Conditions, metav1.Condition{
            Type:               appsv1alpha1.ConditionTypeProgressing,
            Status:             metav1.ConditionTrue,
            Reason:             "DeploymentUpdating",
            Message:            "Deployment update in progress",
            ObservedGeneration: appStack.Generation,
        })
    } else {
        meta.SetStatusCondition(&appStack.Status.Conditions, metav1.Condition{
            Type:               appsv1alpha1.ConditionTypeProgressing,
            Status:             metav1.ConditionFalse,
            Reason:             "ReconcileComplete",
            Message:            "No changes pending",
            ObservedGeneration: appStack.Generation,
        })
    }

    return r.Status().Update(ctx, appStack)
}

What `kubectl describe` Looks Like

After updateStatus runs, kubectl describe appstack api-service shows:

Name:         api-service
Namespace:    production
Status:
  Conditions:
    Last Transition Time:  2026-04-01T10:00:00Z
    Message:               3/3 replicas ready
    Observed Generation:   2
    Reason:                DeploymentReady
    Status:                True
    Type:                  Available
    Last Transition Time:  2026-04-01T10:00:00Z
    Message:               No changes pending
    Observed Generation:   2
    Reason:                ReconcileComplete
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2026-04-01T09:45:00Z
    Message:
    Observed Generation:   2
    Reason:                AsExpected
    Status:                False
    Type:                  Degraded
  Observed Generation:     2
  Phase:                   Running
  Ready Replicas:          3

Kubernetes Events

Events are the operator's audit log. They appear in kubectl get events and kubectl describe output. They're scoped to a namespace and retained for ~1 hour by default (configurable in the cluster).

The EventRecorder you set up in main.go records events against a specific object:

r.Recorder.Event(appStack, corev1.EventTypeNormal, "DeploymentCreated",
    fmt.Sprintf("Created Deployment %s/%s", appStack.Namespace, appStack.Name))

r.Recorder.Event(appStack, corev1.EventTypeWarning, "ImagePullFailed",
    fmt.Sprintf("Container image %s could not be pulled", appStack.Spec.Image))

Event types: corev1.EventTypeNormal ("Normal") or corev1.EventTypeWarning ("Warning").

Reason: CamelCase, no spaces. Should be a stable identifier — don't put variable data here.

Message: Human-readable description. Can include variable data.

Events in Action

After a deployment is created:

kubectl get events -n production --field-selector involvedObject.name=api-service

LAST SEEN   TYPE     REASON              OBJECT                MESSAGE
2s          Normal   DeploymentCreated   AppStack/api-service  Created Deployment production/api-service
2s          Normal   ServiceCreated      AppStack/api-service  Created Service production/api-service
10s         Normal   HPACreated          AppStack/api-service  Created HPA production/api-service

After an image update:

5s    Normal   DeploymentUpdated   AppStack/api-service  Updated Deployment to image ghcr.io/htunn/api-service:v1.3.0

Structured Logging with logr

controller-runtime's logging uses logr, a structured logging interface. The log.FromContext(ctx) call returns a logger that's pre-seeded with the reconciler's name and the object being reconciled.

func (r *AppStackReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    log := log.FromContext(ctx)

    // The context logger already carries:
    // {"controller": "appstack", "namespace": "production", "name": "api-service"}

    log.Info("Starting reconcile")
    log.Info("Reconcile complete", "phase", appStack.Status.Phase, "readyReplicas", appStack.Status.ReadyReplicas)
    log.Error(err, "Failed to create Deployment", "deploymentName", deploymentName)
}

Key Logging Practices

Always add structured fields over string formatting:

// AVOID — hard to parse, hard to filter
log.Info(fmt.Sprintf("Deployment %s has %d replicas", name, count))

// PREFER — machine-parseable, filterable
log.Info("Deployment reconciled", "deployment", name, "replicas", count)

Log important decisions, not implementation steps:

// Too granular — noise in production logs
log.Info("Calling r.Get for Deployment")
log.Info("r.Get returned")

// Useful — reflects a decision
log.Info("Deployment does not exist, creating", "deployment", desired.Name)
log.Info("Image changed, updating Deployment", "from", existing.Image, "to", desired.Image)

Log at the right level:

log.V(1).Info("Cache hit for resource")   // debug-level, not shown by default
log.Info("Resource reconciled")           // info-level, shown by default
log.Error(err, "Reconcile failed")        // always shown

Exposing Prometheus Metrics

controller-runtime registers default metrics on the /metrics endpoint automatically:

Metric

What it measures

controller_runtime_reconcile_total

Total reconcile calls, by result (success/error)

controller_runtime_reconcile_errors_total

Failed reconcile calls

controller_runtime_reconcile_time_seconds

Reconcile duration histogram

controller_runtime_active_workers

Number of active goroutines in the work queue

These are immediately available with no extra code once your operator is running with the metrics server enabled (default port :8080).

Custom Metrics

For domain-specific metrics, use the prometheus/client_golang package:

package controller

import (
    "github.com/prometheus/client_golang/prometheus"
    "sigs.k8s.io/controller-runtime/pkg/metrics"
)

var (
    appStacksTotal = prometheus.NewGaugeVec(prometheus.GaugeOpts{
        Name: "appstack_total",
        Help: "Total number of AppStack resources by phase",
    }, []string{"namespace", "phase"})

    deploymentSyncDuration = prometheus.NewHistogramVec(prometheus.HistogramOpts{
        Name:    "appstack_deployment_sync_duration_seconds",
        Help:    "Time taken to sync the managed Deployment",
        Buckets: prometheus.DefBuckets,
    }, []string{"namespace"})
)

func init() {
    // Register with controller-runtime's metrics registry
    metrics.Registry.MustRegister(appStacksTotal, deploymentSyncDuration)
}

Record metrics in the reconciler:

func (r *AppStackReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // ...

    start := time.Now()
    if err := r.reconcileDeployment(ctx, appStack); err != nil {
        return ctrl.Result{}, err
    }
    deploymentSyncDuration.WithLabelValues(appStack.Namespace).Observe(time.Since(start).Seconds())

    appStacksTotal.WithLabelValues(appStack.Namespace, string(appStack.Status.Phase)).Set(1)

    return ctrl.Result{}, nil
}

Scraping with Prometheus

The controller metrics endpoint is typically not scraped automatically. Add a ServiceMonitor (if using the Prometheus Operator):

# config/prometheus/monitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: appstack-controller-metrics
  namespace: appstack-system
spec:
  selector:
    matchLabels:
      control-plane: controller-manager
  endpoints:
  - path: /metrics
    port: https   # the metrics port defined in the manager Service
    scheme: https
    tlsConfig:
      insecureSkipVerify: true

What Good Observability Looks Like in Practice

When api-service is stuck not rolling out, a well-instrumented operator lets you diagnose it without touching kubectl exec or controller logs:

kubectl describe appstack api-service -n production

Output:

Status:
  Phase: Degraded
  Ready Replicas: 1
  Conditions:
    Type: Available
    Status: False
    Reason: DeploymentPending
    Message: 1/3 replicas ready
    Last Transition Time: 2026-04-01T10:00:00Z

    Type: Degraded
    Status: True
    Reason: ImagePullBackOff
    Message: Container ghcr.io/htunn/api-service:v1.9.9 failed to pull: not found

Events:
  2s    Normal   DeploymentUpdated   AppStack/api-service  Updated Deployment to image ghcr.io/htunn/api-service:v1.9.9
  1s    Warning  ImagePullFailed     AppStack/api-service  Container image ghcr.io/htunn/api-service:v1.9.9 could not be pulled

No log diving needed. The condition says the image is wrong. The event shows when the bad update was applied. Fix the image, kubectl apply, watch the status flip back to Running.

Next: Testing Operators with envtest →

PreviousThe Controller and Reconcile Loop NextTesting Operators with envtest

Last updated 3 hours ago

hashtagTable of Contents

hashtagIntroduction

hashtagStatus Subresource Mechanics

hashtagOptimistic Locking and Conflict Errors

hashtagWriting Status Conditions

hashtagThe Full Status Update Helper

hashtagWhat kubectl describe Looks Like

hashtagKubernetes Events

hashtagEvents in Action

hashtagStructured Logging with logr

hashtagKey Logging Practices

hashtagExposing Prometheus Metrics

hashtagCustom Metrics

hashtagScraping with Prometheus

hashtagWhat Good Observability Looks Like in Practice

Table of Contents