Part 8: MLOps with KubeFlow — Training Pipelines on Kubernetes

Part of the SRE Playbook series

What You'll Learn: This article covers how I added ML operations to the GoReliable platform — deploying KubeFlow on the same Kubernetes cluster, building a training pipeline for a recommendation model, using Katib for hyperparameter tuning, deploying the trained model with KServe, and building a Go prediction gateway that routes inference requests. I also show how I apply SRE principles to ML workloads — they need SLIs too.

Why MLOps Belongs in an SRE Playbook

When I first added a recommendation model to the GoReliable platform, I treated it differently from the other services. I'd SSH into a VM, train a model manually, copy the artifact somewhere, and update a config file with a new model path. That worked for the first iteration.

Then I needed to retrain. And retrain again when my training data grew. And I wanted to try different hyperparameters. Within two weeks, I had no idea which model version was running in production, where the training code was, or how to reproduce the training run that produced it.

MLOps is SRE applied to model pipelines. The same principles apply: reproducibility, observability, automated delivery, and reliability. KubeFlow provides the platform; the same GitOps workflow from Part 3 manages it.

For MLOps fundamentals, see the MLOps 101 series. This article focuses on the platform integration.

Deploying KubeFlow via ArgoCD

I deploy KubeFlow into a dedicated kubeflow namespace on the same cluster. Since KubeFlow has many components, I use ArgoCD's sync waves to order the deployment.

# argocd/appsets/kubeflow.yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: kubeflow
  namespace: argocd
spec:
  generators:
    - list:
        elements:
          - name: kubeflow-cert-manager
            wave: "1"
          - name: kubeflow-istio
            wave: "2"
          - name: kubeflow-dex
            wave: "3"
          - name: kubeflow-pipelines
            wave: "4"
          - name: kubeflow-katib
            wave: "5"
          - name: kserve
            wave: "6"
  template:
    metadata:
      name: "kubeflow-{{name}}"
      annotations:
        argocd.argoproj.io/sync-wave: "{{wave}}"
    spec:
      project: go-reliable
      source:
        repoURL: https://github.com/htunn/go-reliable-gitops.git
        targetRevision: main
        path: infrastructure/kubeflow/{{name}}
      destination:
        server: https://kubernetes.default.svc
        namespace: kubeflow
      syncPolicy:
        automated:
          prune: false           # Don't auto-prune KubeFlow components
          selfHeal: true
        syncOptions:
          - CreateNamespace=true
          - ServerSideApply=true  # KubeFlow CRDs require server-side apply

KubeFlow uses Istio for its internal service mesh. I configure Istio to not intercept traffic in my application namespaces — I don't want KubeFlow's Istio installation interfering with the go-reliable-production namespace.

The Recommendation Model

The model I trained is a simple order-based recommendation: given a user's order history, recommend what they're likely to order next. It's not a state-of-the-art deep learning model — it's a gradient boosting classifier trained on purchase sequences.

What matters is that it runs reliably in production and has the same SRE treatment as any other service.

Training Pipeline

I define the training pipeline using the KubeFlow Pipelines Python SDK v2:

# pipelines/recommendation/pipeline.py
from kfp import dsl
from kfp.dsl import Dataset, Input, Model, Output, component

@component(
    base_image="python:3.11-slim",
    packages_to_install=["pandas", "scikit-learn", "psycopg2-binary", "mlflow==2.11.0"],
)
def extract_training_data(
    output_dataset: Output[Dataset],
    database_url: str,
    lookback_days: int = 90,
):
    """Extract order history from PostgreSQL for training."""
    import pandas as pd
    import psycopg2

    conn = psycopg2.connect(database_url)
    query = f"""
        SELECT user_id, product_id, created_at
        FROM orders
        WHERE created_at >= NOW() - INTERVAL '{lookback_days} days'
          AND status = 'completed'
        ORDER BY user_id, created_at
    """
    df = pd.read_sql(query, conn)
    conn.close()

    df.to_parquet(output_dataset.path, index=False)
    print(f"Extracted {len(df)} orders from last {lookback_days} days")


@component(
    base_image="python:3.11-slim",
    packages_to_install=["pandas", "scikit-learn", "mlflow==2.11.0"],
)
def train_recommendation_model(
    input_dataset: Input[Dataset],
    model_output: Output[Model],
    mlflow_tracking_uri: str,
    experiment_name: str,
    n_estimators: int = 100,
    max_depth: int = 5,
    learning_rate: float = 0.1,
):
    """Train gradient boosting recommendation model."""
    import mlflow
    import pandas as pd
    from sklearn.ensemble import GradientBoostingClassifier
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import accuracy_score
    import joblib

    mlflow.set_tracking_uri(mlflow_tracking_uri)
    mlflow.set_experiment(experiment_name)

    df = pd.read_parquet(input_dataset.path)

    # Feature engineering: encode user purchase sequences
    X, y = prepare_features(df)  # Defined elsewhere in the project
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

    with mlflow.start_run():
        mlflow.log_params({
            "n_estimators": n_estimators,
            "max_depth": max_depth,
            "learning_rate": learning_rate,
            "lookback_days": "90",
        })

        model = GradientBoostingClassifier(
            n_estimators=n_estimators,
            max_depth=max_depth,
            learning_rate=learning_rate,
        )
        model.fit(X_train, y_train)

        accuracy = accuracy_score(y_test, model.predict(X_test))
        mlflow.log_metric("accuracy", accuracy)
        mlflow.log_metric("train_samples", len(X_train))

        print(f"Test accuracy: {accuracy:.4f}")

        # Save model artifact — MLFlow also logs it, but we save locally for KServe
        joblib.dump(model, model_output.path)

        # Register in MLFlow Model Registry
        mlflow.sklearn.log_model(
            model,
            "recommendation-model",
            registered_model_name="recommendation-model",
        )


@dsl.pipeline(
    name="recommendation-training-pipeline",
    description="Train order recommendation model from production data",
)
def recommendation_pipeline(
    database_url: str,
    mlflow_tracking_uri: str,
    experiment_name: str = "recommendation",
    lookback_days: int = 90,
    n_estimators: int = 100,
    max_depth: int = 5,
    learning_rate: float = 0.1,
):
    extract_task = extract_training_data(
        database_url=database_url,
        lookback_days=lookback_days,
    )
    # Don't cache the data extraction — always use fresh data
    extract_task.set_caching_options(False)

    train_task = train_recommendation_model(
        input_dataset=extract_task.outputs["output_dataset"],
        mlflow_tracking_uri=mlflow_tracking_uri,
        experiment_name=experiment_name,
        n_estimators=n_estimators,
        max_depth=max_depth,
        learning_rate=learning_rate,
    )

    # Set resource constraints for the training step
    train_task.set_memory_request("2Gi")
    train_task.set_cpu_request("1")

Submitting the Pipeline

# scripts/submit_pipeline.py
import kfp

client = kfp.Client(host="https://kubeflow.internal.go-reliable.dev")

run = client.create_run_from_pipeline_func(
    recommendation_pipeline,
    arguments={
        "database_url": "postgresql://...",
        "mlflow_tracking_uri": "http://mlflow.mlflow:5000",
        "experiment_name": "recommendation-v2",
    },
    run_name="recommendation-training-2026-03-13",
    experiment_name="recommendation",
)
print(f"Pipeline run: {run.run_id}")

I trigger this pipeline via a scheduled Kubernetes CronJob (weekly retraining) and also manually when I want to experiment with hyperparameters.

Hyperparameter Tuning with Katib

Instead of manually trying hyperparameter combinations, I use Katib to run a structured search:

# katib/recommendation-hpo.yaml
apiVersion: kubeflow.org/v1beta1
kind: Experiment
metadata:
  name: recommendation-hpo
  namespace: kubeflow
spec:
  objective:
    type: maximize
    goal: 0.85
    objectiveMetricName: accuracy
  algorithm:
    algorithmName: bayesianoptimization   # More efficient than random search
  maxTrialCount: 20
  maxFailedTrialCount: 3
  parallelTrialCount: 3
  parameters:
    - name: n_estimators
      parameterType: int
      feasibleSpace:
        min: "50"
        max: "300"
    - name: max_depth
      parameterType: int
      feasibleSpace:
        min: "3"
        max: "10"
    - name: learning_rate
      parameterType: double
      feasibleSpace:
        min: "0.01"
        max: "0.3"
  trialTemplate:
    primaryContainerName: training-container
    trialParameters:
      - name: n_estimators
        description: Number of estimators
        reference: n_estimators
      - name: max_depth
        description: Max depth
        reference: max_depth
      - name: learning_rate
        description: Learning rate
        reference: learning_rate
    trialSpec:
      apiVersion: batch/v1
      kind: Job
      spec:
        template:
          spec:
            containers:
              - name: training-container
                image: ghcr.io/htunn/go-reliable/recommendation-trainer:latest
                command:
                  - python
                  - train.py
                  - "--n_estimators=${trialParameters.n_estimators}"
                  - "--max_depth=${trialParameters.max_depth}"
                  - "--learning_rate=${trialParameters.learning_rate}"
                resources:
                  requests:
                    cpu: "1"
                    memory: "2Gi"
            restartPolicy: Never

The Bayesian optimization algorithm uses results from previous trials to make informed choices about which hyperparameters to try next. My 20-trial search found parameters that improved accuracy from 0.79 to 0.84 — meaningfully better than my initial manual guess.

Model Serving with KServe

After training, I deploy the model as a KServe InferenceService. KServe handles the serving infrastructure — I define what model to serve and it handles scaling, load balancing, and the prediction API.

# kserve/recommendation-model.yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: recommendation-model
  namespace: go-reliable-production
  annotations:
    argocd.argoproj.io/sync-wave: "4"
spec:
  predictor:
    sklearn:
      storageUri: s3://go-reliable-models/recommendation/v3/
      resources:
        requests:
          cpu: "200m"
          memory: "512Mi"
        limits:
          cpu: "1000m"
          memory: "1Gi"
    minReplicas: 1
    maxReplicas: 4
    scaleTarget: 10      # Scale when average in-flight requests per pod > 10
    scaleMetric: concurrency

KServe automatically:

Downloads the model from S3 on startup
Exposes a prediction REST API at /v2/models/recommendation-model/infer
Scales based on concurrency
Logs prediction requests to the observability stack

The Go ML Inference Gateway

Models don't know about authentication, routing, or the GoReliable API contract. The ML Inference Gateway is the Go service that bridges them.

// internal/mlgateway/handler.go
package mlgateway

import (
    "bytes"
    "context"
    "encoding/json"
    "fmt"
    "net/http"
    "time"

    "github.com/rs/zerolog"
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/attribute"
)

var tracer = otel.Tracer("ml-gateway")

type RecommendationRequest struct {
    UserID string `json:"user_id"`
    Limit  int    `json:"limit"`
}

type RecommendationResponse struct {
    UserID          string    `json:"user_id"`
    Recommendations []string  `json:"recommendations"`
    ModelVersion    string    `json:"model_version"`
    InferenceTimeMs float64   `json:"inference_time_ms"`
}

// KServe v2 inference protocol
type kserveRequest struct {
    Inputs []kserveInput `json:"inputs"`
}

type kserveInput struct {
    Name     string        `json:"name"`
    Shape    []int         `json:"shape"`
    Datatype string        `json:"datatype"`
    Data     []interface{} `json:"data"`
}

func (h *Handler) GetRecommendations(w http.ResponseWriter, r *http.Request) {
    ctx, span := tracer.Start(r.Context(), "mlgateway.GetRecommendations")
    defer span.End()

    logger := zerolog.Ctx(ctx)

    var req RecommendationRequest
    if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
        http.Error(w, "invalid request", http.StatusBadRequest)
        return
    }

    // Fetch user's recent order features from Order Service
    features, err := h.orderClient.GetUserFeatures(ctx, req.UserID)
    if err != nil {
        logger.Error().Err(err).Str("user_id", req.UserID).Msg("failed to fetch user features")
        http.Error(w, "feature extraction failed", http.StatusInternalServerError)
        return
    }

    // Call KServe
    start := time.Now()
    predictions, modelVersion, err := h.callKServe(ctx, features, req.Limit)
    inferenceTime := time.Since(start)

    if err != nil {
        span.RecordError(err)
        logger.Error().Err(err).Msg("kserve inference failed")
        http.Error(w, "inference failed", http.StatusServiceUnavailable)
        return
    }

    span.SetAttributes(
        attribute.String("ml.user_id", req.UserID),
        attribute.String("ml.model_version", modelVersion),
        attribute.Float64("ml.inference_ms", float64(inferenceTime.Milliseconds())),
    )

    // Record inference SLI metric
    mlInferenceHistogram.With(prometheus.Labels{
        "model": "recommendation",
        "version": modelVersion,
    }).Observe(inferenceTime.Seconds())

    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(RecommendationResponse{
        UserID:          req.UserID,
        Recommendations: predictions,
        ModelVersion:    modelVersion,
        InferenceTimeMs: float64(inferenceTime.Milliseconds()),
    })
}

func (h *Handler) callKServe(ctx context.Context, features []float64, limit int) ([]string, string, error) {
    kserveReq := kserveRequest{
        Inputs: []kserveInput{
            {
                Name:     "user_features",
                Shape:    []int{1, len(features)},
                Datatype: "FP64",
                Data:     toInterfaceSlice(features),
            },
        },
    }

    body, _ := json.Marshal(kserveReq)

    httpReq, err := http.NewRequestWithContext(ctx,
        http.MethodPost,
        fmt.Sprintf("%s/v2/models/recommendation-model/infer", h.kserveEndpoint),
        bytes.NewReader(body),
    )
    if err != nil {
        return nil, "", err
    }
    httpReq.Header.Set("Content-Type", "application/json")

    resp, err := h.httpClient.Do(httpReq)
    if err != nil {
        return nil, "", fmt.Errorf("kserve request: %w", err)
    }
    defer resp.Body.Close()

    if resp.StatusCode != http.StatusOK {
        return nil, "", fmt.Errorf("kserve returned %d", resp.StatusCode)
    }

    // Parse KServe v2 response and extract top-N recommendations
    // ...
    return parsePredictions(resp.Body, limit)
}

SLIs for Model Serving

The ML inference endpoint gets the same SRE treatment as the other services. I define two additional SLIs:

Model serving availability:

sum(rate(goreliable_ml_gateway_inference_requests_total{status!~"5.."}[5m]))
/
sum(rate(goreliable_ml_gateway_inference_requests_total[5m]))

Inference latency (p99 < 200ms):

sum(rate(goreliable_ml_inference_duration_seconds_bucket{le="0.2"}[5m]))
/
sum(rate(goreliable_ml_inference_duration_seconds_count[5m]))

The 200ms latency threshold for inference is tighter than the 300ms I use for the order API. Recommendations are called in the user-facing path and perceptibly slow the page if they take too long.

In Part 9, I set up MLFlow as the experiment tracking and model registry layer — the system of record for what model is trained, what parameters produced which accuracy, and which version is approved for production.

PreviousPart 7: Capacity Planning, Performance, and Chaos Engineering NextPart 9: MLFlow — Experiment Tracking and Model Registry

Last updated 4 days ago

hashtagWhy MLOps Belongs in an SRE Playbook

hashtagDeploying KubeFlow via ArgoCD

hashtagThe Recommendation Model

hashtagTraining Pipeline

hashtagSubmitting the Pipeline

hashtagHyperparameter Tuning with Katib

hashtagModel Serving with KServe

hashtagThe Go ML Inference Gateway

hashtagSLIs for Model Serving