Part 8: MLOps with KubeFlow — Training Pipelines on Kubernetes
Why MLOps Belongs in an SRE Playbook
Deploying KubeFlow via ArgoCD
# argocd/appsets/kubeflow.yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: kubeflow
namespace: argocd
spec:
generators:
- list:
elements:
- name: kubeflow-cert-manager
wave: "1"
- name: kubeflow-istio
wave: "2"
- name: kubeflow-dex
wave: "3"
- name: kubeflow-pipelines
wave: "4"
- name: kubeflow-katib
wave: "5"
- name: kserve
wave: "6"
template:
metadata:
name: "kubeflow-{{name}}"
annotations:
argocd.argoproj.io/sync-wave: "{{wave}}"
spec:
project: go-reliable
source:
repoURL: https://github.com/htunn/go-reliable-gitops.git
targetRevision: main
path: infrastructure/kubeflow/{{name}}
destination:
server: https://kubernetes.default.svc
namespace: kubeflow
syncPolicy:
automated:
prune: false # Don't auto-prune KubeFlow components
selfHeal: true
syncOptions:
- CreateNamespace=true
- ServerSideApply=true # KubeFlow CRDs require server-side applyThe Recommendation Model
Training Pipeline
Submitting the Pipeline
Hyperparameter Tuning with Katib
Model Serving with KServe
The Go ML Inference Gateway
SLIs for Model Serving
PreviousPart 7: Capacity Planning, Performance, and Chaos EngineeringNextPart 9: MLFlow — Experiment Tracking and Model Registry
Last updated