Part 3: GitOps with ArgoCD and Continuous Delivery

Part of the SRE Playbook series

What You'll Learn: This article covers how I set up GitOps for the GoReliable platform using ArgoCD. You'll see my actual GitOps repository structure, how I use the app-of-apps pattern with ApplicationSets for multi-environment management, how my GitHub Actions CI pipeline builds images and updates Helm values, and how I manage secrets with External Secrets Operator. By the end, every commit to the application code automatically propagates to the cluster without manual helm upgrade commands.

Why Manual Deploys Stopped Working

After getting the Helm charts right in Part 2, I deployed everything manually. For a few weeks this was fine. Then I hit the problem everyone eventually hits: I couldn't remember what version of each service was running in staging versus production.

I had four services, three environments, and no single source of truth for "what is deployed where." When something went wrong, I'd run helm list and kubectl get deployments and try to reconstruct the state from command output and my memory. That's not a system β€” that's anxiety.

GitOps fixes this by making the Git repository the source of truth. The cluster state is defined declaratively in Git. ArgoCD continuously reconciles the cluster to match that state. "What's running in production?" is answered by reading a file in a repository, not by querying the cluster.

For GitOps and ArgoCD fundamentals, see the GitOps 101 series. This article focuses on the patterns specific to the GoReliable platform.

Repository Structure

I maintain two repositories:

  1. go-reliable β€” Application code (what we built in Part 1)

  2. go-reliable-gitops β€” Kubernetes manifests, Helm values, ArgoCD configuration

Keeping them separate is intentional. Application code changes frequently; infrastructure configuration changes less often and requires a different review process. Mixing them creates PRs where a single-line config change is buried in a diff of application code.

go-reliable-gitops/
β”œβ”€β”€ argocd/
β”‚   β”œβ”€β”€ apps/
β”‚   β”‚   β”œβ”€β”€ root-app.yaml           # The root app-of-apps
β”‚   β”‚   └── project.yaml
β”‚   └── appsets/
β”‚       β”œβ”€β”€ microservices.yaml      # ApplicationSet for Go services
β”‚       └── infrastructure.yaml    # ApplicationSet for platform infra
β”œβ”€β”€ environments/
β”‚   β”œβ”€β”€ staging/
β”‚   β”‚   β”œβ”€β”€ api-gateway/
β”‚   β”‚   β”‚   └── values.yaml        # Staging-specific overrides
β”‚   β”‚   β”œβ”€β”€ order-service/
β”‚   β”‚   β”‚   └── values.yaml
β”‚   β”‚   β”œβ”€β”€ notification-worker/
β”‚   β”‚   β”‚   └── values.yaml
β”‚   β”‚   └── ml-gateway/
β”‚   β”‚       └── values.yaml
β”‚   └── production/
β”‚       β”œβ”€β”€ api-gateway/
β”‚       β”‚   └── values.yaml
β”‚       β”œβ”€β”€ order-service/
β”‚       β”‚   └── values.yaml
β”‚       β”œβ”€β”€ notification-worker/
β”‚       β”‚   └── values.yaml
β”‚       └── ml-gateway/
β”‚           └── values.yaml
└── clusters/
    β”œβ”€β”€ staging/
    β”‚   └── cluster.yaml           # Cluster connection config
    └── production/
        └── cluster.yaml

The environments/ directory holds only the override values β€” the base values.yaml from the application chart (in the go-reliable repository) provides defaults.

App-of-Apps Pattern

I use the app-of-apps pattern. One "root" ArgoCD Application manages child Applications. Adding a new service means adding one file; ArgoCD picks it up automatically.

ApplicationSet for Microservices

Instead of creating four separate ArgoCD Applications (one per service), I use an ApplicationSet with a Git generator. It scans a directory and creates one Application per subdirectory found.

The matrix generator creates a cross-product: staging Γ— [api-gateway, order-service, notification-worker, ml-gateway] and production Γ— same four services = eight Applications from one manifest. When I add a fifth service, I create the values directory; ArgoCD creates the Applications automatically.

CI Pipeline: Build β†’ Update β†’ Sync

The GitHub Actions workflow in go-reliable handles building and publishing the image. A separate workflow in go-reliable-gitops handles the sync notification.

This pipeline:

  1. Detects which services actually changed (no rebuilds for unchanged services)

  2. Builds only the affected Docker images

  3. Pushes to GitHub Container Registry

  4. Updates the staging values.yaml in the GitOps repo with the new image tag

ArgoCD detects the GitOps repo change and syncs staging automatically within 3 minutes (the default polling interval). Production requires a manual PR from staging values to production values β€” a deliberate gate.

Promoting to Production

Promotion from staging to production is a pull request in go-reliable-gitops. I copy the image tag from the staging values file into the production values file and merge.

This gives me:

  • A Git history of every production deployment

  • Author and timestamp of every promotion

  • The ability to revert by reverting the PR

  • A review step before touching production

Secret Management with External Secrets Operator

I do not store secrets in the GitOps repository. They're managed in the cluster using External Secrets Operator, which pulls from a secrets manager (I use AWS Secrets Manager for this project).

The Deployment references envFrom: secretRef: api-gateway-secrets (set up in Part 2). External Secrets Operator keeps the Kubernetes Secret synchronized with AWS Secrets Manager. When I rotate a secret in AWS, the Secret in Kubernetes updates within the refreshInterval.

Sync Waves for Ordered Deployment

When ArgoCD syncs the full application, I need infrastructure to be ready before services start. Database migrations need to run before the Order Service starts. I control this with sync waves.

Wave ordering ensures I never have the API Gateway serving requests to an Order Service that's in the middle of a migration that might include table structure changes.

Handling Rollbacks

When something goes wrong in staging, ArgoCD makes rollback immediate:

In the GitOps model, a "rollback" is actually just reverting the values file to the previous image tag and letting ArgoCD sync. I prefer the Git revert approach for production because it maintains the audit trail β€” the history shows both the bad deploy and the revert, which is useful for post-incident analysis.

What This Enables

After setting up the pipeline:

  • Every push to main in the application repo automatically deploys to staging within ~5 minutes (build time + ArgoCD sync)

  • Production deploys require a PR in the GitOps repo, providing a review gate

  • Cluster state is fully reproducible from the Git history β€” I can recreate any historical state

  • Manual kubectl apply or helm upgrade are unnecessary for normal operations

  • The answer to "what's running in production" is a specific git commit SHA

In Part 4, I instrument the Go services with Prometheus metrics, OpenTelemetry distributed tracing, and structured logging β€” then deploy the observability stack itself via ArgoCD using these same patterns.

Last updated