Part 3: GitOps with ArgoCD and Continuous Delivery

Part of the SRE Playbook series

What You'll Learn: This article covers how I set up GitOps for the GoReliable platform using ArgoCD. You'll see my actual GitOps repository structure, how I use the app-of-apps pattern with ApplicationSets for multi-environment management, how my GitHub Actions CI pipeline builds images and updates Helm values, and how I manage secrets with External Secrets Operator. By the end, every commit to the application code automatically propagates to the cluster without manual helm upgrade commands.

Why Manual Deploys Stopped Working

After getting the Helm charts right in Part 2, I deployed everything manually. For a few weeks this was fine. Then I hit the problem everyone eventually hits: I couldn't remember what version of each service was running in staging versus production.

I had four services, three environments, and no single source of truth for "what is deployed where." When something went wrong, I'd run helm list and kubectl get deployments and try to reconstruct the state from command output and my memory. That's not a system — that's anxiety.

GitOps fixes this by making the Git repository the source of truth. The cluster state is defined declaratively in Git. ArgoCD continuously reconciles the cluster to match that state. "What's running in production?" is answered by reading a file in a repository, not by querying the cluster.

For GitOps and ArgoCD fundamentals, see the GitOps 101 series. This article focuses on the patterns specific to the GoReliable platform.

Repository Structure

I maintain two repositories:

go-reliable — Application code (what we built in Part 1)
go-reliable-gitops — Kubernetes manifests, Helm values, ArgoCD configuration

Keeping them separate is intentional. Application code changes frequently; infrastructure configuration changes less often and requires a different review process. Mixing them creates PRs where a single-line config change is buried in a diff of application code.

go-reliable-gitops/
├── argocd/
│   ├── apps/
│   │   ├── root-app.yaml           # The root app-of-apps
│   │   └── project.yaml
│   └── appsets/
│       ├── microservices.yaml      # ApplicationSet for Go services
│       └── infrastructure.yaml    # ApplicationSet for platform infra
├── environments/
│   ├── staging/
│   │   ├── api-gateway/
│   │   │   └── values.yaml        # Staging-specific overrides
│   │   ├── order-service/
│   │   │   └── values.yaml
│   │   ├── notification-worker/
│   │   │   └── values.yaml
│   │   └── ml-gateway/
│   │       └── values.yaml
│   └── production/
│       ├── api-gateway/
│       │   └── values.yaml
│       ├── order-service/
│       │   └── values.yaml
│       ├── notification-worker/
│       │   └── values.yaml
│       └── ml-gateway/
│           └── values.yaml
└── clusters/
    ├── staging/
    │   └── cluster.yaml           # Cluster connection config
    └── production/
        └── cluster.yaml

The environments/ directory holds only the override values — the base values.yaml from the application chart (in the go-reliable repository) provides defaults.

App-of-Apps Pattern

I use the app-of-apps pattern. One "root" ArgoCD Application manages child Applications. Adding a new service means adding one file; ArgoCD picks it up automatically.

# argocd/apps/root-app.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: root
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: go-reliable
  source:
    repoURL: https://github.com/htunn/go-reliable-gitops.git
    targetRevision: main
    path: argocd/appsets              # Watches this directory for ApplicationSets
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true                     # Delete resources removed from Git
      selfHeal: true                  # Revert manual changes to cluster
    syncOptions:
      - CreateNamespace=true

ApplicationSet for Microservices

Instead of creating four separate ArgoCD Applications (one per service), I use an ApplicationSet with a Git generator. It scans a directory and creates one Application per subdirectory found.

# argocd/appsets/microservices.yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: go-reliable-microservices
  namespace: argocd
spec:
  generators:
    - matrix:
        generators:
          # First dimension: environments
          - list:
              elements:
                - env: staging
                  namespace: go-reliable-staging
                - env: production
                  namespace: go-reliable-production
          # Second dimension: services discovered from the environments directory
          - git:
              repoURL: https://github.com/htunn/go-reliable-gitops.git
              revision: main
              directories:
                - path: environments/{{env}}/*

  template:
    metadata:
      name: "{{env}}-{{path.basename}}"
      labels:
        environment: "{{env}}"
        service: "{{path.basename}}"
    spec:
      project: go-reliable
      sources:
        # Source 1: Helm chart from the application repository
        - repoURL: https://github.com/htunn/go-reliable.git
          targetRevision: main
          path: deployments/helm/{{path.basename}}
          helm:
            releaseName: "{{path.basename}}"
            valueFiles:
              - values.yaml                     # defaults from app repo
              - $values/environments/{{env}}/{{path.basename}}/values.yaml   # env overrides
        # Source 2: Environment-specific values from the GitOps repo
        - repoURL: https://github.com/htunn/go-reliable-gitops.git
          targetRevision: main
          ref: values                           # referenced by $values above

      destination:
        server: https://kubernetes.default.svc
        namespace: "{{namespace}}"

      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - CreateNamespace=true
          - ServerSideApply=true
        retry:
          limit: 3
          backoff:
            duration: 30s
            factor: 2
            maxDuration: 5m

The matrix generator creates a cross-product: staging × [api-gateway, order-service, notification-worker, ml-gateway] and production × same four services = eight Applications from one manifest. When I add a fifth service, I create the values directory; ArgoCD creates the Applications automatically.

CI Pipeline: Build → Update → Sync

The GitHub Actions workflow in go-reliable handles building and publishing the image. A separate workflow in go-reliable-gitops handles the sync notification.

# .github/workflows/deploy.yml (in go-reliable repository)
name: Build and Deploy

on:
  push:
    branches: [main]
    paths:
      - 'cmd/**'
      - 'internal/**'
      - 'pkg/**'

env:
  REGISTRY: ghcr.io
  IMAGE_PREFIX: ghcr.io/htunn/go-reliable

jobs:
  changed-services:
    runs-on: ubuntu-latest
    outputs:
      services: ${{ steps.changes.outputs.services }}
    steps:
      - uses: actions/checkout@v4
      - uses: dorny/paths-filter@v3
        id: changes
        with:
          # Detect which service changed
          filters: |
            api-gateway:
              - 'cmd/api-gateway/**'
              - 'internal/gateway/**'
            order-service:
              - 'cmd/order-service/**'
              - 'internal/order/**'
            notification-worker:
              - 'cmd/notification-worker/**'
              - 'internal/notification/**'
            ml-gateway:
              - 'cmd/ml-gateway/**'
              - 'internal/mlgateway/**'

  build-and-push:
    needs: changed-services
    runs-on: ubuntu-latest
    strategy:
      matrix:
        service: ${{ fromJson(needs.changed-services.outputs.services) }}
    permissions:
      contents: read
      packages: write
    steps:
      - uses: actions/checkout@v4

      - name: Log in to container registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Build and push
        uses: docker/build-push-action@v5
        with:
          context: .
          file: Dockerfile.${{ matrix.service }}
          push: true
          tags: |
            ${{ env.IMAGE_PREFIX }}/${{ matrix.service }}:${{ github.sha }}
            ${{ env.IMAGE_PREFIX }}/${{ matrix.service }}:latest
          build-args: |
            VERSION=${{ github.sha }}
            BUILD_TIME=${{ github.event.head_commit.timestamp }}

      - name: Update image tag in GitOps repo
        uses: actions/github-script@v7
        with:
          github-token: ${{ secrets.GITOPS_PAT }}  # PAT with write access to gitops repo
          script: |
            const { Octokit } = require("@octokit/rest");
            const octokit = new Octokit({ auth: process.env.GITHUB_TOKEN });

            // Update staging values file with new image tag
            const path = `environments/staging/${{ matrix.service }}/values.yaml`;
            const sha = await octokit.repos.getContent({
              owner: "htunn",
              repo: "go-reliable-gitops",
              path,
            }).then(r => r.data.sha);

            // Read current content, update image tag
            const content = Buffer.from(
              (await octokit.repos.getContent({
                owner: "htunn", repo: "go-reliable-gitops", path
              })).data.content, 'base64'
            ).toString();

            const updated = content.replace(
              /tag: .*/,
              `tag: "${{ github.sha }}"`
            );

            await octokit.repos.createOrUpdateFileContents({
              owner: "htunn",
              repo: "go-reliable-gitops",
              path,
              message: `chore: update ${{ matrix.service }} to ${{ github.sha }}`,
              content: Buffer.from(updated).toString('base64'),
              sha,
            });

This pipeline:

Detects which services actually changed (no rebuilds for unchanged services)
Builds only the affected Docker images
Pushes to GitHub Container Registry
Updates the staging values.yaml in the GitOps repo with the new image tag

ArgoCD detects the GitOps repo change and syncs staging automatically within 3 minutes (the default polling interval). Production requires a manual PR from staging values to production values — a deliberate gate.

Promoting to Production

Promotion from staging to production is a pull request in go-reliable-gitops. I copy the image tag from the staging values file into the production values file and merge.

# environments/production/api-gateway/values.yaml
image:
  tag: "a3f8c2e1"    # SHA promoted from staging after verification

This gives me:

A Git history of every production deployment
Author and timestamp of every promotion
The ability to revert by reverting the PR
A review step before touching production

Secret Management with External Secrets Operator

I do not store secrets in the GitOps repository. They're managed in the cluster using External Secrets Operator, which pulls from a secrets manager (I use AWS Secrets Manager for this project).

# Deployed once per namespace; not managed per-service
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: aws-secrets-store
  namespace: go-reliable-production
spec:
  provider:
    aws:
      service: SecretsManager
      region: ap-southeast-1
      auth:
        jwt:
          serviceAccountRef:
            name: external-secrets-sa   # IRSA — IAM role bound to service account

# Per-service ExternalSecret
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: api-gateway-secrets
  namespace: go-reliable-production
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-store
    kind: SecretStore
  target:
    name: api-gateway-secrets    # Creates this Kubernetes Secret
    creationPolicy: Owner
  data:
    - secretKey: JWT_SECRET
      remoteRef:
        key: go-reliable/production/api-gateway
        property: jwt_secret
    - secretKey: DATABASE_URL
      remoteRef:
        key: go-reliable/production/api-gateway
        property: database_url

The Deployment references envFrom: secretRef: api-gateway-secrets (set up in Part 2). External Secrets Operator keeps the Kubernetes Secret synchronized with AWS Secrets Manager. When I rotate a secret in AWS, the Secret in Kubernetes updates within the refreshInterval.

Sync Waves for Ordered Deployment

When ArgoCD syncs the full application, I need infrastructure to be ready before services start. Database migrations need to run before the Order Service starts. I control this with sync waves.

# annotation on the migration job in the Helm chart
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "1"   # Wave 1 runs first

# annotation on the Order Service deployment
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "2"   # Wave 2 runs after wave 1 completes

# annotation on the API Gateway
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "3"   # Last — only routes traffic once Order Service is healthy

Wave ordering ensures I never have the API Gateway serving requests to an Order Service that's in the middle of a migration that might include table structure changes.

Handling Rollbacks

When something goes wrong in staging, ArgoCD makes rollback immediate:

# Via CLI
argocd app rollback staging-api-gateway --revision <previous-sync-id>

# Or via UI — click the previous sync in the history and hit "Rollback"

In the GitOps model, a "rollback" is actually just reverting the values file to the previous image tag and letting ArgoCD sync. I prefer the Git revert approach for production because it maintains the audit trail — the history shows both the bad deploy and the revert, which is useful for post-incident analysis.

What This Enables

After setting up the pipeline:

Every push to main in the application repo automatically deploys to staging within ~5 minutes (build time + ArgoCD sync)
Production deploys require a PR in the GitOps repo, providing a review gate
Cluster state is fully reproducible from the Git history — I can recreate any historical state
Manual kubectl apply or helm upgrade are unnecessary for normal operations
The answer to "what's running in production" is a specific git commit SHA

In Part 4, I instrument the Go services with Prometheus metrics, OpenTelemetry distributed tracing, and structured logging — then deploy the observability stack itself via ArgoCD using these same patterns.

PreviousPart 2: Kubernetes Deployment with Helm Charts NextPart 4: Instrumenting Go Services — Metrics, Traces, and Logs

Last updated 4 days ago

hashtagWhy Manual Deploys Stopped Working

hashtagRepository Structure

hashtagApp-of-Apps Pattern

hashtagApplicationSet for Microservices

hashtagCI Pipeline: Build → Update → Sync

hashtagPromoting to Production

hashtagSecret Management with External Secrets Operator

hashtagSync Waves for Ordered Deployment

hashtagHandling Rollbacks

hashtagWhat This Enables