Production Workflows and Team Collaboration

The day we deployed to production without approval taught me that workflows prevent disasters


Table of Contents


Introduction: The Accidental Production Deploy

It was a Friday afternoon. I was testing a change in our development environment. I ran terraform apply and confirmed.

30 seconds later, Slack exploded:

ALERT: Production database deleted ALERT: Production services down ALERT: All production traffic failing

My heart stopped. I looked at my terminal.

I was in the production directory.

I had just destroyed production infrastructure. On a Friday. At 4:45 PM.

The damage:

  • 2 hours of complete downtime

  • Data loss (fortunately, backups worked)

  • $50,000 in revenue loss

  • Weekend recovery effort

  • Very angry customers

The root cause:

  • No workflow guardrails

  • Manual terraform commands

  • No approval process

  • Easy to make mistakes

We spent the next month implementing production workflows:

  • GitOps: All changes through Git

  • Automation: No manual terraform commands

  • Approvals: Required reviews before deploy

  • Safeguards: Multiple safety checks

That incident taught me: workflows prevent human error.

This article is everything I learned about building safe, automated production workflows for Terraform.


Why Production Workflows Matter

Manual Process Problems

Manual terraform workflow:

Risks:

  • Wrong directory

  • Wrong environment

  • Wrong credentials

  • No peer review

  • No audit trail

  • No rollback plan

Automated Workflow Benefits

GitOps workflow:

Benefits:

  • βœ… Peer review required

  • βœ… Automated validation

  • βœ… Audit trail in Git

  • βœ… Consistent process

  • βœ… Easy rollback

  • βœ… No local terraform commands

Workflow Impact

spinner

GitOps Principles

Core Principles

1. Git is the source of truth

  • All infrastructure defined in Git

  • Deployed state matches Git state

  • Changes only through Git commits

2. Declarative configuration

  • Describe desired state, not steps

  • Terraform handles how to get there

3. Automated deployment

  • No manual terraform commands

  • CI/CD applies changes

  • Humans approve, machines execute

4. Continuous reconciliation

  • Detect drift

  • Auto-remediate

  • Alert on discrepancies

GitOps Workflow

spinner

Repository Structure


Pull Request Workflow

PR Template

.github/pull_request_template.md:

PR Validation Workflow

.github/workflows/pr-plan.yml:


Automated Plan Comments

Post plan output to PR for review.

Plan Comment Action

.github/workflows/post-plan-comment.yml:

Example Plan Comment

create: 3 update: 1 delete: 0

⚠️ Review Checklist

CODEOWNERS File

.github/CODEOWNERS:

Manual Approval Step

.github/workflows/deploy-production.yml:

GitHub Environment Protection:

  • Required reviewers: @infrastructure-lead, @security-team

  • Wait timer: 5 minutes (thinking time)

  • Deployment branches: main only


Deployment Strategies

Strategy 1: Blue-Green Deployment

Deploy new version alongside old, then switch.

Deployment process:

Strategy 2: Canary Deployment

Gradually shift traffic to new version.

Gradual rollout:

Strategy 3: Rolling Deployment

Update instances one at a time.

Strategy 4: Terraform Workspaces

Use workspaces for environment isolation.

Workspace-aware configuration:


Rollback Procedures

Rollback via Git Revert

Rollback via State

Rollback Automation

.github/workflows/rollback.yml:

Trigger rollback:

  1. Go to Actions β†’ Rollback Production

  2. Click "Run workflow"

  3. Enter commit SHA to rollback to

  4. Select environment

  5. Confirm


Real-World Example: Complete CI/CD Pipeline

Full production-ready pipeline.

Repository Structure

Complete Deploy Workflow

.github/workflows/deploy.yml:

Drift Detection

.github/workflows/drift-detection.yml:


Environment Promotion

Promotion Strategy

spinner

Promotion Workflow

.github/workflows/promote.yml:


Disaster Recovery

Backup Strategy

scripts/backup-state.sh:

Recovery Procedure

scripts/recover-state.sh:

Automated Backups

.github/workflows/backup-state.yml:


Monitoring Deployments

Deployment Metrics

scripts/monitor-deployment.sh:

Health Checks

scripts/smoke-test.sh:


Common Workflow Patterns

Pattern 1: Feature Branch Workflow

Pattern 2: GitFlow

Pattern 3: Trunk-Based

Pattern 4: Environment Branches


Production Best Practices

1. Never Run Terraform Manually

βœ… All changes through CI/CD ❌ No terraform apply from laptops

2. Require Reviews

βœ… Peer review for all changes βœ… CODEOWNERS for ownership βœ… Approval gates for production

3. Test Before Deploy

βœ… Validate in PR βœ… Plan before apply βœ… Test in lower environments

4. Monitor Everything

βœ… Deployment metrics βœ… Drift detection βœ… Health checks βœ… Alert on failures

5. Automate Rollback

βœ… Fast rollback procedure βœ… Tested rollback path βœ… Automated health checks βœ… Circuit breakers

6. Document Procedures

βœ… Deployment runbooks βœ… Rollback procedures βœ… Disaster recovery plans βœ… On-call guides


What I Learned About Production Workflows

1. Workflows Prevent Human Error

That Friday afternoon production incident taught me: humans make mistakes, workflows prevent them.

Automation removes risk.

2. Approvals Save Production

Never deploy without approval.

Peer review catches what you missed.

3. Rollback Must Be Fast

When things break, every second counts.

Automate rollback procedures.

4. Testing is Non-Negotiable

Test in dev, staging before production.

Automated tests catch issues early.

5. Monitoring Enables Confidence

You can't deploy what you can't monitor.

Metrics tell you when to rollback.

6. Documentation Saves Time

When production is down, there's no time to figure it out.

Document procedures beforehand.

7. GitOps Provides Audit Trail

Git history shows what changed and why.

Essential for troubleshooting and compliance.


Next Steps

Congratulations! You've mastered production workflows:

βœ… GitOps principles βœ… Pull request workflow βœ… Automated plan comments βœ… Approval gates βœ… Deployment strategies βœ… Rollback procedures βœ… Complete CI/CD pipeline βœ… Disaster recovery

Practice Exercises

Exercise 1: Build CI/CD Pipeline

Exercise 2: Implement GitOps

Exercise 3: Test Disaster Recovery

Coming Up Next

In Article 12: Real-World Production Example - Putting It All Togetherarrow-up-right, we'll build:

  • Complete production infrastructure

  • All concepts integrated

  • Multi-environment setup

  • Full CI/CD pipeline

  • Monitoring and observability

  • Production-ready example

This is where everything comes together!


This Week's Challenge: Implement automated Terraform workflows for your projects. Move from manual deploys to GitOps.

See you in the final article! πŸš€


"That Friday afternoon accidentβ€”deploying to production instead of devβ€”cost us $50k and taught me that workflows aren't overhead, they're essential. Now every change goes through automated CI/CD. Zero manual deploys." - Me, GitOps believer

Last updated