Infrastructure as Code for Landing Zones

Table of Contents


Introduction

Managing cloud infrastructure at scale has taught me that manual deployments simply don't scale beyond a handful of accounts.

Working with organizations rapidly growing their cloud footprint, I've seen the limitations of manual console-based configuration:

  • Inconsistent configurations across accounts

  • No audit trail of infrastructure changes

  • Changes deployed directly to production without validation

  • Difficult rollback when mistakes occur

  • Slow provisioning times (hours or days per account)

  • Configuration drift over time

  • Compliance audit failures due to inconsistencies

The root problem was treating infrastructure as something to click through rather than code to manage. What works for 5 accounts becomes unmanageable at 50+ accounts.

Through implementing Infrastructure as Code (IaC) across various landing zone projects, I've learned that IaC is the only sustainable way to manage cloud infrastructure at scale. It provides consistency, auditability, testability, and automation that manual processes cannot achieve.

This article shares the IaC patterns and practices I've developed - covering Terraform module architecture, state management strategies, CI/CD pipelines for infrastructure, testing approaches, and how to maintain configuration consistency across many accounts.


Why IaC for Landing Zones

The Problems IaC Solves

Problem
Manual Approach
IaC Approach

Consistency

Each account different

Identical configuration

Auditability

No change history

Git commit history

Testing

Test in production

Validate before deploy

Rollback

Manual, error-prone

git revert + redeploy

Scale

Linear effort (40h × N)

Constant effort (15min regardless of N)

Documentation

Outdated wikis

Code is documentation

Collaboration

Ticket-driven

Pull request workflow

Compliance

Manual audits

Automated validation

IaC Benefits for Landing Zones

1. Consistency at Scale

2. Version Control

3. Automated Testing

4. Rapid Rollback


Terraform Module Architecture

Modular Structure

Root Module Pattern

Account Baseline Module


State Management Strategies

Remote State with S3 + DynamoDB

State Isolation Strategies

Strategy 1: Separate State Files per Account

Strategy 2: Workspaces

Strategy 3: Separate Backends per Layer


CI/CD Pipelines for Infrastructure

GitHub Actions Pipeline

Atlantis for Pull Request Automation


Testing Infrastructure Code

Terraform Validate and Plan

TFLint for Best Practices

Checkov for Security

Terratest for Integration Testing

Run Tests:


What I Learned About IaC

After that $280K manual deployment disaster and dozens of IaC implementations:

Lesson 1: IaC is Non-Negotiable at Scale

Manual deployments don't scale beyond 10-15 accounts.

Action: Terraform for all infrastructure, no exceptions.

Lesson 2: Modular Architecture Enables Reuse

Copy-paste Terraform code creates technical debt.

Action: Build reusable modules, version them, test them independently.

Lesson 3: State Management Makes or Breaks IaC

Lost state files = disaster. Corrupted state = hours of recovery.

Action: Remote state (S3), state locking (DynamoDB), versioning enabled, backups automated.

Lesson 4: CI/CD Prevents Production Incidents

Manual terraform apply in production = eventual disaster.

Action: All changes via pull requests, automated testing, approval requirements, automated deployment.

Lesson 5: Testing Infrastructure Code is Critical

Untested infrastructure changes cause outages.

Action: terraform validate, TFLint, Checkov, Terratest before production deployment.

Lesson 6: Versioning Modules Enables Safe Updates

Updating modules without versioning breaks everything.

Action: Semantic versioning for modules, module registry, changelog for breaking changes.

Lesson 7: Documentation in Code is Best Documentation

Wiki documentation goes stale immediately.

Action: Terraform code with comments, module READMEs, examples directory, ADRs in Git.

Lesson 8: Rollback Strategy Must Exist

Failed deployments without rollback plan = extended outages.

Action: Git-based rollback (git revert), tested rollback procedures, automated rollback for critical failures.


Next Up: Subscription/Account Vending

In Article 8, we'll cover account vending machines, self-service portals, automated provisioning workflows, and approval processes.

Ready to automate account creation? Let's go! 🏗️

Last updated