Migration and Workload Onboarding

Article 11 of 12 in the Cloud Landing Zone Series

Introduction

Building a landing zone is only half the challenge - the real test is migrating existing workloads into it.

Through migration projects moving applications from on-premises data centers and brownfield cloud accounts into proper landing zones, I've seen common challenges:

  • Underestimating application complexity and dependencies

  • Discovering undocumented dependencies during migration

  • Missing cutover windows due to unforeseen issues

  • Emergency rollbacks when migrations don't go as planned

  • Insufficient testing before production cutover

Migration is where technical plans meet operational reality. What looked straightforward on paper becomes complex when dealing with legacy applications, hidden dependencies, and business continuity requirements.

This article shares the migration patterns and strategies I've learned through hands-on experience - covering discovery and assessment, dependency mapping, migration patterns, wave planning, testing strategies, and how to execute successful cutovers with minimal risk.


Discovery and Assessment

Application Discovery

# scripts/discover_applications.py
import boto3
import json
from collections import defaultdict

def discover_aws_applications():
    """
    Discover all applications across existing AWS accounts
    """
    
    ec2 = boto3.client('ec2')
    rds = boto3.client('rds')
    elbv2 = boto3.client('elbv2')
    
    applications = defaultdict(lambda: {
        'compute': [],
        'databases': [],
        'load_balancers': [],
        'storage': [],
        'dependencies': []
    })
    
    # Discover EC2 instances grouped by Application tag
    instances = ec2.describe_instances()
    for reservation in instances['Reservations']:
        for instance in reservation['Instances']:
            app_name = get_tag(instance, 'Application') or 'untagged'
            
            applications[app_name]['compute'].append({
                'type': 'EC2',
                'id': instance['InstanceId'],
                'instance_type': instance['InstanceType'],
                'vpc_id': instance.get('VpcId'),
                'subnet_id': instance.get('SubnetId'),
                'security_groups': [sg['GroupId'] for sg in instance.get('SecurityGroups', [])],
                'private_ip': instance.get('PrivateIpAddress'),
                'public_ip': instance.get('PublicIpAddress'),
                'tags': instance.get('Tags', [])
            })
    
    # Discover RDS databases
    databases = rds.describe_db_instances()
    for db in databases['DBInstances']:
        app_name = get_tag_rds(db, 'Application') or 'untagged'
        
        applications[app_name]['databases'].append({
            'type': 'RDS',
            'id': db['DBInstanceIdentifier'],
            'engine': db['Engine'],
            'size': db['DBInstanceClass'],
            'multi_az': db['MultiAZ'],
            'storage_encrypted': db['StorageEncrypted']
        })
    
    # Discover Load Balancers
    load_balancers = elbv2.describe_load_balancers()
    for lb in load_balancers['LoadBalancers']:
        app_name = get_tag_elb(lb, 'Application') or 'untagged'
        
        # Find target instances
        target_groups = elbv2.describe_target_groups(
            LoadBalancerArn=lb['LoadBalancerArn']
        )
        
        targets = []
        for tg in target_groups['TargetGroups']:
            health = elbv2.describe_target_health(
                TargetGroupArn=tg['TargetGroupArn']
            )
            targets.extend([t['Target']['Id'] for t in health['TargetHealthDescriptions']])
        
        applications[app_name]['load_balancers'].append({
            'type': 'ALB' if lb['Type'] == 'application' else 'NLB',
            'arn': lb['LoadBalancerArn'],
            'dns_name': lb['DNSName'],
            'targets': targets
        })
    
    # Generate dependency map
    for app_name, app in applications.items():
        app['dependencies'] = discover_dependencies(app)
    
    return dict(applications)

def discover_dependencies(app):
    """Analyze network traffic to discover application dependencies"""
    
    # Use VPC Flow Logs to discover communication patterns
    # This would analyze flow logs to determine which applications communicate
    
    return []

def get_tag(resource, key):
    """Get tag value from AWS resource"""
    for tag in resource.get('Tags', []):
        if tag['Key'] == key:
            return tag['Value']
    return None

# Export to JSON for migration planning
applications = discover_aws_applications()
with open('application_inventory.json', 'w') as f:
    json.dump(applications, f, indent=2)

print(f"Discovered {len(applications)} applications")

Dependency Mapping


Migration Patterns

Pattern 1: Lift-and-Shift (Rehost)

Use case: Legacy applications, tight migration timeline

Pattern 2: Refactor (Re-architect)

Use case: Modernize to serverless, containers

Pattern 3: Replatform

Use case: Move to managed services (RDS, ElastiCache, etc.)


Migration Wave Planning

Wave Strategy

Cutover Checklist


Rollback Strategy

Automated Rollback


What I Learned About Migration

Lesson 1: Discovery Always Takes Longer Than Planned

Budget 2-3x your estimated discovery time. Unknown dependencies appear mid-migration.

Action: Automated discovery tools, network traffic analysis, 4 weeks minimum for discovery.

Lesson 2: Dependencies Are Never Fully Documented

Documentation lies. Network traffic analysis reveals true dependencies.

Action: VPC Flow Log analysis, packet capture, dependency mapping tools.

Lesson 3: Migrate in Small Batches

Migrating 50 applications at once = guaranteed failure. Migrate 5 at a time.

Action: Wave-based migration, 1-2 week waves, validate each wave before next.

Lesson 4: Always Have a Rollback Plan

If you can't rollback in <15 minutes, don't start the migration.

Action: Documented rollback procedures, automated rollback scripts, tested before cutover.

Lesson 5: Communication Prevents Panic

Stakeholders panic when migrations exceed maintenance windows without updates.

Action: Real-time status updates, Slack channel for migration, hourly updates during cutover.

Lesson 6: Data Migration is Always the Bottleneck

Application migration: hours. Data migration: days/weeks.

Action: AWS DMS with CDC, parallel data sync, minimize downtime to final sync only.

Lesson 7: Testing After Migration is Critical

"It works in staging" ≠ "It works in production with real traffic"

Action: Smoke tests, performance tests, gradual traffic shifting (canary deployments).


Next: Real-World Production Example - Complete end-to-end landing zone implementation with full Terraform code, architecture diagrams, and lessons learned.

Last updated