Part 3: CI/CD Pipeline Best Practices

The Day Our Pipeline Saved Us from Disaster

A developer on my team once pushed a commit that accidentally deleted a critical database index migration. The code compiled, the unit tests passed, but deploying it would have caused massive performance degradation. Our CI/CD pipeline caught it—the integration tests failed because queries took 10+ seconds instead of milliseconds.

That pipeline didn't happen by accident. It was built through years of learning from failures, understanding what gates actually matter, and balancing speed with safety. This part covers the testing gates, quality checks, and promotion flows I use to keep bad code from reaching production.

The Anatomy of a Good CI/CD Pipeline

A robust CI/CD pipeline is a series of automated gates. Each gate represents a quality check. Code must pass all gates before progressing to the next environment. Here's the pipeline architecture I use:

Developer Push → GitHub
    ↓
[Build Stage]
    ↓
[Unit Tests Gate]  ← Fast feedback (< 2 min)
    ↓
[Code Quality Gate]  ← Linting, formatting, security scans
    ↓
[Integration Tests Gate]  ← Test with dependencies (< 5 min)
    ↓
[Container Build & Scan]  ← Build image, scan vulnerabilities
    ↓
[Deploy to Dev Environment]  ← Automatic deployment
    ↓
[Smoke Tests Gate]  ← Validate basic functionality
    ↓
[Deploy to Staging Environment]  ← Automatic deployment
    ↓
[End-to-End Tests Gate]  ← Full user journey tests
    ↓
[Performance Tests Gate]  ← Load and stress tests
    ↓
[Manual Approval Gate]  ← Product owner review (optional)
    ↓
[Deploy to Production]  ← Canary or blue/green
    ↓
[Production Validation]  ← Monitor metrics, automated checks

Not every project needs all these gates. Start with the essentials and add more based on failures you experience.

Building Blocks: Unit Tests Gate

Unit tests are your first line of defense. They're fast, isolated, and catch logic errors early. I require unit test coverage of at least 80% for new code (not the whole codebase—that's unrealistic for existing projects).

My GitHub Actions Unit Test Configuration

name: CI Pipeline

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main, develop ]

jobs:
  unit-tests:
    name: Unit Tests
    runs-on: ubuntu-latest
    timeout-minutes: 10
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
      
    - name: Set up Node.js
      uses: actions/setup-node@v4
      with:
        node-version: '20'
        cache: 'npm'
    
    - name: Install dependencies
      run: npm ci
      
    - name: Run unit tests
      run: npm run test:unit -- --coverage
      
    - name: Check coverage thresholds
      run: |
        npm run test:coverage:check
        
    - name: Upload coverage to Codecov
      uses: codecov/codecov-action@v3
      with:
        files: ./coverage/coverage-final.json
        flags: unittests
        fail_ci_if_error: true

Key decisions:

timeout-minutes: 10: If tests hang, fail fast
npm ci instead of npm install: Ensures reproducible builds
Coverage check: Fails if new code doesn't meet threshold
Upload to Codecov: Track coverage over time

Unit Test Strategy

I write unit tests that:

Test business logic, not implementation details: Don't test that a function was called—test that the right outcome happened
Are fast: Unit tests should complete in milliseconds, not seconds
Are isolated: No database, no network, no file system
Have clear names: shouldReturnErrorWhenUserNotFound() not testFindUser()

Example from a payment service:

describe('PaymentProcessor', () => {
  describe('processPayment', () => {
    it('should successfully process valid payment and return transaction ID', async () => {
      // Arrange
      const mockPaymentGateway = {
        charge: jest.fn().mockResolvedValue({ transactionId: 'txn_123', status: 'succeeded' })
      };
      const processor = new PaymentProcessor(mockPaymentGateway);
      const payment = { amount: 10000, currency: 'USD', customerId: 'cust_123' };
      
      // Act
      const result = await processor.processPayment(payment);
      
      // Assert
      expect(result.status).toBe('succeeded');
      expect(result.transactionId).toBe('txn_123');
      expect(mockPaymentGateway.charge).toHaveBeenCalledWith(payment);
    });
    
    it('should return error when payment amount is negative', async () => {
      // Arrange
      const processor = new PaymentProcessor(mockPaymentGateway);
      const payment = { amount: -100, currency: 'USD', customerId: 'cust_123' };
      
      // Act & Assert
      await expect(processor.processPayment(payment))
        .rejects
        .toThrow('Payment amount must be positive');
    });
  });
});

Code Quality Gate: Catching Issues Before Review

Code quality gates catch issues that tests might miss: security vulnerabilities, code smells, formatting inconsistencies, and complexity problems.

My Code Quality Pipeline

  code-quality:
    name: Code Quality Checks
    runs-on: ubuntu-latest
    timeout-minutes: 10
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
      with:
        fetch-depth: 0  # Full history for SonarQube
      
    - name: Run ESLint
      run: |
        npm ci
        npm run lint -- --max-warnings=0
        
    - name: Run Prettier check
      run: npm run format:check
      
    - name: Run TypeScript compiler
      run: npm run type-check
      
    - name: Security audit
      run: npm audit --audit-level=moderate
      
    - name: Dependency vulnerabilities scan
      uses: snyk/actions/node@master
      env:
        SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
      with:
        args: --severity-threshold=high
        
    - name: SonarQube Scan
      uses: sonarsource/sonarqube-scan-action@master
      env:
        SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
        SONAR_HOST_URL: ${{ secrets.SONAR_HOST_URL }}

What each check does:

ESLint: Catches common code issues and enforces style
Prettier: Ensures consistent formatting
TypeScript: Catches type errors at compile time
npm audit: Finds known vulnerabilities in dependencies
Snyk: More comprehensive dependency vulnerability scanning
SonarQube: Detects bugs, code smells, security issues, and tracks technical debt

Quality Gate Configuration

I configure SonarQube quality gates to fail the build if:

New code coverage < 80%
Duplicated code > 3%
Critical or blocker issues > 0
Security hotspots > 0
Maintainability rating < A

These aren't arbitrary—they're based on incidents we've had from skipping these checks.

Integration Tests Gate: Testing with Real Dependencies

Integration tests validate that components work together correctly. Unlike unit tests, these tests use real databases, message queues, and other services (usually through Docker containers).

Docker Compose for Integration Tests

# docker-compose.test.yml
version: '3.8'

services:
  postgres:
    image: postgres:15-alpine
    environment:
      POSTGRES_DB: testdb
      POSTGRES_USER: testuser
      POSTGRES_PASSWORD: testpass
    ports:
      - "5432:5432"
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U testuser"]
      interval: 5s
      timeout: 5s
      retries: 5
      
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 5
      
  rabbitmq:
    image: rabbitmq:3-management-alpine
    ports:
      - "5672:5672"
      - "15672:15672"
    healthcheck:
      test: ["CMD", "rabbitmq-diagnostics", "ping"]
      interval: 10s
      timeout: 10s
      retries: 5

GitHub Actions Integration Tests

  integration-tests:
    name: Integration Tests
    runs-on: ubuntu-latest
    timeout-minutes: 15
    needs: [unit-tests, code-quality]  # Run only after unit tests pass
    
    services:
      postgres:
        image: postgres:15-alpine
        env:
          POSTGRES_DB: testdb
          POSTGRES_USER: testuser
          POSTGRES_PASSWORD: testpass
        ports:
          - 5432:5432
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
          
      redis:
        image: redis:7-alpine
        ports:
          - 6379:6379
        options: >-
          --health-cmd "redis-cli ping"
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
      
    - name: Set up Node.js
      uses: actions/setup-node@v4
      with:
        node-version: '20'
        cache: 'npm'
    
    - name: Install dependencies
      run: npm ci
      
    - name: Run database migrations
      env:
        DATABASE_URL: postgresql://testuser:testpass@localhost:5432/testdb
      run: npm run db:migrate
      
    - name: Run integration tests
      env:
        DATABASE_URL: postgresql://testuser:testpass@localhost:5432/testdb
        REDIS_URL: redis://localhost:6379
      run: npm run test:integration
      
    - name: Upload test results
      if: always()
      uses: actions/upload-artifact@v3
      with:
        name: integration-test-results
        path: test-results/

Integration Test Example

describe('UserRepository Integration Tests', () => {
  let db: Database;
  let userRepo: UserRepository;
  
  beforeAll(async () => {
    // Connect to real test database
    db = await Database.connect(process.env.DATABASE_URL!);
    userRepo = new UserRepository(db);
  });
  
  afterAll(async () => {
    await db.close();
  });
  
  beforeEach(async () => {
    // Clean database before each test
    await db.query('TRUNCATE TABLE users CASCADE');
  });
  
  it('should create user and retrieve by email', async () => {
    // Arrange
    const userData = {
      email: '[email protected]',
      name: 'Test User',
      password: 'hashedpassword123'
    };
    
    // Act
    const createdUser = await userRepo.create(userData);
    const retrievedUser = await userRepo.findByEmail(userData.email);
    
    // Assert
    expect(retrievedUser).toBeDefined();
    expect(retrievedUser!.email).toBe(userData.email);
    expect(retrievedUser!.id).toBe(createdUser.id);
    expect(retrievedUser!.createdAt).toBeInstanceOf(Date);
  });
  
  it('should handle unique email constraint violation', async () => {
    // Arrange
    const userData = {
      email: '[email protected]',
      name: 'User One',
      password: 'pass123'
    };
    
    await userRepo.create(userData);
    
    // Act & Assert
    await expect(userRepo.create(userData))
      .rejects
      .toThrow('Email already exists');
  });
});

Container Build and Security Scanning

After tests pass, I build the container image and scan it for vulnerabilities. A vulnerable base image has caused production issues for me before—now I catch them in CI.

Multi-Stage Docker Build

# Build stage
FROM node:20-alpine AS builder

WORKDIR /app

COPY package*.json ./
RUN npm ci --only=production

COPY . .
RUN npm run build

# Production stage
FROM node:20-alpine

# Run as non-root user
RUN addgroup -g 1001 -S nodejs && adduser -S nodejs -u 1001

WORKDIR /app

# Copy only necessary files
COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist
COPY --from=builder --chown=nodejs:nodejs /app/node_modules ./node_modules
COPY --chown=nodejs:nodejs package*.json ./

USER nodejs

EXPOSE 8080

CMD ["node", "dist/index.js"]

Container Build and Scan Pipeline

  build-and-scan:
    name: Build and Scan Container
    runs-on: ubuntu-latest
    needs: [unit-tests, code-quality, integration-tests]
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
      
    - name: Set up Docker Buildx
      uses: docker/setup-buildx-action@v3
      
    - name: Log in to Container Registry
      uses: docker/login-action@v3
      with:
        registry: ghcr.io
        username: ${{ github.actor }}
        password: ${{ secrets.GITHUB_TOKEN }}
        
    - name: Extract metadata
      id: meta
      uses: docker/metadata-action@v5
      with:
        images: ghcr.io/${{ github.repository }}
        tags: |
          type=ref,event=branch
          type=ref,event=pr
          type=semver,pattern={{version}}
          type=sha,prefix={{branch}}-
          
    - name: Build container image
      uses: docker/build-push-action@v5
      with:
        context: .
        push: true
        tags: ${{ steps.meta.outputs.tags }}
        labels: ${{ steps.meta.outputs.labels }}
        cache-from: type=gha
        cache-to: type=gha,mode=max
        
    - name: Run Trivy vulnerability scanner
      uses: aquasecurity/trivy-action@master
      with:
        image-ref: ghcr.io/${{ github.repository }}:${{ github.sha }}
        format: 'sarif'
        output: 'trivy-results.sarif'
        severity: 'CRITICAL,HIGH'
        
    - name: Upload Trivy results to GitHub Security
      uses: github/codeql-action/upload-sarif@v2
      with:
        sarif_file: 'trivy-results.sarif'
        
    - name: Fail on critical vulnerabilities
      uses: aquasecurity/trivy-action@master
      with:
        image-ref: ghcr.io/${{ github.repository }}:${{ github.sha }}
        format: 'table'
        exit-code: '1'
        severity: 'CRITICAL'

This fails the build if critical vulnerabilities are found. High-severity vulnerabilities create alerts but don't block deployments (we review these in weekly security meetings).

Environment Promotion Flows

Code progresses through environments: Dev → Staging → Production. Each environment has different promotion criteria.

Development Environment

Trigger: Automatic on merge to develop branch Requirements: All tests and quality gates pass Purpose: Validate changes in environment closest to production

  deploy-dev:
    name: Deploy to Development
    runs-on: ubuntu-latest
    needs: [build-and-scan]
    if: github.ref == 'refs/heads/develop'
    
    environment:
      name: development
      url: https://dev.myapp.com
      
    steps:
    - name: Checkout GitOps repo
      uses: actions/checkout@v4
      with:
        repository: myorg/gitops-repo
        token: ${{ secrets.GITOPS_TOKEN }}
        
    - name: Update image tag in dev overlay
      run: |
        cd overlays/development
        kustomize edit set image myapp=ghcr.io/${{ github.repository }}:${{ github.sha }}
        
    - name: Commit and push changes
      run: |
        git config user.name "GitHub Actions"
        git config user.email "[email protected]"
        git add overlays/development/kustomization.yaml
        git commit -m "Update dev image to ${{ github.sha }}"
        git push
        
    - name: Wait for ArgoCD sync
      run: |
        argocd app wait myapp-dev --timeout 300
        
    - name: Run smoke tests
      run: |
        npm run test:smoke -- --baseUrl=https://dev.myapp.com

Staging Environment

Trigger: Manual approval or automatic on merge to main Requirements: All dev tests pass + end-to-end tests pass Purpose: Final validation before production with production-like data

  deploy-staging:
    name: Deploy to Staging
    runs-on: ubuntu-latest
    needs: [deploy-dev]
    if: github.ref == 'refs/heads/main'
    
    environment:
      name: staging
      url: https://staging.myapp.com
      
    steps:
    - name: Checkout GitOps repo
      uses: actions/checkout@v4
      with:
        repository: myorg/gitops-repo
        token: ${{ secrets.GITOPS_TOKEN }}
        
    - name: Update image tag in staging overlay
      run: |
        cd overlays/staging
        kustomize edit set image myapp=ghcr.io/${{ github.repository }}:${{ github.sha }}
        
    - name: Commit and push changes
      run: |
        git config user.name "GitHub Actions"
        git config user.email "[email protected]"
        git add overlays/staging/kustomization.yaml
        git commit -m "Update staging image to ${{ github.sha }}"
        git push
        
    - name: Wait for ArgoCD sync
      run: |
        argocd app wait myapp-staging --timeout 300
        
    - name: Run end-to-end tests
      run: |
        npm run test:e2e -- --baseUrl=https://staging.myapp.com
        
    - name: Run performance tests
      run: |
        k6 run tests/load-tests/standard-load.js

Production Environment

Trigger: Manual approval required Requirements: All staging tests pass + security review (for sensitive changes) Purpose: Serve real users

  deploy-production:
    name: Deploy to Production
    runs-on: ubuntu-latest
    needs: [deploy-staging]
    if: github.ref == 'refs/heads/main'
    
    environment:
      name: production
      url: https://myapp.com
      
    steps:
    - name: Checkout GitOps repo
      uses: actions/checkout@v4
      with:
        repository: myorg/gitops-repo
        token: ${{ secrets.GITOPS_TOKEN }}
        
    - name: Update image tag in production overlay
      run: |
        cd overlays/production
        kustomize edit set image myapp=ghcr.io/${{ github.repository }}:${{ github.sha }}
        
    - name: Commit and push changes
      run: |
        git config user.name "GitHub Actions"
        git config user.email "[email protected]"
        git add overlays/production/kustomization.yaml
        git commit -m "Update production image to ${{ github.sha }}"
        git push
        
    - name: Wait for ArgoCD canary rollout
      run: |
        argocd app wait myapp-production --timeout 600
        
    - name: Monitor error rates
      run: |
        ./scripts/monitor-deployment.sh --duration=15m --threshold=1%
        
    - name: Notify team
      uses: 8398a7/action-slack@v3
      with:
        status: ${{ job.status }}
        text: 'Production deployment ${{ github.sha }} completed'
        webhook_url: ${{ secrets.SLACK_WEBHOOK }}

Manual Approval Gates

For production deployments, I require manual approval. This gives stakeholders visibility and control.

GitHub Environment Protection Rules

In GitHub repository settings → Environments → production:

✅ Required reviewers: @tech-lead, @product-owner
✅ Wait timer: 5 minutes (time to review staging)
✅ Deployment branches: main only

environment:
  name: production
  url: https://myapp.com
  # GitHub automatically pauses here for approval

Fast Feedback: Parallel Execution

Run independent checks in parallel to get faster feedback:

name: CI Pipeline

on:
  pull_request:
    branches: [ main ]

jobs:
  # These run in parallel
  unit-tests:
    runs-on: ubuntu-latest
    steps: [...]
    
  code-quality:
    runs-on: ubuntu-latest
    steps: [...]
    
  lint:
    runs-on: ubuntu-latest
    steps: [...]
    
  # This waits for all parallel jobs
  integration-tests:
    needs: [unit-tests, code-quality, lint]
    runs-on: ubuntu-latest
    steps: [...]

With parallelization, my pipeline went from 15 minutes to 6 minutes—fast enough that developers don't context-switch while waiting.

Lessons Learned

Lesson 1: Fail Fast and Loud

Put fastest checks first. If linting fails, don't waste time running integration tests. Order gates by:

Speed (fastest first)
Likelihood of failure (most likely first)

Lesson 2: Flaky Tests Are Worse Than No Tests

I once had a test that passed 95% of the time. Developers started re-running CI until it passed. Now I have a zero-tolerance policy for flaky tests—fix them immediately or delete them.

Lesson 3: Don't Over-Gate

Early in my career, I added 12 quality gates. CI took 45 minutes. Developers bypassed it with --no-verify. Less is more—focus on gates that catch real issues.

Lesson 4: Monitor Pipeline Performance

Track metrics:

Average pipeline duration
Pass/fail rate by gate
Most common failure reasons
Time to fix failures

I use these metrics to optimize the pipeline quarterly.

Key Takeaways

Build a progressive pipeline: Fast checks first, slow checks later
Each gate must provide value: If a gate never catches issues, remove it
Parallel execution reduces feedback time: Run independent checks simultaneously
Security scanning is non-negotiable: Catch vulnerabilities in CI, not production
Environment promotion should be automatic with manual production gate: Reduces toil while maintaining control
Monitor your pipeline like you monitor production: Slow or flaky pipelines kill productivity

In the next part, we'll integrate these pipelines with real tools: Jira for tracking, GitHub Actions for CI/CD, ArgoCD for GitOps, and Kubernetes for deployment orchestration.

Previous: Part 2: Deployment Strategies - Blue/Green, Canary, and Rollbacks Next: Part 4: Release Management with Modern Tools

PreviousPart 2: Deployment Strategies NextPart 4: Release Management with Modern Tools

Last updated 19 hours ago

hashtagThe Day Our Pipeline Saved Us from Disaster

hashtagThe Anatomy of a Good CI/CD Pipeline

hashtagBuilding Blocks: Unit Tests Gate

hashtagMy GitHub Actions Unit Test Configuration

hashtagUnit Test Strategy

hashtagCode Quality Gate: Catching Issues Before Review

hashtagMy Code Quality Pipeline

hashtagQuality Gate Configuration

hashtagIntegration Tests Gate: Testing with Real Dependencies

hashtagDocker Compose for Integration Tests

hashtagGitHub Actions Integration Tests

hashtagIntegration Test Example

hashtagContainer Build and Security Scanning

hashtagMulti-Stage Docker Build

hashtagContainer Build and Scan Pipeline

hashtagEnvironment Promotion Flows

hashtagDevelopment Environment

hashtagStaging Environment

hashtagProduction Environment

hashtagManual Approval Gates

hashtagGitHub Environment Protection Rules

hashtagFast Feedback: Parallel Execution

hashtagLessons Learned

hashtagLesson 1: Fail Fast and Loud

hashtagLesson 2: Flaky Tests Are Worse Than No Tests

hashtagLesson 3: Don't Over-Gate

hashtagLesson 4: Monitor Pipeline Performance

hashtagKey Takeaways