Part 3: CI/CD Pipeline Best Practices

The Day Our Pipeline Saved Us from Disaster

A developer on my team once pushed a commit that accidentally deleted a critical database index migration. The code compiled, the unit tests passed, but deploying it would have caused massive performance degradation. Our CI/CD pipeline caught itβ€”the integration tests failed because queries took 10+ seconds instead of milliseconds.

That pipeline didn't happen by accident. It was built through years of learning from failures, understanding what gates actually matter, and balancing speed with safety. This part covers the testing gates, quality checks, and promotion flows I use to keep bad code from reaching production.

The Anatomy of a Good CI/CD Pipeline

A robust CI/CD pipeline is a series of automated gates. Each gate represents a quality check. Code must pass all gates before progressing to the next environment. Here's the pipeline architecture I use:

Developer Push β†’ GitHub
    ↓
[Build Stage]
    ↓
[Unit Tests Gate]  ← Fast feedback (< 2 min)
    ↓
[Code Quality Gate]  ← Linting, formatting, security scans
    ↓
[Integration Tests Gate]  ← Test with dependencies (< 5 min)
    ↓
[Container Build & Scan]  ← Build image, scan vulnerabilities
    ↓
[Deploy to Dev Environment]  ← Automatic deployment
    ↓
[Smoke Tests Gate]  ← Validate basic functionality
    ↓
[Deploy to Staging Environment]  ← Automatic deployment
    ↓
[End-to-End Tests Gate]  ← Full user journey tests
    ↓
[Performance Tests Gate]  ← Load and stress tests
    ↓
[Manual Approval Gate]  ← Product owner review (optional)
    ↓
[Deploy to Production]  ← Canary or blue/green
    ↓
[Production Validation]  ← Monitor metrics, automated checks

Not every project needs all these gates. Start with the essentials and add more based on failures you experience.

Building Blocks: Unit Tests Gate

Unit tests are your first line of defense. They're fast, isolated, and catch logic errors early. I require unit test coverage of at least 80% for new code (not the whole codebaseβ€”that's unrealistic for existing projects).

My GitHub Actions Unit Test Configuration

Key decisions:

  • timeout-minutes: 10: If tests hang, fail fast

  • npm ci instead of npm install: Ensures reproducible builds

  • Coverage check: Fails if new code doesn't meet threshold

  • Upload to Codecov: Track coverage over time

Unit Test Strategy

I write unit tests that:

  1. Test business logic, not implementation details: Don't test that a function was calledβ€”test that the right outcome happened

  2. Are fast: Unit tests should complete in milliseconds, not seconds

  3. Are isolated: No database, no network, no file system

  4. Have clear names: shouldReturnErrorWhenUserNotFound() not testFindUser()

Example from a payment service:

Code Quality Gate: Catching Issues Before Review

Code quality gates catch issues that tests might miss: security vulnerabilities, code smells, formatting inconsistencies, and complexity problems.

My Code Quality Pipeline

What each check does:

  • ESLint: Catches common code issues and enforces style

  • Prettier: Ensures consistent formatting

  • TypeScript: Catches type errors at compile time

  • npm audit: Finds known vulnerabilities in dependencies

  • Snyk: More comprehensive dependency vulnerability scanning

  • SonarQube: Detects bugs, code smells, security issues, and tracks technical debt

Quality Gate Configuration

I configure SonarQube quality gates to fail the build if:

  • New code coverage < 80%

  • Duplicated code > 3%

  • Critical or blocker issues > 0

  • Security hotspots > 0

  • Maintainability rating < A

These aren't arbitraryβ€”they're based on incidents we've had from skipping these checks.

Integration Tests Gate: Testing with Real Dependencies

Integration tests validate that components work together correctly. Unlike unit tests, these tests use real databases, message queues, and other services (usually through Docker containers).

Docker Compose for Integration Tests

GitHub Actions Integration Tests

Integration Test Example

Container Build and Security Scanning

After tests pass, I build the container image and scan it for vulnerabilities. A vulnerable base image has caused production issues for me beforeβ€”now I catch them in CI.

Multi-Stage Docker Build

Container Build and Scan Pipeline

This fails the build if critical vulnerabilities are found. High-severity vulnerabilities create alerts but don't block deployments (we review these in weekly security meetings).

Environment Promotion Flows

Code progresses through environments: Dev β†’ Staging β†’ Production. Each environment has different promotion criteria.

Development Environment

Trigger: Automatic on merge to develop branch Requirements: All tests and quality gates pass Purpose: Validate changes in environment closest to production

Staging Environment

Trigger: Manual approval or automatic on merge to main Requirements: All dev tests pass + end-to-end tests pass Purpose: Final validation before production with production-like data

Production Environment

Trigger: Manual approval required Requirements: All staging tests pass + security review (for sensitive changes) Purpose: Serve real users

Manual Approval Gates

For production deployments, I require manual approval. This gives stakeholders visibility and control.

GitHub Environment Protection Rules

In GitHub repository settings β†’ Environments β†’ production:

  • βœ… Required reviewers: @tech-lead, @product-owner

  • βœ… Wait timer: 5 minutes (time to review staging)

  • βœ… Deployment branches: main only

Fast Feedback: Parallel Execution

Run independent checks in parallel to get faster feedback:

With parallelization, my pipeline went from 15 minutes to 6 minutesβ€”fast enough that developers don't context-switch while waiting.

Lessons Learned

Lesson 1: Fail Fast and Loud

Put fastest checks first. If linting fails, don't waste time running integration tests. Order gates by:

  1. Speed (fastest first)

  2. Likelihood of failure (most likely first)

Lesson 2: Flaky Tests Are Worse Than No Tests

I once had a test that passed 95% of the time. Developers started re-running CI until it passed. Now I have a zero-tolerance policy for flaky testsβ€”fix them immediately or delete them.

Lesson 3: Don't Over-Gate

Early in my career, I added 12 quality gates. CI took 45 minutes. Developers bypassed it with --no-verify. Less is moreβ€”focus on gates that catch real issues.

Lesson 4: Monitor Pipeline Performance

Track metrics:

  • Average pipeline duration

  • Pass/fail rate by gate

  • Most common failure reasons

  • Time to fix failures

I use these metrics to optimize the pipeline quarterly.

Key Takeaways

  1. Build a progressive pipeline: Fast checks first, slow checks later

  2. Each gate must provide value: If a gate never catches issues, remove it

  3. Parallel execution reduces feedback time: Run independent checks simultaneously

  4. Security scanning is non-negotiable: Catch vulnerabilities in CI, not production

  5. Environment promotion should be automatic with manual production gate: Reduces toil while maintaining control

  6. Monitor your pipeline like you monitor production: Slow or flaky pipelines kill productivity

In the next part, we'll integrate these pipelines with real tools: Jira for tracking, GitHub Actions for CI/CD, ArgoCD for GitOps, and Kubernetes for deployment orchestration.


Previous: Part 2: Deployment Strategies - Blue/Green, Canary, and Rollbacks Next: Part 4: Release Management with Modern Tools

Last updated