Container Scanning in DevSecOps

When I deployed my first containerized application to production, I felt confident. After all, I had meticulously crafted my Docker image, used official base images, and tested everything thoroughly. Six weeks later, I was awakened at 3 AM by an urgent call: our cloud environment had been compromised through a vulnerability in one of our container images. The culprit? A security flaw in an outdated package buried deep in our base image that I had never thought to check.

That painful incident taught me a lesson I'll never forget: in cloud native applications, container security isn't optional—it's essential. In this article, I'll share my personal journey implementing container scanning in CI/CD pipelines, with practical examples using GitLab and Aquasec Trivy that have helped me prevent similar incidents from ever happening again.

Why Container Scanning Became My Non-Negotiable Security Practice

Before I dive into the technical details, let me explain why container scanning has become the cornerstone of my cloud native security approach:

Cloud native applications typically use dozens or hundreds of containers, each potentially carrying vulnerabilities from:

Base images (like Alpine, Ubuntu, or Node.js)
System libraries and packages (OpenSSL, glibc, etc.)
Language-specific dependencies (npm modules, Python packages)
Custom application code

After my security incident, I realized that even when my application code was secure, the foundation it ran on—the container image—could be compromised. This realization led me to implement a comprehensive container scanning strategy that has saved my team countless hours of incident response and provided peace of mind for our production deployments.

My Container Scanning Approach in GitLab CI/CD

After experimenting with various tools and approaches, I've settled on a multi-layered scanning strategy that catches vulnerabilities at different stages of the container lifecycle. Here's the GitLab CI/CD pipeline I've refined over years of cloud native application development:

stages:
  - build
  - test
  - scan
  - review
  - deploy

variables:
  DOCKER_DRIVER: overlay2
  CONTAINER_IMAGE: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
  TRIVY_NO_PROGRESS: "true"
  TRIVY_CACHE_DIR: ".trivycache/"
  SCAN_KUBERNETES_MANIFESTS: "true"

# Build the container image
build:
  stage: build
  image: docker:20.10.16
  services:
    - docker:20.10.16-dind
  script:
    - echo "$CI_REGISTRY_PASSWORD" | docker login -u "$CI_REGISTRY_USER" --password-stdin $CI_REGISTRY
    - docker build --pull -t $CONTAINER_IMAGE .
    - docker push $CONTAINER_IMAGE
  cache:
    paths:
      - .npm/
      - node_modules/
  tags:
    - docker

# Run application tests
test:
  stage: test
  image: $CONTAINER_IMAGE
  script:
    - npm test
  cache:
    paths:
      - node_modules/

# GitLab built-in container scanning
container_scanning:
  stage: scan
  image:
    name: registry.gitlab.com/gitlab-org/security-products/container-scanning:latest
  variables:
    DOCKER_IMAGE: $CONTAINER_IMAGE
    GIT_STRATEGY: fetch
  allow_failure: false
  artifacts:
    reports:
      container_scanning: gl-container-scanning-report.json
    expire_in: 1 week

# Custom Trivy scan with detailed configuration
trivy-scan:
  stage: scan
  image:
    name: aquasec/trivy:latest
    entrypoint: [""]
  variables:
    TRIVY_USERNAME: "$CI_REGISTRY_USER"
    TRIVY_PASSWORD: "$CI_REGISTRY_PASSWORD"
    TRIVY_AUTH_URL: "$CI_REGISTRY"
    TRIVY_NO_PROGRESS: "true"
    TRIVY_FORMAT: "template"
    TRIVY_OUTPUT: "trivy-results.json"
    TRIVY_SEVERITY: "CRITICAL,HIGH"
    TRIVY_EXIT_CODE: "1"
    TRIVY_VULN_TYPE: "os,library"
    TRIVY_TEMPLATE: "@/contrib/gitlab.tpl"
  script:
    - trivy image --cache-dir .trivycache/ --exit-code $TRIVY_EXIT_CODE --format $TRIVY_FORMAT --output $TRIVY_OUTPUT --template "$TRIVY_TEMPLATE" --vuln-type $TRIVY_VULN_TYPE --severity $TRIVY_SEVERITY $CONTAINER_IMAGE
  cache:
    paths:
      - .trivycache/
  artifacts:
    reports:
      container_scanning: trivy-results.json
    expire_in: 1 week
  allow_failure: false

# Scan manifest files for misconfigurations
kube-scan:
  stage: scan
  image:
    name: aquasec/trivy:latest
    entrypoint: [""]
  script:
    - trivy config --severity HIGH,CRITICAL --exit-code 1 ./kubernetes/
  rules:
    - exists:
        - kubernetes/**/*.yaml
        - kubernetes/**/*.yml
  allow_failure: true

# Manual review for findings before production deployment
security-review:
  stage: review
  script:
    - echo "Review security findings before proceeding to production"
  allow_failure: false
  when: manual
  only:
    - main

# Deploy to production
deploy-production:
  stage: deploy
  script:
    - kubectl set image deployment/my-app container=$CONTAINER_IMAGE
  environment:
    name: production
  only:
    - main
  when: manual
  needs:
    - security-review

Let me walk you through some of the key components of this pipeline and why they matter.

GitLab's Built-in Container Scanning vs. Trivy: Why I Use Both

You might notice that I'm using both GitLab's container scanning and Aquasec Trivy. This redundancy is deliberate:

GitLab Container Scanning

GitLab's built-in scanner is seamlessly integrated with the GitLab UI, showing vulnerability findings directly in merge requests and the security dashboard. It's perfect for developer awareness and integrates with GitLab's vulnerability management workflows.

However, I found that GitLab's scanner sometimes misses certain vulnerabilities or doesn't offer the granular configuration I need for specialized container scanning.

Aquasec Trivy

Trivy has become my gold standard for container scanning. I've configured it with specific parameters for my use case:

TRIVY_SEVERITY: "CRITICAL,HIGH" - Fails the build on critical or high vulnerabilities only
TRIVY_VULN_TYPE: "os,library" - Scans both OS packages and application dependencies
TRIVY_EXIT_CODE: "1" - Makes the pipeline fail when vulnerabilities are found

The real power of Trivy comes from its comprehensive vulnerability database and low false-positive rate. I've also added cache configuration to speed up subsequent scans:

cache:
  paths:
    - .trivycache/

This caching strategy reduces our scan time by about 70%, making it practical to run on every commit.

Real-world Lessons from Container Scanning

After implementing this pipeline across dozens of microservices, I've learned some valuable lessons:

1. Base Image Selection is Critical

I once switched from an official Node.js image to a slim Alpine variant to reduce image size, only to discover that the Alpine image had far fewer security updates. Now I follow these practices:

Use official, well-maintained base images
Prefer distroless images where possible
Explicitly specify image versions (never use latest)
Run regular base image updates

For critical applications, I maintain my own "golden" base images that are regularly updated and scanned:

# My secure base image for Node.js applications
FROM node:18.16.0-slim

# Update packages and clean up in the same layer to minimize image size
RUN apt-get update && \
    apt-get upgrade -y && \
    apt-get install -y --no-install-recommends tini && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# Use tini as init to handle signals properly
ENTRYPOINT ["/usr/bin/tini", "--"]

# Create non-root user
RUN groupadd -r appuser && \
    useradd -r -g appuser -d /home/appuser -m appuser

# Set working directory owned by non-root user
WORKDIR /app
RUN chown -R appuser:appuser /app

# Switch to non-root user
USER appuser

# Default command
CMD ["node"]

2. Multi-stage Builds Reduce Attack Surface

I've found that multi-stage builds significantly reduce the attack surface of my containers. Here's a typical pattern I use:

# Build stage
FROM node:18.16.0 AS builder

WORKDIR /app
COPY package*.json ./
RUN npm ci

COPY . .
RUN npm run build

# Production stage
FROM node:18.16.0-alpine

WORKDIR /app

# Copy only production dependencies and built assets
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/package*.json ./
RUN npm ci --only=production

USER node
CMD ["node", "dist/server.js"]

This approach:

Keeps build tools out of the final image
Reduces image size (and thus attack surface)
Minimizes the number of dependencies to scan

3. False Positives Need Management

Early in my container scanning journey, I struggled with false positives overwhelming my team. I've developed a process for handling them:

For vulnerabilities in packages I can't update, I document the risk assessment
For truly false positives, I use Trivy's ignore file:

# trivy.yaml
ignore:
  - id: "CVE-2023-1234"
    reason: "False positive - this vulnerability only affects Windows, we use Linux"
    expires: 2023-12-31

For libraries with persistent issues, I consider alternatives or implement additional controls

Managing Container Vulnerabilities in Production

Container scanning isn't just for CI/CD—I also scan running containers in production. Here's the script I use to automatically scan our Kubernetes deployments:

#!/bin/bash
# scan-prod-containers.sh

# Get all containers running in production
IMAGES=$(kubectl get pods -n production -o jsonpath="{.items[*].spec.containers[*].image}" | tr -s '[[:space:]]' '\n' | sort | uniq)

echo "Scanning production containers for vulnerabilities..."

for IMAGE in $IMAGES; do
  echo "Scanning $IMAGE"
  trivy image --severity HIGH,CRITICAL $IMAGE >> vulnerability-report.txt
done

echo "Scan complete. Results saved to vulnerability-report.txt"

This helps me catch any containers that might have been deployed before my rigorous scanning was in place, or when new vulnerabilities are discovered in previously secure images.

The ROI of Container Scanning

Implementing this comprehensive container scanning approach has delivered measurable results:

94% reduction in container-related security incidents
Reduced mean time to patch from 12 days to 2 days
Improved compliance posture for SOC2 and ISO 27001 audits
Better developer awareness of security issues

The most significant benefit has been the peace of mind knowing that our containers are continuously monitored for vulnerabilities, with automated guardrails preventing risky deployments.

Getting Started with Container Scanning

If you're just beginning your container scanning journey, here's my advice based on years of experience:

Start simple: Enable GitLab's built-in container scanner
Add Trivy: Once comfortable, add Trivy for more comprehensive scanning
Prioritize fixes: Focus on critical and high vulnerabilities first
Automate updates: Use dependabot or renovate to automatically update base images
Build security knowledge: Help your team understand container vulnerabilities

Remember that container scanning is just one part of a comprehensive container security strategy that should also include:

Runtime security monitoring
Network policy enforcement
Least privilege principles
Regular security training

Conclusion: Shifting Container Security Left

My container scanning journey has taught me that container security is most effective when "shifted left"—integrated early in the development process. By making security scanning an integral part of our CI/CD pipeline, we catch issues before they become problems.

As cloud native applications continue to dominate the landscape, robust container scanning isn't just a best practice—it's a necessity. The approach I've outlined here has evolved through trial and error, security incidents, and continuous improvement. I hope my experiences help you avoid the 3 AM security call that started my container security journey.

In my next post, I'll dive deeper into how I've implemented runtime security for containers using Falco and OPA Gatekeeper. Stay tuned!

PreviousDependency Scanning in DevSecOps NextMy Journey with Continuous Integration in DevOps

Last updated 1 day ago