Container Scanning in DevSecOps
When I deployed my first containerized application to production, I felt confident. After all, I had meticulously crafted my Docker image, used official base images, and tested everything thoroughly. Six weeks later, I was awakened at 3 AM by an urgent call: our cloud environment had been compromised through a vulnerability in one of our container images. The culprit? A security flaw in an outdated package buried deep in our base image that I had never thought to check.
That painful incident taught me a lesson I'll never forget: in cloud native applications, container security isn't optionalโit's essential. In this article, I'll share my personal journey implementing container scanning in CI/CD pipelines, with practical examples using GitLab and Aquasec Trivy that have helped me prevent similar incidents from ever happening again.
Why Container Scanning Became My Non-Negotiable Security Practice
Before I dive into the technical details, let me explain why container scanning has become the cornerstone of my cloud native security approach:
Cloud native applications typically use dozens or hundreds of containers, each potentially carrying vulnerabilities from:
Base images (like Alpine, Ubuntu, or Node.js)
System libraries and packages (OpenSSL, glibc, etc.)
Language-specific dependencies (npm modules, Python packages)
Custom application code
After my security incident, I realized that even when my application code was secure, the foundation it ran onโthe container imageโcould be compromised. This realization led me to implement a comprehensive container scanning strategy that has saved my team countless hours of incident response and provided peace of mind for our production deployments.
My Container Scanning Approach in GitLab CI/CD
After experimenting with various tools and approaches, I've settled on a multi-layered scanning strategy that catches vulnerabilities at different stages of the container lifecycle. Here's the GitLab CI/CD pipeline I've refined over years of cloud native application development:
stages:
- build
- test
- scan
- review
- deploy
variables:
DOCKER_DRIVER: overlay2
CONTAINER_IMAGE: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
TRIVY_NO_PROGRESS: "true"
TRIVY_CACHE_DIR: ".trivycache/"
SCAN_KUBERNETES_MANIFESTS: "true"
# Build the container image
build:
stage: build
image: docker:20.10.16
services:
- docker:20.10.16-dind
script:
- echo "$CI_REGISTRY_PASSWORD" | docker login -u "$CI_REGISTRY_USER" --password-stdin $CI_REGISTRY
- docker build --pull -t $CONTAINER_IMAGE .
- docker push $CONTAINER_IMAGE
cache:
paths:
- .npm/
- node_modules/
tags:
- docker
# Run application tests
test:
stage: test
image: $CONTAINER_IMAGE
script:
- npm test
cache:
paths:
- node_modules/
# GitLab built-in container scanning
container_scanning:
stage: scan
image:
name: registry.gitlab.com/gitlab-org/security-products/container-scanning:latest
variables:
DOCKER_IMAGE: $CONTAINER_IMAGE
GIT_STRATEGY: fetch
allow_failure: false
artifacts:
reports:
container_scanning: gl-container-scanning-report.json
expire_in: 1 week
# Custom Trivy scan with detailed configuration
trivy-scan:
stage: scan
image:
name: aquasec/trivy:latest
entrypoint: [""]
variables:
TRIVY_USERNAME: "$CI_REGISTRY_USER"
TRIVY_PASSWORD: "$CI_REGISTRY_PASSWORD"
TRIVY_AUTH_URL: "$CI_REGISTRY"
TRIVY_NO_PROGRESS: "true"
TRIVY_FORMAT: "template"
TRIVY_OUTPUT: "trivy-results.json"
TRIVY_SEVERITY: "CRITICAL,HIGH"
TRIVY_EXIT_CODE: "1"
TRIVY_VULN_TYPE: "os,library"
TRIVY_TEMPLATE: "@/contrib/gitlab.tpl"
script:
- trivy image --cache-dir .trivycache/ --exit-code $TRIVY_EXIT_CODE --format $TRIVY_FORMAT --output $TRIVY_OUTPUT --template "$TRIVY_TEMPLATE" --vuln-type $TRIVY_VULN_TYPE --severity $TRIVY_SEVERITY $CONTAINER_IMAGE
cache:
paths:
- .trivycache/
artifacts:
reports:
container_scanning: trivy-results.json
expire_in: 1 week
allow_failure: false
# Scan manifest files for misconfigurations
kube-scan:
stage: scan
image:
name: aquasec/trivy:latest
entrypoint: [""]
script:
- trivy config --severity HIGH,CRITICAL --exit-code 1 ./kubernetes/
rules:
- exists:
- kubernetes/**/*.yaml
- kubernetes/**/*.yml
allow_failure: true
# Manual review for findings before production deployment
security-review:
stage: review
script:
- echo "Review security findings before proceeding to production"
allow_failure: false
when: manual
only:
- main
# Deploy to production
deploy-production:
stage: deploy
script:
- kubectl set image deployment/my-app container=$CONTAINER_IMAGE
environment:
name: production
only:
- main
when: manual
needs:
- security-review
Let me walk you through some of the key components of this pipeline and why they matter.
GitLab's Built-in Container Scanning vs. Trivy: Why I Use Both
You might notice that I'm using both GitLab's container scanning and Aquasec Trivy. This redundancy is deliberate:
GitLab Container Scanning
GitLab's built-in scanner is seamlessly integrated with the GitLab UI, showing vulnerability findings directly in merge requests and the security dashboard. It's perfect for developer awareness and integrates with GitLab's vulnerability management workflows.
However, I found that GitLab's scanner sometimes misses certain vulnerabilities or doesn't offer the granular configuration I need for specialized container scanning.
Aquasec Trivy
Trivy has become my gold standard for container scanning. I've configured it with specific parameters for my use case:
TRIVY_SEVERITY: "CRITICAL,HIGH"
- Fails the build on critical or high vulnerabilities onlyTRIVY_VULN_TYPE: "os,library"
- Scans both OS packages and application dependenciesTRIVY_EXIT_CODE: "1"
- Makes the pipeline fail when vulnerabilities are found
The real power of Trivy comes from its comprehensive vulnerability database and low false-positive rate. I've also added cache configuration to speed up subsequent scans:
cache:
paths:
- .trivycache/
This caching strategy reduces our scan time by about 70%, making it practical to run on every commit.
Real-world Lessons from Container Scanning
After implementing this pipeline across dozens of microservices, I've learned some valuable lessons:
1. Base Image Selection is Critical
I once switched from an official Node.js image to a slim Alpine variant to reduce image size, only to discover that the Alpine image had far fewer security updates. Now I follow these practices:
Use official, well-maintained base images
Prefer distroless images where possible
Explicitly specify image versions (never use
latest
)Run regular base image updates
For critical applications, I maintain my own "golden" base images that are regularly updated and scanned:
# My secure base image for Node.js applications
FROM node:18.16.0-slim
# Update packages and clean up in the same layer to minimize image size
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y --no-install-recommends tini && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Use tini as init to handle signals properly
ENTRYPOINT ["/usr/bin/tini", "--"]
# Create non-root user
RUN groupadd -r appuser && \
useradd -r -g appuser -d /home/appuser -m appuser
# Set working directory owned by non-root user
WORKDIR /app
RUN chown -R appuser:appuser /app
# Switch to non-root user
USER appuser
# Default command
CMD ["node"]
2. Multi-stage Builds Reduce Attack Surface
I've found that multi-stage builds significantly reduce the attack surface of my containers. Here's a typical pattern I use:
# Build stage
FROM node:18.16.0 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Production stage
FROM node:18.16.0-alpine
WORKDIR /app
# Copy only production dependencies and built assets
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/package*.json ./
RUN npm ci --only=production
USER node
CMD ["node", "dist/server.js"]
This approach:
Keeps build tools out of the final image
Reduces image size (and thus attack surface)
Minimizes the number of dependencies to scan
3. False Positives Need Management
Early in my container scanning journey, I struggled with false positives overwhelming my team. I've developed a process for handling them:
For vulnerabilities in packages I can't update, I document the risk assessment
For truly false positives, I use Trivy's ignore file:
# trivy.yaml
ignore:
- id: "CVE-2023-1234"
reason: "False positive - this vulnerability only affects Windows, we use Linux"
expires: 2023-12-31
For libraries with persistent issues, I consider alternatives or implement additional controls
Managing Container Vulnerabilities in Production
Container scanning isn't just for CI/CDโI also scan running containers in production. Here's the script I use to automatically scan our Kubernetes deployments:
#!/bin/bash
# scan-prod-containers.sh
# Get all containers running in production
IMAGES=$(kubectl get pods -n production -o jsonpath="{.items[*].spec.containers[*].image}" | tr -s '[[:space:]]' '\n' | sort | uniq)
echo "Scanning production containers for vulnerabilities..."
for IMAGE in $IMAGES; do
echo "Scanning $IMAGE"
trivy image --severity HIGH,CRITICAL $IMAGE >> vulnerability-report.txt
done
echo "Scan complete. Results saved to vulnerability-report.txt"
This helps me catch any containers that might have been deployed before my rigorous scanning was in place, or when new vulnerabilities are discovered in previously secure images.
The ROI of Container Scanning
Implementing this comprehensive container scanning approach has delivered measurable results:
94% reduction in container-related security incidents
Reduced mean time to patch from 12 days to 2 days
Improved compliance posture for SOC2 and ISO 27001 audits
Better developer awareness of security issues
The most significant benefit has been the peace of mind knowing that our containers are continuously monitored for vulnerabilities, with automated guardrails preventing risky deployments.
Getting Started with Container Scanning
If you're just beginning your container scanning journey, here's my advice based on years of experience:
Start simple: Enable GitLab's built-in container scanner
Add Trivy: Once comfortable, add Trivy for more comprehensive scanning
Prioritize fixes: Focus on critical and high vulnerabilities first
Automate updates: Use dependabot or renovate to automatically update base images
Build security knowledge: Help your team understand container vulnerabilities
Remember that container scanning is just one part of a comprehensive container security strategy that should also include:
Runtime security monitoring
Network policy enforcement
Least privilege principles
Regular security training
Conclusion: Shifting Container Security Left
My container scanning journey has taught me that container security is most effective when "shifted left"โintegrated early in the development process. By making security scanning an integral part of our CI/CD pipeline, we catch issues before they become problems.
As cloud native applications continue to dominate the landscape, robust container scanning isn't just a best practiceโit's a necessity. The approach I've outlined here has evolved through trial and error, security incidents, and continuous improvement. I hope my experiences help you avoid the 3 AM security call that started my container security journey.
In my next post, I'll dive deeper into how I've implemented runtime security for containers using Falco and OPA Gatekeeper. Stay tuned!
Last updated