Containerization

Container Security Best Practices

A comprehensive guide to container security covering minimal base images, non-root users, image scanning, secret management, read-only filesystems, and Kubernetes security contexts.

Container Security Best Practices

Overview

Container security is not a feature you bolt on after deployment — it is a discipline that starts with your base image choice, runs through every line of your Dockerfile, and extends into your orchestration layer and CI/CD pipeline. Most container breaches exploit preventable misconfigurations: running as root, bloated images with known CVEs, leaked secrets in build layers, and wide-open network policies. This guide covers the full attack surface and gives you concrete, production-tested techniques for hardening Docker containers, with a focus on Node.js workloads.

Prerequisites

  • Working knowledge of Docker (building images, running containers, docker-compose)
  • Familiarity with Node.js and npm
  • Basic understanding of Linux user permissions and file systems
  • Docker Engine 20.10+ recommended
  • Optional: Kubernetes cluster for the security context sections
  • Optional: Trivy or Snyk CLI installed for scanning examples

The Container Attack Surface

Before you can defend your containers, you need to understand where attackers get in. The attack surface breaks down into three layers.

Image Vulnerabilities

Your base image is someone else's Linux distribution. When you pull node:20, you are pulling Debian Bookworm with over 900 packages — most of which your application never touches. Each one of those packages is a potential CVE. A scan of a stock node:20 image typically reveals 200+ known vulnerabilities, with a handful rated critical.

$ trivy image node:20
2026-02-08T10:15:32.441Z  INFO  Vulnerability scanning...

node:20 (debian 12.4)
=====================
Total: 287 (UNKNOWN: 2, LOW: 172, MEDIUM: 81, HIGH: 27, CRITICAL: 5)

┌──────────────┬────────────────┬──────────┬────────────────────┬──────────────────┬─────────────────────────────────────┐
│   Library    │ Vulnerability  │ Severity │ Installed Version  │  Fixed Version   │                Title                │
├──────────────┼────────────────┼──────────┼────────────────────┼──────────────────┼─────────────────────────────────────┤
│ libssl3      │ CVE-2024-5535  │ CRITICAL │ 3.0.13-1~deb12u1   │ 3.0.14-1~deb12u1 │ openssl: SSL_select_next_proto...   │
│ libexpat1    │ CVE-2024-45490 │ CRITICAL │ 2.5.0-1            │ 2.5.0-1+deb12u1  │ libexpat: Negative Length Parsing   │
│ perl-base    │ CVE-2023-47038 │ HIGH     │ 5.36.0-7+deb12u1   │                  │ perl: Write past buffer end via...  │
│ zlib1g       │ CVE-2023-45853 │ HIGH     │ 1:1.2.13.dfsg-1    │                  │ MiniZip: integer overflow in...     │
└──────────────┴────────────────┴──────────┴────────────────────┴──────────────────┴─────────────────────────────────────┘

Every vulnerability you ship is a vulnerability an attacker can exploit. The single most effective mitigation is reducing the number of packages in your image.

Runtime Exploits

If your container runs as root (the default), a container escape vulnerability gives an attacker root on the host. Kernel exploits like CVE-2024-21626 (Leaky Vessels) demonstrated that container isolation is not absolute. Running as root also means a compromised application can modify binaries, install tools, and pivot to other containers on the same network.

Orchestration Misconfiguration

In Kubernetes, running pods with default security settings means privileged mode is one YAML typo away. Missing network policies allow lateral movement between pods. Unprotected API servers, overly permissive RBAC, and mounted service account tokens give attackers a clear path from a single compromised container to the entire cluster.


Minimal Base Images

The fewer packages in your image, the fewer vulnerabilities you ship. Here are your options, ranked from most to fewest packages.

Alpine Linux

Alpine uses musl libc instead of glibc and weighs in at about 5 MB. The node:20-alpine image is roughly 130 MB — an 85% reduction from node:20.

FROM node:20-alpine

Alpine is my default recommendation for most Node.js workloads. It has a package manager (apk) for when you need to install additional system libraries, and the community support is excellent. The musl libc compatibility issues that plagued earlier versions are largely resolved for Node.js.

Distroless

Google's distroless images strip out everything — no shell, no package manager, no coreutils. You get the language runtime and nothing else.

FROM gcr.io/distroless/nodejs20-debian12

Distroless images are roughly 120 MB for Node.js. The lack of a shell makes debugging harder, but it also makes exploitation harder. An attacker who gains code execution cannot spawn a shell, install tools, or run arbitrary commands. For high-security workloads, this is worth the trade-off.

Scratch

The scratch image is literally empty — zero bytes. You can use it for Go or Rust binaries compiled as static executables, but it is not practical for Node.js since you need the V8 runtime.

# Only viable for statically compiled languages
FROM scratch
COPY myapp /myapp
CMD ["/myapp"]

My recommendation: Use node:20-alpine for most Node.js applications. Switch to distroless if you are building security-critical services and can accept the debugging limitations.


Running as Non-Root

This is the single most important security hardening step, and I am consistently surprised by how many production containers still run as root.

The USER Directive

FROM node:20-alpine

WORKDIR /app

COPY package*.json ./
RUN npm ci --only=production

COPY . .

# Create a non-root user with a specific UID
RUN addgroup -g 1001 -S appgroup && \
    adduser -u 1001 -S appuser -G appgroup

# Change ownership of the app directory
RUN chown -R appuser:appgroup /app

# Switch to the non-root user
USER appuser

EXPOSE 3000
CMD ["node", "server.js"]

Why Numeric IDs Matter

Always use numeric user and group IDs, not names. Kubernetes security policies match on numeric IDs, and some container runtimes resolve names differently. Using USER 1001 is more portable and more auditable than USER appuser.

# Prefer this
USER 1001

# Over this
USER appuser

What Breaks When You Drop Root

If your application tries to bind to port 80 or 443, it will fail without root privileges. The fix is simple: bind to a high port (3000, 8080) and let your load balancer or ingress controller handle the port mapping.

# This fails as non-root
Error: listen EACCES: permission denied 0.0.0.0:80

# Fix: bind to a high port instead
PORT=3000 node server.js

If your app writes to /tmp or creates log files, make sure those directories are writable by your non-root user before switching to it.


Multi-Stage Builds for Smaller Attack Surface

Multi-stage builds are a security tool, not just an optimization. By separating your build environment from your runtime environment, you ensure that build tools, source code, and development dependencies never ship to production.

# Stage 1: Build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
RUN npm prune --production

# Stage 2: Production
FROM node:20-alpine AS production
WORKDIR /app

RUN addgroup -g 1001 -S appgroup && \
    adduser -u 1001 -S appuser -G appgroup

# Copy only production artifacts
COPY --from=builder --chown=appuser:appgroup /app/dist ./dist
COPY --from=builder --chown=appuser:appgroup /app/node_modules ./node_modules
COPY --from=builder --chown=appuser:appgroup /app/package.json ./

USER 1001

EXPOSE 3000
CMD ["node", "dist/server.js"]

The build stage had npm, TypeScript, webpack, and all your devDependencies. None of that makes it into the production image. Fewer binaries means fewer things an attacker can exploit.


Image Scanning with Trivy and Snyk

Scanning your images for known vulnerabilities should be as routine as running your test suite. Two tools dominate this space.

Trivy

Trivy is open source, fast, and scans images, filesystems, and IaC configs. I run it locally during development and in CI.

# Install Trivy
$ brew install trivy

# Scan an image
$ trivy image myapp:latest

# Scan with severity filter
$ trivy image --severity HIGH,CRITICAL myapp:latest

# Scan and fail the build on critical findings
$ trivy image --exit-code 1 --severity CRITICAL myapp:latest

# Scan a Dockerfile for misconfigurations
$ trivy config Dockerfile

Trivy also detects secrets accidentally baked into images:

$ trivy image --scanners vuln,secret myapp:latest

2026-02-08T10:30:15.221Z  INFO  Secret scanning enabled

myapp:latest (alpine 3.19.1)
============================
Total: 3 (HIGH: 2, CRITICAL: 1)

Secrets:
========
┌──────────┬───────────────────────┬────────────────────────────────────┐
│ Severity │        Rule           │              Match                 │
├──────────┼───────────────────────┼────────────────────────────────────┤
│ CRITICAL │ AWS Access Key ID     │ app/config.js:14                   │
│ HIGH     │ Private Key           │ app/certs/server.key:1             │
└──────────┴───────────────────────┴────────────────────────────────────┘

Snyk

Snyk offers deeper vulnerability intelligence and fix suggestions, but requires an account. The free tier covers open source projects.

# Authenticate
$ snyk auth

# Scan a Docker image
$ snyk container test myapp:latest

# Scan with Dockerfile context for upgrade recommendations
$ snyk container test myapp:latest --file=Dockerfile

Testing myapp:latest...

✗ High severity vulnerability found in openssl/libssl3
  Description: Buffer Overflow
  Introduced through: openssl/[email protected]~deb12u1
  Fixed in: 3.0.14-1~deb12u1
  Recommendation: Upgrade base image to node:20.11.1-alpine

Organization:   my-org
Package manager: apk
Target file:    Dockerfile
Project name:   docker-image|myapp
Docker image:   myapp:latest

Tested 45 dependencies for known issues.
Found 12 issues (3 critical, 4 high, 5 medium).

Pinning Image Versions

Never use :latest in production. It is non-deterministic, unreproducible, and makes it impossible to audit what you deployed.

# Bad - what version is this? Nobody knows.
FROM node:latest

# Better - pinned to major version
FROM node:20-alpine

# Best - pinned to exact version with SHA digest
FROM node:20.11.1-alpine3.19@sha256:a1b2c3d4e5f6...

Pin to the SHA256 digest in production. This guarantees byte-for-byte reproducibility. Even if someone pushes a malicious image to the same tag, the digest will not match and your build will fail.

# Get the digest for an image
$ docker inspect --format='{{index .RepoDigests 0}}' node:20-alpine
node@sha256:a1f1b2c3d4e5f67890abcdef1234567890abcdef1234567890abcdef12345678

Update your pinned versions on a regular schedule — monthly at minimum — and re-scan after each update.


Managing Secrets

Secrets in containers are a minefield. Here is how to navigate it.

What NOT To Do

# NEVER do this - secrets are baked into image layers permanently
ENV DATABASE_URL=postgres://admin:password123@db:5432/myapp
ARG API_KEY=sk-1234567890abcdef

# NEVER do this - .env gets copied into the image
COPY . .

Even if you delete the environment variable in a later layer, docker history will reveal it. Build args are visible in the image metadata. Image layers are append-only; you cannot truly delete anything.

# Anyone with the image can see your secrets
$ docker history myapp:latest
IMAGE          CREATED        CREATED BY                                      SIZE
a1b2c3d4e5f6   2 hours ago   ARG API_KEY=sk-1234567890abcdef                  0B

.dockerignore

The first line of defense. Create a .dockerignore that excludes sensitive files from the build context entirely.

.env
.env.*
*.pem
*.key
credentials.json
docker-compose*.yml
.git
node_modules
.npm
*.log

Runtime Secrets

Pass secrets at runtime via environment variables or mounted files. Never bake them into the image.

# Pass at runtime via environment variable
$ docker run -e DATABASE_URL="postgres://admin:pass@db:5432/myapp" myapp:latest

# Pass via env file (not baked into the image)
$ docker run --env-file .env myapp:latest

# Mount a secret file
$ docker run -v /run/secrets/db-password:/app/secrets/db-password:ro myapp:latest

Docker Secrets (Swarm / Compose)

Docker Swarm has built-in secret management. Secrets are encrypted at rest, transmitted over TLS, and mounted as tmpfs files inside the container.

# docker-compose.yml
version: "3.8"
services:
  app:
    image: myapp:latest
    secrets:
      - db_password
      - api_key

secrets:
  db_password:
    file: ./secrets/db_password.txt
  api_key:
    file: ./secrets/api_key.txt

Access secrets in your Node.js application:

var fs = require("fs");
var path = require("path");

function readSecret(name) {
  var secretPath = path.join("/run/secrets", name);
  try {
    return fs.readFileSync(secretPath, "utf8").trim();
  } catch (err) {
    console.error("Failed to read secret:", name, err.message);
    return process.env[name.toUpperCase()] || null;
  }
}

var dbPassword = readSecret("db_password");
var apiKey = readSecret("api_key");

Read-Only File Systems

If your application does not need to write to the filesystem at runtime, mount it read-only. This prevents attackers from modifying binaries, dropping malware, or altering configuration files.

# Run with read-only root filesystem
$ docker run --read-only myapp:latest

Most Node.js applications need to write to at least /tmp for temporary files. Mount a tmpfs volume for writable directories:

$ docker run --read-only \
  --tmpfs /tmp:rw,noexec,nosuid,size=64m \
  --tmpfs /app/logs:rw,noexec,nosuid,size=32m \
  myapp:latest

The noexec flag prevents executing binaries from the tmpfs mount, and nosuid prevents setuid binaries. The size flag caps how much data can be written.

In Docker Compose:

services:
  app:
    image: myapp:latest
    read_only: true
    tmpfs:
      - /tmp:size=64m,noexec,nosuid
      - /app/logs:size=32m,noexec,nosuid

Resource Limits

An unrestricted container can consume all available memory, CPU, and PIDs on the host — either through a bug or a deliberate denial-of-service attack. Always set limits.

# Set memory and CPU limits
$ docker run \
  --memory=512m \
  --memory-swap=512m \
  --cpus=1.0 \
  --pids-limit=256 \
  myapp:latest
  • --memory=512m — hard memory cap. The container is killed (OOMKilled) if it exceeds this.
  • --memory-swap=512m — setting swap equal to memory effectively disables swap, preventing the container from using disk as memory.
  • --cpus=1.0 — limits to one CPU core.
  • --pids-limit=256 — prevents fork bombs. No container needs thousands of processes.

In Docker Compose:

services:
  app:
    image: myapp:latest
    deploy:
      resources:
        limits:
          memory: 512M
          cpus: "1.0"
          pids: 256
        reservations:
          memory: 256M
          cpus: "0.5"

Network Security

Don't Expose Unnecessary Ports

Every exposed port is an attack vector. Only expose what your application actually needs.

# Only expose the application port
EXPOSE 3000

# Do NOT expose debugging ports, database ports, etc.
# EXPOSE 9229  <- Node.js debugger
# EXPOSE 5432  <- PostgreSQL
# EXPOSE 27017 <- MongoDB

Use Internal Networks

In Docker Compose, create internal networks that cannot reach the internet. Backend services like databases should never have external network access.

services:
  app:
    image: myapp:latest
    networks:
      - frontend
      - backend
    ports:
      - "3000:3000"

  db:
    image: postgres:16-alpine
    networks:
      - backend
    # No ports exposed to host

networks:
  frontend:
    driver: bridge
  backend:
    driver: bridge
    internal: true  # No internet access

The internal: true flag on the backend network means the database container cannot make outbound connections to the internet, preventing data exfiltration even if the database is compromised.


Docker Content Trust

Docker Content Trust (DCT) uses digital signatures to verify that images have not been tampered with. When enabled, Docker will only pull signed images.

# Enable content trust
$ export DOCKER_CONTENT_TRUST=1

# Now pulls will fail for unsigned images
$ docker pull unsigned-image:latest
Error: remote trust data does not exist for docker.io/library/unsigned-image

# Sign and push your own images
$ docker trust sign myregistry/myapp:1.0.0

# Inspect trust data
$ docker trust inspect --pretty myregistry/myapp:1.0.0

For CI/CD pipelines, set DOCKER_CONTENT_TRUST=1 as an environment variable. This ensures that every image pulled during the build is verified.


Security Scanning in CI/CD Pipelines

Scanning in CI is where security stops being optional and becomes enforced. Here is a GitHub Actions workflow that scans on every pull request and blocks merges with critical vulnerabilities.

# .github/workflows/security-scan.yml
name: Container Security Scan

on:
  pull_request:
    branches: [main, master]
  push:
    branches: [main, master]

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Build Docker image
        run: docker build -t myapp:${{ github.sha }} .

      - name: Run Trivy vulnerability scanner
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: myapp:${{ github.sha }}
          format: table
          exit-code: 1
          severity: CRITICAL,HIGH
          ignore-unfixed: true

      - name: Run Trivy config scanner
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: config
          scan-ref: .
          exit-code: 1
          severity: CRITICAL,HIGH

      - name: Run Dockle linter
        uses: erzz/dockle-action@v1
        with:
          image: myapp:${{ github.sha }}
          exit-code: 1
          failure-threshold: WARN

      - name: Generate SBOM
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: myapp:${{ github.sha }}
          format: cyclonedx
          output: sbom.json

      - name: Upload SBOM artifact
        uses: actions/upload-artifact@v4
        with:
          name: sbom
          path: sbom.json

This pipeline does four things: scans for known CVEs, checks Dockerfile best practices, lints the container configuration, and generates a Software Bill of Materials. Any critical or high vulnerability fails the build.


Kubernetes Security Contexts

In Kubernetes, the security context is where you enforce everything discussed above at the pod and container level. These settings override anything in the Dockerfile, which is exactly the point — you enforce security policy at the orchestration layer, not just the image layer.

Pod Security Context

apiVersion: v1
kind: Pod
metadata:
  name: secure-app
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1001
    runAsGroup: 1001
    fsGroup: 1001
    seccompProfile:
      type: RuntimeDefault
  containers:
    - name: app
      image: myapp:1.0.0@sha256:a1b2c3d4e5f6...
      securityContext:
        allowPrivilegeEscalation: false
        readOnlyRootFilesystem: true
        capabilities:
          drop:
            - ALL
        runAsNonRoot: true
        runAsUser: 1001
      resources:
        limits:
          memory: "512Mi"
          cpu: "1000m"
          ephemeral-storage: "128Mi"
        requests:
          memory: "256Mi"
          cpu: "250m"
      volumeMounts:
        - name: tmp
          mountPath: /tmp
        - name: logs
          mountPath: /app/logs
  volumes:
    - name: tmp
      emptyDir:
        medium: Memory
        sizeLimit: 64Mi
    - name: logs
      emptyDir:
        medium: Memory
        sizeLimit: 32Mi

Key settings explained:

  • runAsNonRoot: true — Kubernetes rejects the pod if the image tries to run as root.
  • allowPrivilegeEscalation: false — prevents processes from gaining more privileges than their parent.
  • readOnlyRootFilesystem: true — the container filesystem is read-only. Writable directories must be explicitly mounted.
  • capabilities.drop: ALL — drops all Linux capabilities. Most applications need zero capabilities. If you need to bind to a low port, add NET_BIND_SERVICE back selectively.
  • seccompProfile.type: RuntimeDefault — applies the container runtime's default seccomp profile, which blocks dangerous syscalls.

Network Policies

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: app-network-policy
spec:
  podSelector:
    matchLabels:
      app: myapp
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: ingress-controller
      ports:
        - protocol: TCP
          port: 3000
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: postgres
      ports:
        - protocol: TCP
          port: 5432
    - to:  # Allow DNS resolution
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53

This policy restricts the app pod to only accept traffic from the ingress controller on port 3000, and only connect outbound to the PostgreSQL pod on port 5432 and DNS. Everything else is denied.


Seccomp and AppArmor Profiles

Seccomp

Seccomp (Secure Computing Mode) filters which system calls a container can make. The default Docker seccomp profile blocks about 44 of the 300+ Linux syscalls, including potentially dangerous ones like reboot, mount, and clock_settime.

For high-security workloads, create a custom profile that only allows the syscalls your application actually uses:

{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": ["SCMP_ARCH_X86_64"],
  "syscalls": [
    {
      "names": [
        "read", "write", "open", "close", "stat", "fstat",
        "mmap", "mprotect", "munmap", "brk", "access",
        "pipe", "select", "sched_yield", "clone", "execve",
        "exit", "wait4", "kill", "fcntl", "getdents",
        "getcwd", "chdir", "rename", "mkdir", "rmdir",
        "socket", "connect", "accept", "sendto", "recvfrom",
        "bind", "listen", "epoll_create", "epoll_ctl", "epoll_wait",
        "futex", "set_robust_list", "nanosleep", "clock_gettime",
        "getpid", "getuid", "getgid", "gettid", "arch_prctl",
        "set_tid_address", "exit_group", "openat", "newfstatat",
        "ioctl", "pread64", "pwrite64", "getrandom"
      ],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}

Apply it at runtime:

$ docker run --security-opt seccomp=./seccomp-profile.json myapp:latest

AppArmor

AppArmor provides mandatory access control, restricting what files and capabilities a container can access. Docker applies a default AppArmor profile (docker-default) automatically on systems that support it.

# Check if AppArmor is active
$ docker inspect --format='{{.AppArmorProfile}}' <container_id>
docker-default

# Run with a custom AppArmor profile
$ docker run --security-opt apparmor=my-custom-profile myapp:latest

Supply Chain Security

Verifying Base Images

Only pull base images from trusted registries. Use official images from Docker Hub, Google Container Registry, or Amazon ECR Public.

# Verify image provenance with cosign
$ cosign verify --key cosign.pub myregistry/myapp:1.0.0

Verification for myregistry/myapp:1.0.0 --
The following checks were performed on each of these signatures:
  - The cosign claims were validated
  - The signatures were verified against the specified public key

[{"critical":{"identity":{"docker-reference":"myregistry/myapp"},...}]

SBOM Generation

A Software Bill of Materials documents every component in your image. This is increasingly required for compliance and makes vulnerability response faster — when the next Log4Shell drops, you can instantly check whether you are affected.

# Generate SBOM with Trivy
$ trivy image --format cyclonedx --output sbom.json myapp:latest

# Generate SBOM with Syft
$ syft myapp:latest -o cyclonedx-json > sbom.json

# Scan an existing SBOM for vulnerabilities
$ trivy sbom sbom.json

Store SBOMs alongside your container images in your registry. Most modern registries support attaching SBOMs as OCI artifacts.


Complete Working Example

Here is a hardened Dockerfile for a Node.js Express application, a docker-compose.yml with full security options, and a health check endpoint.

Application Code

// server.js
var express = require("express");
var fs = require("fs");
var path = require("path");
var os = require("os");

var app = express();
var PORT = process.env.PORT || 3000;

// Health check endpoint
app.get("/health", function(req, res) {
  var health = {
    status: "healthy",
    uptime: process.uptime(),
    timestamp: new Date().toISOString(),
    pid: process.pid,
    user: os.userInfo().username,
    memoryUsage: process.memoryUsage().rss
  };
  res.status(200).json(health);
});

// Read secret from Docker secrets mount
function getSecret(name) {
  var secretPath = path.join("/run/secrets", name);
  try {
    return fs.readFileSync(secretPath, "utf8").trim();
  } catch (err) {
    return process.env[name.toUpperCase().replace(/-/g, "_")] || null;
  }
}

var dbPassword = getSecret("db-password");

app.get("/", function(req, res) {
  res.json({ message: "Secure container running", user: os.userInfo().uid });
});

app.listen(PORT, "0.0.0.0", function() {
  console.log("Server running on port " + PORT + " as UID " + process.getuid());
});

Hardened Dockerfile

# =============================================================================
# Stage 1: Build dependencies
# =============================================================================
FROM node:20.11.1-alpine3.19@sha256:bf77dc26e48ea95fca9d1aceb5acfa69d2e546b765ec2abfb502975f1a2bf7be AS builder

WORKDIR /app

# Copy dependency manifests first for layer caching
COPY package.json package-lock.json ./

# Install all dependencies (including devDependencies for build)
RUN npm ci --no-audit --no-fund

# Copy source code
COPY . .

# If you have a build step, run it here
# RUN npm run build

# Remove devDependencies for production
RUN npm prune --production

# =============================================================================
# Stage 2: Production image
# =============================================================================
FROM node:20.11.1-alpine3.19@sha256:bf77dc26e48ea95fca9d1aceb5acfa69d2e546b765ec2abfb502975f1a2bf7be AS production

# Install dumb-init for proper PID 1 signal handling
RUN apk add --no-cache dumb-init

# Set NODE_ENV
ENV NODE_ENV=production

WORKDIR /app

# Create non-root user with specific UID/GID
RUN addgroup -g 1001 -S appgroup && \
    adduser -u 1001 -S appuser -G appgroup -h /app -s /sbin/nologin

# Copy production dependencies and application code
COPY --from=builder --chown=1001:1001 /app/node_modules ./node_modules
COPY --from=builder --chown=1001:1001 /app/package.json ./
COPY --from=builder --chown=1001:1001 /app/server.js ./

# Create writable directories needed at runtime
RUN mkdir -p /app/logs /tmp && \
    chown -R 1001:1001 /app/logs /tmp

# Switch to non-root user
USER 1001

# Expose only the application port
EXPOSE 3000

# Health check - curl is not available in alpine by default, use node
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
  CMD ["node", "-e", "var http = require('http'); var options = { hostname: '127.0.0.1', port: 3000, path: '/health', timeout: 3000 }; var req = http.request(options, function(res) { process.exit(res.statusCode === 200 ? 0 : 1); }); req.on('error', function() { process.exit(1); }); req.end();"]

# Use dumb-init to handle signals properly
ENTRYPOINT ["dumb-init", "--"]

CMD ["node", "server.js"]

Docker Compose with Security Options

# docker-compose.yml
version: "3.8"

services:
  app:
    build:
      context: .
      dockerfile: Dockerfile
    image: myapp:latest
    container_name: secure-app
    restart: unless-stopped

    # Read-only root filesystem
    read_only: true

    # Writable temp directories
    tmpfs:
      - /tmp:size=64m,noexec,nosuid,nodev
      - /app/logs:size=32m,noexec,nosuid,nodev

    # Security options
    security_opt:
      - no-new-privileges:true
      - seccomp:./seccomp-profile.json
    cap_drop:
      - ALL

    # Resource limits
    deploy:
      resources:
        limits:
          memory: 512M
          cpus: "1.0"
          pids: 128
        reservations:
          memory: 128M
          cpus: "0.25"

    # Network configuration
    ports:
      - "3000:3000"
    networks:
      - frontend
      - backend

    # Secrets
    secrets:
      - db-password
      - api-key

    # Environment variables (non-sensitive only)
    environment:
      - NODE_ENV=production
      - PORT=3000
      - LOG_LEVEL=info

    # Health check
    healthcheck:
      test: ["CMD", "node", "-e", "var h=require('http');h.get('http://127.0.0.1:3000/health',function(r){process.exit(r.statusCode===200?0:1)}).on('error',function(){process.exit(1)})"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 15s

    depends_on:
      db:
        condition: service_healthy

  db:
    image: postgres:16.2-alpine3.19@sha256:1234abcd...
    container_name: secure-db
    restart: unless-stopped

    # Database on internal network only
    networks:
      - backend

    # Security hardening
    read_only: true
    security_opt:
      - no-new-privileges:true
    cap_drop:
      - ALL
    cap_add:
      - CHOWN
      - DAC_OVERRIDE
      - FOWNER
      - SETGID
      - SETUID

    tmpfs:
      - /tmp:size=128m,noexec,nosuid,nodev
      - /run/postgresql:size=16m,noexec,nosuid,nodev

    volumes:
      - pgdata:/var/lib/postgresql/data

    secrets:
      - db-password

    environment:
      - POSTGRES_DB=myapp
      - POSTGRES_USER=appuser
      - POSTGRES_PASSWORD_FILE=/run/secrets/db-password

    deploy:
      resources:
        limits:
          memory: 1G
          cpus: "2.0"
          pids: 256

    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U appuser -d myapp"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s

networks:
  frontend:
    driver: bridge
  backend:
    driver: bridge
    internal: true  # No internet access for backend services

volumes:
  pgdata:
    driver: local

secrets:
  db-password:
    file: ./secrets/db-password.txt
  api-key:
    file: ./secrets/api-key.txt

CI Security Pipeline

# .github/workflows/container-security.yml
name: Container Security

on:
  pull_request:
  push:
    branches: [main]

env:
  IMAGE_NAME: myapp
  DOCKER_CONTENT_TRUST: 1

jobs:
  build-and-scan:
    runs-on: ubuntu-latest
    permissions:
      security-events: write
      contents: read

    steps:
      - uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Build image
        run: |
          docker build \
            --no-cache \
            --tag ${{ env.IMAGE_NAME }}:${{ github.sha }} \
            .

      - name: Trivy vulnerability scan
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: ${{ env.IMAGE_NAME }}:${{ github.sha }}
          format: sarif
          output: trivy-results.sarif
          severity: CRITICAL,HIGH
          exit-code: 1
          ignore-unfixed: true

      - name: Upload scan results to GitHub Security
        uses: github/codeql-action/upload-sarif@v3
        if: always()
        with:
          sarif_file: trivy-results.sarif

      - name: Dockerfile lint with Hadolint
        uses: hadolint/[email protected]
        with:
          dockerfile: Dockerfile
          failure-threshold: warning

      - name: Check for secrets in image
        run: |
          docker run --rm \
            -v /var/run/docker.sock:/var/run/docker.sock \
            aquasec/trivy:latest image \
            --scanners secret \
            --exit-code 1 \
            ${{ env.IMAGE_NAME }}:${{ github.sha }}

      - name: Verify non-root user
        run: |
          USER=$(docker inspect --format='{{.Config.User}}' ${{ env.IMAGE_NAME }}:${{ github.sha }})
          if [ "$USER" = "" ] || [ "$USER" = "root" ] || [ "$USER" = "0" ]; then
            echo "ERROR: Container runs as root!"
            exit 1
          fi
          echo "Container runs as user: $USER"

      - name: Generate SBOM
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: ${{ env.IMAGE_NAME }}:${{ github.sha }}
          format: cyclonedx
          output: sbom.json

      - name: Upload SBOM
        uses: actions/upload-artifact@v4
        with:
          name: sbom-${{ github.sha }}
          path: sbom.json
          retention-days: 90

Common Issues & Troubleshooting

1. Permission Denied When Running as Non-Root

Error: EACCES: permission denied, open '/app/logs/app.log'

Cause: The application is trying to write to a directory owned by root. When you switch to USER 1001, the process cannot write to directories it does not own.

Fix: Set ownership before switching users:

RUN mkdir -p /app/logs && chown -R 1001:1001 /app/logs
USER 1001

Or mount the directory as a tmpfs volume in your compose file:

tmpfs:
  - /app/logs:size=32m,noexec,nosuid

2. Read-Only Filesystem Breaks npm or Node.js Internals

Error: EROFS: read-only file system, mkdir '/app/.npm'
Error: EROFS: read-only file system, open '/app/node_modules/.cache/...'

Cause: Node.js and npm attempt to write cache files to the application directory. A read-only filesystem blocks these writes.

Fix: Mount tmpfs volumes for cache directories and set npm cache location:

ENV NPM_CONFIG_CACHE=/tmp/.npm
tmpfs:
  - /tmp:size=64m,noexec,nosuid

3. Health Check Fails Because curl Is Not Installed

OCI runtime exec failed: exec failed: unable to start container process:
exec: "curl": executable file not found in $PATH: unknown

Cause: Alpine and distroless images do not include curl. A health check using curl will fail.

Fix: Use Node.js for health checks instead of curl:

HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
  CMD ["node", "-e", "var h=require('http');h.get('http://127.0.0.1:3000/health',function(r){process.exit(r.statusCode===200?0:1)}).on('error',function(){process.exit(1)})"]

4. Trivy Scan Reports Vulnerabilities in the Base Image You Cannot Fix

Total: 14 (HIGH: 3, CRITICAL: 1)

┌──────────┬───────────────┬──────────┬──────────────────┬──────────────┐
│ Library  │ Vulnerability │ Severity │ Installed Version│ Fixed Version│
├──────────┼───────────────┼──────────┼──────────────────┼──────────────┤
│ busybox  │ CVE-2023-XXXX │ HIGH     │ 1.36.1-r2        │              │
└──────────┴───────────────┴──────────┴──────────────────┴──────────────┘

Cause: Some vulnerabilities in the base image have no fix available yet (the "Fixed Version" column is empty). Your CI pipeline fails even though there is nothing you can do about it.

Fix: Use --ignore-unfixed to skip vulnerabilities with no available fix, and maintain a .trivyignore file for acknowledged risks:

$ trivy image --ignore-unfixed --ignorefile .trivyignore myapp:latest
# .trivyignore
# Accepted risk: busybox CVE with no fix available, no exploit in the wild
CVE-2023-XXXX

# Accepted risk: low-severity zlib issue, mitigated by WAF
CVE-2023-45853

5. Container Killed with OOMKilled After Setting Memory Limits

State: {"Status":"exited","ExitCode":137,"OOMKilled":true}

Cause: The memory limit is set too low for the workload. Node.js reserves memory for the V8 heap, and the default --max-old-space-size may exceed your container limit.

Fix: Align Node.js heap size with your container memory limit. A good rule is to set --max-old-space-size to about 75% of the container memory limit:

# Container has 512MB limit, so set heap to ~384MB
CMD ["node", "--max-old-space-size=384", "server.js"]

6. Capabilities Dropped Too Aggressively, Application Fails to Start

Error: listen EACCES: permission denied 0.0.0.0:80

Cause: You dropped ALL capabilities but your application needs NET_BIND_SERVICE to bind to ports below 1024.

Fix: Either bind to a high port (recommended) or selectively add back the capability you need:

cap_drop:
  - ALL
cap_add:
  - NET_BIND_SERVICE  # Only if you must bind to ports < 1024

The better solution is to use a high port and let your reverse proxy or load balancer handle port 80/443.


Best Practices

  • Use minimal base images. Start with Alpine or distroless. Every package you do not ship is a vulnerability you do not ship. A stock node:20 image has 287 known vulnerabilities; node:20-alpine typically has fewer than 10.

  • Never run as root. Use the USER directive with a numeric UID. Enforce runAsNonRoot: true in Kubernetes security contexts. This is non-negotiable for production workloads.

  • Pin image versions with SHA digests. Tags are mutable. Digests are not. Pin your base images to SHA256 digests and update them on a monthly schedule with fresh vulnerability scans.

  • Scan images in CI, not just locally. Integrate Trivy, Snyk, or Grype into your CI pipeline with exit-code: 1 on critical findings. If the scan fails, the build fails. No exceptions.

  • Keep secrets out of the image. Never use ENV or ARG for secrets in Dockerfiles. Use Docker secrets, mounted files, or a secrets manager like HashiCorp Vault. Always have a .dockerignore that excludes .env, *.pem, and credentials.json.

  • Enable read-only filesystems. Mount the root filesystem as read-only and provide tmpfs volumes for the specific directories that need write access. Use noexec and nosuid flags on writable mounts.

  • Set resource limits on every container. Memory, CPU, and PID limits prevent denial-of-service attacks and contain the blast radius of runaway processes. A fork bomb in an unlimited container takes down the entire host.

  • Drop all Linux capabilities, then add back selectively. Most applications need zero capabilities. Start with cap_drop: ALL and only add back what fails. Document why each capability is needed.

  • Generate and store SBOMs. When the next critical vulnerability is disclosed, you need to know within minutes whether any of your deployed containers are affected. SBOMs make this possible.

  • Apply network policies. In Kubernetes, use NetworkPolicy resources to restrict pod-to-pod communication. In Docker Compose, use internal networks for backend services. Default-deny is the only sane network policy.


References

Powered by Contentful