Health Checks in Docker and Kubernetes
Implement robust health checks for Node.js applications in Docker and Kubernetes, covering liveness probes, readiness probes, graceful shutdown, and comprehensive health check endpoint design.
Health Checks in Docker and Kubernetes
A container running does not mean your application is working. Your Node.js process might be alive but stuck in an infinite loop, unable to connect to the database, or out of memory. Health checks are how orchestrators detect these conditions and take corrective action — restarting unhealthy containers, removing them from load balancer rotation, or delaying traffic until startup completes. This guide covers health check implementation from simple Docker HEALTHCHECK instructions to comprehensive Kubernetes probe configurations.
Prerequisites
- Docker Desktop v4.0+ or Docker Engine
- Docker Compose v2
- kubectl and a Kubernetes cluster (minikube, kind, or cloud-managed)
- Node.js 18+ and Express.js
- Basic familiarity with Docker and Kubernetes concepts
Docker HEALTHCHECK Instruction
The HEALTHCHECK instruction in a Dockerfile tells Docker how to test whether a container is still working.
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
CMD ["node", "app.js"]
The parameters:
--interval=30s: Check every 30 seconds--timeout=5s: Fail if the check takes longer than 5 seconds--start-period=10s: Grace period for container startup (checks during this period do not count toward retries)--retries=3: Mark unhealthy after 3 consecutive failures
Check container health status:
docker ps
# CONTAINER ID IMAGE STATUS PORTS
# abc123 myapp Up 2 min (healthy) 0.0.0.0:3000->3000/tcp
# def456 myapp Up 30s (health: starting) 0.0.0.0:3001->3000/tcp
Using curl vs wget for health checks:
# wget (available in Alpine by default)
HEALTHCHECK CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
# curl (requires installing curl in Alpine)
HEALTHCHECK CMD curl -f http://localhost:3000/health || exit 1
# Node.js script (no external dependencies)
HEALTHCHECK CMD node -e "var http = require('http'); http.get('http://localhost:3000/health', function(res) { process.exit(res.statusCode === 200 ? 0 : 1); }).on('error', function() { process.exit(1); });"
I prefer wget on Alpine images because it is built-in. The Node.js option works everywhere but starts a full Node process for each check — that is 30-50MB of memory overhead every 30 seconds, briefly.
Implementing Health Check Endpoints in Express.js
Shallow Health Check
A shallow check verifies the application process is running and can handle HTTP requests. It does not test dependencies.
var express = require('express');
var app = express();
app.get('/health', function(req, res) {
res.status(200).json({
status: 'healthy',
uptime: process.uptime(),
timestamp: Date.now()
});
});
This is fast (sub-millisecond) and always succeeds unless the process is completely stuck. Use it for Kubernetes liveness probes.
Deep Health Check
A deep check verifies all critical dependencies — database, cache, external services.
var pg = require('pg');
var redis = require('redis');
var pool = new pg.Pool({
connectionString: process.env.DATABASE_URL
});
var redisClient = redis.createClient({
url: process.env.REDIS_URL
});
redisClient.connect();
app.get('/health/ready', function(req, res) {
var checks = {
database: 'unknown',
redis: 'unknown',
memory: 'unknown'
};
var healthy = true;
// Check database
pool.query('SELECT 1', function(dbErr) {
checks.database = dbErr ? 'unhealthy' : 'healthy';
if (dbErr) healthy = false;
// Check Redis
redisClient.ping().then(function() {
checks.redis = 'healthy';
}).catch(function(redisErr) {
checks.redis = 'unhealthy';
healthy = false;
}).finally(function() {
// Check memory
var memUsage = process.memoryUsage();
var heapUsedMB = Math.round(memUsage.heapUsed / 1024 / 1024);
var heapTotalMB = Math.round(memUsage.heapTotal / 1024 / 1024);
var heapPercent = Math.round((memUsage.heapUsed / memUsage.heapTotal) * 100);
checks.memory = heapPercent > 90 ? 'warning' : 'healthy';
if (heapPercent > 95) {
checks.memory = 'unhealthy';
healthy = false;
}
var statusCode = healthy ? 200 : 503;
res.status(statusCode).json({
status: healthy ? 'healthy' : 'unhealthy',
checks: checks,
memory: {
heapUsed: heapUsedMB + 'MB',
heapTotal: heapTotalMB + 'MB',
heapPercent: heapPercent + '%'
},
uptime: Math.round(process.uptime()) + 's',
timestamp: new Date().toISOString()
});
});
});
});
Response when healthy:
{
"status": "healthy",
"checks": {
"database": "healthy",
"redis": "healthy",
"memory": "healthy"
},
"memory": {
"heapUsed": "45MB",
"heapTotal": "78MB",
"heapPercent": "57%"
},
"uptime": "3842s",
"timestamp": "2026-02-13T14:30:00.000Z"
}
Response when unhealthy (HTTP 503):
{
"status": "unhealthy",
"checks": {
"database": "unhealthy",
"redis": "healthy",
"memory": "healthy"
},
"memory": {
"heapUsed": "52MB",
"heapTotal": "78MB",
"heapPercent": "66%"
},
"uptime": "3842s",
"timestamp": "2026-02-13T14:30:00.000Z"
}
Health Check Module
Extract health checks into a reusable module.
// health/index.js
var os = require('os');
function HealthChecker(options) {
this.checks = {};
this.timeout = (options && options.timeout) || 5000;
}
HealthChecker.prototype.addCheck = function(name, checkFn) {
this.checks[name] = checkFn;
};
HealthChecker.prototype.run = function(callback) {
var results = {};
var healthy = true;
var checkNames = Object.keys(this.checks);
var completed = 0;
var timeout = this.timeout;
if (checkNames.length === 0) {
return callback(null, { status: 'healthy', checks: {} });
}
var timer = setTimeout(function() {
if (completed < checkNames.length) {
healthy = false;
callback(null, {
status: 'unhealthy',
checks: results,
error: 'Health check timed out after ' + timeout + 'ms'
});
}
}, timeout);
checkNames.forEach(function(name) {
var checkFn = this.checks[name];
var startTime = Date.now();
checkFn(function(err) {
var duration = Date.now() - startTime;
results[name] = {
status: err ? 'unhealthy' : 'healthy',
duration: duration + 'ms'
};
if (err) {
results[name].error = err.message;
healthy = false;
}
completed++;
if (completed === checkNames.length) {
clearTimeout(timer);
callback(null, {
status: healthy ? 'healthy' : 'unhealthy',
checks: results,
system: {
uptime: Math.round(process.uptime()) + 's',
memory: {
used: Math.round(process.memoryUsage().heapUsed / 1024 / 1024) + 'MB',
total: Math.round(os.totalmem() / 1024 / 1024) + 'MB'
},
cpu: os.loadavg()
},
timestamp: new Date().toISOString()
});
}
});
}.bind(this));
};
module.exports = HealthChecker;
// app.js
var express = require('express');
var pg = require('pg');
var redis = require('redis');
var HealthChecker = require('./health');
var app = express();
var pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });
var redisClient = redis.createClient({ url: process.env.REDIS_URL });
redisClient.connect();
var healthChecker = new HealthChecker({ timeout: 5000 });
healthChecker.addCheck('database', function(callback) {
pool.query('SELECT 1', function(err) {
callback(err);
});
});
healthChecker.addCheck('redis', function(callback) {
redisClient.ping().then(function() {
callback(null);
}).catch(function(err) {
callback(err);
});
});
// Shallow check for liveness
app.get('/health', function(req, res) {
res.status(200).json({ status: 'healthy' });
});
// Deep check for readiness
app.get('/health/ready', function(req, res) {
healthChecker.run(function(err, result) {
var statusCode = result.status === 'healthy' ? 200 : 503;
res.status(statusCode).json(result);
});
});
Docker Compose Health Checks
Use health checks with depends_on to orchestrate startup order.
services:
api:
build: .
ports:
- "3000:3000"
environment:
- DATABASE_URL=postgresql://appuser:secret@postgres:5432/myapp
- REDIS_URL=redis://redis:6379
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:3000/health"]
interval: 15s
timeout: 5s
start_period: 30s
retries: 3
postgres:
image: postgres:16-alpine
environment:
POSTGRES_USER: appuser
POSTGRES_PASSWORD: secret
POSTGRES_DB: myapp
healthcheck:
test: ["CMD-SHELL", "pg_isready -U appuser -d myapp"]
interval: 5s
timeout: 5s
retries: 5
redis:
image: redis:7-alpine
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 5
The condition: service_healthy ensures the API container does not start until PostgreSQL and Redis are both healthy. This eliminates the "connection refused on startup" race condition.
Kubernetes Liveness Probes
Liveness probes determine if a container should be restarted. If the probe fails, Kubernetes kills the container and creates a new one.
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
replicas: 3
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: api
image: myapp:latest
ports:
- containerPort: 3000
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 15
periodSeconds: 20
timeoutSeconds: 5
failureThreshold: 3
successThreshold: 1
The liveness probe should be a shallow check. Do not include database connectivity in liveness probes — if the database is down, restarting your API container will not fix the database. It will just create a restart loop.
Probe Types
# HTTP GET probe
livenessProbe:
httpGet:
path: /health
port: 3000
httpHeaders:
- name: X-Health-Check
value: kubernetes
# TCP socket probe
livenessProbe:
tcpSocket:
port: 3000
# Command execution probe
livenessProbe:
exec:
command:
- node
- -e
- "var http = require('http'); http.get('http://localhost:3000/health', function(r) { process.exit(r.statusCode === 200 ? 0 : 1); }).on('error', function() { process.exit(1); });"
HTTP GET is the most common for web applications. TCP socket is lighter but only checks if the port is open. Exec runs a command inside the container.
Kubernetes Readiness Probes
Readiness probes determine if a container should receive traffic. A container that fails readiness is removed from the Service's endpoint list but is NOT restarted.
readinessProbe:
httpGet:
path: /health/ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
successThreshold: 1
The readiness probe should be a deep check that verifies database connectivity and critical dependencies. When the database goes down:
- Readiness probe fails → pod removed from Service endpoints
- No new traffic routes to this pod
- Existing connections drain
- When the database recovers, readiness probe passes → pod re-added to endpoints
This prevents users from hitting pods that cannot serve requests.
Startup Probes
Startup probes handle slow-starting containers. While the startup probe is running, liveness and readiness probes are disabled.
startupProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 30 # 30 * 5s = 150s maximum startup time
Use startup probes when your application needs time to:
- Load large models or datasets
- Run database migrations
- Warm caches
- Establish connection pools
Without a startup probe, slow-starting apps get killed by the liveness probe before they finish initializing.
// Track startup completion
var isReady = false;
function initialize(callback) {
console.log('Starting initialization...');
// Run migrations
console.log('Running migrations...');
// ...
// Warm cache
console.log('Warming cache...');
// ...
// Establish connection pools
console.log('Connecting to database...');
pool.query('SELECT 1', function(err) {
if (err) return callback(err);
console.log('Initialization complete');
isReady = true;
callback(null);
});
}
app.get('/health', function(req, res) {
// Startup and liveness - just check if process is responding
res.status(200).json({ status: 'alive' });
});
app.get('/health/ready', function(req, res) {
if (!isReady) {
return res.status(503).json({ status: 'not ready', reason: 'initializing' });
}
healthChecker.run(function(err, result) {
var statusCode = result.status === 'healthy' ? 200 : 503;
res.status(statusCode).json(result);
});
});
initialize(function(err) {
if (err) {
console.error('Initialization failed:', err.message);
process.exit(1);
}
});
Graceful Shutdown
Health checks work hand-in-hand with graceful shutdown. When Kubernetes terminates a pod, it sends SIGTERM and waits for terminationGracePeriodSeconds (default: 30 seconds) before sending SIGKILL.
var http = require('http');
var app = require('./app');
var server = http.createServer(app);
var isShuttingDown = false;
server.listen(3000, function() {
console.log('Server listening on port 3000');
});
function shutdown(signal) {
console.log(signal + ' received. Starting graceful shutdown...');
isShuttingDown = true;
// Stop accepting new connections
server.close(function() {
console.log('HTTP server closed');
// Close database connections
pool.end(function() {
console.log('Database pool closed');
// Close Redis connection
redisClient.quit().then(function() {
console.log('Redis connection closed');
console.log('Graceful shutdown complete');
process.exit(0);
});
});
});
// Force exit if graceful shutdown takes too long
setTimeout(function() {
console.error('Forced shutdown after timeout');
process.exit(1);
}, 25000); // Leave 5s buffer before SIGKILL
}
process.on('SIGTERM', function() { shutdown('SIGTERM'); });
process.on('SIGINT', function() { shutdown('SIGINT'); });
// Middleware to reject requests during shutdown
app.use(function(req, res, next) {
if (isShuttingDown) {
res.set('Connection', 'close');
return res.status(503).json({ error: 'Server is shutting down' });
}
next();
});
The shutdown sequence:
- SIGTERM received
- Set
isShuttingDown = true→ health checks fail → no new traffic routed server.close()waits for in-flight requests to complete- Close database and cache connections
- Exit process
# Kubernetes deployment with graceful shutdown
spec:
terminationGracePeriodSeconds: 30
containers:
- name: api
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"]
The preStop hook with sleep 5 gives the Kubernetes networking layer time to update iptables rules and remove the pod from the Service before the application starts rejecting requests. Without this, some requests may route to a terminating pod.
Complete Working Example
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
labels:
app: api
spec:
replicas: 3
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
terminationGracePeriodSeconds: 30
containers:
- name: api
image: myapp:latest
ports:
- containerPort: 3000
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: api-secrets
key: database-url
- name: REDIS_URL
valueFrom:
secretKeyRef:
name: api-secrets
key: redis-url
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "500m"
startupProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 30
livenessProbe:
httpGet:
path: /health
port: 3000
periodSeconds: 20
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: 3000
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"]
---
apiVersion: v1
kind: Service
metadata:
name: api
spec:
selector:
app: api
ports:
- port: 80
targetPort: 3000
type: ClusterIP
FROM node:20-alpine AS production
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=5s --start-period=15s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
USER node
CMD ["node", "app.js"]
// app.js - Complete application with health checks and graceful shutdown
var express = require('express');
var http = require('http');
var pg = require('pg');
var redis = require('redis');
var HealthChecker = require('./health');
var app = express();
var isShuttingDown = false;
var isReady = false;
app.use(express.json());
// Reject requests during shutdown
app.use(function(req, res, next) {
if (isShuttingDown) {
res.set('Connection', 'close');
return res.status(503).json({ error: 'Server is shutting down' });
}
next();
});
// Database
var pool = new pg.Pool({
connectionString: process.env.DATABASE_URL,
max: 20,
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 5000
});
// Redis
var redisClient = redis.createClient({ url: process.env.REDIS_URL });
redisClient.on('error', function(err) {
console.error('Redis error:', err.message);
});
redisClient.connect();
// Health checker
var healthChecker = new HealthChecker({ timeout: 5000 });
healthChecker.addCheck('database', function(callback) {
pool.query('SELECT 1', function(err) { callback(err); });
});
healthChecker.addCheck('redis', function(callback) {
redisClient.ping().then(function() { callback(null); })
.catch(function(err) { callback(err); });
});
// Health endpoints
app.get('/health', function(req, res) {
res.status(200).json({ status: 'alive', uptime: process.uptime() });
});
app.get('/health/ready', function(req, res) {
if (!isReady) {
return res.status(503).json({ status: 'initializing' });
}
healthChecker.run(function(err, result) {
res.status(result.status === 'healthy' ? 200 : 503).json(result);
});
});
// Application routes
app.get('/api/status', function(req, res) {
res.json({ version: '1.0.0', ready: isReady });
});
// Initialize
function initialize(callback) {
pool.query('SELECT NOW()', function(err) {
if (err) return callback(err);
console.log('Database connected');
isReady = true;
callback(null);
});
}
// Start server
var server = http.createServer(app);
server.listen(3000, function() {
console.log('Server listening on port 3000');
initialize(function(err) {
if (err) {
console.error('Initialization failed:', err.message);
process.exit(1);
}
console.log('Application ready');
});
});
// Graceful shutdown
function shutdown(signal) {
console.log(signal + ' received');
isShuttingDown = true;
server.close(function() {
console.log('HTTP server closed');
pool.end(function() {
redisClient.quit().then(function() {
console.log('Shutdown complete');
process.exit(0);
});
});
});
setTimeout(function() {
console.error('Forced shutdown');
process.exit(1);
}, 25000);
}
process.on('SIGTERM', function() { shutdown('SIGTERM'); });
process.on('SIGINT', function() { shutdown('SIGINT'); });
Common Issues and Troubleshooting
1. Container Stuck in CrashLoopBackOff
NAME READY STATUS RESTARTS AGE
api-1 0/1 CrashLoopBackOff 5 3m
The liveness probe is killing the container before it finishes starting. Add a startup probe or increase initialDelaySeconds:
startupProbe:
httpGet:
path: /health
port: 3000
failureThreshold: 30
periodSeconds: 5
2. Pod Not Receiving Traffic
kubectl get endpoints api
# NAME ENDPOINTS
# api <none>
No endpoints means all pods are failing readiness probes. Check the probe:
kubectl describe pod api-abc123 | grep -A10 Readiness
# Readiness probe failed: HTTP probe failed with statuscode: 503
kubectl logs api-abc123
# Error: connect ECONNREFUSED 10.96.0.5:5432
The database is unreachable. Fix the database connection, not the health check.
3. Health Check Timeouts
Warning Unhealthy Pod/api-1 Liveness probe failed: Get "http://10.244.0.5:3000/health": context deadline exceeded
The health check endpoint is too slow. Common causes:
- Health check queries a slow database view
- Network latency between nodes
- Container is CPU-starved
Fix: increase timeoutSeconds, simplify the liveness check, or increase CPU limits.
4. Docker Health Check Shows "unhealthy" but App Works
docker ps
# STATUS: Up 5 min (unhealthy)
curl http://localhost:3000/health
# {"status":"healthy"}
The health check command inside the container cannot reach the endpoint. Common causes:
wgetorcurlnot installed in the image- Health check URL is wrong
- The app binds to
127.0.0.1but the health check uses a different interface
Fix: exec into the container and run the health check manually to debug:
docker exec -it myapp sh
wget --spider http://localhost:3000/health
Best Practices
- Separate liveness from readiness. Liveness checks the process, readiness checks dependencies. Mixing them causes unnecessary restarts.
- Keep liveness probes fast and simple. A 200 OK from
/healthis sufficient. Never include database queries in liveness probes. - Use startup probes for slow-starting apps. They prevent premature kills during initialization without requiring inflated
initialDelaySeconds. - Always implement graceful shutdown. Handle SIGTERM, drain connections, and close database pools. Set a forced exit timeout shorter than
terminationGracePeriodSeconds. - Add a preStop sleep in Kubernetes. A 5-second delay gives the network time to update before your app starts rejecting requests.
- Return structured JSON from health endpoints. Include check names, durations, and error details. This makes debugging production issues dramatically easier.
- Set appropriate timeouts. Health check timeouts should be shorter than the probe interval. A 5-second timeout with a 10-second interval means you detect issues within 15 seconds.
- Do not expose health endpoints publicly. They reveal internal architecture details. Restrict them to internal networks or use authentication.