Node.js Clustering for Multi-Core Systems
A practical guide to Node.js clustering for multi-core utilization, covering the cluster module, Express.js clustering, graceful restarts, sticky sessions, worker health monitoring, and PM2 comparison.
Node.js Clustering for Multi-Core Systems
Node.js runs on a single thread by default, which means a 16-core production server is sitting at 6% utilization under load while your application chokes on requests. The built-in cluster module solves this by forking multiple worker processes that share the same server port, distributing incoming connections across all available CPU cores. If you are running Node.js in production on anything larger than a single-core VM and you are not clustering, you are leaving performance on the table.
Prerequisites
- Node.js v16+ installed (v18+ recommended for production)
- Basic familiarity with Express.js
- Redis installed locally or accessible (for session sharing)
- npm packages:
express,ioredis,connect-redis,express-session,autocannon
npm install express ioredis connect-redis express-session
npm install -g autocannon
Why Node.js Is Single-Threaded
Node.js is built on V8 and libuv. V8 compiles and executes JavaScript on a single thread. Libuv provides an event loop and a thread pool for I/O operations (file system, DNS, some crypto), but your JavaScript code — your route handlers, your business logic, your JSON serialization — all executes on one thread.
This means one CPU core handles all of your application logic. If you have an 8-core machine, seven cores sit idle while your single-threaded process handles every request.
The event loop model is brilliant for I/O-bound workloads. A single thread can handle tens of thousands of concurrent connections because it never blocks waiting for a database response or a file read. But the moment you have CPU-intensive work — parsing large JSON payloads, image processing, heavy computation, even just handling high request throughput — that single thread becomes your bottleneck.
Here is a simple demonstration of the problem:
var http = require("http");
var os = require("os");
var server = http.createServer(function (req, res) {
// Simulate CPU-bound work
var sum = 0;
for (var i = 0; i < 1e7; i++) {
sum += i;
}
res.writeHead(200);
res.end("Process " + process.pid + " handled this request on core count: " + os.cpus().length);
});
server.listen(3000, function () {
console.log("Server running on PID " + process.pid);
console.log("Available CPU cores: " + os.cpus().length);
});
Run this on an 8-core machine and load test it — you will see one core pinned at 100% while the other seven are idle.
The Cluster Module Basics
The cluster module is built into Node.js. No npm install required. It works on a master/worker pattern: one process acts as the master (or "primary" in newer Node.js terminology) and forks child processes called workers. Each worker is a full copy of your application, running in its own V8 isolate with its own memory space.
The key insight is that all workers can share the same server port. The operating system handles distributing incoming connections across the workers using a round-robin algorithm (on Linux/macOS) or through the OS scheduling mechanism (on Windows).
var cluster = require("cluster");
var http = require("http");
var os = require("os");
var numCPUs = os.cpus().length;
if (cluster.isMaster) {
console.log("Master process " + process.pid + " is running");
console.log("Forking " + numCPUs + " workers...");
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on("exit", function (worker, code, signal) {
console.log("Worker " + worker.process.pid + " died (code: " + code + ", signal: " + signal + ")");
console.log("Forking a replacement worker...");
cluster.fork();
});
} else {
http.createServer(function (req, res) {
res.writeHead(200);
res.end("Handled by worker " + process.pid + "\n");
}).listen(3000);
console.log("Worker " + process.pid + " started");
}
Running this on an 8-core machine produces:
Master process 12345 is running
Forking 8 workers...
Worker 12346 started
Worker 12347 started
Worker 12348 started
Worker 12349 started
Worker 12350 started
Worker 12351 started
Worker 12352 started
Worker 12353 started
The master process does not handle HTTP requests. Its sole job is to manage the lifecycle of workers — fork them, monitor them, and replace them when they crash. The workers handle all incoming traffic.
Building a Clustered Express.js Server
In real applications, you are not using raw http.createServer. You are using Express. The pattern stays the same, but you split your code into two concerns: cluster management in the entry point, and application logic in a separate module.
app.js — the Express application:
var express = require("express");
function createApp() {
var app = express();
app.get("/", function (req, res) {
res.json({
pid: process.pid,
uptime: process.uptime(),
memory: Math.round(process.memoryUsage().rss / 1024 / 1024) + "MB"
});
});
app.get("/health", function (req, res) {
res.status(200).json({ status: "ok", pid: process.pid });
});
app.get("/heavy", function (req, res) {
// Simulate CPU-bound work
var result = 0;
for (var i = 0; i < 5e7; i++) {
result += Math.sqrt(i);
}
res.json({ pid: process.pid, result: result });
});
return app;
}
module.exports = createApp;
server.js — the cluster manager:
var cluster = require("cluster");
var os = require("os");
var numCPUs = os.cpus().length;
var PORT = process.env.PORT || 3000;
if (cluster.isMaster) {
console.log("Master " + process.pid + " starting " + numCPUs + " workers on port " + PORT);
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on("exit", function (worker, code, signal) {
if (!worker.exitedAfterDisconnect) {
console.log("Worker " + worker.process.pid + " crashed. Restarting...");
cluster.fork();
}
});
cluster.on("online", function (worker) {
console.log("Worker " + worker.process.pid + " is online");
});
} else {
var createApp = require("./app");
var app = createApp();
app.listen(PORT, function () {
console.log("Worker " + process.pid + " listening on port " + PORT);
});
}
The createApp factory function pattern is important. It ensures each worker gets a fresh Express instance. If you just export an already-constructed app, you risk shared state issues across module require caching, especially during testing.
Graceful Restart and Zero-Downtime Reloads
The naive approach of killing all workers and starting new ones creates a window where no workers are available to handle requests. Zero-downtime restart means you replace workers one at a time, waiting for each new worker to come online before killing the next old one.
var cluster = require("cluster");
var os = require("os");
var numCPUs = os.cpus().length;
var workers = [];
if (cluster.isMaster) {
// Fork initial workers
for (var i = 0; i < numCPUs; i++) {
workers.push(cluster.fork());
}
// Zero-downtime restart on SIGUSR2
process.on("SIGUSR2", function () {
console.log("Received SIGUSR2 - starting zero-downtime restart");
var workerIds = Object.keys(cluster.workers);
var index = 0;
function restartNext() {
if (index >= workerIds.length) {
console.log("All workers restarted");
return;
}
var id = workerIds[index];
var worker = cluster.workers[id];
if (!worker) {
index++;
restartNext();
return;
}
console.log("Restarting worker " + worker.process.pid);
// Fork a replacement first
var newWorker = cluster.fork();
newWorker.on("listening", function () {
console.log("New worker " + newWorker.process.pid + " is listening. Disconnecting old worker " + worker.process.pid);
worker.disconnect();
// Force kill if worker does not exit within 5 seconds
var killTimer = setTimeout(function () {
console.log("Force killing worker " + worker.process.pid);
worker.kill();
}, 5000);
worker.on("exit", function () {
clearTimeout(killTimer);
index++;
restartNext();
});
});
}
restartNext();
});
cluster.on("exit", function (worker, code, signal) {
if (!worker.exitedAfterDisconnect) {
console.log("Worker " + worker.process.pid + " crashed unexpectedly. Forking replacement...");
cluster.fork();
}
});
}
Trigger a zero-downtime restart by sending SIGUSR2 to the master process:
# Find the master PID
ps aux | grep "node server.js" | grep -v grep
# Send restart signal
kill -SIGUSR2 12345
The output during restart looks like this:
Received SIGUSR2 - starting zero-downtime restart
Restarting worker 12346
New worker 12354 is listening. Disconnecting old worker 12346
Restarting worker 12347
New worker 12355 is listening. Disconnecting old worker 12347
Restarting worker 12348
New worker 12356 is listening. Disconnecting old worker 12348
All workers restarted
At no point during this sequence are zero workers available. The new worker starts listening before the old one disconnects.
Sticky Sessions for Stateful Connections
The default round-robin distribution works for stateless REST APIs. But if you are using WebSockets, long polling, or server-side sessions stored in memory, you have a problem: a client's second request might land on a different worker than their first, and that worker has no knowledge of the session.
Sticky sessions ensure that all requests from the same client are routed to the same worker. The @socket.io/sticky package handles this for Socket.IO, but for generic HTTP sticky sessions, you can implement IP-based routing in the master process:
var cluster = require("cluster");
var net = require("net");
var os = require("os");
var numCPUs = os.cpus().length;
var PORT = 3000;
if (cluster.isMaster) {
var workers = [];
for (var i = 0; i < numCPUs; i++) {
workers.push(cluster.fork());
}
// Create a raw TCP server for sticky session routing
var server = net.createServer({ pauseOnConnect: true }, function (connection) {
var remoteAddress = connection.remoteAddress || "";
var workerIndex = hashIP(remoteAddress) % workers.length;
var worker = workers[workerIndex];
worker.send("sticky-session:connection", connection);
});
server.listen(PORT, function () {
console.log("Master listening on port " + PORT + " with sticky sessions");
});
} else {
var createApp = require("./app");
var app = createApp();
var http = require("http");
var server = http.createServer(app);
// Do not listen on a port — the master sends connections via IPC
server.listen(0, "localhost");
process.on("message", function (message, connection) {
if (message === "sticky-session:connection") {
server.emit("connection", connection);
connection.resume();
}
});
}
function hashIP(ip) {
var hash = 0;
for (var i = 0; i < ip.length; i++) {
var char = ip.charCodeAt(i);
hash = ((hash << 5) - hash) + char;
hash = hash & hash; // Convert to 32-bit integer
}
return Math.abs(hash);
}
The trade-off with sticky sessions is that you lose even distribution. If many clients share the same IP (corporate NAT, for example), one worker can become overloaded while others sit idle. This is why the better approach for most applications is to externalize state entirely.
Shared State with Redis
The cleanest solution to cross-worker state is to stop storing state in worker memory. Use Redis as a shared session store and cache layer that all workers can access:
var express = require("express");
var session = require("express-session");
var RedisStore = require("connect-redis").default;
var Redis = require("ioredis");
function createApp() {
var app = express();
var redisClient = new Redis({
host: process.env.REDIS_HOST || "127.0.0.1",
port: process.env.REDIS_PORT || 6379,
retryDelayOnFailover: 100,
maxRetriesPerRequest: 3
});
redisClient.on("error", function (err) {
console.error("Redis connection error:", err.message);
});
redisClient.on("connect", function () {
console.log("Worker " + process.pid + " connected to Redis");
});
var store = new RedisStore({ client: redisClient });
app.use(session({
store: store,
secret: process.env.SESSION_SECRET || "your-secret-key",
resave: false,
saveUninitialized: false,
cookie: {
secure: process.env.NODE_ENV === "production",
httpOnly: true,
maxAge: 1000 * 60 * 60 * 24 // 24 hours
}
}));
app.get("/login", function (req, res) {
req.session.userId = "user_12345";
req.session.loginTime = Date.now();
res.json({ message: "Logged in", worker: process.pid });
});
app.get("/profile", function (req, res) {
if (!req.session.userId) {
return res.status(401).json({ error: "Not authenticated" });
}
res.json({
userId: req.session.userId,
loginTime: req.session.loginTime,
servedBy: process.pid
});
});
return app;
}
module.exports = createApp;
With Redis-backed sessions, it does not matter which worker handles a given request. The session data is stored externally and accessible from any worker. This pattern also survives worker restarts — session data persists even when workers are cycled.
The performance overhead is minimal. A Redis session lookup on localhost adds roughly 0.1-0.3ms per request. Across a network, it is typically 0.5-2ms. This is negligible compared to any real route handler logic.
Worker Communication via IPC
Workers in a cluster do not share memory. Each worker has its own V8 heap. If you need workers to communicate — broadcasting a cache invalidation event, coordinating a distributed operation, collecting metrics — you use Inter-Process Communication (IPC) through the master process.
var cluster = require("cluster");
var os = require("os");
if (cluster.isMaster) {
var numCPUs = os.cpus().length;
for (var i = 0; i < numCPUs; i++) {
var worker = cluster.fork();
worker.on("message", function (msg) {
if (msg.type === "broadcast") {
// Forward to all other workers
Object.keys(cluster.workers).forEach(function (id) {
if (cluster.workers[id].process.pid !== msg.fromPid) {
cluster.workers[id].send({
type: msg.type,
data: msg.data,
fromPid: msg.fromPid
});
}
});
}
if (msg.type === "metrics") {
// Collect metrics from workers
console.log("Worker " + msg.fromPid + " metrics:", JSON.stringify(msg.data));
}
});
}
// Send config updates to all workers
setInterval(function () {
Object.keys(cluster.workers).forEach(function (id) {
cluster.workers[id].send({
type: "config-update",
data: { featureFlag: true, maxRequestSize: "10mb" }
});
});
}, 60000);
} else {
// Worker side
var requestCount = 0;
process.on("message", function (msg) {
if (msg.type === "broadcast") {
console.log("Worker " + process.pid + " received broadcast from " + msg.fromPid + ": " + JSON.stringify(msg.data));
}
if (msg.type === "config-update") {
console.log("Worker " + process.pid + " received config update: " + JSON.stringify(msg.data));
}
});
// Send metrics to master every 30 seconds
setInterval(function () {
process.send({
type: "metrics",
fromPid: process.pid,
data: {
requestCount: requestCount,
memory: process.memoryUsage().rss,
uptime: process.uptime()
}
});
}, 30000);
// Broadcast a cache invalidation to other workers
function invalidateCache(key) {
process.send({
type: "broadcast",
fromPid: process.pid,
data: { action: "invalidate-cache", key: key }
});
}
}
IPC is fast — it uses Unix domain sockets on Linux/macOS and named pipes on Windows. Message serialization is the main cost. Keep IPC messages small and avoid sending them in hot loops.
Monitoring Worker Health and Auto-Restart
A production cluster needs more than just restarting crashed workers. You need to detect workers that are alive but unhealthy — memory leaks, event loop starvation, or deadlocked I/O.
var cluster = require("cluster");
var os = require("os");
var WORKER_TIMEOUT = 30000; // 30 seconds without a heartbeat = dead
var HEARTBEAT_INTERVAL = 10000; // Workers send heartbeats every 10 seconds
var MAX_MEMORY_MB = 512; // Kill workers exceeding 512MB RSS
if (cluster.isMaster) {
var workerHealth = {};
for (var i = 0; i < os.cpus().length; i++) {
spawnWorker();
}
function spawnWorker() {
var worker = cluster.fork();
workerHealth[worker.id] = {
lastHeartbeat: Date.now(),
pid: worker.process.pid
};
worker.on("message", function (msg) {
if (msg.type === "heartbeat") {
workerHealth[worker.id] = {
lastHeartbeat: Date.now(),
pid: worker.process.pid,
memory: msg.memory,
eventLoopLag: msg.eventLoopLag,
activeRequests: msg.activeRequests
};
}
});
}
// Health check loop
setInterval(function () {
var now = Date.now();
Object.keys(cluster.workers).forEach(function (id) {
var worker = cluster.workers[id];
var health = workerHealth[id];
if (!health) return;
// Check heartbeat timeout
var timeSinceHeartbeat = now - health.lastHeartbeat;
if (timeSinceHeartbeat > WORKER_TIMEOUT) {
console.error("Worker " + health.pid + " missed heartbeat for " + Math.round(timeSinceHeartbeat / 1000) + "s. Killing...");
worker.kill("SIGKILL");
delete workerHealth[id];
return;
}
// Check memory usage
if (health.memory && health.memory > MAX_MEMORY_MB * 1024 * 1024) {
console.error("Worker " + health.pid + " exceeded memory limit (" + Math.round(health.memory / 1024 / 1024) + "MB). Restarting...");
worker.disconnect();
setTimeout(function () {
if (!worker.isDead()) {
worker.kill("SIGKILL");
}
}, 5000);
delete workerHealth[id];
}
// Check event loop lag
if (health.eventLoopLag && health.eventLoopLag > 1000) {
console.warn("Worker " + health.pid + " event loop lag: " + health.eventLoopLag + "ms");
}
});
// Log cluster status
var totalMemory = 0;
var workerCount = 0;
Object.keys(workerHealth).forEach(function (id) {
if (workerHealth[id].memory) {
totalMemory += workerHealth[id].memory;
workerCount++;
}
});
if (workerCount > 0) {
console.log("Cluster health: " + workerCount + " workers, " + Math.round(totalMemory / 1024 / 1024) + "MB total RSS");
}
}, 5000);
cluster.on("exit", function (worker, code, signal) {
console.log("Worker " + worker.process.pid + " exited (code: " + code + ", signal: " + signal + ")");
delete workerHealth[worker.id];
if (!worker.exitedAfterDisconnect) {
console.log("Unexpected exit. Spawning replacement...");
spawnWorker();
}
});
}
On the worker side, send heartbeats with diagnostic information:
// Worker heartbeat
var activeRequests = 0;
function sendHeartbeat() {
var loopStart = Date.now();
setImmediate(function () {
var eventLoopLag = Date.now() - loopStart;
try {
process.send({
type: "heartbeat",
memory: process.memoryUsage().rss,
eventLoopLag: eventLoopLag,
activeRequests: activeRequests
});
} catch (err) {
// IPC channel might be closed during shutdown
}
});
}
setInterval(sendHeartbeat, 10000);
// Track active requests in middleware
app.use(function (req, res, next) {
activeRequests++;
res.on("finish", function () {
activeRequests--;
});
next();
});
Output from the health monitor during normal operation:
Cluster health: 8 workers, 423MB total RSS
Cluster health: 8 workers, 427MB total RSS
Worker 12348 event loop lag: 1245ms
Worker 12348 exceeded memory limit (538MB). Restarting...
Worker 12348 exited (code: 0, signal: null)
Worker 12358 is online
Cluster health: 8 workers, 391MB total RSS
Cluster Module vs PM2 Cluster Mode
PM2 is a process manager that provides clustering out of the box with a single command:
# PM2 cluster mode - starts one worker per CPU core
pm2 start app.js -i max
# Or with an ecosystem file
pm2 start ecosystem.config.js
// ecosystem.config.js
module.exports = {
apps: [{
name: "my-api",
script: "./app.js",
instances: "max",
exec_mode: "cluster",
max_memory_restart: "500M",
watch: false,
env_production: {
NODE_ENV: "production",
PORT: 3000
}
}]
};
Here is when to use each:
| Feature | Native cluster | PM2 |
|---|---|---|
| Zero-downtime restart | Manual implementation | pm2 reload |
| Log management | DIY | Built-in with rotation |
| Monitoring | DIY | pm2 monit |
| Process management | DIY | Auto-restart, max memory restart |
| Startup scripts | DIY | pm2 startup |
| Deployment | DIY | pm2 deploy |
| Overhead | None | ~30MB for PM2 daemon |
| Customization | Full control | Configuration-based |
| Container environments | Preferred | Not recommended |
My recommendation: use PM2 in traditional VM/bare-metal deployments where you want a full process manager. Use the native cluster module (or just single-process) in containerized environments (Docker, Kubernetes) where the orchestrator handles process management, scaling, and restarts.
In a Kubernetes deployment, running PM2 inside a container is redundant. Kubernetes already handles restarts, health checks, and horizontal scaling. Running multiple workers in a single container also makes resource limits unpredictable. One worker per container with Kubernetes HPA (Horizontal Pod Autoscaler) is the cleaner pattern.
Cluster vs Worker Threads
Worker threads (worker_threads module) and clustering (cluster module) solve different problems. Understanding when to use each is critical.
Cluster module forks separate processes. Each process has its own V8 isolate, its own memory space, and its own event loop. Processes communicate via IPC (serialized messages). This is for scaling HTTP server throughput across cores.
Worker threads spawn threads within the same process. Threads can share memory via SharedArrayBuffer and transfer data via MessagePort. This is for offloading CPU-intensive tasks without blocking the main event loop.
// worker_threads for CPU-intensive tasks
var { Worker, isMainThread, parentPort } = require("worker_threads");
if (isMainThread) {
var express = require("express");
var app = express();
app.get("/fibonacci/:n", function (req, res) {
var worker = new Worker(__filename, {
workerData: { n: parseInt(req.params.n) }
});
worker.on("message", function (result) {
res.json({ result: result, pid: process.pid });
});
worker.on("error", function (err) {
res.status(500).json({ error: err.message });
});
});
app.listen(3000);
} else {
var { workerData } = require("worker_threads");
function fibonacci(n) {
if (n <= 1) return n;
return fibonacci(n - 1) + fibonacci(n - 2);
}
var result = fibonacci(workerData.n);
parentPort.postMessage(result);
}
In practice, use both together: cluster to scale across cores for handling HTTP connections, and worker threads within each worker process to offload heavy computation without blocking the event loop.
Master Process
/ | \
Worker 1 Worker 2 Worker 3 <-- cluster (one per core)
/ \
Thread A Thread B <-- worker_threads (for CPU tasks)
Load Testing Clustered Apps with Autocannon
You should never deploy a cluster configuration without benchmarking it first. Autocannon is the tool I use for HTTP benchmarking from Node.js:
# Install autocannon globally
npm install -g autocannon
# Benchmark single-process server
autocannon -c 100 -d 10 http://localhost:3000/
# Benchmark clustered server (same endpoint)
autocannon -c 100 -d 10 http://localhost:3000/
Here are real numbers from a 4-core machine running the clustered Express server from earlier:
Single process (no clustering):
┌─────────┬──────┬──────┬───────┬──────┬─────────┬─────────┬──────────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼──────┼──────┼───────┼──────┼─────────┼─────────┼──────────┤
│ Latency │ 2 ms │ 4 ms │ 12 ms │ 18 ms│ 4.82 ms │ 3.41 ms │ 89 ms │
└─────────┴──────┴──────┴───────┴──────┴─────────┴─────────┴──────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬──────────┬─────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │
├───────────┼─────────┼─────────┼─────────┼─────────┼──────────┼─────────┤
│ Req/Sec │ 15,231 │ 16,445 │ 19,823 │ 21,112 │ 19,445 │ 1,823 │
└───────────┴─────────┴─────────┴─────────┴─────────┴──────────┴─────────┘
194k requests in 10s, 28.5 MB read
Clustered (4 workers):
┌─────────┬──────┬──────┬───────┬──────┬─────────┬─────────┬──────────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼──────┼──────┼───────┼──────┼─────────┼─────────┼──────────┤
│ Latency │ 0 ms │ 1 ms │ 4 ms │ 7 ms │ 1.42 ms │ 1.89 ms │ 42 ms │
└─────────┴──────┴──────┴───────┴──────┴─────────┴─────────┴──────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬──────────┬─────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │
├───────────┼─────────┼─────────┼─────────┼─────────┼──────────┼─────────┤
│ Req/Sec │ 52,891 │ 55,234 │ 68,912 │ 72,445 │ 66,789 │ 5,234 │
└───────────┴─────────┴─────────┴─────────┴─────────┴──────────┴─────────┘
667k requests in 10s, 98.2 MB read
That is a 3.4x throughput improvement on 4 cores, with 70% lower average latency. You will not see a perfect 4x linear scaling because of IPC overhead, OS scheduling, and shared resources (memory bus, network stack). In my experience, expect 80-90% scaling efficiency per core for I/O-bound workloads and 70-80% for CPU-bound workloads.
Production Deployment Considerations
How Many Workers?
The conventional wisdom is "one worker per CPU core." This is a reasonable default, but not always optimal.
var os = require("os");
// Default: one worker per core
var numWorkers = os.cpus().length;
// For memory-constrained environments, leave headroom
// Each Node.js worker uses 50-100MB base + your app's heap
var availableMemoryMB = os.totalmem() / 1024 / 1024;
var memoryPerWorkerMB = 150; // Estimate for your app
var maxByMemory = Math.floor(availableMemoryMB * 0.8 / memoryPerWorkerMB);
numWorkers = Math.min(numWorkers, maxByMemory);
// For I/O-heavy apps (database queries, API calls), you can over-provision
// because workers spend most of their time waiting
// numWorkers = os.cpus().length * 2;
console.log("Starting " + numWorkers + " workers");
console.log("Available memory: " + Math.round(availableMemoryMB) + "MB");
console.log("Estimated per-worker: " + memoryPerWorkerMB + "MB");
Graceful Shutdown
Workers should finish in-flight requests before exiting. This matters during deployments and when receiving SIGTERM from orchestrators:
// In each worker
var server = app.listen(PORT);
var isShuttingDown = false;
function gracefulShutdown(signal) {
if (isShuttingDown) return;
isShuttingDown = true;
console.log("Worker " + process.pid + " received " + signal + ". Shutting down gracefully...");
// Stop accepting new connections
server.close(function () {
console.log("Worker " + process.pid + " closed all connections");
// Close database pools, Redis connections, etc.
// redisClient.quit();
// pool.end();
process.exit(0);
});
// Force shutdown after timeout
setTimeout(function () {
console.error("Worker " + process.pid + " could not close connections in time. Forcing shutdown...");
process.exit(1);
}, 10000);
}
process.on("SIGTERM", function () { gracefulShutdown("SIGTERM"); });
process.on("SIGINT", function () { gracefulShutdown("SIGINT"); });
Docker Considerations
When running clustered Node.js in Docker, keep these points in mind:
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
# Use dumb-init or tini to handle signals properly
RUN apk add --no-cache tini
ENTRYPOINT ["/sbin/tini", "--"]
# Node.js cluster inside Docker sees container CPU limits
# with --max-old-space-size matching container memory limit
CMD ["node", "--max-old-space-size=384", "server.js"]
EXPOSE 3000
Use tini or dumb-init as PID 1 in your container. Without it, Node.js runs as PID 1, and signals like SIGTERM are not forwarded properly to child worker processes.
Set --max-old-space-size to match your container's memory limit, minus overhead for the OS and non-heap memory. For a 512MB container, 384MB for the V8 heap is a reasonable starting point.
Be aware that os.cpus().length inside a Docker container reports the host's CPU count, not the container's cgroup CPU limit. On a 64-core host with a 2-CPU container limit, you will fork 64 workers and immediately OOM. Use the WORKERS environment variable pattern instead:
var numWorkers = parseInt(process.env.WORKERS) || os.cpus().length;
# docker-compose.yml
services:
api:
build: .
environment:
- WORKERS=2
deploy:
resources:
limits:
cpus: "2"
memory: 1G
Complete Working Example
Here is a production-ready clustered Express.js application that combines everything covered above: graceful shutdown, worker health monitoring, zero-downtime restarts, and Redis-backed sessions.
package.json:
{
"name": "clustered-express-app",
"version": "1.0.0",
"scripts": {
"start": "node server.js",
"benchmark": "autocannon -c 200 -d 10 -p 10 http://localhost:3000/"
},
"dependencies": {
"connect-redis": "^7.1.0",
"express": "^4.18.2",
"express-session": "^1.17.3",
"ioredis": "^5.3.2"
},
"devDependencies": {
"autocannon": "^7.14.0"
}
}
server.js:
var cluster = require("cluster");
var os = require("os");
var NUM_WORKERS = parseInt(process.env.WORKERS) || os.cpus().length;
var PORT = process.env.PORT || 3000;
var WORKER_TIMEOUT_MS = 30000;
var MAX_MEMORY_MB = parseInt(process.env.MAX_WORKER_MEMORY_MB) || 512;
if (cluster.isMaster) {
masterProcess();
} else {
workerProcess();
}
// ─── Master Process ─────────────────────────────────────────
function masterProcess() {
console.log("=== Cluster Master ===");
console.log("PID: " + process.pid);
console.log("Workers: " + NUM_WORKERS);
console.log("Port: " + PORT);
console.log("Max worker memory: " + MAX_MEMORY_MB + "MB");
console.log("");
var workerHealth = {};
// Fork workers
for (var i = 0; i < NUM_WORKERS; i++) {
spawnWorker();
}
function spawnWorker() {
var worker = cluster.fork();
workerHealth[worker.id] = {
pid: worker.process.pid,
lastHeartbeat: Date.now(),
memory: 0,
eventLoopLag: 0,
requestsHandled: 0
};
worker.on("message", function (msg) {
if (msg.type === "heartbeat") {
workerHealth[worker.id] = {
pid: worker.process.pid,
lastHeartbeat: Date.now(),
memory: msg.memory,
eventLoopLag: msg.eventLoopLag,
requestsHandled: msg.requestsHandled
};
}
});
return worker;
}
// Health monitoring
setInterval(function () {
var now = Date.now();
var totalMemory = 0;
var totalRequests = 0;
var healthyWorkers = 0;
Object.keys(cluster.workers).forEach(function (id) {
var worker = cluster.workers[id];
var health = workerHealth[id];
if (!health) return;
// Heartbeat check
if (now - health.lastHeartbeat > WORKER_TIMEOUT_MS) {
console.error("[HEALTH] Worker " + health.pid + " unresponsive for " +
Math.round((now - health.lastHeartbeat) / 1000) + "s. Killing.");
worker.kill("SIGKILL");
delete workerHealth[id];
return;
}
// Memory check
var memMB = Math.round(health.memory / 1024 / 1024);
if (memMB > MAX_MEMORY_MB) {
console.error("[HEALTH] Worker " + health.pid + " using " + memMB + "MB (limit: " + MAX_MEMORY_MB + "MB). Restarting.");
worker.disconnect();
setTimeout(function () {
if (!worker.isDead()) worker.kill("SIGKILL");
}, 5000);
delete workerHealth[id];
return;
}
totalMemory += health.memory;
totalRequests += health.requestsHandled;
healthyWorkers++;
});
console.log("[CLUSTER] " + healthyWorkers + " healthy workers | " +
Math.round(totalMemory / 1024 / 1024) + "MB total RSS | " +
totalRequests + " total requests handled");
}, 15000);
// Auto-restart crashed workers
cluster.on("exit", function (worker, code, signal) {
delete workerHealth[worker.id];
if (worker.exitedAfterDisconnect) {
console.log("[CLUSTER] Worker " + worker.process.pid + " disconnected gracefully.");
} else {
console.error("[CLUSTER] Worker " + worker.process.pid + " crashed (code: " + code + ", signal: " + signal + "). Spawning replacement.");
}
// Always replace dead workers to maintain desired count
spawnWorker();
});
// Zero-downtime restart on SIGUSR2
process.on("SIGUSR2", function () {
console.log("[CLUSTER] Zero-downtime restart initiated");
var workerIds = Object.keys(cluster.workers);
var idx = 0;
function restartNext() {
if (idx >= workerIds.length) {
console.log("[CLUSTER] All workers restarted successfully");
return;
}
var id = workerIds[idx];
var oldWorker = cluster.workers[id];
if (!oldWorker) {
idx++;
restartNext();
return;
}
var oldPid = oldWorker.process.pid;
console.log("[CLUSTER] Replacing worker " + oldPid);
var newWorker = spawnWorker();
newWorker.on("listening", function () {
console.log("[CLUSTER] New worker " + newWorker.process.pid + " online. Disconnecting " + oldPid);
oldWorker.disconnect();
var forceKillTimer = setTimeout(function () {
if (!oldWorker.isDead()) {
console.warn("[CLUSTER] Force killing worker " + oldPid);
oldWorker.kill("SIGKILL");
}
}, 10000);
oldWorker.on("exit", function () {
clearTimeout(forceKillTimer);
idx++;
restartNext();
});
});
}
restartNext();
});
// Graceful master shutdown
process.on("SIGTERM", function () {
console.log("[CLUSTER] SIGTERM received. Shutting down all workers...");
Object.keys(cluster.workers).forEach(function (id) {
cluster.workers[id].send({ type: "shutdown" });
cluster.workers[id].disconnect();
});
setTimeout(function () {
console.log("[CLUSTER] Forcing exit");
process.exit(0);
}, 15000);
});
}
// ─── Worker Process ─────────────────────────────────────────
function workerProcess() {
var createApp = require("./app");
var app = createApp();
var requestsHandled = 0;
var isShuttingDown = false;
// Track requests for health reporting
app.use(function (req, res, next) {
requestsHandled++;
next();
});
var server = app.listen(PORT, function () {
console.log("[WORKER " + process.pid + "] Listening on port " + PORT);
});
// Heartbeat
setInterval(function () {
if (isShuttingDown) return;
var lagStart = Date.now();
setImmediate(function () {
try {
process.send({
type: "heartbeat",
memory: process.memoryUsage().rss,
eventLoopLag: Date.now() - lagStart,
requestsHandled: requestsHandled
});
} catch (err) {
// IPC channel closed
}
});
}, 10000);
// Shutdown handler
function shutdown(reason) {
if (isShuttingDown) return;
isShuttingDown = true;
console.log("[WORKER " + process.pid + "] Shutting down: " + reason);
server.close(function () {
console.log("[WORKER " + process.pid + "] All connections drained");
process.exit(0);
});
setTimeout(function () {
console.error("[WORKER " + process.pid + "] Forced shutdown after timeout");
process.exit(1);
}, 10000);
}
process.on("SIGTERM", function () { shutdown("SIGTERM"); });
process.on("SIGINT", function () { shutdown("SIGINT"); });
process.on("message", function (msg) {
if (msg.type === "shutdown") {
shutdown("master-requested");
}
});
}
app.js:
var express = require("express");
var session = require("express-session");
var RedisStore = require("connect-redis").default;
var Redis = require("ioredis");
function createApp() {
var app = express();
// Redis session store (shared across all workers)
var redisClient = new Redis({
host: process.env.REDIS_HOST || "127.0.0.1",
port: parseInt(process.env.REDIS_PORT) || 6379,
retryDelayOnFailover: 100,
maxRetriesPerRequest: 3,
lazyConnect: true
});
redisClient.on("error", function (err) {
console.error("[WORKER " + process.pid + "] Redis error: " + err.message);
});
redisClient.connect().catch(function (err) {
console.warn("[WORKER " + process.pid + "] Redis not available. Sessions will use memory store.");
});
var sessionConfig = {
secret: process.env.SESSION_SECRET || "change-me-in-production",
resave: false,
saveUninitialized: false,
cookie: {
secure: process.env.NODE_ENV === "production",
httpOnly: true,
maxAge: 1000 * 60 * 60 * 24
}
};
// Only use Redis store if Redis is available
if (redisClient.status === "ready" || redisClient.status === "connecting") {
sessionConfig.store = new RedisStore({ client: redisClient });
}
app.use(session(sessionConfig));
app.use(express.json());
// Routes
app.get("/", function (req, res) {
res.json({
message: "Clustered Express.js Server",
worker: process.pid,
uptime: Math.round(process.uptime()) + "s",
memory: Math.round(process.memoryUsage().rss / 1024 / 1024) + "MB"
});
});
app.get("/health", function (req, res) {
res.json({
status: "ok",
worker: process.pid,
uptime: process.uptime(),
memory: process.memoryUsage()
});
});
app.post("/login", function (req, res) {
req.session.user = { id: "user_" + Date.now(), role: "admin" };
res.json({ message: "Logged in", worker: process.pid, session: req.session.id });
});
app.get("/session", function (req, res) {
res.json({
worker: process.pid,
sessionId: req.session.id,
user: req.session.user || null
});
});
app.get("/heavy", function (req, res) {
var n = parseInt(req.query.n) || 1000000;
var sum = 0;
for (var i = 0; i < n; i++) {
sum += Math.sqrt(i);
}
res.json({ result: sum, worker: process.pid });
});
return app;
}
module.exports = createApp;
Run it:
# Start the clustered server
WORKERS=4 node server.js
# In another terminal, load test it
autocannon -c 200 -d 10 http://localhost:3000/
# Test session persistence across workers
curl -c cookies.txt http://localhost:3000/login -X POST
curl -b cookies.txt http://localhost:3000/session
curl -b cookies.txt http://localhost:3000/session
curl -b cookies.txt http://localhost:3000/session
# Each request may hit a different worker, but session data is consistent
# Trigger zero-downtime restart (Linux/macOS)
kill -SIGUSR2 $(pgrep -f "node server.js" | head -1)
Common Issues and Troubleshooting
1. Port Already in Use After Worker Crash
Error: listen EADDRINUSE: address already in use :::3000
at Server.setupListenHandle [as _listen2] (net.js:1334:16)
This happens when a crashed worker did not release its socket. The master still has the listening socket, but the replacement worker tries to bind independently. Fix: make sure workers use the shared master socket (the default cluster behavior). If you see this, you have likely called server.listen(PORT) in both master and worker code paths.
2. Worker Exit Code 12 (Insufficient Resources)
[CLUSTER] Worker 15234 crashed (code: 12, signal: null). Spawning replacement.
[CLUSTER] Worker 15240 crashed (code: 12, signal: null). Spawning replacement.
Exit code 12 means the worker could not allocate resources — typically memory. This creates a crash loop where the master endlessly respawns workers that immediately die. Add backoff logic:
var restartCount = {};
var MAX_RESTARTS = 5;
var RESTART_WINDOW = 60000;
cluster.on("exit", function (worker, code, signal) {
var now = Date.now();
var pid = worker.process.pid;
if (!restartCount[pid]) {
restartCount[pid] = [];
}
restartCount[pid].push(now);
// Remove restarts outside the window
restartCount[pid] = restartCount[pid].filter(function (t) {
return now - t < RESTART_WINDOW;
});
if (restartCount[pid].length >= MAX_RESTARTS) {
console.error("[CLUSTER] Worker crash loop detected. " + MAX_RESTARTS + " crashes in " + (RESTART_WINDOW / 1000) + "s. Not restarting.");
return;
}
cluster.fork();
});
3. SIGTERM Not Reaching Workers in Docker
npm ERR! code ELIFECYCLE
npm ERR! errno 137
Exit code 137 means the process was killed by SIGKILL (128 + 9). This happens when Docker sends SIGTERM, your Node.js master does not forward it to workers, and after 10 seconds Docker sends SIGKILL. Fix: use tini as PID 1, or explicitly forward signals in the master process.
# Bad - npm swallows signals
CMD ["npm", "start"]
# Good - tini forwards signals correctly
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["node", "server.js"]
4. Sessions Lost After Restart
GET /profile 401 Unauthorized
# User was logged in, but session disappeared after worker restart
This happens when you use the default in-memory session store with clustering. Each worker has its own MemoryStore, so a session created on worker A does not exist on worker B. Even worse, when a worker restarts, all its sessions are gone. The fix is always Redis-backed sessions in clustered environments, as shown in the complete example above.
5. os.cpus().length Returns Host CPU Count in Containers
[CLUSTER] Starting 64 workers...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
Inside a Docker container limited to 2 CPUs, os.cpus().length still returns the host machine's CPU count. Spawning 64 workers in a container with 1GB of memory exhausts it immediately. Always use an environment variable to control worker count in containerized deployments.
Best Practices
Always set an explicit worker count in containers. Never rely on
os.cpus().lengthinside Docker or Kubernetes. Use an environment variable likeWORKERS=2and fall back toos.cpus().lengthonly for bare-metal deployments.Externalize all shared state to Redis or a database. Do not use in-memory caches, in-memory sessions, or module-level variables for state that must be shared across workers. Each worker is a separate process with its own memory space. In-memory state will be inconsistent across workers and lost on worker restart.
Implement graceful shutdown with a timeout. Workers must stop accepting new connections, drain in-flight requests, close database pools, and then exit. Always set a hard timeout (10-15 seconds) after which the worker force-exits, or orchestrators will SIGKILL you.
Add crash loop detection. A worker that crashes on startup will be restarted by the master, crash again, and create an infinite fork-crash loop that consumes all system resources. Track restart frequency and stop respawning after a threshold (e.g., 5 crashes in 60 seconds).
Use worker threads for CPU-bound tasks, not more cluster workers. If your workload mixes I/O (database queries, API calls) with CPU-bound operations (image processing, encryption, heavy computation), use the cluster module for I/O scaling and worker threads for offloading CPU tasks. Over-provisioning cluster workers beyond your core count causes context switching overhead that hurts performance.
Monitor worker health actively, not just process status. A worker process can be alive but unhealthy — stuck in a long synchronous operation, leaking memory, or experiencing event loop starvation. Send periodic heartbeats from workers to the master and kill unresponsive workers after a timeout.
Set
--max-old-space-sizeper worker. Each worker gets its own V8 heap. On a machine with 4GB of RAM and 4 workers, setting--max-old-space-size=768per worker gives each worker 768MB of heap and leaves headroom for the OS and non-heap memory.Prefer single-process-per-container in Kubernetes. Kubernetes handles horizontal scaling, restarts, and health checks. Running a cluster inside a container makes resource limits unpredictable and duplicates orchestration logic. If you need more throughput, increase the replica count in your Deployment, not the worker count inside the container.
Test with realistic load before deploying cluster changes. Use autocannon or wrk to benchmark before and after clustering. Measure latency percentiles (p99, p95), not just average throughput. A misconfigured cluster can actually perform worse than a single process due to IPC overhead and memory pressure.
