Node.js Clustering for Multi-Core Systems

Shane

2/8/2026

30 min read

A practical guide to Node.js clustering for multi-core utilization, covering the cluster module, Express.js clustering, graceful restarts, sticky sessions, worker health monitoring, and PM2 comparison.

nodejs performance express clustering multi-core scalability

Node.js Clustering for Multi-Core Systems

Node.js runs on a single thread by default, which means a 16-core production server is sitting at 6% utilization under load while your application chokes on requests. The built-in cluster module solves this by forking multiple worker processes that share the same server port, distributing incoming connections across all available CPU cores. If you are running Node.js in production on anything larger than a single-core VM and you are not clustering, you are leaving performance on the table.

Prerequisites

Node.js v16+ installed (v18+ recommended for production)
Basic familiarity with Express.js
Redis installed locally or accessible (for session sharing)
npm packages: express, ioredis, connect-redis, express-session, autocannon

npm install express ioredis connect-redis express-session
npm install -g autocannon

Why Node.js Is Single-Threaded

Node.js is built on V8 and libuv. V8 compiles and executes JavaScript on a single thread. Libuv provides an event loop and a thread pool for I/O operations (file system, DNS, some crypto), but your JavaScript code — your route handlers, your business logic, your JSON serialization — all executes on one thread.

This means one CPU core handles all of your application logic. If you have an 8-core machine, seven cores sit idle while your single-threaded process handles every request.

The event loop model is brilliant for I/O-bound workloads. A single thread can handle tens of thousands of concurrent connections because it never blocks waiting for a database response or a file read. But the moment you have CPU-intensive work — parsing large JSON payloads, image processing, heavy computation, even just handling high request throughput — that single thread becomes your bottleneck.

Here is a simple demonstration of the problem:

var http = require("http");
var os = require("os");

var server = http.createServer(function (req, res) {
  // Simulate CPU-bound work
  var sum = 0;
  for (var i = 0; i < 1e7; i++) {
    sum += i;
  }
  res.writeHead(200);
  res.end("Process " + process.pid + " handled this request on core count: " + os.cpus().length);
});

server.listen(3000, function () {
  console.log("Server running on PID " + process.pid);
  console.log("Available CPU cores: " + os.cpus().length);
});

Run this on an 8-core machine and load test it — you will see one core pinned at 100% while the other seven are idle.

The Cluster Module Basics

The cluster module is built into Node.js. No npm install required. It works on a master/worker pattern: one process acts as the master (or "primary" in newer Node.js terminology) and forks child processes called workers. Each worker is a full copy of your application, running in its own V8 isolate with its own memory space.

The key insight is that all workers can share the same server port. The operating system handles distributing incoming connections across the workers using a round-robin algorithm (on Linux/macOS) or through the OS scheduling mechanism (on Windows).

var cluster = require("cluster");
var http = require("http");
var os = require("os");

var numCPUs = os.cpus().length;

if (cluster.isMaster) {
  console.log("Master process " + process.pid + " is running");
  console.log("Forking " + numCPUs + " workers...");

  for (var i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on("exit", function (worker, code, signal) {
    console.log("Worker " + worker.process.pid + " died (code: " + code + ", signal: " + signal + ")");
    console.log("Forking a replacement worker...");
    cluster.fork();
  });
} else {
  http.createServer(function (req, res) {
    res.writeHead(200);
    res.end("Handled by worker " + process.pid + "\n");
  }).listen(3000);

  console.log("Worker " + process.pid + " started");
}

Running this on an 8-core machine produces:

Master process 12345 is running
Forking 8 workers...
Worker 12346 started
Worker 12347 started
Worker 12348 started
Worker 12349 started
Worker 12350 started
Worker 12351 started
Worker 12352 started
Worker 12353 started

The master process does not handle HTTP requests. Its sole job is to manage the lifecycle of workers — fork them, monitor them, and replace them when they crash. The workers handle all incoming traffic.

Building a Clustered Express.js Server

In real applications, you are not using raw http.createServer. You are using Express. The pattern stays the same, but you split your code into two concerns: cluster management in the entry point, and application logic in a separate module.

app.js — the Express application:

var express = require("express");

function createApp() {
  var app = express();

  app.get("/", function (req, res) {
    res.json({
      pid: process.pid,
      uptime: process.uptime(),
      memory: Math.round(process.memoryUsage().rss / 1024 / 1024) + "MB"
    });
  });

  app.get("/health", function (req, res) {
    res.status(200).json({ status: "ok", pid: process.pid });
  });

  app.get("/heavy", function (req, res) {
    // Simulate CPU-bound work
    var result = 0;
    for (var i = 0; i < 5e7; i++) {
      result += Math.sqrt(i);
    }
    res.json({ pid: process.pid, result: result });
  });

  return app;
}

module.exports = createApp;

server.js — the cluster manager:

var cluster = require("cluster");
var os = require("os");

var numCPUs = os.cpus().length;
var PORT = process.env.PORT || 3000;

if (cluster.isMaster) {
  console.log("Master " + process.pid + " starting " + numCPUs + " workers on port " + PORT);

  for (var i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on("exit", function (worker, code, signal) {
    if (!worker.exitedAfterDisconnect) {
      console.log("Worker " + worker.process.pid + " crashed. Restarting...");
      cluster.fork();
    }
  });

  cluster.on("online", function (worker) {
    console.log("Worker " + worker.process.pid + " is online");
  });
} else {
  var createApp = require("./app");
  var app = createApp();

  app.listen(PORT, function () {
    console.log("Worker " + process.pid + " listening on port " + PORT);
  });
}

The createApp factory function pattern is important. It ensures each worker gets a fresh Express instance. If you just export an already-constructed app, you risk shared state issues across module require caching, especially during testing.

Graceful Restart and Zero-Downtime Reloads

The naive approach of killing all workers and starting new ones creates a window where no workers are available to handle requests. Zero-downtime restart means you replace workers one at a time, waiting for each new worker to come online before killing the next old one.

var cluster = require("cluster");
var os = require("os");

var numCPUs = os.cpus().length;
var workers = [];

if (cluster.isMaster) {
  // Fork initial workers
  for (var i = 0; i < numCPUs; i++) {
    workers.push(cluster.fork());
  }

  // Zero-downtime restart on SIGUSR2
  process.on("SIGUSR2", function () {
    console.log("Received SIGUSR2 - starting zero-downtime restart");

    var workerIds = Object.keys(cluster.workers);
    var index = 0;

    function restartNext() {
      if (index >= workerIds.length) {
        console.log("All workers restarted");
        return;
      }

      var id = workerIds[index];
      var worker = cluster.workers[id];

      if (!worker) {
        index++;
        restartNext();
        return;
      }

      console.log("Restarting worker " + worker.process.pid);

      // Fork a replacement first
      var newWorker = cluster.fork();

      newWorker.on("listening", function () {
        console.log("New worker " + newWorker.process.pid + " is listening. Disconnecting old worker " + worker.process.pid);
        worker.disconnect();

        // Force kill if worker does not exit within 5 seconds
        var killTimer = setTimeout(function () {
          console.log("Force killing worker " + worker.process.pid);
          worker.kill();
        }, 5000);

        worker.on("exit", function () {
          clearTimeout(killTimer);
          index++;
          restartNext();
        });
      });
    }

    restartNext();
  });

  cluster.on("exit", function (worker, code, signal) {
    if (!worker.exitedAfterDisconnect) {
      console.log("Worker " + worker.process.pid + " crashed unexpectedly. Forking replacement...");
      cluster.fork();
    }
  });
}

Trigger a zero-downtime restart by sending SIGUSR2 to the master process:

# Find the master PID
ps aux | grep "node server.js" | grep -v grep

# Send restart signal
kill -SIGUSR2 12345

The output during restart looks like this:

Received SIGUSR2 - starting zero-downtime restart
Restarting worker 12346
New worker 12354 is listening. Disconnecting old worker 12346
Restarting worker 12347
New worker 12355 is listening. Disconnecting old worker 12347
Restarting worker 12348
New worker 12356 is listening. Disconnecting old worker 12348
All workers restarted

At no point during this sequence are zero workers available. The new worker starts listening before the old one disconnects.

Sticky Sessions for Stateful Connections

The default round-robin distribution works for stateless REST APIs. But if you are using WebSockets, long polling, or server-side sessions stored in memory, you have a problem: a client's second request might land on a different worker than their first, and that worker has no knowledge of the session.

Sticky sessions ensure that all requests from the same client are routed to the same worker. The @socket.io/sticky package handles this for Socket.IO, but for generic HTTP sticky sessions, you can implement IP-based routing in the master process:

var cluster = require("cluster");
var net = require("net");
var os = require("os");

var numCPUs = os.cpus().length;
var PORT = 3000;

if (cluster.isMaster) {
  var workers = [];

  for (var i = 0; i < numCPUs; i++) {
    workers.push(cluster.fork());
  }

  // Create a raw TCP server for sticky session routing
  var server = net.createServer({ pauseOnConnect: true }, function (connection) {
    var remoteAddress = connection.remoteAddress || "";
    var workerIndex = hashIP(remoteAddress) % workers.length;
    var worker = workers[workerIndex];

    worker.send("sticky-session:connection", connection);
  });

  server.listen(PORT, function () {
    console.log("Master listening on port " + PORT + " with sticky sessions");
  });
} else {
  var createApp = require("./app");
  var app = createApp();
  var http = require("http");
  var server = http.createServer(app);

  // Do not listen on a port — the master sends connections via IPC
  server.listen(0, "localhost");

  process.on("message", function (message, connection) {
    if (message === "sticky-session:connection") {
      server.emit("connection", connection);
      connection.resume();
    }
  });
}

function hashIP(ip) {
  var hash = 0;
  for (var i = 0; i < ip.length; i++) {
    var char = ip.charCodeAt(i);
    hash = ((hash << 5) - hash) + char;
    hash = hash & hash; // Convert to 32-bit integer
  }
  return Math.abs(hash);
}

The trade-off with sticky sessions is that you lose even distribution. If many clients share the same IP (corporate NAT, for example), one worker can become overloaded while others sit idle. This is why the better approach for most applications is to externalize state entirely.

Shared State with Redis

The cleanest solution to cross-worker state is to stop storing state in worker memory. Use Redis as a shared session store and cache layer that all workers can access:

var express = require("express");
var session = require("express-session");
var RedisStore = require("connect-redis").default;
var Redis = require("ioredis");

function createApp() {
  var app = express();

  var redisClient = new Redis({
    host: process.env.REDIS_HOST || "127.0.0.1",
    port: process.env.REDIS_PORT || 6379,
    retryDelayOnFailover: 100,
    maxRetriesPerRequest: 3
  });

  redisClient.on("error", function (err) {
    console.error("Redis connection error:", err.message);
  });

  redisClient.on("connect", function () {
    console.log("Worker " + process.pid + " connected to Redis");
  });

  var store = new RedisStore({ client: redisClient });

  app.use(session({
    store: store,
    secret: process.env.SESSION_SECRET || "your-secret-key",
    resave: false,
    saveUninitialized: false,
    cookie: {
      secure: process.env.NODE_ENV === "production",
      httpOnly: true,
      maxAge: 1000 * 60 * 60 * 24 // 24 hours
    }
  }));

  app.get("/login", function (req, res) {
    req.session.userId = "user_12345";
    req.session.loginTime = Date.now();
    res.json({ message: "Logged in", worker: process.pid });
  });

  app.get("/profile", function (req, res) {
    if (!req.session.userId) {
      return res.status(401).json({ error: "Not authenticated" });
    }
    res.json({
      userId: req.session.userId,
      loginTime: req.session.loginTime,
      servedBy: process.pid
    });
  });

  return app;
}

module.exports = createApp;

With Redis-backed sessions, it does not matter which worker handles a given request. The session data is stored externally and accessible from any worker. This pattern also survives worker restarts — session data persists even when workers are cycled.

The performance overhead is minimal. A Redis session lookup on localhost adds roughly 0.1-0.3ms per request. Across a network, it is typically 0.5-2ms. This is negligible compared to any real route handler logic.

Worker Communication via IPC

Workers in a cluster do not share memory. Each worker has its own V8 heap. If you need workers to communicate — broadcasting a cache invalidation event, coordinating a distributed operation, collecting metrics — you use Inter-Process Communication (IPC) through the master process.

var cluster = require("cluster");
var os = require("os");

if (cluster.isMaster) {
  var numCPUs = os.cpus().length;

  for (var i = 0; i < numCPUs; i++) {
    var worker = cluster.fork();

    worker.on("message", function (msg) {
      if (msg.type === "broadcast") {
        // Forward to all other workers
        Object.keys(cluster.workers).forEach(function (id) {
          if (cluster.workers[id].process.pid !== msg.fromPid) {
            cluster.workers[id].send({
              type: msg.type,
              data: msg.data,
              fromPid: msg.fromPid
            });
          }
        });
      }

      if (msg.type === "metrics") {
        // Collect metrics from workers
        console.log("Worker " + msg.fromPid + " metrics:", JSON.stringify(msg.data));
      }
    });
  }

  // Send config updates to all workers
  setInterval(function () {
    Object.keys(cluster.workers).forEach(function (id) {
      cluster.workers[id].send({
        type: "config-update",
        data: { featureFlag: true, maxRequestSize: "10mb" }
      });
    });
  }, 60000);

} else {
  // Worker side
  var requestCount = 0;

  process.on("message", function (msg) {
    if (msg.type === "broadcast") {
      console.log("Worker " + process.pid + " received broadcast from " + msg.fromPid + ": " + JSON.stringify(msg.data));
    }
    if (msg.type === "config-update") {
      console.log("Worker " + process.pid + " received config update: " + JSON.stringify(msg.data));
    }
  });

  // Send metrics to master every 30 seconds
  setInterval(function () {
    process.send({
      type: "metrics",
      fromPid: process.pid,
      data: {
        requestCount: requestCount,
        memory: process.memoryUsage().rss,
        uptime: process.uptime()
      }
    });
  }, 30000);

  // Broadcast a cache invalidation to other workers
  function invalidateCache(key) {
    process.send({
      type: "broadcast",
      fromPid: process.pid,
      data: { action: "invalidate-cache", key: key }
    });
  }
}

IPC is fast — it uses Unix domain sockets on Linux/macOS and named pipes on Windows. Message serialization is the main cost. Keep IPC messages small and avoid sending them in hot loops.

Monitoring Worker Health and Auto-Restart

A production cluster needs more than just restarting crashed workers. You need to detect workers that are alive but unhealthy — memory leaks, event loop starvation, or deadlocked I/O.

var cluster = require("cluster");
var os = require("os");

var WORKER_TIMEOUT = 30000; // 30 seconds without a heartbeat = dead
var HEARTBEAT_INTERVAL = 10000; // Workers send heartbeats every 10 seconds
var MAX_MEMORY_MB = 512; // Kill workers exceeding 512MB RSS

if (cluster.isMaster) {
  var workerHealth = {};

  for (var i = 0; i < os.cpus().length; i++) {
    spawnWorker();
  }

  function spawnWorker() {
    var worker = cluster.fork();
    workerHealth[worker.id] = {
      lastHeartbeat: Date.now(),
      pid: worker.process.pid
    };

    worker.on("message", function (msg) {
      if (msg.type === "heartbeat") {
        workerHealth[worker.id] = {
          lastHeartbeat: Date.now(),
          pid: worker.process.pid,
          memory: msg.memory,
          eventLoopLag: msg.eventLoopLag,
          activeRequests: msg.activeRequests
        };
      }
    });
  }

  // Health check loop
  setInterval(function () {
    var now = Date.now();

    Object.keys(cluster.workers).forEach(function (id) {
      var worker = cluster.workers[id];
      var health = workerHealth[id];

      if (!health) return;

      // Check heartbeat timeout
      var timeSinceHeartbeat = now - health.lastHeartbeat;
      if (timeSinceHeartbeat > WORKER_TIMEOUT) {
        console.error("Worker " + health.pid + " missed heartbeat for " + Math.round(timeSinceHeartbeat / 1000) + "s. Killing...");
        worker.kill("SIGKILL");
        delete workerHealth[id];
        return;
      }

      // Check memory usage
      if (health.memory && health.memory > MAX_MEMORY_MB * 1024 * 1024) {
        console.error("Worker " + health.pid + " exceeded memory limit (" + Math.round(health.memory / 1024 / 1024) + "MB). Restarting...");
        worker.disconnect();
        setTimeout(function () {
          if (!worker.isDead()) {
            worker.kill("SIGKILL");
          }
        }, 5000);
        delete workerHealth[id];
      }

      // Check event loop lag
      if (health.eventLoopLag && health.eventLoopLag > 1000) {
        console.warn("Worker " + health.pid + " event loop lag: " + health.eventLoopLag + "ms");
      }
    });

    // Log cluster status
    var totalMemory = 0;
    var workerCount = 0;
    Object.keys(workerHealth).forEach(function (id) {
      if (workerHealth[id].memory) {
        totalMemory += workerHealth[id].memory;
        workerCount++;
      }
    });
    if (workerCount > 0) {
      console.log("Cluster health: " + workerCount + " workers, " + Math.round(totalMemory / 1024 / 1024) + "MB total RSS");
    }
  }, 5000);

  cluster.on("exit", function (worker, code, signal) {
    console.log("Worker " + worker.process.pid + " exited (code: " + code + ", signal: " + signal + ")");
    delete workerHealth[worker.id];
    if (!worker.exitedAfterDisconnect) {
      console.log("Unexpected exit. Spawning replacement...");
      spawnWorker();
    }
  });
}

On the worker side, send heartbeats with diagnostic information:

// Worker heartbeat
var activeRequests = 0;

function sendHeartbeat() {
  var loopStart = Date.now();
  setImmediate(function () {
    var eventLoopLag = Date.now() - loopStart;
    try {
      process.send({
        type: "heartbeat",
        memory: process.memoryUsage().rss,
        eventLoopLag: eventLoopLag,
        activeRequests: activeRequests
      });
    } catch (err) {
      // IPC channel might be closed during shutdown
    }
  });
}

setInterval(sendHeartbeat, 10000);

// Track active requests in middleware
app.use(function (req, res, next) {
  activeRequests++;
  res.on("finish", function () {
    activeRequests--;
  });
  next();
});

Output from the health monitor during normal operation:

Cluster health: 8 workers, 423MB total RSS
Cluster health: 8 workers, 427MB total RSS
Worker 12348 event loop lag: 1245ms
Worker 12348 exceeded memory limit (538MB). Restarting...
Worker 12348 exited (code: 0, signal: null)
Worker 12358 is online
Cluster health: 8 workers, 391MB total RSS

Cluster Module vs PM2 Cluster Mode

PM2 is a process manager that provides clustering out of the box with a single command:

# PM2 cluster mode - starts one worker per CPU core
pm2 start app.js -i max

# Or with an ecosystem file
pm2 start ecosystem.config.js

// ecosystem.config.js
module.exports = {
  apps: [{
    name: "my-api",
    script: "./app.js",
    instances: "max",
    exec_mode: "cluster",
    max_memory_restart: "500M",
    watch: false,
    env_production: {
      NODE_ENV: "production",
      PORT: 3000
    }
  }]
};

Here is when to use each:

Feature	Native cluster	PM2
Zero-downtime restart	Manual implementation	`pm2 reload`
Log management	DIY	Built-in with rotation
Monitoring	DIY	`pm2 monit`
Process management	DIY	Auto-restart, max memory restart
Startup scripts	DIY	`pm2 startup`
Deployment	DIY	`pm2 deploy`
Overhead	None	~30MB for PM2 daemon
Customization	Full control	Configuration-based
Container environments	Preferred	Not recommended

My recommendation: use PM2 in traditional VM/bare-metal deployments where you want a full process manager. Use the native cluster module (or just single-process) in containerized environments (Docker, Kubernetes) where the orchestrator handles process management, scaling, and restarts.

In a Kubernetes deployment, running PM2 inside a container is redundant. Kubernetes already handles restarts, health checks, and horizontal scaling. Running multiple workers in a single container also makes resource limits unpredictable. One worker per container with Kubernetes HPA (Horizontal Pod Autoscaler) is the cleaner pattern.

Cluster vs Worker Threads

Worker threads (worker_threads module) and clustering (cluster module) solve different problems. Understanding when to use each is critical.

Cluster module forks separate processes. Each process has its own V8 isolate, its own memory space, and its own event loop. Processes communicate via IPC (serialized messages). This is for scaling HTTP server throughput across cores.

Worker threads spawn threads within the same process. Threads can share memory via SharedArrayBuffer and transfer data via MessagePort. This is for offloading CPU-intensive tasks without blocking the main event loop.

// worker_threads for CPU-intensive tasks
var { Worker, isMainThread, parentPort } = require("worker_threads");

if (isMainThread) {
  var express = require("express");
  var app = express();

  app.get("/fibonacci/:n", function (req, res) {
    var worker = new Worker(__filename, {
      workerData: { n: parseInt(req.params.n) }
    });

    worker.on("message", function (result) {
      res.json({ result: result, pid: process.pid });
    });

    worker.on("error", function (err) {
      res.status(500).json({ error: err.message });
    });
  });

  app.listen(3000);
} else {
  var { workerData } = require("worker_threads");

  function fibonacci(n) {
    if (n <= 1) return n;
    return fibonacci(n - 1) + fibonacci(n - 2);
  }

  var result = fibonacci(workerData.n);
  parentPort.postMessage(result);
}

In practice, use both together: cluster to scale across cores for handling HTTP connections, and worker threads within each worker process to offload heavy computation without blocking the event loop.

                    Master Process
                    /    |    \
            Worker 1  Worker 2  Worker 3    <-- cluster (one per core)
             /    \
    Thread A  Thread B                       <-- worker_threads (for CPU tasks)

Load Testing Clustered Apps with Autocannon

You should never deploy a cluster configuration without benchmarking it first. Autocannon is the tool I use for HTTP benchmarking from Node.js:

# Install autocannon globally
npm install -g autocannon

# Benchmark single-process server
autocannon -c 100 -d 10 http://localhost:3000/

# Benchmark clustered server (same endpoint)
autocannon -c 100 -d 10 http://localhost:3000/

Here are real numbers from a 4-core machine running the clustered Express server from earlier:

Single process (no clustering):

┌─────────┬──────┬──────┬───────┬──────┬─────────┬─────────┬──────────┐
│ Stat    │ 2.5% │ 50%  │ 97.5% │ 99%  │ Avg     │ Stdev   │ Max      │
├─────────┼──────┼──────┼───────┼──────┼─────────┼─────────┼──────────┤
│ Latency │ 2 ms │ 4 ms │ 12 ms │ 18 ms│ 4.82 ms │ 3.41 ms │ 89 ms    │
└─────────┴──────┴──────┴───────┴──────┴─────────┴─────────┴──────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬──────────┬─────────┐
│ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5%   │ Avg      │ Stdev   │
├───────────┼─────────┼─────────┼─────────┼─────────┼──────────┼─────────┤
│ Req/Sec   │ 15,231  │ 16,445  │ 19,823  │ 21,112  │ 19,445   │ 1,823   │
└───────────┴─────────┴─────────┴─────────┴─────────┴──────────┴─────────┘
194k requests in 10s, 28.5 MB read

Clustered (4 workers):

┌─────────┬──────┬──────┬───────┬──────┬─────────┬─────────┬──────────┐
│ Stat    │ 2.5% │ 50%  │ 97.5% │ 99%  │ Avg     │ Stdev   │ Max      │
├─────────┼──────┼──────┼───────┼──────┼─────────┼─────────┼──────────┤
│ Latency │ 0 ms │ 1 ms │ 4 ms  │ 7 ms │ 1.42 ms │ 1.89 ms │ 42 ms    │
└─────────┴──────┴──────┴───────┴──────┴─────────┴─────────┴──────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬──────────┬─────────┐
│ Stat      │ 1%      │ 2.5%    │ 50%     │ 97.5%   │ Avg      │ Stdev   │
├───────────┼─────────┼─────────┼─────────┼─────────┼──────────┼─────────┤
│ Req/Sec   │ 52,891  │ 55,234  │ 68,912  │ 72,445  │ 66,789   │ 5,234   │
└───────────┴─────────┴─────────┴─────────┴─────────┴──────────┴─────────┘
667k requests in 10s, 98.2 MB read

That is a 3.4x throughput improvement on 4 cores, with 70% lower average latency. You will not see a perfect 4x linear scaling because of IPC overhead, OS scheduling, and shared resources (memory bus, network stack). In my experience, expect 80-90% scaling efficiency per core for I/O-bound workloads and 70-80% for CPU-bound workloads.

Production Deployment Considerations

How Many Workers?

The conventional wisdom is "one worker per CPU core." This is a reasonable default, but not always optimal.

var os = require("os");

// Default: one worker per core
var numWorkers = os.cpus().length;

// For memory-constrained environments, leave headroom
// Each Node.js worker uses 50-100MB base + your app's heap
var availableMemoryMB = os.totalmem() / 1024 / 1024;
var memoryPerWorkerMB = 150; // Estimate for your app
var maxByMemory = Math.floor(availableMemoryMB * 0.8 / memoryPerWorkerMB);
numWorkers = Math.min(numWorkers, maxByMemory);

// For I/O-heavy apps (database queries, API calls), you can over-provision
// because workers spend most of their time waiting
// numWorkers = os.cpus().length * 2;

console.log("Starting " + numWorkers + " workers");
console.log("Available memory: " + Math.round(availableMemoryMB) + "MB");
console.log("Estimated per-worker: " + memoryPerWorkerMB + "MB");

Graceful Shutdown

Workers should finish in-flight requests before exiting. This matters during deployments and when receiving SIGTERM from orchestrators:

// In each worker
var server = app.listen(PORT);
var isShuttingDown = false;

function gracefulShutdown(signal) {
  if (isShuttingDown) return;
  isShuttingDown = true;

  console.log("Worker " + process.pid + " received " + signal + ". Shutting down gracefully...");

  // Stop accepting new connections
  server.close(function () {
    console.log("Worker " + process.pid + " closed all connections");

    // Close database pools, Redis connections, etc.
    // redisClient.quit();
    // pool.end();

    process.exit(0);
  });

  // Force shutdown after timeout
  setTimeout(function () {
    console.error("Worker " + process.pid + " could not close connections in time. Forcing shutdown...");
    process.exit(1);
  }, 10000);
}

process.on("SIGTERM", function () { gracefulShutdown("SIGTERM"); });
process.on("SIGINT", function () { gracefulShutdown("SIGINT"); });

Docker Considerations

When running clustered Node.js in Docker, keep these points in mind:

FROM node:20-alpine

WORKDIR /app

COPY package*.json ./
RUN npm ci --only=production

COPY . .

# Use dumb-init or tini to handle signals properly
RUN apk add --no-cache tini
ENTRYPOINT ["/sbin/tini", "--"]

# Node.js cluster inside Docker sees container CPU limits
# with --max-old-space-size matching container memory limit
CMD ["node", "--max-old-space-size=384", "server.js"]

EXPOSE 3000

Use tini or dumb-init as PID 1 in your container. Without it, Node.js runs as PID 1, and signals like SIGTERM are not forwarded properly to child worker processes.

Set --max-old-space-size to match your container's memory limit, minus overhead for the OS and non-heap memory. For a 512MB container, 384MB for the V8 heap is a reasonable starting point.

Be aware that os.cpus().length inside a Docker container reports the host's CPU count, not the container's cgroup CPU limit. On a 64-core host with a 2-CPU container limit, you will fork 64 workers and immediately OOM. Use the WORKERS environment variable pattern instead:

var numWorkers = parseInt(process.env.WORKERS) || os.cpus().length;

# docker-compose.yml
services:
  api:
    build: .
    environment:
      - WORKERS=2
    deploy:
      resources:
        limits:
          cpus: "2"
          memory: 1G

Complete Working Example

Here is a production-ready clustered Express.js application that combines everything covered above: graceful shutdown, worker health monitoring, zero-downtime restarts, and Redis-backed sessions.

package.json:

{
  "name": "clustered-express-app",
  "version": "1.0.0",
  "scripts": {
    "start": "node server.js",
    "benchmark": "autocannon -c 200 -d 10 -p 10 http://localhost:3000/"
  },
  "dependencies": {
    "connect-redis": "^7.1.0",
    "express": "^4.18.2",
    "express-session": "^1.17.3",
    "ioredis": "^5.3.2"
  },
  "devDependencies": {
    "autocannon": "^7.14.0"
  }
}

server.js:

var cluster = require("cluster");
var os = require("os");

var NUM_WORKERS = parseInt(process.env.WORKERS) || os.cpus().length;
var PORT = process.env.PORT || 3000;
var WORKER_TIMEOUT_MS = 30000;
var MAX_MEMORY_MB = parseInt(process.env.MAX_WORKER_MEMORY_MB) || 512;

if (cluster.isMaster) {
  masterProcess();
} else {
  workerProcess();
}

// ─── Master Process ─────────────────────────────────────────

function masterProcess() {
  console.log("=== Cluster Master ===");
  console.log("PID: " + process.pid);
  console.log("Workers: " + NUM_WORKERS);
  console.log("Port: " + PORT);
  console.log("Max worker memory: " + MAX_MEMORY_MB + "MB");
  console.log("");

  var workerHealth = {};

  // Fork workers
  for (var i = 0; i < NUM_WORKERS; i++) {
    spawnWorker();
  }

  function spawnWorker() {
    var worker = cluster.fork();

    workerHealth[worker.id] = {
      pid: worker.process.pid,
      lastHeartbeat: Date.now(),
      memory: 0,
      eventLoopLag: 0,
      requestsHandled: 0
    };

    worker.on("message", function (msg) {
      if (msg.type === "heartbeat") {
        workerHealth[worker.id] = {
          pid: worker.process.pid,
          lastHeartbeat: Date.now(),
          memory: msg.memory,
          eventLoopLag: msg.eventLoopLag,
          requestsHandled: msg.requestsHandled
        };
      }
    });

    return worker;
  }

  // Health monitoring
  setInterval(function () {
    var now = Date.now();
    var totalMemory = 0;
    var totalRequests = 0;
    var healthyWorkers = 0;

    Object.keys(cluster.workers).forEach(function (id) {
      var worker = cluster.workers[id];
      var health = workerHealth[id];
      if (!health) return;

      // Heartbeat check
      if (now - health.lastHeartbeat > WORKER_TIMEOUT_MS) {
        console.error("[HEALTH] Worker " + health.pid + " unresponsive for " +
          Math.round((now - health.lastHeartbeat) / 1000) + "s. Killing.");
        worker.kill("SIGKILL");
        delete workerHealth[id];
        return;
      }

      // Memory check
      var memMB = Math.round(health.memory / 1024 / 1024);
      if (memMB > MAX_MEMORY_MB) {
        console.error("[HEALTH] Worker " + health.pid + " using " + memMB + "MB (limit: " + MAX_MEMORY_MB + "MB). Restarting.");
        worker.disconnect();
        setTimeout(function () {
          if (!worker.isDead()) worker.kill("SIGKILL");
        }, 5000);
        delete workerHealth[id];
        return;
      }

      totalMemory += health.memory;
      totalRequests += health.requestsHandled;
      healthyWorkers++;
    });

    console.log("[CLUSTER] " + healthyWorkers + " healthy workers | " +
      Math.round(totalMemory / 1024 / 1024) + "MB total RSS | " +
      totalRequests + " total requests handled");
  }, 15000);

  // Auto-restart crashed workers
  cluster.on("exit", function (worker, code, signal) {
    delete workerHealth[worker.id];

    if (worker.exitedAfterDisconnect) {
      console.log("[CLUSTER] Worker " + worker.process.pid + " disconnected gracefully.");
    } else {
      console.error("[CLUSTER] Worker " + worker.process.pid + " crashed (code: " + code + ", signal: " + signal + "). Spawning replacement.");
    }

    // Always replace dead workers to maintain desired count
    spawnWorker();
  });

  // Zero-downtime restart on SIGUSR2
  process.on("SIGUSR2", function () {
    console.log("[CLUSTER] Zero-downtime restart initiated");
    var workerIds = Object.keys(cluster.workers);
    var idx = 0;

    function restartNext() {
      if (idx >= workerIds.length) {
        console.log("[CLUSTER] All workers restarted successfully");
        return;
      }

      var id = workerIds[idx];
      var oldWorker = cluster.workers[id];
      if (!oldWorker) {
        idx++;
        restartNext();
        return;
      }

      var oldPid = oldWorker.process.pid;
      console.log("[CLUSTER] Replacing worker " + oldPid);

      var newWorker = spawnWorker();

      newWorker.on("listening", function () {
        console.log("[CLUSTER] New worker " + newWorker.process.pid + " online. Disconnecting " + oldPid);
        oldWorker.disconnect();

        var forceKillTimer = setTimeout(function () {
          if (!oldWorker.isDead()) {
            console.warn("[CLUSTER] Force killing worker " + oldPid);
            oldWorker.kill("SIGKILL");
          }
        }, 10000);

        oldWorker.on("exit", function () {
          clearTimeout(forceKillTimer);
          idx++;
          restartNext();
        });
      });
    }

    restartNext();
  });

  // Graceful master shutdown
  process.on("SIGTERM", function () {
    console.log("[CLUSTER] SIGTERM received. Shutting down all workers...");

    Object.keys(cluster.workers).forEach(function (id) {
      cluster.workers[id].send({ type: "shutdown" });
      cluster.workers[id].disconnect();
    });

    setTimeout(function () {
      console.log("[CLUSTER] Forcing exit");
      process.exit(0);
    }, 15000);
  });
}

// ─── Worker Process ─────────────────────────────────────────

function workerProcess() {
  var createApp = require("./app");
  var app = createApp();
  var requestsHandled = 0;
  var isShuttingDown = false;

  // Track requests for health reporting
  app.use(function (req, res, next) {
    requestsHandled++;
    next();
  });

  var server = app.listen(PORT, function () {
    console.log("[WORKER " + process.pid + "] Listening on port " + PORT);
  });

  // Heartbeat
  setInterval(function () {
    if (isShuttingDown) return;
    var lagStart = Date.now();
    setImmediate(function () {
      try {
        process.send({
          type: "heartbeat",
          memory: process.memoryUsage().rss,
          eventLoopLag: Date.now() - lagStart,
          requestsHandled: requestsHandled
        });
      } catch (err) {
        // IPC channel closed
      }
    });
  }, 10000);

  // Shutdown handler
  function shutdown(reason) {
    if (isShuttingDown) return;
    isShuttingDown = true;
    console.log("[WORKER " + process.pid + "] Shutting down: " + reason);

    server.close(function () {
      console.log("[WORKER " + process.pid + "] All connections drained");
      process.exit(0);
    });

    setTimeout(function () {
      console.error("[WORKER " + process.pid + "] Forced shutdown after timeout");
      process.exit(1);
    }, 10000);
  }

  process.on("SIGTERM", function () { shutdown("SIGTERM"); });
  process.on("SIGINT", function () { shutdown("SIGINT"); });
  process.on("message", function (msg) {
    if (msg.type === "shutdown") {
      shutdown("master-requested");
    }
  });
}

app.js:

var express = require("express");
var session = require("express-session");
var RedisStore = require("connect-redis").default;
var Redis = require("ioredis");

function createApp() {
  var app = express();

  // Redis session store (shared across all workers)
  var redisClient = new Redis({
    host: process.env.REDIS_HOST || "127.0.0.1",
    port: parseInt(process.env.REDIS_PORT) || 6379,
    retryDelayOnFailover: 100,
    maxRetriesPerRequest: 3,
    lazyConnect: true
  });

  redisClient.on("error", function (err) {
    console.error("[WORKER " + process.pid + "] Redis error: " + err.message);
  });

  redisClient.connect().catch(function (err) {
    console.warn("[WORKER " + process.pid + "] Redis not available. Sessions will use memory store.");
  });

  var sessionConfig = {
    secret: process.env.SESSION_SECRET || "change-me-in-production",
    resave: false,
    saveUninitialized: false,
    cookie: {
      secure: process.env.NODE_ENV === "production",
      httpOnly: true,
      maxAge: 1000 * 60 * 60 * 24
    }
  };

  // Only use Redis store if Redis is available
  if (redisClient.status === "ready" || redisClient.status === "connecting") {
    sessionConfig.store = new RedisStore({ client: redisClient });
  }

  app.use(session(sessionConfig));
  app.use(express.json());

  // Routes
  app.get("/", function (req, res) {
    res.json({
      message: "Clustered Express.js Server",
      worker: process.pid,
      uptime: Math.round(process.uptime()) + "s",
      memory: Math.round(process.memoryUsage().rss / 1024 / 1024) + "MB"
    });
  });

  app.get("/health", function (req, res) {
    res.json({
      status: "ok",
      worker: process.pid,
      uptime: process.uptime(),
      memory: process.memoryUsage()
    });
  });

  app.post("/login", function (req, res) {
    req.session.user = { id: "user_" + Date.now(), role: "admin" };
    res.json({ message: "Logged in", worker: process.pid, session: req.session.id });
  });

  app.get("/session", function (req, res) {
    res.json({
      worker: process.pid,
      sessionId: req.session.id,
      user: req.session.user || null
    });
  });

  app.get("/heavy", function (req, res) {
    var n = parseInt(req.query.n) || 1000000;
    var sum = 0;
    for (var i = 0; i < n; i++) {
      sum += Math.sqrt(i);
    }
    res.json({ result: sum, worker: process.pid });
  });

  return app;
}

module.exports = createApp;

Run it:

# Start the clustered server
WORKERS=4 node server.js

# In another terminal, load test it
autocannon -c 200 -d 10 http://localhost:3000/

# Test session persistence across workers
curl -c cookies.txt http://localhost:3000/login -X POST
curl -b cookies.txt http://localhost:3000/session
curl -b cookies.txt http://localhost:3000/session
curl -b cookies.txt http://localhost:3000/session
# Each request may hit a different worker, but session data is consistent

# Trigger zero-downtime restart (Linux/macOS)
kill -SIGUSR2 $(pgrep -f "node server.js" | head -1)

Common Issues and Troubleshooting

1. Port Already in Use After Worker Crash

Error: listen EADDRINUSE: address already in use :::3000
    at Server.setupListenHandle [as _listen2] (net.js:1334:16)

This happens when a crashed worker did not release its socket. The master still has the listening socket, but the replacement worker tries to bind independently. Fix: make sure workers use the shared master socket (the default cluster behavior). If you see this, you have likely called server.listen(PORT) in both master and worker code paths.

2. Worker Exit Code 12 (Insufficient Resources)

[CLUSTER] Worker 15234 crashed (code: 12, signal: null). Spawning replacement.
[CLUSTER] Worker 15240 crashed (code: 12, signal: null). Spawning replacement.

Exit code 12 means the worker could not allocate resources — typically memory. This creates a crash loop where the master endlessly respawns workers that immediately die. Add backoff logic:

var restartCount = {};
var MAX_RESTARTS = 5;
var RESTART_WINDOW = 60000;

cluster.on("exit", function (worker, code, signal) {
  var now = Date.now();
  var pid = worker.process.pid;

  if (!restartCount[pid]) {
    restartCount[pid] = [];
  }

  restartCount[pid].push(now);

  // Remove restarts outside the window
  restartCount[pid] = restartCount[pid].filter(function (t) {
    return now - t < RESTART_WINDOW;
  });

  if (restartCount[pid].length >= MAX_RESTARTS) {
    console.error("[CLUSTER] Worker crash loop detected. " + MAX_RESTARTS + " crashes in " + (RESTART_WINDOW / 1000) + "s. Not restarting.");
    return;
  }

  cluster.fork();
});

3. SIGTERM Not Reaching Workers in Docker

npm ERR! code ELIFECYCLE
npm ERR! errno 137

Exit code 137 means the process was killed by SIGKILL (128 + 9). This happens when Docker sends SIGTERM, your Node.js master does not forward it to workers, and after 10 seconds Docker sends SIGKILL. Fix: use tini as PID 1, or explicitly forward signals in the master process.

# Bad - npm swallows signals
CMD ["npm", "start"]

# Good - tini forwards signals correctly
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["node", "server.js"]

4. Sessions Lost After Restart

GET /profile 401 Unauthorized
# User was logged in, but session disappeared after worker restart

This happens when you use the default in-memory session store with clustering. Each worker has its own MemoryStore, so a session created on worker A does not exist on worker B. Even worse, when a worker restarts, all its sessions are gone. The fix is always Redis-backed sessions in clustered environments, as shown in the complete example above.

5. os.cpus().length Returns Host CPU Count in Containers

[CLUSTER] Starting 64 workers...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory

Inside a Docker container limited to 2 CPUs, os.cpus().length still returns the host machine's CPU count. Spawning 64 workers in a container with 1GB of memory exhausts it immediately. Always use an environment variable to control worker count in containerized deployments.

Best Practices

Always set an explicit worker count in containers. Never rely on os.cpus().length inside Docker or Kubernetes. Use an environment variable like WORKERS=2 and fall back to os.cpus().length only for bare-metal deployments.
Externalize all shared state to Redis or a database. Do not use in-memory caches, in-memory sessions, or module-level variables for state that must be shared across workers. Each worker is a separate process with its own memory space. In-memory state will be inconsistent across workers and lost on worker restart.
Implement graceful shutdown with a timeout. Workers must stop accepting new connections, drain in-flight requests, close database pools, and then exit. Always set a hard timeout (10-15 seconds) after which the worker force-exits, or orchestrators will SIGKILL you.
Add crash loop detection. A worker that crashes on startup will be restarted by the master, crash again, and create an infinite fork-crash loop that consumes all system resources. Track restart frequency and stop respawning after a threshold (e.g., 5 crashes in 60 seconds).
Use worker threads for CPU-bound tasks, not more cluster workers. If your workload mixes I/O (database queries, API calls) with CPU-bound operations (image processing, encryption, heavy computation), use the cluster module for I/O scaling and worker threads for offloading CPU tasks. Over-provisioning cluster workers beyond your core count causes context switching overhead that hurts performance.
Monitor worker health actively, not just process status. A worker process can be alive but unhealthy — stuck in a long synchronous operation, leaking memory, or experiencing event loop starvation. Send periodic heartbeats from workers to the master and kill unresponsive workers after a timeout.
Set --max-old-space-size per worker. Each worker gets its own V8 heap. On a machine with 4GB of RAM and 4 workers, setting --max-old-space-size=768 per worker gives each worker 768MB of heap and leaves headroom for the OS and non-heap memory.
Prefer single-process-per-container in Kubernetes. Kubernetes handles horizontal scaling, restarts, and health checks. Running a cluster inside a container makes resource limits unpredictable and duplicates orchestration logic. If you need more throughput, increase the replica count in your Deployment, not the worker count inside the container.
Test with realistic load before deploying cluster changes. Use autocannon or wrk to benchmark before and after clustering. Measure latency percentiles (p99, p95), not just average throughput. A misconfigured cluster can actually perform worse than a single process due to IPC overhead and memory pressure.

Node.js Clustering for Multi-Core Systems

Node.js Clustering for Multi-Core Systems

Prerequisites

Why Node.js Is Single-Threaded

The Cluster Module Basics

Building a Clustered Express.js Server

Graceful Restart and Zero-Downtime Reloads

Sticky Sessions for Stateful Connections

Shared State with Redis

Worker Communication via IPC

Monitoring Worker Health and Auto-Restart

Cluster Module vs PM2 Cluster Mode

Cluster vs Worker Threads

Load Testing Clustered Apps with Autocannon

Production Deployment Considerations

How Many Workers?

Graceful Shutdown

Docker Considerations

Complete Working Example

Common Issues and Troubleshooting

1. Port Already in Use After Worker Crash

2. Worker Exit Code 12 (Insufficient Resources)

3. SIGTERM Not Reaching Workers in Docker

4. Sessions Lost After Restart

5. os.cpus().length Returns Host CPU Count in Containers

Best Practices

References

Quick Links

Recommended Reading

Retrieval Augmented Generation with Node.js

Need Expert Help?