Node.js

Stop PM2 Crash Loops: The Unstable Restarts Fix That Actually Works

PM@ Unstable Restarts You deploy your Node.js app, run pm2 start, and within seconds your terminal fills with this:

[PM2] Script app.js had too many unstable restarts (16). Stopped. "errored"

PM2 just gave up on your application. It tried to restart it, watched it crash repeatedly, and decided to stop trying rather than hammer your server in an infinite loop. This is PM2 doing exactly what it should do. The problem is not PM2. The problem is that your application is dying faster than PM2 considers "stable."

The Fundamentals of Training an LLM: A Python & PyTorch Guide

The Fundamentals of Training an LLM: A Python & PyTorch Guide

Build a GPT-style transformer from scratch in Python. Learn how LLMs actually work through hands-on code. No ML experience required.

Learn More

This article breaks down the exact mechanism behind unstable restarts, walks through a systematic diagnosis workflow, and shows you how to configure PM2 to handle crash scenarios intelligently.

What "Unstable Restart" Actually Means

PM2 tracks two numbers for every managed process: how long it has been running since its last start, and how many times it has restarted in a row without staying up. An "unstable restart" happens when your process crashes before reaching the min_uptime threshold.

Here is the logic:

  1. PM2 starts your process
  2. Your process crashes (exits with any code)
  3. PM2 checks: did the process run for at least min_uptime milliseconds?
  4. If no, PM2 increments the unstable restart counter
  5. If yes, PM2 resets the counter to zero (the process was "stable" before it died)
  6. If the unstable restart counter hits max_restarts, PM2 stops the process and marks it errored

The defaults tell the story:

min_uptime: 1000    // 1 second
max_restarts: 16    // 16 consecutive unstable restarts

With these defaults, your process has to crash 16 times in a row, each time within 1 second of starting, before PM2 gives up. If your app runs for 2 seconds before crashing, the counter resets every time and PM2 will restart it forever. The "unstable" designation specifically targets processes that cannot even start successfully.

Reading the Error Message

The error message contains useful information if you know how to parse it:

[PM2] Script app.js had too many unstable restarts (16). Stopped. "errored"

The number in parentheses is your max_restarts value (default 16). The "errored" status means PM2 has given up and will not attempt further restarts until you intervene manually.

Check the current state of your processes:

pm2 list
┌────┬──────────┬─────────┬─────────┬──────────┬────────┬──────┬───────────┐
│ id │ name     │ mode    │ pid     │ uptime   │ ↺      │ cpu  │ status    │
├────┼──────────┼─────────┼─────────┼──────────┼────────┼──────┼───────────┤
│ 0  │ my-api   │ cluster │ 0       │ 0        │ 16     │ 0%   │ errored   │
└────┘──────────┴─────────┴─────────┴──────────┴────────┴──────┴───────────┘

The column shows the restart count. The status column reads errored. The pid is 0 because no process is running. The uptime is 0 because PM2 stopped trying.

Step-by-Step Diagnosis

When you hit "too many unstable restarts," resist the urge to adjust PM2 configuration first. The configuration is not the problem. Your application is crashing on startup, and you need to find out why.

Step 1: Check the Error Logs

pm2 logs my-api --lines 100 --err

This shows the last 100 lines of stderr output. Most startup crashes leave a stack trace here. Common things you will see:

Error: Cannot find module './config/database'
Error: listen EADDRINUSE: address already in use :::3000
TypeError: Cannot read properties of undefined (reading 'DB_HOST')

If pm2 logs shows nothing useful, your app might be crashing before it writes any output. Try running it directly:

node app.js

If it crashes with node directly, the problem is your code or environment, not PM2.

Step 2: Check Exit Codes

pm2 describe my-api

Look for the exit code field in the output. Common exit codes and what they mean:

Exit Code Meaning
0 Clean exit (script finished, not a crash)
1 Uncaught exception or general error
2 Misuse of shell command
8 Uncaught exception (legacy Node.js)
9 Invalid argument
137 Killed by SIGKILL (OOM killer or kill -9)
139 Segmentation fault

Exit code 0 is a special case. Your script ran to completion and exited normally. PM2 sees this as a crash because it expects long-running processes. If your script is not a server (it is a one-time job, a migration, or a build step), PM2 will restart it every time it finishes, burning through max_restarts rapidly. The fix for this case is autorestart: false in your ecosystem file, or use --no-autorestart on the command line.

Step 3: Check Environment Variables

Missing environment variables are the single most common cause of startup crashes that produce confusing error messages. Your app works when you run node app.js because your shell has the variables loaded. PM2 does not inherit your shell environment the same way, especially when started via a startup script.

pm2 env 0

This dumps every environment variable PM2 is passing to process ID 0. Look for missing DATABASE_URL, PORT, NODE_ENV, API_KEY, or whatever your app requires.

If variables are missing, define them in your ecosystem file:

module.exports = {
  apps: [{
    name: "my-api",
    script: "./app.js",
    env_production: {
      NODE_ENV: "production",
      PORT: 8080,
      DATABASE_URL: "postgresql://user:pass@localhost:5432/mydb"
    }
  }]
};

Then start with the correct environment:

pm2 delete my-api
pm2 start ecosystem.config.js --env production

Step 4: Check for Port Conflicts

If your app binds to a port and another process is already using it, the app crashes instantly on startup:

lsof -i :3000

If something is already bound to your port, either stop that process or change your app's port. A common trap: a previous PM2-managed instance is still running on the same port after a failed deploy. Run pm2 delete all and start fresh.

Step 5: Check for Missing Dependencies

A missing node_modules directory or a failed npm install produces instant crashes:

ls node_modules/.package-lock.json

If node_modules is missing or incomplete:

rm -rf node_modules package-lock.json
npm install

Then try starting again.

Step 6: Check Working Directory

PM2 resolves the script path relative to where you run the command, not where the script lives. If you start PM2 from /home/deploy but your app lives in /home/deploy/my-api, PM2 may not find your script or your app may not find its config files.

Set cwd explicitly in your ecosystem file:

{
  name: "my-api",
  script: "./app.js",
  cwd: "/home/deploy/my-api"
}

The Six Most Common Root Causes

After debugging dozens of "too many unstable restarts" incidents on production servers, these are the causes I see over and over, ranked by frequency.

1. Missing or Wrong Environment Variables

Your app expects DATABASE_URL or REDIS_URL and the variable is not set in PM2's environment. The app throws on the first database query (or on startup if you validate config eagerly) and exits. This is especially common when using pm2 startup and pm2 save, because the startup script runs in a clean environment without your .bashrc or .env file loaded.

Fix: Define all required variables in env_production inside your ecosystem file. Never rely on shell-level .env files for PM2 processes.

2. Port Already in Use

Another process (or a zombie from a previous deploy) holds the port. Your app calls server.listen(), gets EADDRINUSE, and dies.

Fix: lsof -i :PORT to find the culprit. Kill it or change your port.

3. Script Exits Cleanly (Exit Code 0)

Your script is not a long-running server. It runs, does its work, exits with code 0, and PM2 restarts it because PM2 expects processes to stay alive. Each clean exit counts as unstable because it happens in under 1 second.

Fix: Use autorestart: false or --no-autorestart for scripts that are supposed to exit.

4. Module Not Found or Syntax Error

A missing dependency, a typo in a require() path, or a syntax error in your code. The process starts, hits the bad line, crashes. Node.js loads modules synchronously at startup, so this happens within milliseconds.

Fix: Run node app.js directly to see the full error. Fix the import, install the missing module, or correct the syntax.

5. Database Connection Failure on Startup

Your app tries to connect to a database during initialization, the connection fails (wrong host, wrong credentials, database not running), and the app exits. If you have eager connection validation, this crashes the process before it even starts listening.

Fix: Make sure your database is reachable. Consider making the database connection non-blocking so the app can start and retry the connection, rather than crashing.

6. File Permission Issues

PM2 cannot write to log files, the script is not executable, or the app cannot read config files. This is common on servers where the deploy user and the PM2 startup user are different.

Fix: Check ownership and permissions on your app directory, log directory, and any config files. ls -la is your friend.

Configuring PM2 to Handle Crashes Better

Once you have fixed the root cause, configure PM2 to be smarter about restart behavior. These ecosystem settings make a real difference in production.

Tune minuptime and maxrestarts

{
  name: "my-api",
  script: "./app.js",
  min_uptime: "10s",
  max_restarts: 10
}

Setting min_uptime to 10 seconds means PM2 only counts a restart as "unstable" if the process dies within the first 10 seconds. If your app runs for 30 seconds before crashing, PM2 will restart it indefinitely (which is usually what you want for transient errors). Setting max_restarts to 10 reduces the crash loop window from 16 attempts to 10.

Add a Restart Delay

{
  restart_delay: 4000
}

This puts a 4-second pause between restarts. Without it, PM2 restarts your app immediately after each crash, which can hammer a database server or external API that is already under load. The delay gives downstream services time to recover.

Use Exponential Backoff

{
  exp_backoff_restart_delay: 100
}

This is the smarter version of restart_delay. Instead of a fixed delay, PM2 increases the wait time exponentially: 100ms, 150ms, 225ms, and so on, up to a maximum of 15 seconds. When the app finally stays up for more than 30 seconds, the delay resets to zero.

This is the best option for applications that crash due to external dependency failures (database down, API rate limited, DNS timeout). The backoff reduces pressure on the failing service while still attempting recovery.

Start it from the command line:

pm2 start app.js --exp-backoff-restart-delay=100

Or in the ecosystem file:

module.exports = {
  apps: [{
    name: "my-api",
    script: "./app.js",
    instances: "max",
    exec_mode: "cluster",
    exp_backoff_restart_delay: 100,
    max_memory_restart: "500M",
    min_uptime: "10s",
    max_restarts: 10,
    kill_timeout: 5000,
    listen_timeout: 8000
  }]
};

Set a Memory Ceiling

{
  max_memory_restart: "500M"
}

This is not directly related to unstable restarts, but it prevents a different kind of crash loop. Memory leaks cause your process to grow until the OS kills it with SIGKILL (exit code 137). PM2's memory restart triggers a graceful restart before the OS gets involved.

Recovering from the "errored" State

Once PM2 marks your process as errored, you have to explicitly tell it to try again. PM2 will not auto-recover from this state.

Option 1: Delete and restart (cleanest approach):

pm2 delete my-api
pm2 start ecosystem.config.js --env production

Option 2: Restart the errored process:

pm2 restart my-api

This resets the unstable restart counter and tries again. If the underlying problem is not fixed, you will hit the same wall.

Option 3: Reset the restart counter without deleting:

pm2 reset my-api
pm2 restart my-api

The reset command zeros out all counters (restarts, uptime) for the process.

After recovery, verify:

pm2 list
pm2 logs my-api --lines 20

Make sure the status shows online and the restart counter is not climbing.

Preventing Unstable Restarts in Your Application Code

The best defense against crash loops is an application that handles startup failures gracefully.

Validate Configuration Before Starting

var requiredVars = ["DATABASE_URL", "PORT", "SESSION_SECRET"];
var missing = requiredVars.filter(function(v) {
  return !process.env[v];
});

if (missing.length > 0) {
  console.error("Missing required environment variables: " + missing.join(", "));
  console.error("Set them in ecosystem.config.js env_production block");
  process.exit(1);
}

This produces a clear error message instead of a cryptic crash trace three layers deep in a database driver.

Handle the Graceful Shutdown Signal

var server = app.listen(PORT, function() {
  console.log("Worker " + process.pid + " listening on port " + PORT);

  // Tell PM2 this process is ready
  if (process.send) {
    process.send("ready");
  }
});

process.on("SIGINT", function() {
  console.log("Worker " + process.pid + " shutting down...");

  server.close(function() {
    console.log("Worker " + process.pid + " closed all connections");
    process.exit(0);
  });

  // Force shutdown after timeout
  setTimeout(function() {
    console.error("Worker " + process.pid + " forced exit after timeout");
    process.exit(1);
  }, 10000);
});

Combined with wait_ready: true and listen_timeout: 10000 in your ecosystem file, this ensures PM2 does not route traffic to a process that has not finished initializing, and does not kill a process that is in the middle of draining connections.

Catch Unhandled Rejections

Unhandled promise rejections terminate the process in Node.js v16+. Catch them to at least get a useful log entry before the crash:

process.on("unhandledRejection", function(reason, promise) {
  console.error("Unhandled Rejection at:", promise, "reason:", reason);
  // Let the process crash — PM2 will restart it
  // But now you have a log entry to debug
  process.exit(1);
});

Quick Reference: Ecosystem Config for Crash Resilience

Here is a production ecosystem file with every restart-related setting annotated:

module.exports = {
  apps: [{
    name: "my-api",
    script: "./app.js",
    cwd: "/home/deploy/my-api",

    // Cluster across all cores
    instances: "max",
    exec_mode: "cluster",

    // Restart behavior
    min_uptime: "10s",           // Process must run 10s to count as "stable"
    max_restarts: 10,            // Give up after 10 unstable restarts
    restart_delay: 4000,         // Wait 4s between restarts
    exp_backoff_restart_delay: 100, // Or use exponential backoff (overrides restart_delay)
    autorestart: true,           // Set to false for one-shot scripts

    // Graceful shutdown
    kill_timeout: 15000,         // 15s to shut down before SIGKILL
    listen_timeout: 10000,       // 10s to start before considered failed
    wait_ready: true,            // Wait for process.send('ready') signal
    shutdown_with_message: false,

    // Memory safety net
    max_memory_restart: "500M",
    node_args: "--max-old-space-size=512",

    // Logging
    error_file: "./logs/error.log",
    out_file: "./logs/out.log",
    merge_logs: true,
    log_date_format: "YYYY-MM-DD HH:mm:ss.SSS",

    // Environment
    env_production: {
      NODE_ENV: "production",
      PORT: 8080
    }
  }]
};

Start it:

pm2 delete my-api
pm2 start ecosystem.config.js --env production
pm2 save

When Unstable Restarts Are Actually Correct Behavior

Not every "too many unstable restarts" is a bug. PM2 is doing the right thing in these scenarios:

Your script is a one-time job. Migration scripts, build scripts, and cron-like tasks exit after completing. Use autorestart: false or run them with pm2 start script.js --no-autorestart.

Your server is out of resources. If the OS is killing your process with SIGKILL because it ran out of memory or hit a process limit, PM2 restarts will keep failing. Fix the resource constraint first.

An upstream service is down. If your app requires a database connection on startup and the database is unreachable, every restart attempt will fail. The exponential backoff strategy handles this gracefully. The fixed restart_delay approach also works but is less adaptive.

The goal is never to hide crashes. The goal is to distinguish between transient failures that PM2 should retry and fundamental startup errors that require human intervention. min_uptime, max_restarts, and exp_backoff_restart_delay are the three knobs that draw that line.