Stop PM2 Crash Loops: The Unstable Restarts Fix That Actually Works
You deploy your Node.js app, run
pm2 start, and within seconds your terminal fills with this:
[PM2] Script app.js had too many unstable restarts (16). Stopped. "errored"
PM2 just gave up on your application. It tried to restart it, watched it crash repeatedly, and decided to stop trying rather than hammer your server in an infinite loop. This is PM2 doing exactly what it should do. The problem is not PM2. The problem is that your application is dying faster than PM2 considers "stable."
This article breaks down the exact mechanism behind unstable restarts, walks through a systematic diagnosis workflow, and shows you how to configure PM2 to handle crash scenarios intelligently.
What "Unstable Restart" Actually Means
PM2 tracks two numbers for every managed process: how long it has been running since its last start, and how many times it has restarted in a row without staying up. An "unstable restart" happens when your process crashes before reaching the min_uptime threshold.
Here is the logic:
- PM2 starts your process
- Your process crashes (exits with any code)
- PM2 checks: did the process run for at least
min_uptimemilliseconds? - If no, PM2 increments the unstable restart counter
- If yes, PM2 resets the counter to zero (the process was "stable" before it died)
- If the unstable restart counter hits
max_restarts, PM2 stops the process and marks iterrored
The defaults tell the story:
min_uptime: 1000 // 1 second
max_restarts: 16 // 16 consecutive unstable restarts
With these defaults, your process has to crash 16 times in a row, each time within 1 second of starting, before PM2 gives up. If your app runs for 2 seconds before crashing, the counter resets every time and PM2 will restart it forever. The "unstable" designation specifically targets processes that cannot even start successfully.
Reading the Error Message
The error message contains useful information if you know how to parse it:
[PM2] Script app.js had too many unstable restarts (16). Stopped. "errored"
The number in parentheses is your max_restarts value (default 16). The "errored" status means PM2 has given up and will not attempt further restarts until you intervene manually.
Check the current state of your processes:
pm2 list
┌────┬──────────┬─────────┬─────────┬──────────┬────────┬──────┬───────────┐
│ id │ name │ mode │ pid │ uptime │ ↺ │ cpu │ status │
├────┼──────────┼─────────┼─────────┼──────────┼────────┼──────┼───────────┤
│ 0 │ my-api │ cluster │ 0 │ 0 │ 16 │ 0% │ errored │
└────┘──────────┴─────────┴─────────┴──────────┴────────┴──────┴───────────┘
The ↺ column shows the restart count. The status column reads errored. The pid is 0 because no process is running. The uptime is 0 because PM2 stopped trying.
Step-by-Step Diagnosis
When you hit "too many unstable restarts," resist the urge to adjust PM2 configuration first. The configuration is not the problem. Your application is crashing on startup, and you need to find out why.
Step 1: Check the Error Logs
pm2 logs my-api --lines 100 --err
This shows the last 100 lines of stderr output. Most startup crashes leave a stack trace here. Common things you will see:
Error: Cannot find module './config/database'
Error: listen EADDRINUSE: address already in use :::3000
TypeError: Cannot read properties of undefined (reading 'DB_HOST')
If pm2 logs shows nothing useful, your app might be crashing before it writes any output. Try running it directly:
node app.js
If it crashes with node directly, the problem is your code or environment, not PM2.
Step 2: Check Exit Codes
pm2 describe my-api
Look for the exit code field in the output. Common exit codes and what they mean:
| Exit Code | Meaning |
|---|---|
| 0 | Clean exit (script finished, not a crash) |
| 1 | Uncaught exception or general error |
| 2 | Misuse of shell command |
| 8 | Uncaught exception (legacy Node.js) |
| 9 | Invalid argument |
| 137 | Killed by SIGKILL (OOM killer or kill -9) |
| 139 | Segmentation fault |
Exit code 0 is a special case. Your script ran to completion and exited normally. PM2 sees this as a crash because it expects long-running processes. If your script is not a server (it is a one-time job, a migration, or a build step), PM2 will restart it every time it finishes, burning through max_restarts rapidly. The fix for this case is autorestart: false in your ecosystem file, or use --no-autorestart on the command line.
Step 3: Check Environment Variables
Missing environment variables are the single most common cause of startup crashes that produce confusing error messages. Your app works when you run node app.js because your shell has the variables loaded. PM2 does not inherit your shell environment the same way, especially when started via a startup script.
pm2 env 0
This dumps every environment variable PM2 is passing to process ID 0. Look for missing DATABASE_URL, PORT, NODE_ENV, API_KEY, or whatever your app requires.
If variables are missing, define them in your ecosystem file:
module.exports = {
apps: [{
name: "my-api",
script: "./app.js",
env_production: {
NODE_ENV: "production",
PORT: 8080,
DATABASE_URL: "postgresql://user:pass@localhost:5432/mydb"
}
}]
};
Then start with the correct environment:
pm2 delete my-api
pm2 start ecosystem.config.js --env production
Step 4: Check for Port Conflicts
If your app binds to a port and another process is already using it, the app crashes instantly on startup:
lsof -i :3000
If something is already bound to your port, either stop that process or change your app's port. A common trap: a previous PM2-managed instance is still running on the same port after a failed deploy. Run pm2 delete all and start fresh.
Step 5: Check for Missing Dependencies
A missing node_modules directory or a failed npm install produces instant crashes:
ls node_modules/.package-lock.json
If node_modules is missing or incomplete:
rm -rf node_modules package-lock.json
npm install
Then try starting again.
Step 6: Check Working Directory
PM2 resolves the script path relative to where you run the command, not where the script lives. If you start PM2 from /home/deploy but your app lives in /home/deploy/my-api, PM2 may not find your script or your app may not find its config files.
Set cwd explicitly in your ecosystem file:
{
name: "my-api",
script: "./app.js",
cwd: "/home/deploy/my-api"
}
The Six Most Common Root Causes
After debugging dozens of "too many unstable restarts" incidents on production servers, these are the causes I see over and over, ranked by frequency.
1. Missing or Wrong Environment Variables
Your app expects DATABASE_URL or REDIS_URL and the variable is not set in PM2's environment. The app throws on the first database query (or on startup if you validate config eagerly) and exits. This is especially common when using pm2 startup and pm2 save, because the startup script runs in a clean environment without your .bashrc or .env file loaded.
Fix: Define all required variables in env_production inside your ecosystem file. Never rely on shell-level .env files for PM2 processes.
2. Port Already in Use
Another process (or a zombie from a previous deploy) holds the port. Your app calls server.listen(), gets EADDRINUSE, and dies.
Fix: lsof -i :PORT to find the culprit. Kill it or change your port.
3. Script Exits Cleanly (Exit Code 0)
Your script is not a long-running server. It runs, does its work, exits with code 0, and PM2 restarts it because PM2 expects processes to stay alive. Each clean exit counts as unstable because it happens in under 1 second.
Fix: Use autorestart: false or --no-autorestart for scripts that are supposed to exit.
4. Module Not Found or Syntax Error
A missing dependency, a typo in a require() path, or a syntax error in your code. The process starts, hits the bad line, crashes. Node.js loads modules synchronously at startup, so this happens within milliseconds.
Fix: Run node app.js directly to see the full error. Fix the import, install the missing module, or correct the syntax.
5. Database Connection Failure on Startup
Your app tries to connect to a database during initialization, the connection fails (wrong host, wrong credentials, database not running), and the app exits. If you have eager connection validation, this crashes the process before it even starts listening.
Fix: Make sure your database is reachable. Consider making the database connection non-blocking so the app can start and retry the connection, rather than crashing.
6. File Permission Issues
PM2 cannot write to log files, the script is not executable, or the app cannot read config files. This is common on servers where the deploy user and the PM2 startup user are different.
Fix: Check ownership and permissions on your app directory, log directory, and any config files. ls -la is your friend.
Configuring PM2 to Handle Crashes Better
Once you have fixed the root cause, configure PM2 to be smarter about restart behavior. These ecosystem settings make a real difference in production.
Tune minuptime and maxrestarts
{
name: "my-api",
script: "./app.js",
min_uptime: "10s",
max_restarts: 10
}
Setting min_uptime to 10 seconds means PM2 only counts a restart as "unstable" if the process dies within the first 10 seconds. If your app runs for 30 seconds before crashing, PM2 will restart it indefinitely (which is usually what you want for transient errors). Setting max_restarts to 10 reduces the crash loop window from 16 attempts to 10.
Add a Restart Delay
{
restart_delay: 4000
}
This puts a 4-second pause between restarts. Without it, PM2 restarts your app immediately after each crash, which can hammer a database server or external API that is already under load. The delay gives downstream services time to recover.
Use Exponential Backoff
{
exp_backoff_restart_delay: 100
}
This is the smarter version of restart_delay. Instead of a fixed delay, PM2 increases the wait time exponentially: 100ms, 150ms, 225ms, and so on, up to a maximum of 15 seconds. When the app finally stays up for more than 30 seconds, the delay resets to zero.
This is the best option for applications that crash due to external dependency failures (database down, API rate limited, DNS timeout). The backoff reduces pressure on the failing service while still attempting recovery.
Start it from the command line:
pm2 start app.js --exp-backoff-restart-delay=100
Or in the ecosystem file:
module.exports = {
apps: [{
name: "my-api",
script: "./app.js",
instances: "max",
exec_mode: "cluster",
exp_backoff_restart_delay: 100,
max_memory_restart: "500M",
min_uptime: "10s",
max_restarts: 10,
kill_timeout: 5000,
listen_timeout: 8000
}]
};
Set a Memory Ceiling
{
max_memory_restart: "500M"
}
This is not directly related to unstable restarts, but it prevents a different kind of crash loop. Memory leaks cause your process to grow until the OS kills it with SIGKILL (exit code 137). PM2's memory restart triggers a graceful restart before the OS gets involved.
Recovering from the "errored" State
Once PM2 marks your process as errored, you have to explicitly tell it to try again. PM2 will not auto-recover from this state.
Option 1: Delete and restart (cleanest approach):
pm2 delete my-api
pm2 start ecosystem.config.js --env production
Option 2: Restart the errored process:
pm2 restart my-api
This resets the unstable restart counter and tries again. If the underlying problem is not fixed, you will hit the same wall.
Option 3: Reset the restart counter without deleting:
pm2 reset my-api
pm2 restart my-api
The reset command zeros out all counters (restarts, uptime) for the process.
After recovery, verify:
pm2 list
pm2 logs my-api --lines 20
Make sure the status shows online and the restart counter is not climbing.
Preventing Unstable Restarts in Your Application Code
The best defense against crash loops is an application that handles startup failures gracefully.
Validate Configuration Before Starting
var requiredVars = ["DATABASE_URL", "PORT", "SESSION_SECRET"];
var missing = requiredVars.filter(function(v) {
return !process.env[v];
});
if (missing.length > 0) {
console.error("Missing required environment variables: " + missing.join(", "));
console.error("Set them in ecosystem.config.js env_production block");
process.exit(1);
}
This produces a clear error message instead of a cryptic crash trace three layers deep in a database driver.
Handle the Graceful Shutdown Signal
var server = app.listen(PORT, function() {
console.log("Worker " + process.pid + " listening on port " + PORT);
// Tell PM2 this process is ready
if (process.send) {
process.send("ready");
}
});
process.on("SIGINT", function() {
console.log("Worker " + process.pid + " shutting down...");
server.close(function() {
console.log("Worker " + process.pid + " closed all connections");
process.exit(0);
});
// Force shutdown after timeout
setTimeout(function() {
console.error("Worker " + process.pid + " forced exit after timeout");
process.exit(1);
}, 10000);
});
Combined with wait_ready: true and listen_timeout: 10000 in your ecosystem file, this ensures PM2 does not route traffic to a process that has not finished initializing, and does not kill a process that is in the middle of draining connections.
Catch Unhandled Rejections
Unhandled promise rejections terminate the process in Node.js v16+. Catch them to at least get a useful log entry before the crash:
process.on("unhandledRejection", function(reason, promise) {
console.error("Unhandled Rejection at:", promise, "reason:", reason);
// Let the process crash — PM2 will restart it
// But now you have a log entry to debug
process.exit(1);
});
Quick Reference: Ecosystem Config for Crash Resilience
Here is a production ecosystem file with every restart-related setting annotated:
module.exports = {
apps: [{
name: "my-api",
script: "./app.js",
cwd: "/home/deploy/my-api",
// Cluster across all cores
instances: "max",
exec_mode: "cluster",
// Restart behavior
min_uptime: "10s", // Process must run 10s to count as "stable"
max_restarts: 10, // Give up after 10 unstable restarts
restart_delay: 4000, // Wait 4s between restarts
exp_backoff_restart_delay: 100, // Or use exponential backoff (overrides restart_delay)
autorestart: true, // Set to false for one-shot scripts
// Graceful shutdown
kill_timeout: 15000, // 15s to shut down before SIGKILL
listen_timeout: 10000, // 10s to start before considered failed
wait_ready: true, // Wait for process.send('ready') signal
shutdown_with_message: false,
// Memory safety net
max_memory_restart: "500M",
node_args: "--max-old-space-size=512",
// Logging
error_file: "./logs/error.log",
out_file: "./logs/out.log",
merge_logs: true,
log_date_format: "YYYY-MM-DD HH:mm:ss.SSS",
// Environment
env_production: {
NODE_ENV: "production",
PORT: 8080
}
}]
};
Start it:
pm2 delete my-api
pm2 start ecosystem.config.js --env production
pm2 save
When Unstable Restarts Are Actually Correct Behavior
Not every "too many unstable restarts" is a bug. PM2 is doing the right thing in these scenarios:
Your script is a one-time job. Migration scripts, build scripts, and cron-like tasks exit after completing. Use autorestart: false or run them with pm2 start script.js --no-autorestart.
Your server is out of resources. If the OS is killing your process with SIGKILL because it ran out of memory or hit a process limit, PM2 restarts will keep failing. Fix the resource constraint first.
An upstream service is down. If your app requires a database connection on startup and the database is unreachable, every restart attempt will fail. The exponential backoff strategy handles this gracefully. The fixed restart_delay approach also works but is less adaptive.
The goal is never to hide crashes. The goal is to distinguish between transient failures that PM2 should retry and fundamental startup errors that require human intervention. min_uptime, max_restarts, and exp_backoff_restart_delay are the three knobs that draw that line.