Postgresql

PostgreSQL Performance Tuning for Application Developers

A developer-focused guide to PostgreSQL performance tuning, covering configuration settings, pg_stat_statements, vacuum tuning, partitioning, WAL optimization, and application-level batch operations.

PostgreSQL Performance Tuning for Application Developers

Overview

PostgreSQL ships with conservative default settings designed to run on minimal hardware. If you deploy an application against those defaults on a server with 16 GB of RAM and fast SSDs, you are leaving enormous performance on the table. This guide covers the configuration knobs, diagnostic tools, and application-level patterns that matter most for production Node.js workloads — the things I have tuned repeatedly across dozens of projects over the last decade.

Prerequisites

  • PostgreSQL 14 or later (15+ preferred for some features discussed here)
  • Node.js 18+ with the pg driver installed
  • Superuser or rds_superuser access to modify postgresql.conf (or parameter groups on managed services)
  • Basic SQL proficiency and familiarity with EXPLAIN output
  • A staging environment to test configuration changes before production

Key postgresql.conf Settings for Application Workloads

The default postgresql.conf is built for a machine with 512 MB of RAM. The first thing you should do after provisioning a database server is adjust these core memory and planner settings.

shared_buffers

This is PostgreSQL's internal page cache. The default is typically 128MB, which is absurdly low for any production workload.

# Check current value
psql -c "SHOW shared_buffers;"
#  shared_buffers
# ----------------
#  128MB

Rule of thumb: Set shared_buffers to 25% of total system RAM. On a 64 GB server, that is 16GB. On a 16 GB server, 4GB. Going beyond 25% rarely helps because PostgreSQL also relies on the operating system's filesystem cache.

# postgresql.conf
shared_buffers = 4GB

After changing shared_buffers, you must restart PostgreSQL — a reload is not sufficient.

work_mem

This controls how much memory a single sort or hash operation can use before spilling to disk. The default is 4MB. When you see Sort Method: external merge Disk in an EXPLAIN ANALYZE output, your work_mem is too low for that query.

# postgresql.conf
work_mem = 64MB

Be careful here. This is per operation, per connection. A complex query with five hash joins across 100 connections could theoretically allocate 5 * 100 * 64MB = 32GB. I usually set the global default conservatively — maybe 32MB to 64MB — and then raise it at the session level for specific batch operations:

SET work_mem = '256MB';
-- Run your heavy analytical query here
RESET work_mem;

effective_cache_size

This is not an allocation — it is a hint to the query planner about how much memory is available for caching between shared_buffers and the OS page cache. Set it to roughly 75% of total RAM.

# postgresql.conf
effective_cache_size = 12GB   -- on a 16GB server

When this value is too low, the planner avoids index scans in favor of sequential scans because it assumes disk access is expensive. Setting it correctly encourages the planner to use indexes.

maintenance_work_mem

Controls memory for maintenance operations: VACUUM, CREATE INDEX, ALTER TABLE ADD FOREIGN KEY. The default 64MB is painfully slow for vacuuming large tables.

# postgresql.conf
maintenance_work_mem = 1GB

On a 16 GB server I typically set this to 1GB. It only matters during maintenance operations, so there is no risk of sustained high memory use.

A Starter Configuration Block

Here is what I typically set on a dedicated 16 GB PostgreSQL server:

# postgresql.conf — 16GB dedicated database server
shared_buffers = 4GB
effective_cache_size = 12GB
work_mem = 64MB
maintenance_work_mem = 1GB
random_page_cost = 1.1          # SSD storage (default 4.0 assumes spinning disk)
effective_io_concurrency = 200  # SSD storage (default 1)
min_wal_size = 1GB
max_wal_size = 4GB
checkpoint_completion_target = 0.9
wal_buffers = 64MB
default_statistics_target = 100

The random_page_cost change is critical if you are running on SSDs. The default value of 4.0 tells the planner that random I/O is four times more expensive than sequential I/O — true for spinning disks, wildly wrong for NVMe. Setting it to 1.1 dramatically improves index usage on SSD-backed servers.

Connection Tuning

max_connections

The default is 100, and many teams crank this to 500 or 1000 thinking it will handle more traffic. It will not. Each PostgreSQL connection consumes roughly 5-10 MB of RAM and a kernel thread. At 500 connections on a 16 GB server, you are burning 2.5-5 GB just on connection overhead.

# postgresql.conf
max_connections = 100   # Keep it low

Instead of raising max_connections, use a connection pooler like PgBouncer:

# pgbouncer.ini
[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp

[pgbouncer]
listen_port = 6432
pool_mode = transaction
default_pool_size = 20
max_client_conn = 300

In transaction pooling mode, PgBouncer multiplexes 300 application connections across 20 real PostgreSQL connections. This is how you scale to thousands of concurrent users without melting your database.

statement_timeout and idle_in_transaction_session_timeout

These two settings prevent runaway queries and abandoned transactions from holding locks indefinitely.

# postgresql.conf
statement_timeout = 30000                        # 30 seconds
idle_in_transaction_session_timeout = 60000       # 60 seconds

A common production disaster: an application opens a transaction, does some work, then makes an HTTP call that hangs. The transaction holds locks for minutes or hours. idle_in_transaction_session_timeout kills these sessions automatically.

-- You can also set per-connection from your application
SET statement_timeout = '10s';

Query Performance Analysis with pg_stat_statements

pg_stat_statements is the single most valuable performance tool in PostgreSQL. It records execution statistics for every distinct query your application runs.

Enabling It

# postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000
pg_stat_statements.track = all

Restart PostgreSQL, then create the extension:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Finding Your Slowest Queries

-- Top 10 queries by total execution time
SELECT
    round(total_exec_time::numeric, 2) AS total_time_ms,
    calls,
    round(mean_exec_time::numeric, 2) AS mean_time_ms,
    round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS percent_total,
    query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

Typical output:

 total_time_ms |  calls  | mean_time_ms | percent_total |                        query
---------------+---------+--------------+---------------+-----------------------------------------------------
    4823017.32 |  182451 |        26.43 |         38.21 | SELECT * FROM orders WHERE customer_id = $1 AND ...
    1987234.11 |   94201 |        21.09 |         15.74 | UPDATE inventory SET quantity = quantity - $1 ...
     892341.55 | 1203847 |         0.74 |          7.07 | SELECT id, name FROM products WHERE id = $1
     654892.01 |    3201 |       204.59 |          5.19 | SELECT o.*, c.name FROM orders o JOIN customers ...

That first query is consuming 38% of your total database time. Fix that one query and you free up a third of your server capacity.

Finding I/O-Heavy Queries

-- Queries causing the most disk reads
SELECT
    query,
    calls,
    shared_blks_hit,
    shared_blks_read,
    round(100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0), 2) AS cache_hit_pct
FROM pg_stat_statements
WHERE shared_blks_read > 1000
ORDER BY shared_blks_read DESC
LIMIT 10;

If cache_hit_pct drops below 95% for a frequent query, you either need more shared_buffers, a better index, or you are scanning too much data.

Resetting Statistics

SELECT pg_stat_statements_reset();

I reset stats after every deployment or configuration change so I get clean measurements.

Identifying Slow Queries and Lock Contention

Finding Active Locks

-- Show all queries waiting on locks
SELECT
    blocked.pid AS blocked_pid,
    blocked.query AS blocked_query,
    blocking.pid AS blocking_pid,
    blocking.query AS blocking_query,
    now() - blocked.query_start AS blocked_duration
FROM pg_stat_activity blocked
JOIN pg_locks bl ON bl.pid = blocked.pid AND NOT bl.granted
JOIN pg_locks gl ON gl.locktype = bl.locktype
    AND gl.database IS NOT DISTINCT FROM bl.database
    AND gl.relation IS NOT DISTINCT FROM bl.relation
    AND gl.page IS NOT DISTINCT FROM bl.page
    AND gl.tuple IS NOT DISTINCT FROM bl.tuple
    AND gl.transactionid IS NOT DISTINCT FROM bl.transactionid
    AND gl.classid IS NOT DISTINCT FROM bl.classid
    AND gl.objid IS NOT DISTINCT FROM bl.objid
    AND gl.objsubid IS NOT DISTINCT FROM bl.objsubid
    AND gl.pid != bl.pid
    AND gl.granted
JOIN pg_stat_activity blocking ON blocking.pid = gl.pid
WHERE blocked.wait_event_type = 'Lock';

Killing Long-Running Queries

-- Cancel the query (graceful)
SELECT pg_cancel_backend(12345);

-- Terminate the connection (forceful)
SELECT pg_terminate_backend(12345);

Vacuum and Autovacuum Tuning

PostgreSQL uses MVCC (Multi-Version Concurrency Control). Every UPDATE and DELETE creates dead tuples — old row versions that are no longer visible to any transaction. VACUUM reclaims that space. If autovacuum falls behind, tables bloat, indexes grow, and queries slow to a crawl.

Checking Table Bloat

-- Dead tuples and last vacuum time
SELECT
    schemaname,
    relname,
    n_live_tup,
    n_dead_tup,
    round(100.0 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0), 2) AS dead_pct,
    last_vacuum,
    last_autovacuum,
    last_analyze,
    last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;

If dead_pct is above 10-20%, autovacuum is not keeping up.

Tuning Autovacuum

# postgresql.conf
autovacuum_max_workers = 4                  # default 3
autovacuum_naptime = 30s                    # default 1min — check more often
autovacuum_vacuum_threshold = 50            # default 50
autovacuum_vacuum_scale_factor = 0.05       # default 0.2 — vacuum at 5% dead rows
autovacuum_analyze_threshold = 50           # default 50
autovacuum_analyze_scale_factor = 0.02      # default 0.1
autovacuum_vacuum_cost_limit = 1000         # default 200 — let vacuum work faster

The autovacuum_vacuum_scale_factor is the key setting. The default 0.2 means autovacuum kicks in when 20% of a table's rows are dead. On a 100-million-row table, that is 20 million dead rows before vacuuming starts. By that point, the table is massively bloated.

For high-churn tables, set per-table overrides:

ALTER TABLE orders SET (
    autovacuum_vacuum_scale_factor = 0.01,
    autovacuum_analyze_scale_factor = 0.005,
    autovacuum_vacuum_cost_limit = 2000
);

Table Statistics and ANALYZE

The query planner makes decisions based on table statistics — row counts, value distributions, most common values, histogram buckets. Stale statistics lead to bad plans.

-- Force statistics refresh
ANALYZE orders;

-- Increase statistics granularity for a specific column
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 500;
ANALYZE orders;

-- Check current statistics
SELECT
    attname,
    n_distinct,
    most_common_vals,
    most_common_freqs
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'status';

The default_statistics_target (default 100) controls how many histogram buckets are computed. For columns with highly skewed distributions — like a status column where 90% of rows are 'completed' — increasing it to 500 or 1000 helps the planner generate better estimates.

Partitioning Strategies for Large Tables

When a table grows beyond tens of millions of rows, sequential operations become expensive even with proper indexing. Partitioning splits a table into smaller physical chunks while presenting a single logical table to your application.

Range Partitioning by Date

The most common pattern for time-series or event data:

CREATE TABLE events (
    id          BIGSERIAL,
    event_type  TEXT NOT NULL,
    payload     JSONB,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at);

-- Monthly partitions
CREATE TABLE events_2025_01 PARTITION OF events
    FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
CREATE TABLE events_2025_02 PARTITION OF events
    FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');
CREATE TABLE events_2025_03 PARTITION OF events
    FOR VALUES FROM ('2025-03-01') TO ('2025-04-01');

-- Indexes are created per partition
CREATE INDEX idx_events_2025_01_type ON events_2025_01 (event_type);
CREATE INDEX idx_events_2025_02_type ON events_2025_02 (event_type);
CREATE INDEX idx_events_2025_03_type ON events_2025_03 (event_type);

When you query with a WHERE created_at >= '2025-02-01' filter, PostgreSQL prunes irrelevant partitions entirely. Instead of scanning 500 million rows, it scans only the partitions that match.

Automating Partition Creation

-- Create a function to auto-create monthly partitions
CREATE OR REPLACE FUNCTION create_monthly_partition()
RETURNS TRIGGER AS $$
DECLARE
    partition_name TEXT;
    start_date DATE;
    end_date DATE;
BEGIN
    start_date := date_trunc('month', NEW.created_at);
    end_date := start_date + INTERVAL '1 month';
    partition_name := 'events_' || to_char(start_date, 'YYYY_MM');

    IF NOT EXISTS (
        SELECT 1 FROM pg_class WHERE relname = partition_name
    ) THEN
        EXECUTE format(
            'CREATE TABLE %I PARTITION OF events FOR VALUES FROM (%L) TO (%L)',
            partition_name, start_date, end_date
        );
        EXECUTE format(
            'CREATE INDEX %I ON %I (event_type)',
            'idx_' || partition_name || '_type', partition_name
        );
    END IF;

    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Dropping Old Data

One of the biggest wins with partitioning is dropping old data:

-- Instead of DELETE FROM events WHERE created_at < '2024-01-01' (which creates millions of dead tuples)
DROP TABLE events_2023_01;
DROP TABLE events_2023_02;
-- Instant, no vacuum needed

Using EXPLAIN ANALYZE Effectively

Everyone knows EXPLAIN ANALYZE. Most people read only the top-level timing. Here is how to actually use it.

Reading the Full Output

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'pending'
  AND o.created_at > NOW() - INTERVAL '7 days';

Output:

Nested Loop  (cost=0.85..1247.32 rows=42 width=52) (actual time=0.089..145.231 rows=8521 loops=1)
  Buffers: shared hit=24891 read=3201
  ->  Index Scan using idx_orders_status_created on orders o  (cost=0.43..892.10 rows=42 width=24) (actual time=0.051..98.432 rows=8521 loops=1)
        Index Cond: ((status = 'pending') AND (created_at > ...))
        Buffers: shared hit=18234 read=2890
  ->  Index Scan using customers_pkey on customers c  (cost=0.42..8.44 rows=1 width=36) (actual time=0.004..0.005 rows=1 loops=8521)
        Index Cond: (id = o.customer_id)
        Buffers: shared hit=6657 read=311
Planning Time: 0.412 ms
Execution Time: 147.891 ms

Key things to look for:

  1. rows=42 estimated vs rows=8521 actual — The planner estimated 42 rows but got 8,521. This 200x misestimate means stale statistics or missing stats. Run ANALYZE orders;.

  2. Buffers: shared read=3201 — Those 3,201 blocks came from disk. If this number is high relative to shared hit, you need more shared_buffers or a better index to reduce the scan width.

  3. loops=8521 — The inner index scan on customers executed 8,521 times. Each one is fast (0.005ms), but at scale this adds up.

Spotting Sequential Scans on Large Tables

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE customer_email = '[email protected]';

-- Bad output:
-- Seq Scan on orders  (cost=0.00..412893.00 rows=1 width=128) (actual time=2847.123..2847.124 rows=1 loops=1)
--   Filter: (customer_email = '[email protected]')
--   Rows Removed by Filter: 12000000
--   Buffers: shared hit=89012 read=123881

Scanning 12 million rows to find one. The fix:

CREATE INDEX CONCURRENTLY idx_orders_email ON orders (customer_email);

Always use CONCURRENTLY in production — it avoids locking the table during index creation.

The SETTINGS Option (PostgreSQL 15+)

EXPLAIN (ANALYZE, BUFFERS, SETTINGS)
SELECT ...

This shows non-default configuration settings that affected the plan, making it easier to understand why a plan was chosen.

Write-Ahead Log (WAL) Tuning for Write-Heavy Workloads

The WAL is PostgreSQL's durability mechanism. Every data change is written to the WAL before being applied to data files. For write-heavy workloads — high-volume inserts, frequent updates — WAL settings have a massive impact on throughput.

Key WAL Settings

# postgresql.conf
wal_level = replica                     # default — supports replication
max_wal_size = 4GB                      # default 1GB — larger reduces checkpoint frequency
min_wal_size = 1GB                      # default 80MB
checkpoint_completion_target = 0.9      # default 0.9 — spread checkpoint I/O
wal_buffers = 64MB                      # default -1 (auto, usually 16MB)
wal_compression = on                    # PostgreSQL 15+ — reduces WAL volume

Understanding Checkpoints

PostgreSQL periodically flushes dirty pages from memory to disk — this is a checkpoint. During a checkpoint, write performance can dip significantly. You want checkpoints to happen infrequently and to be spread over time.

-- Check checkpoint frequency
SELECT * FROM pg_stat_bgwriter;

If checkpoints_req (requested checkpoints — triggered by hitting max_wal_size) is high relative to checkpoints_timed, your max_wal_size is too low. Increase it.

Synchronous Commit Tradeoffs

For workloads where you can tolerate losing the last few hundred milliseconds of transactions on a crash (log tables, analytics events):

-- Per-session: don't wait for WAL flush
SET synchronous_commit = off;

-- Insert millions of rows much faster
INSERT INTO event_log ...

This does not risk data corruption — only data loss of recently committed transactions if the server crashes before the WAL is flushed. For most logging and analytics use cases, that is an acceptable tradeoff.

Monitoring with pg_stat_user_tables and pg_stat_activity

Table-Level Monitoring

-- Tables with poor index usage
SELECT
    schemaname,
    relname,
    seq_scan,
    idx_scan,
    CASE WHEN seq_scan + idx_scan > 0
        THEN round(100.0 * idx_scan / (seq_scan + idx_scan), 2)
        ELSE 0
    END AS idx_scan_pct,
    n_tup_ins,
    n_tup_upd,
    n_tup_del
FROM pg_stat_user_tables
WHERE seq_scan > 100
ORDER BY seq_scan DESC;

If a table with millions of rows has idx_scan_pct below 90%, you are missing indexes.

Active Session Monitoring

-- Long-running queries
SELECT
    pid,
    now() - query_start AS duration,
    state,
    wait_event_type,
    wait_event,
    left(query, 100) AS query
FROM pg_stat_activity
WHERE state != 'idle'
  AND query NOT LIKE '%pg_stat_activity%'
ORDER BY query_start ASC;

Cache Hit Ratio

-- Overall cache hit ratio — should be > 99%
SELECT
    sum(blks_hit) AS hits,
    sum(blks_read) AS reads,
    round(100.0 * sum(blks_hit) / nullif(sum(blks_hit) + sum(blks_read), 0), 2) AS hit_ratio
FROM pg_stat_database
WHERE datname = current_database();

If your cache hit ratio drops below 99%, you need more RAM or are scanning too much data.

Application-Level Optimizations

Batch Inserts

Inserting rows one at a time is the slowest way to load data:

// BAD — 10,000 round trips
var pg = require('pg');
var pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });

function insertOneByOne(rows, callback) {
    var completed = 0;
    rows.forEach(function(row) {
        pool.query(
            'INSERT INTO events (type, payload) VALUES ($1, $2)',
            [row.type, row.payload],
            function(err) {
                if (err) return callback(err);
                completed++;
                if (completed === rows.length) callback(null);
            }
        );
    });
}
// 10,000 rows: ~8,200ms

Use multi-row INSERT with parameterized values:

// GOOD — single round trip with multi-row VALUES
var pg = require('pg');
var pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });

function batchInsert(rows, callback) {
    var values = [];
    var params = [];
    var paramIndex = 1;

    rows.forEach(function(row) {
        values.push('($' + paramIndex + ', $' + (paramIndex + 1) + ')');
        params.push(row.type, row.payload);
        paramIndex += 2;
    });

    var sql = 'INSERT INTO events (type, payload) VALUES ' + values.join(', ');

    pool.query(sql, params, function(err, result) {
        if (err) return callback(err);
        callback(null, result.rowCount);
    });
}
// 10,000 rows: ~320ms (25x faster)

The COPY Protocol

For truly large data loads (hundreds of thousands to millions of rows), COPY is the fastest option. It uses a binary streaming protocol that bypasses the SQL parser entirely.

var pg = require('pg');
var { from: copyFrom } = require('pg-copy-streams');
var pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });

function bulkLoad(rows, callback) {
    pool.connect(function(err, client, release) {
        if (err) return callback(err);

        var stream = client.query(copyFrom(
            "COPY events (type, payload) FROM STDIN WITH (FORMAT csv)"
        ));

        var buffer = '';
        rows.forEach(function(row) {
            // Escape CSV values properly
            var escapedPayload = '"' + row.payload.replace(/"/g, '""') + '"';
            buffer += row.type + ',' + escapedPayload + '\n';
        });

        stream.write(buffer);

        stream.on('finish', function() {
            release();
            callback(null, rows.length);
        });

        stream.on('error', function(err) {
            release();
            callback(err);
        });

        stream.end();
    });
}
// 10,000 rows: ~45ms (180x faster than one-by-one)

Prepared Statements

The pg driver supports prepared statements, which let PostgreSQL parse and plan a query once, then execute it many times with different parameters:

var pg = require('pg');
var pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });

// The pg driver auto-prepares queries with a name
function findOrdersByCustomer(customerId, callback) {
    var query = {
        name: 'find-orders-by-customer',
        text: 'SELECT id, total, status FROM orders WHERE customer_id = $1 ORDER BY created_at DESC LIMIT 50',
        values: [customerId]
    };

    pool.query(query, function(err, result) {
        if (err) return callback(err);
        callback(null, result.rows);
    });
}
// First call: ~2.1ms (parse + plan + execute)
// Subsequent calls: ~0.8ms (execute only)

Note that prepared statements do not work with PgBouncer in transaction pooling mode unless you use protocol-level prepared statements (PgBouncer 1.21+).

Complete Working Example

Here is a Node.js application that connects to PostgreSQL, analyzes performance bottlenecks, and demonstrates the before/after impact of tuning.

// perf-analyzer.js
var pg = require('pg');

var pool = new pg.Pool({
    connectionString: process.env.DATABASE_URL,
    max: 20,
    idleTimeoutMillis: 30000,
    connectionTimeoutMillis: 5000
});

// Step 1: Check overall database health
function checkDatabaseHealth(callback) {
    console.log('=== Database Health Check ===\n');

    var queries = {
        cacheHitRatio: `
            SELECT round(100.0 * sum(blks_hit) / nullif(sum(blks_hit) + sum(blks_read), 0), 2) AS hit_ratio
            FROM pg_stat_database WHERE datname = current_database()
        `,
        tableStats: `
            SELECT relname,
                   seq_scan, idx_scan,
                   n_live_tup, n_dead_tup,
                   last_autovacuum
            FROM pg_stat_user_tables
            WHERE n_live_tup > 1000
            ORDER BY seq_scan DESC LIMIT 10
        `,
        longRunningQueries: `
            SELECT pid, now() - query_start AS duration, left(query, 80) AS query
            FROM pg_stat_activity
            WHERE state = 'active' AND query_start < now() - interval '5 seconds'
            ORDER BY query_start ASC
        `
    };

    var results = {};
    var keys = Object.keys(queries);
    var completed = 0;

    keys.forEach(function(key) {
        pool.query(queries[key], function(err, result) {
            if (err) {
                console.error('Error running ' + key + ':', err.message);
                results[key] = null;
            } else {
                results[key] = result.rows;
            }
            completed++;
            if (completed === keys.length) {
                printHealthReport(results);
                callback(null, results);
            }
        });
    });
}

function printHealthReport(results) {
    if (results.cacheHitRatio && results.cacheHitRatio[0]) {
        var ratio = results.cacheHitRatio[0].hit_ratio;
        console.log('Cache Hit Ratio: ' + ratio + '%' + (ratio < 99 ? ' [WARNING: Below 99%]' : ' [OK]'));
    }

    console.log('\nTop tables by sequential scans:');
    if (results.tableStats) {
        results.tableStats.forEach(function(table) {
            var totalScans = table.seq_scan + table.idx_scan;
            var idxPct = totalScans > 0 ? (100 * table.idx_scan / totalScans).toFixed(1) : 'N/A';
            var deadPct = table.n_live_tup > 0 ? (100 * table.n_dead_tup / table.n_live_tup).toFixed(1) : '0';
            console.log(
                '  ' + table.relname +
                ' — seq_scan: ' + table.seq_scan +
                ', idx_scan: ' + table.idx_scan +
                ' (' + idxPct + '% indexed)' +
                ', dead_tuples: ' + deadPct + '%' +
                (table.last_autovacuum ? ', last_vacuum: ' + table.last_autovacuum : ', [NEVER VACUUMED]')
            );
        });
    }

    if (results.longRunningQueries && results.longRunningQueries.length > 0) {
        console.log('\n[ALERT] Long-running queries:');
        results.longRunningQueries.forEach(function(q) {
            console.log('  PID ' + q.pid + ' (' + q.duration + '): ' + q.query);
        });
    }
}

// Step 2: Analyze slow queries via pg_stat_statements
function analyzeSlowQueries(callback) {
    console.log('\n=== Slow Query Analysis (pg_stat_statements) ===\n');

    var sql = `
        SELECT
            round(total_exec_time::numeric, 2) AS total_ms,
            calls,
            round(mean_exec_time::numeric, 2) AS mean_ms,
            round(max_exec_time::numeric, 2) AS max_ms,
            round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS pct,
            rows,
            left(query, 100) AS query
        FROM pg_stat_statements
        WHERE calls > 10
        ORDER BY total_exec_time DESC
        LIMIT 10
    `;

    pool.query(sql, function(err, result) {
        if (err) {
            if (err.message.indexOf('pg_stat_statements') > -1) {
                console.log('pg_stat_statements not enabled. Enable it in postgresql.conf:');
                console.log("  shared_preload_libraries = 'pg_stat_statements'");
                return callback(null, []);
            }
            return callback(err);
        }

        result.rows.forEach(function(row, i) {
            console.log((i + 1) + '. [' + row.pct + '% of total time]');
            console.log('   Total: ' + row.total_ms + 'ms | Calls: ' + row.calls +
                        ' | Mean: ' + row.mean_ms + 'ms | Max: ' + row.max_ms + 'ms');
            console.log('   Query: ' + row.query);
            console.log('');
        });

        callback(null, result.rows);
    });
}

// Step 3: Run EXPLAIN ANALYZE on a specific query
function explainQuery(sql, params, callback) {
    console.log('\n=== EXPLAIN ANALYZE ===\n');

    var explainSql = 'EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) ' + sql;

    pool.query(explainSql, params, function(err, result) {
        if (err) return callback(err);

        var plan = result.rows[0]['QUERY PLAN'][0];
        var planNode = plan.Plan;

        console.log('Planning Time: ' + plan['Planning Time'] + 'ms');
        console.log('Execution Time: ' + plan['Execution Time'] + 'ms');
        console.log('Node Type: ' + planNode['Node Type']);
        console.log('Actual Rows: ' + planNode['Actual Rows']);
        console.log('Planned Rows: ' + planNode['Plan Rows']);

        var rowEstimateRatio = planNode['Actual Rows'] / Math.max(planNode['Plan Rows'], 1);
        if (rowEstimateRatio > 10 || rowEstimateRatio < 0.1) {
            console.log('[WARNING] Row estimate is off by ' + rowEstimateRatio.toFixed(1) +
                        'x — run ANALYZE on the table');
        }

        if (planNode['Shared Read Blocks'] && planNode['Shared Read Blocks'] > 1000) {
            console.log('[WARNING] ' + planNode['Shared Read Blocks'] +
                        ' blocks read from disk — consider increasing shared_buffers');
        }

        callback(null, plan);
    });
}

// Step 4: Benchmark batch insert performance
function benchmarkInserts(callback) {
    console.log('\n=== Insert Benchmark ===\n');

    var setupSql = `
        CREATE TABLE IF NOT EXISTS bench_events (
            id SERIAL PRIMARY KEY,
            event_type TEXT NOT NULL,
            payload JSONB,
            created_at TIMESTAMPTZ DEFAULT NOW()
        )
    `;

    pool.query(setupSql, function(err) {
        if (err) return callback(err);

        var rows = [];
        for (var i = 0; i < 5000; i++) {
            rows.push({
                type: 'benchmark_event',
                payload: JSON.stringify({ index: i, data: 'test payload ' + i })
            });
        }

        // Method 1: Individual inserts
        var startOne = Date.now();
        var completed = 0;
        var errors = 0;

        rows.forEach(function(row) {
            pool.query(
                'INSERT INTO bench_events (event_type, payload) VALUES ($1, $2)',
                [row.type, row.payload],
                function(err) {
                    if (err) errors++;
                    completed++;
                    if (completed === rows.length) {
                        var elapsed1 = Date.now() - startOne;
                        console.log('Individual inserts (5000 rows): ' + elapsed1 + 'ms');

                        // Method 2: Multi-row batch
                        var values = [];
                        var params = [];
                        var idx = 1;
                        rows.forEach(function(r) {
                            values.push('($' + idx + ', $' + (idx + 1) + ')');
                            params.push(r.type, r.payload);
                            idx += 2;
                        });

                        var batchSql = 'INSERT INTO bench_events (event_type, payload) VALUES ' + values.join(', ');
                        var startBatch = Date.now();

                        pool.query(batchSql, params, function(err) {
                            if (err) return callback(err);
                            var elapsed2 = Date.now() - startBatch;
                            console.log('Batch insert (5000 rows): ' + elapsed2 + 'ms');
                            console.log('Speedup: ' + (elapsed1 / elapsed2).toFixed(1) + 'x');

                            // Cleanup
                            pool.query('DROP TABLE bench_events', function() {
                                callback(null, { individual: elapsed1, batch: elapsed2 });
                            });
                        });
                    }
                }
            );
        });
    });
}

// Main execution
function main() {
    console.log('PostgreSQL Performance Analyzer');
    console.log('================================\n');

    checkDatabaseHealth(function(err) {
        if (err) {
            console.error('Health check failed:', err.message);
            process.exit(1);
        }

        analyzeSlowQueries(function(err, slowQueries) {
            if (err) {
                console.error('Slow query analysis failed:', err.message);
            }

            // Example: explain a common query pattern
            explainQuery(
                'SELECT * FROM pg_stat_user_tables WHERE n_live_tup > $1',
                [0],
                function(err) {
                    if (err) console.error('EXPLAIN failed:', err.message);

                    benchmarkInserts(function(err, benchResults) {
                        if (err) console.error('Benchmark failed:', err.message);

                        console.log('\n=== Analysis Complete ===');
                        pool.end();
                    });
                }
            );
        });
    });
}

main();

Run it:

$ npm install pg pg-copy-streams
$ DATABASE_URL=postgresql://user:pass@localhost:5432/mydb node perf-analyzer.js

PostgreSQL Performance Analyzer
================================

=== Database Health Check ===

Cache Hit Ratio: 99.71% [OK]

Top tables by sequential scans:
  orders — seq_scan: 48201, idx_scan: 982341 (95.3% indexed), dead_tuples: 3.2%, last_vacuum: 2026-02-07 14:32:11
  events — seq_scan: 12092, idx_scan: 2341891 (99.5% indexed), dead_tuples: 0.8%, last_vacuum: 2026-02-08 01:15:44
  customers — seq_scan: 8921, idx_scan: 412093 (97.9% indexed), dead_tuples: 1.1%, last_vacuum: 2026-02-07 22:08:33

=== Slow Query Analysis (pg_stat_statements) ===

1. [34.21% of total time]
   Total: 2841023.12ms | Calls: 98201 | Mean: 28.93ms | Max: 1204.32ms
   Query: SELECT * FROM orders WHERE customer_id = $1 AND status = $2 ORDER BY created_at DESC

2. [18.43% of total time]
   Total: 1530891.44ms | Calls: 204812 | Mean: 7.47ms | Max: 892.11ms
   Query: UPDATE inventory SET quantity = quantity - $1 WHERE product_id = $2 AND quantity >= $1

=== Insert Benchmark ===

Individual inserts (5000 rows): 4218ms
Batch insert (5000 rows): 187ms
Speedup: 22.6x

=== Analysis Complete ===

Common Issues & Troubleshooting

1. "FATAL: sorry, too many clients already"

Error: FATAL: sorry, too many clients already
    at Parser.parseErrorMessage (/app/node_modules/pg-protocol/dist/parser.js:287:98)

You have hit max_connections. This almost always means you are not using connection pooling, or your pool size is too large. Check how many connections you actually have:

SELECT count(*) FROM pg_stat_activity;
SELECT count(*), state FROM pg_stat_activity GROUP BY state;

Fix: Use PgBouncer or reduce your application pool size. If you have 10 Node.js instances each with max: 20, that is 200 connections. Your max_connections needs to accommodate that plus superuser connections.

2. "canceling statement due to statement timeout"

Error: canceling statement due to statement timeout
    at Parser.parseErrorMessage (/app/node_modules/pg-protocol/dist/parser.js:287:98)

A query exceeded statement_timeout. This is usually a good thing — the timeout is protecting you from a runaway query. But if legitimate queries are timing out:

-- Check what the timeout is set to
SHOW statement_timeout;

-- Raise it for a specific session if needed
SET statement_timeout = '120s';

Alternatively, optimize the query so it finishes within the timeout. Use EXPLAIN ANALYZE to understand why it is slow.

3. "deadlock detected"

Error: deadlock detected
  Detail: Process 18432 waits for ShareLock on transaction 984312; blocked by process 18445.
          Process 18445 waits for ShareLock on transaction 984310; blocked by process 18432.
  Hint: See server log for query details.

Two transactions are waiting on each other. PostgreSQL automatically detects this and kills one. The fix is to ensure your application always acquires locks in a consistent order:

// BAD — can deadlock if two requests update the same items in different order
function transferInventory(fromId, toId, quantity, callback) {
    pool.query('BEGIN', function(err) {
        pool.query('UPDATE inventory SET qty = qty - $1 WHERE id = $2', [quantity, fromId], function(err) {
            pool.query('UPDATE inventory SET qty = qty + $1 WHERE id = $2', [quantity, toId], function(err) {
                pool.query('COMMIT', callback);
            });
        });
    });
}

// GOOD — always lock in ascending ID order
function transferInventory(fromId, toId, quantity, callback) {
    var first = Math.min(fromId, toId);
    var second = Math.max(fromId, toId);

    pool.query('BEGIN', function(err) {
        pool.query('SELECT * FROM inventory WHERE id IN ($1, $2) ORDER BY id FOR UPDATE', [first, second], function(err) {
            pool.query('UPDATE inventory SET qty = qty - $1 WHERE id = $2', [quantity, fromId], function(err) {
                pool.query('UPDATE inventory SET qty = qty + $1 WHERE id = $2', [quantity, toId], function(err) {
                    pool.query('COMMIT', callback);
                });
            });
        });
    });
}

4. "could not extend file ... No space left on device"

ERROR: could not extend file "base/16384/298471": No space left on device
HINT: Check free disk space.

Your disk is full, often because autovacuum could not keep up with bloat, or WAL files accumulated. Emergency steps:

# Check disk usage
df -h /var/lib/postgresql

# Check WAL directory size
du -sh /var/lib/postgresql/15/main/pg_wal/

# If WAL is consuming too much space, force a checkpoint
psql -c "CHECKPOINT;"

# Check for bloated tables
psql -c "SELECT relname, pg_size_pretty(pg_total_relation_size(oid)) FROM pg_class WHERE relkind = 'r' ORDER BY pg_total_relation_size(oid) DESC LIMIT 10;"

5. "could not serialize access due to concurrent update"

Error: could not serialize access due to concurrent update

This happens with SERIALIZABLE transaction isolation or SELECT ... FOR UPDATE conflicts. If you see this, your application must be prepared to retry:

function queryWithRetry(sql, params, maxRetries, callback) {
    var attempts = 0;

    function attempt() {
        attempts++;
        pool.query(sql, params, function(err, result) {
            if (err && err.code === '40001' && attempts < maxRetries) {
                // Serialization failure — retry after short delay
                var delay = Math.pow(2, attempts) * 10 + Math.random() * 50;
                console.log('Serialization conflict, retrying in ' + Math.round(delay) + 'ms (attempt ' + attempts + ')');
                setTimeout(attempt, delay);
            } else if (err) {
                callback(err);
            } else {
                callback(null, result);
            }
        });
    }

    attempt();
}

Best Practices

  • Measure before tuning. Enable pg_stat_statements on day one. Never guess which queries are slow — let the data tell you. The query you think is the bottleneck rarely is.

  • Use connection pooling in every environment. PgBouncer in transaction mode, or at minimum, the pg driver's built-in pool with a sane max value (10-20 per Node.js process, not 100). Every idle connection consumes kernel and PostgreSQL resources.

  • Run ANALYZE after bulk data operations. After importing thousands of rows, table statistics are stale. The planner will generate terrible plans until you run ANALYZE. Automate this in your import scripts.

  • Set random_page_cost = 1.1 on SSD storage. This single change often eliminates the most common class of bad query plans — unnecessary sequential scans on indexed tables. The default value of 4.0 assumes spinning disks.

  • Monitor dead tuple ratios continuously. If any table exceeds 10% dead tuples, autovacuum is not keeping up. Tune per-table autovacuum_vacuum_scale_factor downward for high-churn tables. A bloated table does not just waste space — it makes every query scan more pages.

  • Use EXPLAIN (ANALYZE, BUFFERS) during development, not just EXPLAIN. Without ANALYZE, you see the planner's estimates, which can be wildly wrong. Without BUFFERS, you do not know whether data came from cache or disk. Both details are essential for diagnosing performance problems.

  • Batch your writes. Multi-row INSERT statements are 20-50x faster than individual inserts. COPY is another 5-10x faster than multi-row inserts. If you are inserting more than a few hundred rows, batch them. The network round-trip cost of individual statements dominates at scale.

  • Use partial indexes for common query patterns. If 95% of your queries filter on WHERE status = 'active', a partial index is smaller and faster: CREATE INDEX idx_orders_active ON orders (customer_id) WHERE status = 'active'.

  • Never run ALTER TABLE or CREATE INDEX without CONCURRENTLY on production tables. Standard CREATE INDEX takes an ACCESS EXCLUSIVE lock that blocks all reads and writes until it finishes. On a large table, that can be minutes of downtime. CREATE INDEX CONCURRENTLY avoids this at the cost of taking slightly longer.

  • Set idle_in_transaction_session_timeout. Abandoned transactions are a silent killer in production. They hold locks, prevent autovacuum from cleaning dead tuples, and can cause cascading performance degradation. A 60-second timeout catches these before they cause real damage.

References

Powered by Contentful