PostgreSQL Indexing Strategies for Node.js Applications
A practical guide to PostgreSQL indexing for Node.js developers, covering B-tree, GIN, partial, and expression indexes with EXPLAIN ANALYZE examples and real performance benchmarks.
PostgreSQL Indexing Strategies for Node.js Applications
Overview
Indexes are the single most impactful performance lever you have in PostgreSQL, and most Node.js developers either create too few of them or create the wrong ones. A missing index turns a 2ms lookup into a 4-second sequential scan. A poorly designed index wastes disk space, slows down writes, and gives you a false sense of security when you check pg_stat_user_indexes and see it was never used. This guide covers every index type that matters in production, shows you how to read EXPLAIN ANALYZE output, and walks through a complete Node.js application with before-and-after benchmarks.
Prerequisites
- PostgreSQL 14+ installed and running
- Node.js 18+ with the
pgdriver - Basic SQL knowledge (CREATE TABLE, SELECT, INSERT, WHERE)
- Familiarity with Express.js routing
- A database with enough rows to see the difference (we will generate them)
npm install pg express
How Indexes Work: B-tree Internals Simplified
Every time you create an index in PostgreSQL without specifying a type, you get a B-tree. Understanding what happens under the hood saves you from cargo-culting index creation.
A B-tree is a balanced tree structure where:
- The root node contains pointers to child pages and boundary keys
- Internal nodes narrow the search space by comparing your lookup value against stored keys
- Leaf nodes contain the actual indexed values and pointers (CTIDs) to the heap tuples on disk
When you run SELECT * FROM users WHERE email = '[email protected]', PostgreSQL walks the B-tree from root to leaf in O(log n) time. For a table with 10 million rows, that is roughly 3-4 page reads instead of scanning all 10 million rows.
The key insight: a B-tree is only useful if your query's WHERE clause, JOIN condition, or ORDER BY clause matches the indexed columns. An index on (last_name, first_name) does nothing for WHERE first_name = 'Shane' by itself.
Root Node: [Garcia | Martinez | Williams]
| | |
Internal Internal Internal
Nodes Nodes Nodes
| | |
Leaf Nodes Leaf Nodes Leaf Nodes
(CTIDs) (CTIDs) (CTIDs)
Each leaf node also has a pointer to the next leaf node, forming a linked list. This is why range scans (WHERE created_at > '2025-01-01') are fast -- PostgreSQL walks the tree once and then follows the leaf chain.
The Default B-tree Index: When It Is Enough
B-tree indexes handle equality and range operations on scalar types. That covers the vast majority of queries in a typical Node.js application.
-- Equality lookups
CREATE INDEX idx_users_email ON users (email);
-- Supports: WHERE email = '[email protected]'
-- Range queries
CREATE INDEX idx_orders_created ON orders (created_at);
-- Supports: WHERE created_at > '2025-01-01'
-- Supports: WHERE created_at BETWEEN '2025-01-01' AND '2025-06-01'
-- Supports: ORDER BY created_at DESC LIMIT 20
-- Sorting
CREATE INDEX idx_articles_published ON articles (published_at DESC);
-- Supports: ORDER BY published_at DESC LIMIT 50
B-tree is enough when:
- You query by exact value or range on a single column
- You sort by a single column
- Your data types are standard scalars (integer, text, timestamp, uuid)
B-tree is not enough when you need to search inside JSONB, arrays, or full-text search vectors.
Composite Indexes and Column Ordering
Composite indexes index multiple columns together. The order of columns matters enormously because of the leftmost prefix rule: PostgreSQL can use a composite index for any query that matches a left prefix of the indexed columns.
CREATE INDEX idx_orders_user_status_created
ON orders (user_id, status, created_at DESC);
This single index supports all of these queries:
-- Uses full index
SELECT * FROM orders WHERE user_id = 42 AND status = 'shipped' ORDER BY created_at DESC;
-- Uses first two columns
SELECT * FROM orders WHERE user_id = 42 AND status = 'pending';
-- Uses first column only
SELECT * FROM orders WHERE user_id = 42;
But it does not help with:
-- Cannot use the index (skips user_id)
SELECT * FROM orders WHERE status = 'shipped';
-- Cannot use the index (skips user_id and status)
SELECT * FROM orders WHERE created_at > '2025-01-01';
Column ordering rules of thumb:
- Equality columns first. Columns compared with
=go before columns compared with>,<,BETWEEN. - High-selectivity columns first. A column with 100,000 distinct values (
user_id) should precede a column with 5 distinct values (status). - Range or sort column last. If you sort by
created_at DESC, put it at the end.
Partial Indexes for Filtering Hot Paths
A partial index only indexes rows that match a WHERE clause. This is one of the most underused features in PostgreSQL.
Consider: your orders table has 50 million rows, but 95% have status = 'completed'. Your dashboard only queries active orders.
-- Full index: indexes all 50 million rows
CREATE INDEX idx_orders_status ON orders (status);
-- Partial index: indexes only ~2.5 million active rows
CREATE INDEX idx_orders_active ON orders (user_id, created_at DESC)
WHERE status IN ('pending', 'processing', 'shipped');
The partial index is 20x smaller, faster to scan, faster to maintain, and uses less memory.
Your query must include a WHERE clause that matches or is more restrictive than the index predicate:
-- Uses the partial index
SELECT * FROM orders
WHERE status = 'pending' AND user_id = 42
ORDER BY created_at DESC;
-- Does NOT use the partial index (status = 'completed' is excluded)
SELECT * FROM orders
WHERE status = 'completed' AND user_id = 42;
Partial indexes are ideal for:
- Soft deletes:
WHERE deleted_at IS NULL - Active records:
WHERE is_active = true - Published content:
WHERE published_at IS NOT NULL - Unprocessed queues:
WHERE processed = false
GIN Indexes for JSONB and Full-Text Search
GIN (Generalized Inverted Index) maps each value inside a composite type to the rows that contain it. Think of it as the index at the back of a textbook: each keyword points to the pages where it appears.
JSONB Indexing
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
metadata JSONB DEFAULT '{}'
);
-- Index all keys and values in the JSONB column
CREATE INDEX idx_products_metadata ON products USING GIN (metadata);
This GIN index supports containment queries:
-- Find products with a specific attribute
SELECT * FROM products WHERE metadata @> '{"color": "red"}';
-- Check if a key exists
SELECT * FROM products WHERE metadata ? 'warranty';
-- Check if any of these keys exist
SELECT * FROM products WHERE metadata ?| array['warranty', 'guarantee'];
For queries on a specific JSONB path, use an expression index instead (covered below).
Full-Text Search Indexing
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);
GIN indexes are optimized for read-heavy workloads. The tradeoff: writes are slower because PostgreSQL must update the inverted index for every insert or update. For most Node.js applications -- which are read-heavy APIs serving content -- this tradeoff is worth it.
Expression Indexes for Computed Lookups
An expression index indexes the result of a function or expression, not the raw column value. This is essential when your queries transform data before comparison.
-- Index lowercase email for case-insensitive lookups
CREATE INDEX idx_users_email_lower ON users (LOWER(email));
-- Query MUST use the same expression
SELECT * FROM users WHERE LOWER(email) = LOWER('[email protected]');
Common use cases:
-- Index a specific JSONB path
CREATE INDEX idx_products_color ON products ((metadata->>'color'));
-- Supports: WHERE metadata->>'color' = 'red'
-- Index extracted date from timestamp
CREATE INDEX idx_orders_date ON orders ((created_at::date));
-- Supports: WHERE created_at::date = '2025-06-15'
-- Index string length (find empty or very long values)
CREATE INDEX idx_articles_body_length ON articles (LENGTH(body))
WHERE LENGTH(body) < 100;
The critical rule: the expression in your WHERE clause must exactly match the expression in the index definition. WHERE LOWER(email) = 'test' uses the index. WHERE email ILIKE 'test' does not.
Unique Indexes and Constraints
A unique index enforces uniqueness at the database level. Every UNIQUE constraint creates a unique B-tree index under the hood.
-- These are equivalent:
CREATE UNIQUE INDEX idx_users_email_unique ON users (email);
ALTER TABLE users ADD CONSTRAINT users_email_unique UNIQUE (email);
Unique indexes have a useful property: PostgreSQL knows that at most one row matches an equality lookup, so it stops scanning immediately after finding it. The query planner models this as an Index Scan with a cost that does not scale with table size.
Partial unique indexes are powerful for conditional uniqueness:
-- Each user can have only one active subscription
CREATE UNIQUE INDEX idx_subscriptions_active_unique
ON subscriptions (user_id)
WHERE cancelled_at IS NULL;
This allows multiple cancelled subscriptions per user but only one active one. Trying to insert a second active subscription for the same user raises a unique violation error.
Covering Indexes (INCLUDE)
A covering index stores additional columns in the leaf nodes without including them in the search key. This enables index-only scans -- queries that can be answered entirely from the index without touching the heap.
CREATE INDEX idx_orders_user_covering
ON orders (user_id, status)
INCLUDE (total_amount, created_at);
This index supports:
-- Index-only scan: all needed columns are in the index
SELECT status, total_amount, created_at
FROM orders
WHERE user_id = 42;
Without the INCLUDE, PostgreSQL would find the matching rows using the index but then need to fetch total_amount and created_at from the heap -- a random I/O operation for each row.
When to use INCLUDE:
- High-frequency queries that select a small, consistent set of columns
- Columns that are selected but never searched or sorted on
- When you see "Heap Fetches" in EXPLAIN ANALYZE and want to eliminate them
When NOT to use INCLUDE:
- Do not include large columns (TEXT, BYTEA) -- they bloat the index
- Do not add every column -- that is just duplicating the table
Index-Only Scans
An index-only scan is the fastest possible query path. PostgreSQL reads the answer directly from the index B-tree without touching the table heap at all.
Requirements for an index-only scan:
- All columns in SELECT, WHERE, and ORDER BY must be in the index (key + INCLUDE)
- The visibility map must indicate that the pages are all-visible (run VACUUM)
- The query planner must estimate that an index-only scan is cheaper
EXPLAIN (ANALYZE, BUFFERS) SELECT user_id, status FROM orders WHERE user_id = 42;
Index Only Scan using idx_orders_user_covering on orders
Index Cond: (user_id = 42)
Heap Fetches: 0
Buffers: shared hit=4
Planning Time: 0.085 ms
Execution Time: 0.031 ms
The Heap Fetches: 0 confirms a true index-only scan. If you see Heap Fetches: 847, run VACUUM orders; to update the visibility map.
EXPLAIN ANALYZE Reading Guide
EXPLAIN ANALYZE is your primary diagnostic tool. Here is how to read the output.
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.status, o.total_amount
FROM orders o
WHERE o.user_id = 42
AND o.status = 'pending'
ORDER BY o.created_at DESC
LIMIT 20;
Sample output:
Limit (cost=0.56..45.23 rows=20 width=24) (actual time=0.031..0.089 rows=20 loops=1)
-> Index Scan using idx_orders_user_status_created on orders o
(cost=0.56..892.34 rows=398 width=24) (actual time=0.029..0.083 rows=20 loops=1)
Index Cond: ((user_id = 42) AND (status = 'pending'::text))
Scan Direction: Forward
Buffers: shared hit=24
Planning Time: 0.112 ms
Execution Time: 0.109 ms
Key fields to read:
| Field | What It Tells You |
|---|---|
cost=0.56..45.23 |
Estimated startup cost..total cost (in arbitrary units) |
rows=20 |
Estimated (left) vs actual (right) row count |
actual time=0.031..0.089 |
Real wall-clock time in milliseconds |
Buffers: shared hit=24 |
Pages read from shared buffer cache (no disk I/O) |
Buffers: shared read=150 |
Pages read from disk (slow) |
Index Scan |
Used an index, then fetched from heap |
Index Only Scan |
Answered entirely from the index |
Seq Scan |
Full table scan -- usually bad for large tables |
Bitmap Index Scan |
Read index, built a bitmap, then fetched matching heap pages |
Heap Fetches |
Number of heap pages accessed during an index-only scan |
Red flags in EXPLAIN output:
- Seq Scan on a large table -- missing index
rows=1estimated butrows=50000actual -- stale statistics, runANALYZE tablename;Buffers: shared read=5000-- heavy disk I/O, table or index not in cache- Nested Loop with Seq Scan -- missing join index
Identifying Missing Indexes from Slow Query Logs
Enable slow query logging in postgresql.conf:
log_min_duration_statement = 100 # Log queries slower than 100ms
log_statement = 'none'
Then query pg_stat_user_tables to find tables with excessive sequential scans:
SELECT
schemaname,
relname AS table_name,
seq_scan,
seq_tup_read,
idx_scan,
idx_tup_fetch,
n_live_tup,
CASE WHEN seq_scan > 0
THEN round(seq_tup_read::numeric / seq_scan, 0)
ELSE 0
END AS avg_rows_per_seq_scan
FROM pg_stat_user_tables
WHERE seq_scan > 100
ORDER BY seq_tup_read DESC
LIMIT 20;
A table with 5 million rows, 10,000 sequential scans, and 50 billion seq_tup_read is screaming for an index. Cross-reference with pg_stat_statements (requires the extension) to find the exact queries:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
SELECT
calls,
round(mean_exec_time::numeric, 2) AS avg_ms,
round(total_exec_time::numeric, 2) AS total_ms,
query
FROM pg_stat_statements
WHERE mean_exec_time > 100
ORDER BY total_exec_time DESC
LIMIT 20;
Index Bloat and Maintenance
Indexes accumulate dead tuples over time. PostgreSQL's MVCC architecture marks old row versions as dead but does not immediately reclaim the space. This is called index bloat.
Detecting Bloat
SELECT
indexrelname AS index_name,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
idx_scan AS times_used,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 20;
Look for indexes that are large but rarely used (idx_scan close to 0). Those are candidates for removal.
Fixing Bloat
-- Blocks writes to the table (use during maintenance windows)
REINDEX INDEX CONCURRENTLY idx_orders_user_status_created;
-- Or rebuild all indexes on a table
REINDEX TABLE CONCURRENTLY orders;
The CONCURRENTLY option (PostgreSQL 12+) rebuilds the index without locking the table for writes. Always use it in production.
Automated Maintenance
Autovacuum handles routine cleanup, but heavily updated tables may need tuning:
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.02, -- Vacuum after 2% dead tuples (default 20%)
autovacuum_analyze_scale_factor = 0.01 -- Analyze after 1% changes (default 10%)
);
The Cost of Too Many Indexes (Write Amplification)
Every index on a table must be updated for every INSERT, UPDATE, or DELETE. This is write amplification.
A table with 8 indexes means every single INSERT results in 9 writes: one to the heap plus one to each index. For update-heavy tables, this can be devastating.
-- See how many indexes each table has
SELECT
tablename,
COUNT(*) AS index_count
FROM pg_indexes
WHERE schemaname = 'public'
GROUP BY tablename
ORDER BY index_count DESC;
Symptoms of over-indexing:
- INSERT/UPDATE latency climbing over time
- WAL generation rate increasing
- Replication lag growing
- Autovacuum falling behind
Rules for index discipline:
- Remove unused indexes. If
pg_stat_user_indexes.idx_scan = 0after a month of production traffic, drop it. - Consolidate overlapping indexes. If you have
(user_id)and(user_id, status), drop the single-column index -- the composite handles both. - Benchmark writes, not just reads. A 2ms read improvement that costs 10ms per write is a net loss on a write-heavy table.
Using node-postgres (pg) with Parameterized Queries
Parameterized queries are non-negotiable. They prevent SQL injection, and they also help PostgreSQL reuse query plans.
var { Pool } = require('pg');
var pool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 20
});
// CORRECT: parameterized query -- plan can be cached, index is used
async function getOrdersByUser(userId, status) {
var result = await pool.query(
'SELECT id, status, total_amount, created_at FROM orders WHERE user_id = $1 AND status = $2 ORDER BY created_at DESC LIMIT 20',
[userId, status]
);
return result.rows;
}
// WRONG: string interpolation -- SQL injection risk, no plan caching
async function getOrdersByUserUnsafe(userId, status) {
var result = await pool.query(
"SELECT * FROM orders WHERE user_id = " + userId + " AND status = '" + status + "'"
);
return result.rows;
}
Parameterized Queries and Type Casting
PostgreSQL infers parameter types from context. Sometimes it gets the type wrong, causing the query planner to skip an index:
// This can cause a Seq Scan if 'id' is an integer column but $1 is sent as text
var result = await pool.query('SELECT * FROM users WHERE id = $1', [req.params.id]);
// Fix: cast explicitly
var result = await pool.query('SELECT * FROM users WHERE id = $1::integer', [req.params.id]);
// Or convert in JavaScript
var result = await pool.query('SELECT * FROM users WHERE id = $1', [parseInt(req.params.id, 10)]);
Prepared Statements for Repeated Queries
For queries executed thousands of times per second, use named prepared statements:
var GET_USER_ORDERS = {
name: 'get-user-orders',
text: 'SELECT id, status, total_amount FROM orders WHERE user_id = $1 AND status = $2 ORDER BY created_at DESC LIMIT $3',
values: null
};
async function getUserOrders(userId, status, limit) {
var query = Object.assign({}, GET_USER_ORDERS, {
values: [userId, status, limit || 20]
});
var result = await pool.query(query);
return result.rows;
}
Named queries are parsed and planned once, then reused for subsequent executions. This shaves 0.1-0.5ms per query, which adds up at scale.
Complete Working Example
Here is a complete Node.js Express application demonstrating indexing strategies with an articles table. We will create the schema, seed data, add indexes, and compare EXPLAIN ANALYZE output.
Schema
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title VARCHAR(500) NOT NULL,
slug VARCHAR(500) NOT NULL,
body TEXT NOT NULL,
category VARCHAR(100) NOT NULL,
status VARCHAR(20) NOT NULL DEFAULT 'draft',
author_id INTEGER NOT NULL,
view_count INTEGER DEFAULT 0,
metadata JSONB DEFAULT '{}',
tags TEXT[] DEFAULT '{}',
published_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
Seed Script
var { Pool } = require('pg');
var pool = new Pool({ connectionString: process.env.DATABASE_URL });
var categories = ['javascript', 'python', 'devops', 'databases', 'architecture', 'ai-ml'];
var statuses = ['draft', 'review', 'published', 'archived'];
var tagOptions = ['nodejs', 'express', 'react', 'postgresql', 'docker', 'kubernetes', 'api', 'rest', 'graphql', 'testing'];
async function seed() {
var batchSize = 1000;
var totalRows = 500000;
var batches = totalRows / batchSize;
for (var b = 0; b < batches; b++) {
var values = [];
var params = [];
var paramIndex = 1;
for (var i = 0; i < batchSize; i++) {
var rowNum = b * batchSize + i;
var category = categories[rowNum % categories.length];
var status = rowNum % 10 < 7 ? 'published' : statuses[rowNum % statuses.length];
var authorId = (rowNum % 200) + 1;
var viewCount = Math.floor(Math.random() * 10000);
var tags = '{' + tagOptions.slice(0, (rowNum % 4) + 1).join(',') + '}';
var publishedAt = status === 'published' ? new Date(2023, 0, 1 + (rowNum % 730)).toISOString() : null;
var metadata = JSON.stringify({
reading_time: Math.floor(Math.random() * 20) + 1,
difficulty: ['beginner', 'intermediate', 'advanced'][rowNum % 3],
featured: rowNum % 50 === 0
});
values.push('($' + paramIndex++ + ', $' + paramIndex++ + ', $' + paramIndex++ + ', $' + paramIndex++ + ', $' + paramIndex++ + ', $' + paramIndex++ + ', $' + paramIndex++ + ', $' + paramIndex++ + ', $' + paramIndex++ + ', $' + paramIndex++ + ')');
params.push(
'Article Title Number ' + rowNum,
'article-title-number-' + rowNum,
'This is the body content of article ' + rowNum + '. It contains detailed technical information about ' + category + ' topics. '.repeat(20),
category,
status,
authorId,
viewCount,
metadata,
tags,
publishedAt
);
}
await pool.query(
'INSERT INTO articles (title, slug, body, category, status, author_id, view_count, metadata, tags, published_at) VALUES ' + values.join(', '),
params
);
if (b % 50 === 0) {
console.log('Inserted ' + ((b + 1) * batchSize) + ' rows');
}
}
console.log('Seeding complete: ' + totalRows + ' rows');
await pool.end();
}
seed().catch(function(err) {
console.error(err);
process.exit(1);
});
node seed.js
# Inserted 50000 rows
# Inserted 100000 rows
# ...
# Seeding complete: 500000 rows
Before Indexes: Baseline Performance
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, title, category, view_count, published_at
FROM articles
WHERE author_id = 42 AND status = 'published'
ORDER BY published_at DESC
LIMIT 20;
Without indexes:
Limit (cost=24891.56..24891.61 rows=20 width=68) (actual time=312.445..312.461 rows=20 loops=1)
-> Sort (cost=24891.56..24896.18 rows=1847 width=68) (actual time=312.443..312.454 rows=20 loops=1)
Sort Key: published_at DESC
Sort Method: top-N heapsort Memory: 28kB
-> Seq Scan on articles (cost=0.00..24841.00 rows=1847 width=68) (actual time=0.029..305.128 rows=1750 loops=1)
Filter: ((author_id = 42) AND (status = 'published'::text))
Rows Removed by Filter: 498250
Buffers: shared read=14841
Planning Time: 0.198 ms
Execution Time: 312.509 ms
312ms. Sequential scan reading 14,841 pages from disk. Filtering out 498,250 rows to find 1,750 matches.
Creating the Indexes
-- Composite index for the most common query pattern
CREATE INDEX idx_articles_author_status_published
ON articles (author_id, status, published_at DESC);
-- Partial index for published articles only
CREATE INDEX idx_articles_published_category
ON articles (category, published_at DESC)
WHERE status = 'published';
-- GIN index for JSONB metadata queries
CREATE INDEX idx_articles_metadata ON articles USING GIN (metadata);
-- Expression index for case-insensitive slug lookup
CREATE INDEX idx_articles_slug_lower ON articles (LOWER(slug));
-- Covering index for listing queries
CREATE INDEX idx_articles_listing
ON articles (category, published_at DESC)
INCLUDE (title, slug, author_id, view_count)
WHERE status = 'published';
-- Update statistics
ANALYZE articles;
After Indexes: Improved Performance
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, title, category, view_count, published_at
FROM articles
WHERE author_id = 42 AND status = 'published'
ORDER BY published_at DESC
LIMIT 20;
With the composite index:
Limit (cost=0.42..18.76 rows=20 width=68) (actual time=0.039..0.078 rows=20 loops=1)
-> Index Scan using idx_articles_author_status_published on articles
(cost=0.42..1694.12 rows=1847 width=68) (actual time=0.037..0.072 rows=20 loops=1)
Index Cond: ((author_id = 42) AND (status = 'published'::text))
Buffers: shared hit=24
Planning Time: 0.215 ms
Execution Time: 0.098 ms
0.098ms. That is a 3,188x improvement. 24 buffer hits (all from cache), zero disk reads, zero rows filtered out. The index delivered exactly the rows we needed in the order we needed them.
The Express Application
var express = require('express');
var { Pool } = require('pg');
var app = express();
app.use(express.json());
var pool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 20
});
// List published articles by category with pagination
app.get('/api/articles', async function(req, res) {
var category = req.query.category;
var page = Math.max(1, parseInt(req.query.page, 10) || 1);
var limit = Math.min(50, parseInt(req.query.limit, 10) || 20);
var offset = (page - 1) * limit;
try {
var sql = 'SELECT id, title, slug, category, author_id, view_count, published_at FROM articles WHERE status = $1';
var params = ['published'];
var paramIndex = 2;
if (category) {
sql += ' AND category = $' + paramIndex++;
params.push(category);
}
sql += ' ORDER BY published_at DESC LIMIT $' + paramIndex++ + ' OFFSET $' + paramIndex++;
params.push(limit, offset);
var result = await pool.query(sql, params);
res.json({ articles: result.rows, page: page, limit: limit });
} catch (err) {
console.error('List error:', err.message);
res.status(500).json({ error: 'Internal server error' });
}
});
// Get articles by author
app.get('/api/authors/:authorId/articles', async function(req, res) {
var authorId = parseInt(req.params.authorId, 10);
if (isNaN(authorId)) {
return res.status(400).json({ error: 'Invalid author ID' });
}
try {
var result = await pool.query(
'SELECT id, title, slug, status, view_count, published_at FROM articles WHERE author_id = $1 AND status = $2 ORDER BY published_at DESC LIMIT 50',
[authorId, 'published']
);
res.json({ articles: result.rows });
} catch (err) {
console.error('Author articles error:', err.message);
res.status(500).json({ error: 'Internal server error' });
}
});
// Find articles by metadata (JSONB query)
app.get('/api/articles/featured', async function(req, res) {
try {
var result = await pool.query(
"SELECT id, title, slug, category, view_count FROM articles WHERE metadata @> '{\"featured\": true}' AND status = 'published' ORDER BY view_count DESC LIMIT 20"
);
res.json({ articles: result.rows });
} catch (err) {
console.error('Featured articles error:', err.message);
res.status(500).json({ error: 'Internal server error' });
}
});
// Get article by slug (case-insensitive)
app.get('/api/articles/:slug', async function(req, res) {
var slug = req.params.slug.toLowerCase();
try {
var result = await pool.query(
'SELECT id, title, slug, body, category, author_id, view_count, metadata, tags, published_at FROM articles WHERE LOWER(slug) = $1 AND status = $2',
[slug, 'published']
);
if (result.rows.length === 0) {
return res.status(404).json({ error: 'Article not found' });
}
// Increment view count without blocking the response
pool.query('UPDATE articles SET view_count = view_count + 1 WHERE id = $1', [result.rows[0].id]);
res.json({ article: result.rows[0] });
} catch (err) {
console.error('Get article error:', err.message);
res.status(500).json({ error: 'Internal server error' });
}
});
var PORT = process.env.PORT || 3000;
app.listen(PORT, function() {
console.log('Articles API running on port ' + PORT);
});
Common Issues and Troubleshooting
1. Index exists but PostgreSQL uses a Seq Scan
EXPLAIN SELECT * FROM articles WHERE author_id = 42;
Seq Scan on articles (cost=0.00..24841.00 rows=2500 width=890)
Filter: (author_id = 42)
Cause: PostgreSQL's query planner estimated that a Seq Scan is cheaper. This happens when:
- The table is small (< 1000 rows) -- a Seq Scan on a few pages is faster than index traversal
- The query returns a large percentage of rows (e.g., 30%+) -- sequential reads beat random I/O
- Statistics are stale -- run
ANALYZE articles; random_page_costis too high for SSD storage
Fix for SSD storage:
-- Default is 4.0 (tuned for spinning disks). Set to 1.1-1.5 for SSD.
SET random_page_cost = 1.1;
-- Or set permanently in postgresql.conf
-- random_page_cost = 1.1
After adjusting:
Index Scan using idx_articles_author on articles (cost=0.42..892.34 rows=2500 width=890)
Index Cond: (author_id = 42)
2. Composite index not used when skipping leading columns
-- Index: (author_id, status, published_at DESC)
EXPLAIN SELECT * FROM articles WHERE status = 'published' ORDER BY published_at DESC LIMIT 20;
Limit (cost=22458.91..22458.96 rows=20 width=890) (actual time=287.445..287.456 rows=20 loops=1)
-> Sort (cost=22458.91..23333.91 rows=350000 width=890) (actual time=287.443..287.450 rows=20 loops=1)
Sort Key: published_at DESC
Sort Method: top-N heapsort Memory: 36kB
-> Seq Scan on articles (cost=0.00..15841.00 rows=350000 width=890) (actual time=0.012..198.312 rows=350000 loops=1)
Filter: (status = 'published'::text)
Rows Removed by Filter: 150000
Fix: The composite index (author_id, status, published_at) cannot be used when author_id is not in the WHERE clause. Create a separate index:
CREATE INDEX idx_articles_status_published ON articles (status, published_at DESC);
3. GIN index not used for JSONB ->> operator
EXPLAIN SELECT * FROM articles WHERE metadata->>'difficulty' = 'advanced';
Seq Scan on articles (cost=0.00..26091.00 rows=2500 width=890)
Filter: ((metadata ->> 'difficulty'::text) = 'advanced'::text)
Cause: The default GIN index with jsonb_ops supports containment (@>), existence (?), and key/value pair queries. It does not support the ->> text extraction operator.
Fix option 1: Rewrite the query to use containment:
SELECT * FROM articles WHERE metadata @> '{"difficulty": "advanced"}';
Fix option 2: Create an expression index for the specific path:
CREATE INDEX idx_articles_difficulty ON articles ((metadata->>'difficulty'));
4. Covering index not producing index-only scans
EXPLAIN (ANALYZE, BUFFERS)
SELECT title, slug, view_count FROM articles
WHERE category = 'javascript' AND status = 'published'
ORDER BY published_at DESC LIMIT 20;
Index Scan using idx_articles_listing on articles (cost=0.42..52.18 rows=20 width=64) (actual time=0.038..0.091 rows=20 loops=1)
Index Cond: (category = 'javascript'::text)
Heap Fetches: 20
Buffers: shared hit=44
Cause: Heap Fetches: 20 means PostgreSQL had to visit the heap despite having a covering index. This happens when the visibility map is outdated -- dead tuples exist on those pages.
Fix:
VACUUM articles;
After vacuuming:
Index Only Scan using idx_articles_listing on articles (cost=0.42..18.23 rows=20 width=64) (actual time=0.029..0.053 rows=20 loops=1)
Index Cond: (category = 'javascript'::text)
Heap Fetches: 0
Buffers: shared hit=4
Heap Fetches dropped to 0 and buffer reads dropped from 44 to 4.
5. Query performance degrades after bulk UPDATE
After running a large batch update:
UPDATE articles SET view_count = view_count + 1 WHERE category = 'javascript';
-- Updated 83,000 rows
Subsequent queries slow down noticeably. EXPLAIN shows increased buffer reads.
Cause: The bulk update created 83,000 dead tuples. Indexes now contain pointers to both the old and new row versions. Autovacuum may not have kicked in yet.
Fix:
VACUUM (VERBOSE) articles;
-- INFO: "articles": removed 83000 dead row versions in 5841 pages
-- INFO: index "idx_articles_author_status_published": pages: 2841 rows: 583000, removed 83000
For tables with heavy update traffic, consider tuning autovacuum thresholds as shown in the maintenance section above.
Best Practices
- Always run EXPLAIN ANALYZE before and after adding an index. Intuition about query plans is wrong more often than it is right. Trust the planner output.
- Create indexes to match your query patterns, not your schema. Do not index every column. Look at your application's actual queries, especially the ones in your hot paths (listing endpoints, dashboards, search).
- Use partial indexes aggressively. If 90% of your queries filter on
WHERE status = 'published'orWHERE deleted_at IS NULL, a partial index is smaller, faster, and cheaper to maintain. - Run ANALYZE after bulk data loads. PostgreSQL's query planner relies on table statistics. After inserting or updating large batches, run
ANALYZE tablename;so the planner has accurate row counts and value distributions. - Monitor unused indexes with pg_stat_user_indexes. Check
idx_scanmonthly. Any index with zero scans after a full month of production traffic is dead weight -- drop it. - Consolidate redundant indexes. An index on
(user_id, status)makes a separate index on(user_id)redundant for equality lookups. Fewer indexes mean faster writes and less bloat. - Use INCLUDE columns instead of wider composite keys. If a column appears in SELECT but never in WHERE or ORDER BY, add it with INCLUDE rather than making it part of the index key. This keeps the B-tree narrower for searches while still enabling index-only scans.
- Set random_page_cost appropriately for your storage. On SSD (which is almost everything in 2026), set it to 1.1. The default 4.0 was designed for spinning disks and causes PostgreSQL to prefer Seq Scans when an Index Scan would be faster.
- Parameterize all queries in your Node.js application. Beyond preventing SQL injection, parameterized queries help PostgreSQL cache and reuse query plans, which improves performance for repeated queries.
- Rebuild bloated indexes with REINDEX CONCURRENTLY. Tables with heavy UPDATE or DELETE traffic accumulate dead index entries. Schedule periodic
REINDEX TABLE CONCURRENTLY tablename;during low-traffic windows.
References
- PostgreSQL Documentation: Indexes
- PostgreSQL Documentation: Index Types
- PostgreSQL Documentation: Partial Indexes
- PostgreSQL Documentation: Expression Indexes
- PostgreSQL Documentation: GIN Indexes
- PostgreSQL Documentation: EXPLAIN
- PostgreSQL Documentation: Routine Vacuuming
- PostgreSQL Documentation: Monitoring Statistics
- node-postgres (pg) documentation
- Use The Index, Luke - A Guide to Database Performance
