PostgreSQL on DigitalOcean: Setup and Optimization
A practical guide to running PostgreSQL on DigitalOcean covering self-hosted and managed options, Node.js integration, query optimization, indexing, and production hardening.
PostgreSQL on DigitalOcean: Setup and Optimization
PostgreSQL is the database behind most Node.js production applications on DigitalOcean. Whether you run it on a Droplet alongside your application or use DigitalOcean's Managed Database service, the fundamentals of setup, optimization, and Node.js integration are the same.
This guide covers both approaches — self-hosted PostgreSQL on a Droplet and Managed PostgreSQL — along with the query optimization, indexing, and configuration tuning that makes either option perform well in production.
Self-Hosted vs Managed PostgreSQL
Self-Hosted on a Droplet
Install and manage PostgreSQL yourself on a Droplet.
Advantages:
- No additional cost — PostgreSQL runs on the same Droplet as your application
- Full configuration control — tune any PostgreSQL parameter
- Lower latency — no network hop between application and database
Disadvantages:
- You manage backups, updates, and security patches
- No automatic failover
- Database competes for CPU and memory with your application
Best for: Side projects, low-traffic applications, tight budgets, development environments.
Managed PostgreSQL
DigitalOcean runs PostgreSQL for you.
Advantages:
- Automated daily backups with point-in-time recovery
- Automatic failover with standby nodes
- Security patches applied automatically
- Monitoring dashboard included
- Read replicas available
Disadvantages:
- Starts at $15/month on top of Droplet costs
- Limited configuration options (many parameters are locked)
- Network latency between Droplet and database (typically < 1ms in same region)
Best for: Production applications, teams without a DBA, applications that need high availability.
Self-Hosted PostgreSQL Setup
Installing on Ubuntu
# Add the PostgreSQL repository for the latest version
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt update
sudo apt install postgresql-15 -y
# Verify installation
sudo -u postgres psql -c "SELECT version();"
Creating a Database and User
sudo -u postgres psql
-- Create the application database
CREATE DATABASE myapp;
-- Create the application user with a strong password
CREATE USER myapp_user WITH PASSWORD 'strong-password-here';
-- Grant permissions
GRANT ALL PRIVILEGES ON DATABASE myapp TO myapp_user;
\c myapp
GRANT ALL ON SCHEMA public TO myapp_user;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO myapp_user;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO myapp_user;
-- Set default privileges for future tables
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL PRIVILEGES ON TABLES TO myapp_user;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL PRIVILEGES ON SEQUENCES TO myapp_user;
Configuring Remote Access
By default, PostgreSQL only accepts local connections. If your application runs on a different server:
# Edit pg_hba.conf to allow connections from your VPC
sudo nano /etc/postgresql/15/main/pg_hba.conf
Add a line for your VPC subnet:
# Allow connections from VPC
host myapp myapp_user 10.10.10.0/24 scram-sha-256
# Edit postgresql.conf to listen on all interfaces
sudo nano /etc/postgresql/15/main/postgresql.conf
listen_addresses = '*'
sudo systemctl restart postgresql
For connections over the public internet, always use SSL. For same-Droplet connections, use the Unix socket (fastest, no network overhead).
Performance Configuration
Edit /etc/postgresql/15/main/postgresql.conf:
# Memory — adjust based on your Droplet size
# For a 4GB Droplet running PostgreSQL and Node.js
shared_buffers = 1GB # 25% of RAM
effective_cache_size = 2GB # 50% of RAM
work_mem = 16MB # Per-operation sort memory
maintenance_work_mem = 256MB # For VACUUM, CREATE INDEX
# WAL (Write-Ahead Logging)
wal_buffers = 16MB
checkpoint_completion_target = 0.9
max_wal_size = 2GB
# Query Planning
random_page_cost = 1.1 # SSD storage (default 4.0 is for HDD)
effective_io_concurrency = 200 # SSD can handle concurrent I/O
# Connections
max_connections = 100 # Adjust based on application needs
# Logging
log_min_duration_statement = 500 # Log queries taking > 500ms
log_statement = 'none' # Don't log all queries in production
sudo systemctl restart postgresql
Setting Up Backups
# Create backup script
cat > /var/www/myapp/scripts/backup-postgres.sh << 'SCRIPT'
#!/bin/bash
BACKUP_DIR="/var/backups/postgres"
DATE=$(date +%Y%m%d_%H%M%S)
mkdir -p "$BACKUP_DIR"
pg_dump -U myapp_user -d myapp | gzip > "${BACKUP_DIR}/myapp_${DATE}.sql.gz"
# Keep last 7 days
find "$BACKUP_DIR" -name "myapp_*.sql.gz" -mtime +7 -delete
SCRIPT
chmod +x /var/www/myapp/scripts/backup-postgres.sh
# Schedule daily at 2 AM
(crontab -l; echo "0 2 * * * /var/www/myapp/scripts/backup-postgres.sh") | crontab -
Managed PostgreSQL Setup
Creating the Database
doctl databases create my-postgres \
--engine pg \
--version 15 \
--size db-s-1vcpu-1gb \
--region nyc3 \
--num-nodes 1
Getting the Connection String
doctl databases connection my-postgres --format Host,Port,User,Password,Database,URI
The connection string looks like:
postgresql://doadmin:[email protected]:25060/defaultdb?sslmode=require
Creating an Application Database
doctl databases db create my-postgres myapp
Creating an Application User
doctl databases user create my-postgres myapp_user
Configuring Trusted Sources
Restrict access to your Droplet only:
doctl databases firewalls append my-postgres \
--rule droplet:YOUR_DROPLET_ID
Connecting from Node.js
Basic Connection
npm install pg
// db.js
var { Pool } = require("pg");
var pool = new Pool({
connectionString: process.env.DATABASE_URL,
ssl: process.env.NODE_ENV === "production" ? {
rejectUnauthorized: true,
ca: process.env.DATABASE_CA_CERT || undefined
} : false,
max: 20,
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 5000
});
pool.on("error", function(err) {
console.error("Unexpected database pool error:", err);
});
pool.on("connect", function() {
console.log("New database connection established");
});
module.exports = {
query: function(text, params) {
return pool.query(text, params);
},
getClient: function() {
return pool.connect();
},
end: function() {
return pool.end();
},
pool: pool
};
Transaction Support
// db.js — add transaction helper
function transaction(callback) {
var client;
return pool.connect()
.then(function(c) {
client = c;
return client.query("BEGIN");
})
.then(function() {
return callback(client);
})
.then(function(result) {
return client.query("COMMIT").then(function() {
return result;
});
})
.catch(function(err) {
return client.query("ROLLBACK").then(function() {
throw err;
});
})
.finally(function() {
if (client) client.release();
});
}
module.exports.transaction = transaction;
// Usage
var db = require("./db");
function transferFunds(fromId, toId, amount) {
return db.transaction(function(client) {
return client.query(
"UPDATE accounts SET balance = balance - $1 WHERE id = $2 RETURNING balance",
[amount, fromId]
).then(function(result) {
if (result.rows[0].balance < 0) {
throw new Error("Insufficient funds");
}
return client.query(
"UPDATE accounts SET balance = balance + $1 WHERE id = $2",
[amount, toId]
);
});
});
}
Parameterized Queries
Always use parameterized queries to prevent SQL injection:
// DANGEROUS — SQL injection vulnerability
app.get("/api/users", function(req, res) {
var name = req.query.name;
db.query("SELECT * FROM users WHERE name = '" + name + "'")
.then(function(result) { res.json(result.rows); });
});
// SAFE — parameterized query
app.get("/api/users", function(req, res) {
db.query("SELECT * FROM users WHERE name = $1", [req.query.name])
.then(function(result) { res.json(result.rows); });
});
Schema Design
Creating Tables
// scripts/migrate.js
var db = require("../db");
var SCHEMA = [
'CREATE TABLE IF NOT EXISTS users (' +
' id SERIAL PRIMARY KEY,' +
' email VARCHAR(255) UNIQUE NOT NULL,' +
' name VARCHAR(255) NOT NULL,' +
' password_hash VARCHAR(255) NOT NULL,' +
' created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),' +
' updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()' +
')',
'CREATE TABLE IF NOT EXISTS articles (' +
' id SERIAL PRIMARY KEY,' +
' title VARCHAR(500) NOT NULL,' +
' slug VARCHAR(500) UNIQUE NOT NULL,' +
' content TEXT NOT NULL,' +
' author_id INTEGER REFERENCES users(id),' +
' status VARCHAR(50) DEFAULT \'draft\',' +
' published_at TIMESTAMP WITH TIME ZONE,' +
' created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),' +
' updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()' +
')',
'CREATE INDEX IF NOT EXISTS idx_articles_slug ON articles (slug)',
'CREATE INDEX IF NOT EXISTS idx_articles_status ON articles (status)',
'CREATE INDEX IF NOT EXISTS idx_articles_author ON articles (author_id)',
'CREATE INDEX IF NOT EXISTS idx_articles_published ON articles (published_at DESC) WHERE status = \'published\''
];
function migrate() {
var promise = Promise.resolve();
SCHEMA.forEach(function(sql) {
promise = promise.then(function() {
console.log("Running:", sql.substring(0, 60) + "...");
return db.query(sql);
});
});
return promise
.then(function() { console.log("Migration complete"); })
.catch(function(err) { console.error("Migration failed:", err); })
.finally(function() { return db.end(); });
}
migrate();
Choosing the Right Data Types
-- Use the most specific type available
CREATE TABLE products (
id SERIAL PRIMARY KEY, -- Auto-incrementing integer
uuid UUID DEFAULT gen_random_uuid(), -- Globally unique identifier
name VARCHAR(255) NOT NULL, -- Variable length string with limit
description TEXT, -- Unlimited length text
price NUMERIC(10, 2) NOT NULL, -- Exact decimal (for money)
weight REAL, -- Floating point (for measurements)
tags TEXT[], -- Array of strings
metadata JSONB, -- Structured JSON data
is_active BOOLEAN DEFAULT true, -- Boolean flag
created_at TIMESTAMPTZ DEFAULT NOW() -- Timestamp with time zone
);
Key rules:
- Use
TIMESTAMPTZ(with time zone) instead ofTIMESTAMP— avoids timezone confusion - Use
NUMERICfor money, notFLOAT— floating point introduces rounding errors - Use
JSONB(binary JSON) instead ofJSON— supports indexing and is faster to query - Use
TEXT[]for simple tag lists instead of a separate join table
Query Optimization
EXPLAIN ANALYZE
The most important tool for query optimization:
function explainQuery(text, params) {
return db.query("EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) " + text, params)
.then(function(result) {
var plan = result.rows[0]["QUERY PLAN"][0];
console.log("Execution time:", plan["Execution Time"] + "ms");
console.log("Planning time:", plan["Planning Time"] + "ms");
console.log(JSON.stringify(plan.Plan, null, 2));
return plan;
});
}
// Usage
explainQuery(
"SELECT * FROM articles WHERE status = $1 ORDER BY published_at DESC LIMIT 20",
["published"]
);
What to look for:
- Seq Scan on large tables — needs an index
- Nested Loop with many rows — consider a different join strategy
- Sort without index — add an index that covers the ORDER BY
- High cost estimates — the query planner expects slow execution
Index Strategies
-- Single column index for WHERE clauses
CREATE INDEX idx_users_email ON users (email);
-- Composite index for multi-column queries
CREATE INDEX idx_articles_status_date ON articles (status, published_at DESC);
-- Partial index — only indexes matching rows
CREATE INDEX idx_articles_published ON articles (published_at DESC)
WHERE status = 'published';
-- GIN index for JSONB columns
CREATE INDEX idx_products_metadata ON products USING GIN (metadata);
-- GIN index for array columns
CREATE INDEX idx_products_tags ON products USING GIN (tags);
-- Expression index for case-insensitive search
CREATE INDEX idx_users_email_lower ON users (LOWER(email));
Common Query Patterns
Pagination with cursor:
// Offset-based pagination (slow for large offsets)
function getArticlesOffset(page, perPage) {
var offset = (page - 1) * perPage;
return db.query(
"SELECT * FROM articles WHERE status = 'published' ORDER BY published_at DESC LIMIT $1 OFFSET $2",
[perPage, offset]
);
}
// Cursor-based pagination (fast at any depth)
function getArticlesCursor(cursor, perPage) {
if (cursor) {
return db.query(
"SELECT * FROM articles WHERE status = 'published' AND published_at < $1 ORDER BY published_at DESC LIMIT $2",
[cursor, perPage]
);
}
return db.query(
"SELECT * FROM articles WHERE status = 'published' ORDER BY published_at DESC LIMIT $1",
[perPage]
);
}
Cursor-based pagination uses the index directly. Offset-based pagination must scan and skip rows, getting slower as the offset increases.
Full-text search:
-- Add a tsvector column
ALTER TABLE articles ADD COLUMN search_vector tsvector;
-- Populate it
UPDATE articles SET search_vector =
to_tsvector('english', coalesce(title, '') || ' ' || coalesce(content, ''));
-- Create a GIN index
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);
-- Keep it updated with a trigger
CREATE OR REPLACE FUNCTION articles_search_trigger()
RETURNS trigger AS $$
BEGIN
NEW.search_vector := to_tsvector('english', coalesce(NEW.title, '') || ' ' || coalesce(NEW.content, ''));
RETURN NEW;
END
$$ LANGUAGE plpgsql;
CREATE TRIGGER update_search_vector
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION articles_search_trigger();
function searchArticles(query) {
return db.query(
"SELECT id, title, ts_rank(search_vector, plainto_tsquery('english', $1)) as rank " +
"FROM articles " +
"WHERE search_vector @@ plainto_tsquery('english', $1) " +
"AND status = 'published' " +
"ORDER BY rank DESC " +
"LIMIT 20",
[query]
);
}
Batch inserts:
function insertArticles(articles) {
var values = [];
var params = [];
var index = 1;
articles.forEach(function(article) {
values.push("($" + index++ + ", $" + index++ + ", $" + index++ + ")");
params.push(article.title, article.slug, article.content);
});
return db.query(
"INSERT INTO articles (title, slug, content) VALUES " + values.join(", ") + " RETURNING id",
params
);
}
Monitoring PostgreSQL Performance
Key Metrics to Track
// scripts/db-stats.js
var db = require("../db");
function getDatabaseStats() {
var queries = [
// Database size
"SELECT pg_size_pretty(pg_database_size(current_database())) as db_size",
// Table sizes
"SELECT tablename, pg_size_pretty(pg_total_relation_size(tablename::text)) as size " +
"FROM pg_tables WHERE schemaname = 'public' ORDER BY pg_total_relation_size(tablename::text) DESC LIMIT 10",
// Active connections
"SELECT count(*) as active_connections FROM pg_stat_activity WHERE state = 'active'",
// Slow queries (if pg_stat_statements is enabled)
"SELECT query, calls, mean_exec_time, total_exec_time " +
"FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 5",
// Cache hit ratio
"SELECT sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as cache_hit_ratio " +
"FROM pg_statio_all_tables WHERE heap_blks_hit + heap_blks_read > 0",
// Index usage
"SELECT schemaname, tablename, " +
" idx_scan as index_scans, seq_scan as sequential_scans, " +
" CASE WHEN idx_scan + seq_scan > 0 THEN " +
" round(100.0 * idx_scan / (idx_scan + seq_scan), 1) " +
" ELSE 0 END as index_usage_percent " +
"FROM pg_stat_user_tables ORDER BY seq_scan DESC LIMIT 10"
];
var promise = Promise.resolve();
queries.forEach(function(sql) {
promise = promise.then(function() {
return db.query(sql)
.then(function(result) {
console.log("\n" + sql.substring(0, 60) + "...");
console.table(result.rows);
})
.catch(function(err) {
console.log("\n" + sql.substring(0, 60) + "... [skipped: " + err.message + "]");
});
});
});
return promise.finally(function() { db.end(); });
}
getDatabaseStats();
Cache Hit Ratio
The cache hit ratio should be above 99% for most applications. If it drops below 95%, your dataset is larger than available memory:
- For self-hosted: increase
shared_buffersandeffective_cache_size - For managed: upgrade to a larger database plan with more RAM
Index Usage
If a table shows high sequential scans and low index scans, queries on that table are not using indexes. Check your WHERE clauses and add appropriate indexes.
Common Issues and Troubleshooting
Connection refused on managed database
The Droplet is not in the trusted sources list:
Fix: Add your Droplet to the database firewall with doctl databases firewalls append. Verify the connection string uses the correct port (25060 for direct, 25061 for pooled).
Slow queries after data load
PostgreSQL has not analyzed the new data:
Fix: Run ANALYZE on all tables after bulk data imports. The query planner needs statistics to choose efficient execution plans.
"too many connections" error
Application opens more connections than PostgreSQL allows:
Fix: Use connection pooling. For managed databases, enable PgBouncer connection pools. Set the max pool size in your Node.js Pool configuration to stay within limits.
Database disk full
WAL files or table bloat consuming disk space:
Fix: For self-hosted, run VACUUM FULL on bloated tables and check WAL retention settings. For managed databases, resize the storage or contact DigitalOcean support.
Query runs fast locally but slow in production
Different data volumes produce different query plans:
Fix: Test with production-like data volumes. Run EXPLAIN ANALYZE in production to see the actual query plan. Indexes that work for 1,000 rows may not work for 1,000,000 rows.
Best Practices
- Use connection pooling. Every application connecting to PostgreSQL should use a connection pool. The
pglibrary'sPoolclass handles this automatically. - Always use parameterized queries. Never concatenate user input into SQL strings. Parameterized queries prevent SQL injection and improve plan caching.
- Index columns used in WHERE, JOIN, and ORDER BY. Without indexes, PostgreSQL scans entire tables. Check
pg_stat_user_tablesfor tables with high sequential scan counts. - Use TIMESTAMPTZ for all timestamps. Store timestamps with time zone information. This prevents timezone-related bugs when your application and database are in different zones.
- Run ANALYZE after bulk data changes. The query planner relies on statistics. After importing data, creating tables, or deleting large amounts of data, run
ANALYZE. - Monitor the cache hit ratio. Keep it above 99%. Below 95% means your database needs more memory.
- Use partial indexes for filtered queries. If you frequently query
WHERE status = 'published', a partial index on that condition is smaller and faster than a full index. - Back up daily and test restores monthly. A backup strategy is incomplete without verified restore procedures. Schedule automated restore tests.