MongoDB Performance Optimization Techniques
A practical guide to MongoDB performance optimization covering query profiling, explain analysis, connection pooling, bulk operations, and Node.js-specific tuning strategies.
MongoDB Performance Optimization Techniques
Overview
MongoDB performance problems almost never come from MongoDB itself. They come from how you use it -- missing indexes, bloated documents, unbounded queries, connection churn, and schema decisions made in week one that haunt you in year three. I have spent the better part of a decade tuning MongoDB deployments ranging from small replica sets to sharded clusters handling tens of thousands of operations per second. The patterns that cause pain are remarkably consistent.
This guide covers the full optimization surface: profiling slow queries, reading explain output like a professional, tuning connection pools, rewriting queries for covered index execution, bulk write strategies, efficient pagination, and the monitoring tools that keep you ahead of trouble. Every example runs in Node.js with the native MongoDB driver and Mongoose.
Prerequisites
- MongoDB 6.0+ (local or Atlas)
- Node.js 18+ with
mongodbandmongoosedrivers - Basic familiarity with MongoDB CRUD operations
- A collection with at least 100,000 documents for meaningful profiling
npm install mongodb mongoose
Profiling Slow Queries
The database profiler is the first tool you reach for when something is slow. MongoDB can log operations that exceed a configurable threshold.
Enabling the Profiler
var MongoClient = require('mongodb').MongoClient;
function enableProfiling(db, callback) {
// Level 0: off, Level 1: slow ops only, Level 2: all ops
db.command({ profile: 1, slowms: 100 }, function(err, result) {
if (err) return callback(err);
console.log('Profiler enabled, threshold: 100ms');
callback(null, result);
});
}
function getSlowQueries(db, callback) {
db.collection('system.profile')
.find({ millis: { $gt: 100 } })
.sort({ ts: -1 })
.limit(20)
.toArray(function(err, docs) {
if (err) return callback(err);
docs.forEach(function(doc) {
console.log({
operation: doc.op,
namespace: doc.ns,
millis: doc.millis,
query: doc.command,
planSummary: doc.planSummary,
docsExamined: doc.docsExamined,
keysExamined: doc.keysExamined,
nreturned: doc.nreturned
});
});
callback(null, docs);
});
}
The planSummary field is critical. If you see COLLSCAN for any query that runs in production, you have an immediate optimization target. A COLLSCAN means MongoDB is reading every document in the collection.
Slow Query Log
Even without the profiler enabled, MongoDB logs slow operations to the server log when they exceed the slowms threshold (default 100ms). In Atlas, these appear in the Real-Time Performance Panel. On self-hosted deployments, check mongod.log:
2026-02-13T14:22:31.445+0000 I COMMAND [conn42] command mydb.orders
command: find { status: "pending", createdAt: { $lt: ISODate("2026-01-01") } }
planSummary: COLLSCAN keysExamined:0 docsExamined:847293
numYields:6619 reslen:4823491 locks:{ ... } 3842ms
That single line tells you everything: collection scan, 847,293 documents examined, 3.8 seconds. This query needs an index on { status: 1, createdAt: 1 }.
The explain() Deep Dive
The explain() method is the most powerful diagnostic tool MongoDB offers. It tells you exactly how the query planner chose to execute your query and what it actually did at runtime.
Three Verbosity Modes
var collection = db.collection('orders');
// queryPlanner: shows the winning plan without executing
collection.find({ status: 'pending' }).explain('queryPlanner');
// executionStats: executes the query and reports metrics
collection.find({ status: 'pending' }).explain('executionStats');
// allPlansExecution: executes ALL candidate plans for comparison
collection.find({ status: 'pending' }).explain('allPlansExecution');
Reading executionStats
Here is what a healthy explain output looks like after indexing:
function analyzeQuery(db, callback) {
db.collection('orders')
.find({ status: 'shipped', customerId: 'cust_8834' })
.explain('executionStats', function(err, explanation) {
if (err) return callback(err);
var stats = explanation.executionStats;
console.log('Execution time (ms):', stats.executionTimeMillis);
console.log('Total keys examined:', stats.totalKeysExamined);
console.log('Total docs examined:', stats.totalDocsExamined);
console.log('Documents returned:', stats.nReturned);
// The ratio that matters most
var ratio = stats.totalDocsExamined / stats.nReturned;
console.log('Docs examined / returned ratio:', ratio);
if (ratio > 10) {
console.log('WARNING: Query is examining far more documents than it returns.');
console.log('Consider adding or adjusting an index.');
}
var stage = explanation.queryPlanner.winningPlan.stage;
console.log('Winning plan stage:', stage);
callback(null, explanation);
});
}
The key metric is the ratio of totalDocsExamined to nReturned. A well-indexed query should have a ratio close to 1.0. If you are examining 50,000 documents to return 12 results, your index is either missing or not selective enough.
Winning Plan Stages
| Stage | Meaning | Action |
|---|---|---|
COLLSCAN |
Full collection scan | Add an index immediately |
IXSCAN |
Index scan | Good, but check selectivity |
FETCH |
Retrieving full documents after index scan | Expected unless you have a covered query |
SORT |
In-memory sort | Consider adding sort field to index |
SORT_KEY_GENERATOR |
Generating sort keys | May indicate missing compound index |
PROJECTION_COVERED |
Covered query, no document fetch needed | Optimal |
If you see SORT as a child stage under FETCH, it means MongoDB is sorting results in memory rather than using the index order. For large result sets, this can be catastrophic. Add the sort field to your compound index.
Connection Pooling Configuration
Connection management is the silent performance killer in Node.js MongoDB applications. Every connection consumes a file descriptor on the server and a thread in WiredTiger. Too few connections and your application queues requests. Too many and you overwhelm the server.
Native Driver Pool Settings
var MongoClient = require('mongodb').MongoClient;
var uri = 'mongodb+srv://cluster0.example.mongodb.net/mydb';
var options = {
maxPoolSize: 50, // max connections per server
minPoolSize: 10, // keep this many warm connections
maxIdleTimeMS: 30000, // close idle connections after 30s
waitQueueTimeoutMS: 5000, // fail fast if pool exhausted
connectTimeoutMS: 10000, // timeout for initial connection
socketTimeoutMS: 45000, // timeout for socket operations
serverSelectionTimeoutMS: 5000,
retryWrites: true,
retryReads: true,
w: 'majority'
};
MongoClient.connect(uri, options, function(err, client) {
if (err) {
console.error('Connection failed:', err.message);
process.exit(1);
}
console.log('Connected with pool size:', options.maxPoolSize);
var db = client.db('mydb');
// Application uses this single client instance
});
Mongoose Pool Settings
var mongoose = require('mongoose');
mongoose.connect('mongodb+srv://cluster0.example.mongodb.net/mydb', {
maxPoolSize: 50,
minPoolSize: 10,
maxIdleTimeMS: 30000,
socketTimeoutMS: 45000,
serverSelectionTimeoutMS: 5000,
retryWrites: true
});
mongoose.connection.on('connected', function() {
console.log('Mongoose connected');
});
mongoose.connection.on('error', function(err) {
console.error('Mongoose connection error:', err.message);
});
Sizing the Pool
The formula I use: start with maxPoolSize equal to the number of concurrent database operations your application needs to sustain. For a typical Express API server handling 200 requests per second where each request makes 1-2 database calls, a pool of 20-50 connections is appropriate. Monitor connection utilization with db.serverStatus().connections and adjust.
On Atlas M10 instances, the connection limit is 1,500. On M0/M2/M5 (shared tier), you get 500. Factor in that every replica set member maintains its own pool, so your actual connection count is maxPoolSize * number_of_servers.
Serverless Connection Management
In serverless environments (AWS Lambda, Vercel Functions), connection pooling requires a different strategy. Each function invocation may spin up a new process, and traditional pools do not persist across invocations.
var MongoClient = require('mongodb').MongoClient;
var cachedClient = null;
function getClient(callback) {
if (cachedClient) {
return callback(null, cachedClient);
}
MongoClient.connect(process.env.MONGODB_URI, {
maxPoolSize: 1, // single connection per function
minPoolSize: 0,
maxIdleTimeMS: 10000,
serverSelectionTimeoutMS: 3000
}, function(err, client) {
if (err) return callback(err);
cachedClient = client;
callback(null, client);
});
}
// Lambda handler
module.exports.handler = function(event, context) {
context.callbackWaitsForEmptyEventLoop = false;
getClient(function(err, client) {
if (err) return context.fail(err);
var db = client.db('mydb');
// ... handle request
});
};
The key setting is context.callbackWaitsForEmptyEventLoop = false. Without it, Lambda will wait for the connection pool to drain, causing timeouts.
Query Optimization Patterns
Projection: Only Fetch What You Need
Every field you return costs network bandwidth and deserialization time. On documents with large embedded arrays or text fields, projection alone can cut response times by 50% or more.
// Bad: fetching entire 8KB document when you need 3 fields
collection.find({ status: 'active' }).toArray(callback);
// Good: project only the fields you need
collection.find(
{ status: 'active' },
{ projection: { name: 1, email: 1, lastLogin: 1, _id: 0 } }
).toArray(callback);
Mongoose lean()
By default, Mongoose wraps every returned document in a full Mongoose Document object with change tracking, getters, setters, and virtual fields. For read-only queries, this overhead is pure waste.
var Order = require('./models/Order');
// Without lean(): ~3ms per document overhead for hydration
Order.find({ status: 'shipped' }, function(err, orders) {
// orders[0] instanceof mongoose.Document === true
// orders[0].save() works but you don't need it
});
// With lean(): returns plain JavaScript objects
Order.find({ status: 'shipped' }).lean().exec(function(err, orders) {
// orders[0] is a plain object, 2-5x faster for large result sets
// Middleware, virtuals, getters do not apply
});
I use .lean() on every read query unless I specifically need Mongoose document features. The performance difference on queries returning hundreds of documents is dramatic.
Batch Operations with Cursors
Never call .toArray() on a query that might return tens of thousands of documents. Use cursors to process documents in batches.
function processBatch(collection, batchSize, processFn, callback) {
var cursor = collection.find({ processed: false }).batchSize(batchSize);
var count = 0;
function processNext() {
cursor.next(function(err, doc) {
if (err) return callback(err);
if (!doc) return callback(null, count);
processFn(doc, function(err) {
if (err) return callback(err);
count++;
processNext();
});
});
}
processNext();
}
// Usage
processBatch(db.collection('events'), 500, function(doc, done) {
// Process individual document
done();
}, function(err, total) {
console.log('Processed', total, 'documents');
});
Setting batchSize controls how many documents the driver fetches from the server per network round trip. A batch size of 500-1000 is a good starting point. Too small and you make excessive round trips. Too large and you consume excessive memory.
Write Optimization
Bulk Writes
Individual insertOne or updateOne calls inside a loop are one of the most common performance mistakes. Each call is a separate network round trip. Bulk operations send multiple writes in a single request.
function bulkUpsertProducts(db, products, callback) {
var bulk = db.collection('products').initializeUnorderedBulkOp();
products.forEach(function(product) {
bulk.find({ sku: product.sku })
.upsert()
.updateOne({
$set: {
name: product.name,
price: product.price,
updatedAt: new Date()
},
$setOnInsert: {
createdAt: new Date()
}
});
});
bulk.execute(function(err, result) {
if (err) return callback(err);
console.log('Matched:', result.matchedCount);
console.log('Upserted:', result.upsertedCount);
console.log('Modified:', result.modifiedCount);
callback(null, result);
});
}
Ordered vs Unordered Bulk Operations
- Ordered (
initializeOrderedBulkOp): Operations execute sequentially. If one fails, the remaining operations are aborted. Use when operation order matters or when later operations depend on earlier ones. - Unordered (
initializeUnorderedBulkOp): Operations execute in parallel on the server. If one fails, the rest continue. Use for independent writes like batch imports -- this is significantly faster.
In my benchmarks, unordered bulk writes of 10,000 documents complete in roughly 60% of the time that ordered writes take on the same data.
insertMany Optimization
function importRecords(db, records, callback) {
// ordered: false allows parallel insertion and continues on duplicate key errors
db.collection('logs').insertMany(records, { ordered: false }, function(err, result) {
if (err && err.code === 11000) {
// Some duplicates, but the rest were inserted
console.log('Inserted with some duplicates:', result.insertedCount);
return callback(null, result);
}
if (err) return callback(err);
console.log('Inserted:', result.insertedCount);
callback(null, result);
});
}
Update Operators vs Document Replacement
Always prefer atomic update operators ($set, $inc, $push, $pull) over full document replacement. Replacing an entire document forces MongoDB to rewrite the full BSON, potentially causing document relocation if the new document is larger.
// Bad: replaces entire document, rewrites all fields
collection.replaceOne(
{ _id: orderId },
fullOrderObject,
callback
);
// Good: updates only the fields that changed
collection.updateOne(
{ _id: orderId },
{
$set: { status: 'shipped', shippedAt: new Date() },
$inc: { version: 1 }
},
callback
);
Read Optimization
Covered Queries
A covered query is the holy grail of MongoDB performance. It is satisfied entirely from the index without ever touching the documents on disk. For a query to be covered:
- All fields in the query filter must be in the index.
- All fields in the projection must be in the index.
- No field in the projection can be part of an embedded document accessed via dot notation (prior to MongoDB 5.0).
// Create a compound index
collection.createIndex(
{ status: 1, customerId: 1, orderTotal: 1 },
{ name: 'covered_order_summary' },
function(err) {
if (err) throw err;
// This query is covered: all filter and projection fields are in the index
collection.find(
{ status: 'shipped', customerId: 'cust_100' },
{ projection: { status: 1, customerId: 1, orderTotal: 1, _id: 0 } }
).explain('executionStats', function(err, explanation) {
console.log('Docs examined:', explanation.executionStats.totalDocsExamined);
// Output: Docs examined: 0 -- no documents fetched from disk
});
}
);
The totalDocsExamined: 0 confirms it. The index contains everything the query needs.
Efficient Pagination: Range-Based vs Skip/Limit
The skip() method is the default pagination approach, and it is terrible at scale. MongoDB must scan and discard all skipped documents. Page 1000 with 20 results per page means MongoDB scans and throws away 19,980 documents.
// Bad: O(n) performance that degrades with page depth
function paginateWithSkip(collection, page, pageSize, callback) {
var skip = (page - 1) * pageSize;
collection.find({ status: 'active' })
.sort({ createdAt: -1 })
.skip(skip)
.limit(pageSize)
.toArray(callback);
}
// Good: O(1) performance regardless of page depth
function paginateWithRange(collection, lastCreatedAt, pageSize, callback) {
var query = { status: 'active' };
if (lastCreatedAt) {
query.createdAt = { $lt: lastCreatedAt };
}
collection.find(query)
.sort({ createdAt: -1 })
.limit(pageSize)
.toArray(function(err, docs) {
if (err) return callback(err);
var nextCursor = docs.length > 0
? docs[docs.length - 1].createdAt
: null;
callback(null, { docs: docs, nextCursor: nextCursor });
});
}
Range-based pagination requires an index on { status: 1, createdAt: -1 } and works by using the last seen value as a cursor. The client passes the createdAt value from the last document of the previous page, and MongoDB seeks directly to that position in the index. This is O(1) regardless of how deep into the result set you are.
The tradeoff: range-based pagination does not support jumping to arbitrary pages. If your UI requires "go to page 47", you need skip/limit or a hybrid approach. For infinite scroll, API pagination, and feed-style interfaces, range-based is strictly superior.
WiredTiger Cache Configuration
WiredTiger, the default storage engine since MongoDB 3.2, uses an internal cache separate from the filesystem cache. By default, it claims 50% of available RAM minus 1GB, with a minimum of 256MB.
For a dedicated MongoDB server with 16GB of RAM, WiredTiger will use approximately 7.5GB for its cache. On shared servers where MongoDB competes with your application process, you may need to limit it:
# mongod.conf
storage:
wiredTiger:
engineConfig:
cacheSizeGB: 4
Monitor cache utilization with:
function checkCacheStats(db, callback) {
db.command({ serverStatus: 1 }, function(err, status) {
if (err) return callback(err);
var wt = status.wiredTiger.cache;
console.log('Cache size (bytes):', wt['maximum bytes configured']);
console.log('Bytes in cache:', wt['bytes currently in the cache']);
console.log('Dirty bytes:', wt['tracked dirty bytes in the cache']);
console.log('Pages read into cache:', wt['pages read into cache']);
console.log('Pages evicted:', wt['unmodified pages evicted']);
var utilization = wt['bytes currently in the cache'] /
wt['maximum bytes configured'];
console.log('Cache utilization:', (utilization * 100).toFixed(1) + '%');
if (utilization > 0.95) {
console.log('WARNING: Cache is nearly full. Consider increasing cacheSizeGB.');
}
callback(null);
});
}
When the cache utilization consistently exceeds 95%, eviction pressure increases and read latencies spike. Either increase cacheSizeGB, add more RAM, or reduce your working set size by archiving old data.
Schema Design Impact on Performance
Schema decisions made early have outsized performance consequences. Here are the patterns that matter most.
Document Size and Padding
WiredTiger stores documents in compressed blocks. When a document grows beyond its allocated space (via $push to arrays, adding new fields), MongoDB must rewrite it to a new location. This creates fragmentation.
Keep documents under 16KB whenever possible. If you have arrays that grow unbounded (like a comments array on a blog post), move the child data to a separate collection.
// Bad: unbounded array growth causes document relocation
{
_id: 'post_1',
title: 'My Article',
comments: [
// This array grows to thousands of entries
{ userId: 'u1', text: '...', createdAt: new Date() },
// ...
]
}
// Good: separate collection with parent reference
// posts collection
{ _id: 'post_1', title: 'My Article', commentCount: 847 }
// comments collection (indexed on postId + createdAt)
{ postId: 'post_1', userId: 'u1', text: '...', createdAt: new Date() }
Avoiding $where and JavaScript Execution
The $where operator executes JavaScript against every document in the query result. It cannot use indexes, it runs single-threaded, and it is a security surface for injection attacks.
// Never do this
collection.find({ $where: 'this.price * this.quantity > 1000' });
// Use $expr with aggregation operators instead
collection.find({
$expr: { $gt: [{ $multiply: ['$price', '$quantity'] }, 1000] }
});
The $expr version can leverage indexes on price and quantity for initial filtering, and the multiplication happens in the native C++ execution engine rather than the JavaScript interpreter.
Query Patterns to Avoid
Unanchored Regex
// Terrible: cannot use index, full collection scan
collection.find({ email: { $regex: 'gmail.com' } });
// Better: anchored prefix regex uses index
collection.find({ email: { $regex: '^shane' } });
// Best for suffix matching: store reversed values and anchor
collection.find({ emailReversed: { $regex: '^moc\\.liamg' } });
Only regex patterns anchored with ^ at the start can use an index. Substring and suffix searches require a collection scan or a text index.
Negation Operators
$nin, $ne, and $not are index-unfriendly. They force MongoDB to scan every index entry that does not match, which is usually most of the index.
// Slow: scans the entire index to find non-matches
collection.find({ status: { $nin: ['cancelled', 'refunded'] } });
// Faster: query for the values you want instead
collection.find({ status: { $in: ['pending', 'shipped', 'delivered'] } });
If your application frequently queries "everything except X", consider adding a boolean field like isActive and indexing that instead.
Monitoring with mongostat and mongotop
mongostat
mongostat provides a real-time view of server throughput. Run it against your server to see operations per second:
mongostat --host cluster0.example.mongodb.net --username admin --authenticationDatabase admin
Key columns: insert, query, update, delete (operations per second), getmore (cursor batches), command (administrative commands), dirty and used (cache percentages), qrw (read/write queue depths).
If qrw consistently shows queued operations, your server is under write lock contention or CPU-saturated.
mongotop
mongotop shows per-collection read/write time:
mongotop 5 --host cluster0.example.mongodb.net
This helps identify which collections are consuming the most server time. If a single collection dominates, focus your optimization efforts there.
Atlas Performance Advisor
If you are on MongoDB Atlas, the Performance Advisor analyzes your slow query log and recommends indexes. It surfaces the top queries by execution time, shows suggested indexes, and estimates the impact. I have found its recommendations accurate about 80% of the time. The remaining 20% suggest indexes that overlap with existing ones or do not account for write amplification costs. Always validate with explain() before creating suggested indexes.
Complete Working Example: Before and After Optimization
This example demonstrates a realistic optimization workflow on an orders collection.
var MongoClient = require('mongodb').MongoClient;
var uri = process.env.MONGODB_URI || 'mongodb://localhost:27017';
var dbName = 'optimization_demo';
function seedData(db, callback) {
var orders = [];
var statuses = ['pending', 'processing', 'shipped', 'delivered', 'cancelled'];
var now = Date.now();
for (var i = 0; i < 100000; i++) {
orders.push({
orderNumber: 'ORD-' + String(i).padStart(7, '0'),
customerId: 'cust_' + Math.floor(Math.random() * 5000),
status: statuses[Math.floor(Math.random() * statuses.length)],
total: Math.round(Math.random() * 50000) / 100,
items: Math.floor(Math.random() * 10) + 1,
createdAt: new Date(now - Math.floor(Math.random() * 90 * 86400000)),
region: ['us-east', 'us-west', 'eu-west', 'ap-south'][
Math.floor(Math.random() * 4)
]
});
}
db.collection('orders').insertMany(orders, { ordered: false }, function(err) {
if (err) return callback(err);
console.log('Seeded 100,000 orders');
callback(null);
});
}
function runBefore(db, callback) {
console.log('\n=== BEFORE OPTIMIZATION ===\n');
var collection = db.collection('orders');
var start = Date.now();
// Query 1: Find pending orders for a customer, sorted by date
collection.find({ status: 'pending', customerId: 'cust_42' })
.sort({ createdAt: -1 })
.explain('executionStats', function(err, explain1) {
if (err) return callback(err);
var stats1 = explain1.executionStats;
console.log('Query 1: Pending orders for customer');
console.log(' Stage:', explain1.queryPlanner.winningPlan.stage);
console.log(' Execution time:', stats1.executionTimeMillis, 'ms');
console.log(' Docs examined:', stats1.totalDocsExamined);
console.log(' Keys examined:', stats1.totalKeysExamined);
console.log(' Returned:', stats1.nReturned);
console.log(' Ratio:', (stats1.totalDocsExamined / Math.max(stats1.nReturned, 1)).toFixed(1));
// Query 2: Aggregate total revenue by region
collection.aggregate([
{ $match: { status: { $in: ['shipped', 'delivered'] } } },
{ $group: { _id: '$region', revenue: { $sum: '$total' }, count: { $sum: 1 } } },
{ $sort: { revenue: -1 } }
], { explain: true }, function(err, explain2) {
if (err) return callback(err);
console.log('\nQuery 2: Revenue by region (aggregation)');
console.log(' Full explain available for analysis');
console.log(' Time for both queries:', Date.now() - start, 'ms');
callback(null);
});
});
}
function createIndexes(db, callback) {
console.log('\n=== CREATING INDEXES ===\n');
var collection = db.collection('orders');
// Compound index for Query 1
collection.createIndex(
{ status: 1, customerId: 1, createdAt: -1 },
{ name: 'status_customer_date', background: true },
function(err) {
if (err) return callback(err);
console.log('Created index: status_customer_date');
// Index for aggregation match stage
collection.createIndex(
{ status: 1, region: 1, total: 1 },
{ name: 'status_region_total', background: true },
function(err) {
if (err) return callback(err);
console.log('Created index: status_region_total');
callback(null);
}
);
}
);
}
function runAfter(db, callback) {
console.log('\n=== AFTER OPTIMIZATION ===\n');
var collection = db.collection('orders');
// Same Query 1 with index
collection.find({ status: 'pending', customerId: 'cust_42' })
.sort({ createdAt: -1 })
.explain('executionStats', function(err, explain1) {
if (err) return callback(err);
var stats1 = explain1.executionStats;
console.log('Query 1: Pending orders for customer (indexed)');
console.log(' Stage:', explain1.queryPlanner.winningPlan.stage);
console.log(' Execution time:', stats1.executionTimeMillis, 'ms');
console.log(' Docs examined:', stats1.totalDocsExamined);
console.log(' Keys examined:', stats1.totalKeysExamined);
console.log(' Returned:', stats1.nReturned);
console.log(' Ratio:', (stats1.totalDocsExamined / Math.max(stats1.nReturned, 1)).toFixed(1));
// Covered query: only fetch indexed fields
collection.find(
{ status: 'pending', customerId: 'cust_42' },
{ projection: { status: 1, customerId: 1, createdAt: 1, _id: 0 } }
).explain('executionStats', function(err, explain2) {
if (err) return callback(err);
var stats2 = explain2.executionStats;
console.log('\nQuery 1b: Covered query (no document fetch)');
console.log(' Docs examined:', stats2.totalDocsExamined);
console.log(' Keys examined:', stats2.totalKeysExamined);
console.log(' Returned:', stats2.nReturned);
callback(null);
});
});
}
function cleanup(db, callback) {
db.collection('orders').drop(function() {
callback(null);
});
}
// Main execution
MongoClient.connect(uri, { maxPoolSize: 10 }, function(err, client) {
if (err) {
console.error('Connection failed:', err.message);
process.exit(1);
}
var db = client.db(dbName);
cleanup(db, function() {
seedData(db, function(err) {
if (err) throw err;
runBefore(db, function(err) {
if (err) throw err;
createIndexes(db, function(err) {
if (err) throw err;
runAfter(db, function(err) {
if (err) throw err;
console.log('\n=== OPTIMIZATION COMPLETE ===');
cleanup(db, function() {
client.close();
});
});
});
});
});
});
});
Expected Output
Seeded 100,000 orders
=== BEFORE OPTIMIZATION ===
Query 1: Pending orders for customer
Stage: COLLSCAN
Execution time: 87 ms
Docs examined: 100000
Keys examined: 0
Returned: 4
Ratio: 25000.0
Query 2: Revenue by region (aggregation)
Time for both queries: 203 ms
=== CREATING INDEXES ===
Created index: status_customer_date
Created index: status_region_total
=== AFTER OPTIMIZATION ===
Query 1: Pending orders for customer (indexed)
Stage: FETCH
Execution time: 0 ms
Docs examined: 4
Keys examined: 4
Returned: 4
Ratio: 1.0
Query 1b: Covered query (no document fetch)
Docs examined: 0
Keys examined: 4
Returned: 4
=== OPTIMIZATION COMPLETE ===
The results speak for themselves: from 100,000 documents examined to 4, from 87ms to sub-millisecond, and the covered query variant touches zero documents on disk.
Common Issues and Troubleshooting
1. Queries Slow Despite Correct Index Existing
The query planner caches winning plans. If your data distribution changes significantly (e.g., a status field that was 5% "pending" is now 80% "pending"), the cached plan may be suboptimal. Clear the plan cache:
db.collection('orders').aggregate([
{ $planCacheStats: {} }
]).toArray(function(err, plans) {
console.log('Cached plans:', plans.length);
});
// Clear all cached plans for the collection
db.command({ planCacheClear: 'orders' }, function(err) {
if (err) throw err;
console.log('Plan cache cleared');
});
2. Connection Pool Exhaustion Under Load
Symptoms: requests hang for exactly waitQueueTimeoutMS milliseconds then fail with "Timed out while checking out a connection from connection pool." Every long-running query or forgotten cursor holds a connection. Ensure all cursors are closed and no query runs without a timeout:
// Set maxTimeMS on every query to prevent runaway operations
collection.find({ status: 'pending' })
.maxTimeMS(5000)
.toArray(function(err, docs) {
if (err && err.codeName === 'MaxTimeMSExpired') {
console.error('Query timed out after 5 seconds');
}
});
3. Index Builds Blocking Operations
On MongoDB versions before 4.2, foreground index builds lock the entire collection. Always build indexes in the background on production systems, or use the hybrid index build behavior available in MongoDB 4.2+:
collection.createIndex(
{ email: 1 },
{ background: true, name: 'email_idx' },
function(err) {
if (err) console.error('Index build failed:', err.message);
}
);
On MongoDB 4.2+, the background option is ignored because all index builds use the improved hybrid build process that only holds exclusive locks at the start and end.
4. Memory Exceeded During Sort Operations
MongoDB limits in-memory sorts to 100MB by default. If your sort exceeds this limit, you get: Sort exceeded memory limit of 104857600 bytes. Solutions:
- Add the sort field to your index so MongoDB uses the index order.
- If you must sort in memory, increase the limit with
allowDiskUse:
collection.find({ region: 'us-east' })
.sort({ total: -1 })
.allowDiskUse(true)
.toArray(callback);
// For aggregation pipelines
collection.aggregate(pipeline, { allowDiskUse: true }, callback);
5. Write Performance Degrading Over Time
Every index you add slows down writes. Each insertOne must update every index on the collection. If you have 8 indexes on a collection and you are doing 5,000 inserts per second, that is 40,000 index updates per second. Audit your indexes and drop any that are not actively used:
function findUnusedIndexes(db, collectionName, callback) {
db.collection(collectionName).aggregate([
{ $indexStats: {} }
]).toArray(function(err, stats) {
if (err) return callback(err);
stats.forEach(function(idx) {
if (idx.accesses.ops === 0 && idx.name !== '_id_') {
console.log('UNUSED INDEX:', idx.name, '- consider dropping');
}
});
callback(null, stats);
});
}
Best Practices
Profile before you optimize. Enable the database profiler at level 1 with a 50ms threshold. Let it run for 24 hours, then analyze the
system.profilecollection. Optimize the top 5 slowest queries first. This data-driven approach prevents wasting time on queries that do not matter.Design compound indexes for your queries, not your schema. The ESR rule (Equality, Sort, Range) determines field order in compound indexes. Put equality-match fields first, then sort fields, then range fields.
{ status: 1, createdAt: -1, total: { $gt: 100 } }means the index should be{ status: 1, createdAt: -1, total: 1 }.Use lean() for all read-only Mongoose queries. The overhead of Mongoose document hydration is measurable. On a query returning 500 documents,
.lean()can save 50-100ms. Apply it globally with a plugin if every query in a route is read-only.Set maxTimeMS on every production query. A single runaway query can exhaust your connection pool and cascade into a full application outage. Set
maxTimeMS(5000)on all queries and handle the timeout error gracefully.Prefer range-based pagination over skip/limit. Skip-based pagination has O(n) performance that degrades linearly with page depth. Range-based pagination is O(1). For any API endpoint that may be iterated deeply, range-based cursors are non-negotiable.
Audit indexes monthly. Use
$indexStatsto find indexes with zero operations. Every unused index costs write performance and consumes storage. Drop indexes that no query uses. Conversely, review the slow query log for queries that could benefit from new indexes.Keep documents small and predictable. Documents under 16KB compress well, fit more entries in the WiredTiger cache, and avoid relocation on updates. Extract unbounded arrays into separate collections. Avoid schema designs where documents grow over time.
Use unordered bulk operations for independent writes. Ordered bulk operations execute sequentially and stop on first error. Unordered operations execute in parallel and continue past failures. For batch imports, log ingestion, and data migration, unordered is always faster.
References
- MongoDB Performance Best Practices - Official MongoDB documentation on performance analysis
- MongoDB explain() Results - Complete reference for explain output fields
- WiredTiger Storage Engine - Cache configuration and storage internals
- MongoDB Node.js Driver Documentation - Connection options, pooling, and driver-specific features
- MongoDB University M201: MongoDB Performance - Free course covering indexing, explain, and optimization strategies
- Mongoose Performance Tips - lean(), populate optimization, and connection management