Mongodb

MongoDB Performance Optimization Techniques

A practical guide to MongoDB performance optimization covering query profiling, explain analysis, connection pooling, bulk operations, and Node.js-specific tuning strategies.

MongoDB Performance Optimization Techniques

Overview

MongoDB performance problems almost never come from MongoDB itself. They come from how you use it -- missing indexes, bloated documents, unbounded queries, connection churn, and schema decisions made in week one that haunt you in year three. I have spent the better part of a decade tuning MongoDB deployments ranging from small replica sets to sharded clusters handling tens of thousands of operations per second. The patterns that cause pain are remarkably consistent.

This guide covers the full optimization surface: profiling slow queries, reading explain output like a professional, tuning connection pools, rewriting queries for covered index execution, bulk write strategies, efficient pagination, and the monitoring tools that keep you ahead of trouble. Every example runs in Node.js with the native MongoDB driver and Mongoose.

Prerequisites

  • MongoDB 6.0+ (local or Atlas)
  • Node.js 18+ with mongodb and mongoose drivers
  • Basic familiarity with MongoDB CRUD operations
  • A collection with at least 100,000 documents for meaningful profiling
npm install mongodb mongoose

Profiling Slow Queries

The database profiler is the first tool you reach for when something is slow. MongoDB can log operations that exceed a configurable threshold.

Enabling the Profiler

var MongoClient = require('mongodb').MongoClient;

function enableProfiling(db, callback) {
  // Level 0: off, Level 1: slow ops only, Level 2: all ops
  db.command({ profile: 1, slowms: 100 }, function(err, result) {
    if (err) return callback(err);
    console.log('Profiler enabled, threshold: 100ms');
    callback(null, result);
  });
}

function getSlowQueries(db, callback) {
  db.collection('system.profile')
    .find({ millis: { $gt: 100 } })
    .sort({ ts: -1 })
    .limit(20)
    .toArray(function(err, docs) {
      if (err) return callback(err);
      docs.forEach(function(doc) {
        console.log({
          operation: doc.op,
          namespace: doc.ns,
          millis: doc.millis,
          query: doc.command,
          planSummary: doc.planSummary,
          docsExamined: doc.docsExamined,
          keysExamined: doc.keysExamined,
          nreturned: doc.nreturned
        });
      });
      callback(null, docs);
    });
}

The planSummary field is critical. If you see COLLSCAN for any query that runs in production, you have an immediate optimization target. A COLLSCAN means MongoDB is reading every document in the collection.

Slow Query Log

Even without the profiler enabled, MongoDB logs slow operations to the server log when they exceed the slowms threshold (default 100ms). In Atlas, these appear in the Real-Time Performance Panel. On self-hosted deployments, check mongod.log:

2026-02-13T14:22:31.445+0000 I COMMAND [conn42] command mydb.orders
  command: find { status: "pending", createdAt: { $lt: ISODate("2026-01-01") } }
  planSummary: COLLSCAN keysExamined:0 docsExamined:847293
  numYields:6619 reslen:4823491 locks:{ ... } 3842ms

That single line tells you everything: collection scan, 847,293 documents examined, 3.8 seconds. This query needs an index on { status: 1, createdAt: 1 }.


The explain() Deep Dive

The explain() method is the most powerful diagnostic tool MongoDB offers. It tells you exactly how the query planner chose to execute your query and what it actually did at runtime.

Three Verbosity Modes

var collection = db.collection('orders');

// queryPlanner: shows the winning plan without executing
collection.find({ status: 'pending' }).explain('queryPlanner');

// executionStats: executes the query and reports metrics
collection.find({ status: 'pending' }).explain('executionStats');

// allPlansExecution: executes ALL candidate plans for comparison
collection.find({ status: 'pending' }).explain('allPlansExecution');

Reading executionStats

Here is what a healthy explain output looks like after indexing:

function analyzeQuery(db, callback) {
  db.collection('orders')
    .find({ status: 'shipped', customerId: 'cust_8834' })
    .explain('executionStats', function(err, explanation) {
      if (err) return callback(err);

      var stats = explanation.executionStats;
      console.log('Execution time (ms):', stats.executionTimeMillis);
      console.log('Total keys examined:', stats.totalKeysExamined);
      console.log('Total docs examined:', stats.totalDocsExamined);
      console.log('Documents returned:', stats.nReturned);

      // The ratio that matters most
      var ratio = stats.totalDocsExamined / stats.nReturned;
      console.log('Docs examined / returned ratio:', ratio);

      if (ratio > 10) {
        console.log('WARNING: Query is examining far more documents than it returns.');
        console.log('Consider adding or adjusting an index.');
      }

      var stage = explanation.queryPlanner.winningPlan.stage;
      console.log('Winning plan stage:', stage);

      callback(null, explanation);
    });
}

The key metric is the ratio of totalDocsExamined to nReturned. A well-indexed query should have a ratio close to 1.0. If you are examining 50,000 documents to return 12 results, your index is either missing or not selective enough.

Winning Plan Stages

Stage Meaning Action
COLLSCAN Full collection scan Add an index immediately
IXSCAN Index scan Good, but check selectivity
FETCH Retrieving full documents after index scan Expected unless you have a covered query
SORT In-memory sort Consider adding sort field to index
SORT_KEY_GENERATOR Generating sort keys May indicate missing compound index
PROJECTION_COVERED Covered query, no document fetch needed Optimal

If you see SORT as a child stage under FETCH, it means MongoDB is sorting results in memory rather than using the index order. For large result sets, this can be catastrophic. Add the sort field to your compound index.


Connection Pooling Configuration

Connection management is the silent performance killer in Node.js MongoDB applications. Every connection consumes a file descriptor on the server and a thread in WiredTiger. Too few connections and your application queues requests. Too many and you overwhelm the server.

Native Driver Pool Settings

var MongoClient = require('mongodb').MongoClient;

var uri = 'mongodb+srv://cluster0.example.mongodb.net/mydb';

var options = {
  maxPoolSize: 50,           // max connections per server
  minPoolSize: 10,           // keep this many warm connections
  maxIdleTimeMS: 30000,      // close idle connections after 30s
  waitQueueTimeoutMS: 5000,  // fail fast if pool exhausted
  connectTimeoutMS: 10000,   // timeout for initial connection
  socketTimeoutMS: 45000,    // timeout for socket operations
  serverSelectionTimeoutMS: 5000,
  retryWrites: true,
  retryReads: true,
  w: 'majority'
};

MongoClient.connect(uri, options, function(err, client) {
  if (err) {
    console.error('Connection failed:', err.message);
    process.exit(1);
  }
  console.log('Connected with pool size:', options.maxPoolSize);
  var db = client.db('mydb');
  // Application uses this single client instance
});

Mongoose Pool Settings

var mongoose = require('mongoose');

mongoose.connect('mongodb+srv://cluster0.example.mongodb.net/mydb', {
  maxPoolSize: 50,
  minPoolSize: 10,
  maxIdleTimeMS: 30000,
  socketTimeoutMS: 45000,
  serverSelectionTimeoutMS: 5000,
  retryWrites: true
});

mongoose.connection.on('connected', function() {
  console.log('Mongoose connected');
});

mongoose.connection.on('error', function(err) {
  console.error('Mongoose connection error:', err.message);
});

Sizing the Pool

The formula I use: start with maxPoolSize equal to the number of concurrent database operations your application needs to sustain. For a typical Express API server handling 200 requests per second where each request makes 1-2 database calls, a pool of 20-50 connections is appropriate. Monitor connection utilization with db.serverStatus().connections and adjust.

On Atlas M10 instances, the connection limit is 1,500. On M0/M2/M5 (shared tier), you get 500. Factor in that every replica set member maintains its own pool, so your actual connection count is maxPoolSize * number_of_servers.

Serverless Connection Management

In serverless environments (AWS Lambda, Vercel Functions), connection pooling requires a different strategy. Each function invocation may spin up a new process, and traditional pools do not persist across invocations.

var MongoClient = require('mongodb').MongoClient;

var cachedClient = null;

function getClient(callback) {
  if (cachedClient) {
    return callback(null, cachedClient);
  }

  MongoClient.connect(process.env.MONGODB_URI, {
    maxPoolSize: 1,            // single connection per function
    minPoolSize: 0,
    maxIdleTimeMS: 10000,
    serverSelectionTimeoutMS: 3000
  }, function(err, client) {
    if (err) return callback(err);
    cachedClient = client;
    callback(null, client);
  });
}

// Lambda handler
module.exports.handler = function(event, context) {
  context.callbackWaitsForEmptyEventLoop = false;
  getClient(function(err, client) {
    if (err) return context.fail(err);
    var db = client.db('mydb');
    // ... handle request
  });
};

The key setting is context.callbackWaitsForEmptyEventLoop = false. Without it, Lambda will wait for the connection pool to drain, causing timeouts.


Query Optimization Patterns

Projection: Only Fetch What You Need

Every field you return costs network bandwidth and deserialization time. On documents with large embedded arrays or text fields, projection alone can cut response times by 50% or more.

// Bad: fetching entire 8KB document when you need 3 fields
collection.find({ status: 'active' }).toArray(callback);

// Good: project only the fields you need
collection.find(
  { status: 'active' },
  { projection: { name: 1, email: 1, lastLogin: 1, _id: 0 } }
).toArray(callback);

Mongoose lean()

By default, Mongoose wraps every returned document in a full Mongoose Document object with change tracking, getters, setters, and virtual fields. For read-only queries, this overhead is pure waste.

var Order = require('./models/Order');

// Without lean(): ~3ms per document overhead for hydration
Order.find({ status: 'shipped' }, function(err, orders) {
  // orders[0] instanceof mongoose.Document === true
  // orders[0].save() works but you don't need it
});

// With lean(): returns plain JavaScript objects
Order.find({ status: 'shipped' }).lean().exec(function(err, orders) {
  // orders[0] is a plain object, 2-5x faster for large result sets
  // Middleware, virtuals, getters do not apply
});

I use .lean() on every read query unless I specifically need Mongoose document features. The performance difference on queries returning hundreds of documents is dramatic.

Batch Operations with Cursors

Never call .toArray() on a query that might return tens of thousands of documents. Use cursors to process documents in batches.

function processBatch(collection, batchSize, processFn, callback) {
  var cursor = collection.find({ processed: false }).batchSize(batchSize);
  var count = 0;

  function processNext() {
    cursor.next(function(err, doc) {
      if (err) return callback(err);
      if (!doc) return callback(null, count);

      processFn(doc, function(err) {
        if (err) return callback(err);
        count++;
        processNext();
      });
    });
  }

  processNext();
}

// Usage
processBatch(db.collection('events'), 500, function(doc, done) {
  // Process individual document
  done();
}, function(err, total) {
  console.log('Processed', total, 'documents');
});

Setting batchSize controls how many documents the driver fetches from the server per network round trip. A batch size of 500-1000 is a good starting point. Too small and you make excessive round trips. Too large and you consume excessive memory.


Write Optimization

Bulk Writes

Individual insertOne or updateOne calls inside a loop are one of the most common performance mistakes. Each call is a separate network round trip. Bulk operations send multiple writes in a single request.

function bulkUpsertProducts(db, products, callback) {
  var bulk = db.collection('products').initializeUnorderedBulkOp();

  products.forEach(function(product) {
    bulk.find({ sku: product.sku })
      .upsert()
      .updateOne({
        $set: {
          name: product.name,
          price: product.price,
          updatedAt: new Date()
        },
        $setOnInsert: {
          createdAt: new Date()
        }
      });
  });

  bulk.execute(function(err, result) {
    if (err) return callback(err);
    console.log('Matched:', result.matchedCount);
    console.log('Upserted:', result.upsertedCount);
    console.log('Modified:', result.modifiedCount);
    callback(null, result);
  });
}

Ordered vs Unordered Bulk Operations

  • Ordered (initializeOrderedBulkOp): Operations execute sequentially. If one fails, the remaining operations are aborted. Use when operation order matters or when later operations depend on earlier ones.
  • Unordered (initializeUnorderedBulkOp): Operations execute in parallel on the server. If one fails, the rest continue. Use for independent writes like batch imports -- this is significantly faster.

In my benchmarks, unordered bulk writes of 10,000 documents complete in roughly 60% of the time that ordered writes take on the same data.

insertMany Optimization

function importRecords(db, records, callback) {
  // ordered: false allows parallel insertion and continues on duplicate key errors
  db.collection('logs').insertMany(records, { ordered: false }, function(err, result) {
    if (err && err.code === 11000) {
      // Some duplicates, but the rest were inserted
      console.log('Inserted with some duplicates:', result.insertedCount);
      return callback(null, result);
    }
    if (err) return callback(err);
    console.log('Inserted:', result.insertedCount);
    callback(null, result);
  });
}

Update Operators vs Document Replacement

Always prefer atomic update operators ($set, $inc, $push, $pull) over full document replacement. Replacing an entire document forces MongoDB to rewrite the full BSON, potentially causing document relocation if the new document is larger.

// Bad: replaces entire document, rewrites all fields
collection.replaceOne(
  { _id: orderId },
  fullOrderObject,
  callback
);

// Good: updates only the fields that changed
collection.updateOne(
  { _id: orderId },
  {
    $set: { status: 'shipped', shippedAt: new Date() },
    $inc: { version: 1 }
  },
  callback
);

Read Optimization

Covered Queries

A covered query is the holy grail of MongoDB performance. It is satisfied entirely from the index without ever touching the documents on disk. For a query to be covered:

  1. All fields in the query filter must be in the index.
  2. All fields in the projection must be in the index.
  3. No field in the projection can be part of an embedded document accessed via dot notation (prior to MongoDB 5.0).
// Create a compound index
collection.createIndex(
  { status: 1, customerId: 1, orderTotal: 1 },
  { name: 'covered_order_summary' },
  function(err) {
    if (err) throw err;

    // This query is covered: all filter and projection fields are in the index
    collection.find(
      { status: 'shipped', customerId: 'cust_100' },
      { projection: { status: 1, customerId: 1, orderTotal: 1, _id: 0 } }
    ).explain('executionStats', function(err, explanation) {
      console.log('Docs examined:', explanation.executionStats.totalDocsExamined);
      // Output: Docs examined: 0  -- no documents fetched from disk
    });
  }
);

The totalDocsExamined: 0 confirms it. The index contains everything the query needs.

Efficient Pagination: Range-Based vs Skip/Limit

The skip() method is the default pagination approach, and it is terrible at scale. MongoDB must scan and discard all skipped documents. Page 1000 with 20 results per page means MongoDB scans and throws away 19,980 documents.

// Bad: O(n) performance that degrades with page depth
function paginateWithSkip(collection, page, pageSize, callback) {
  var skip = (page - 1) * pageSize;
  collection.find({ status: 'active' })
    .sort({ createdAt: -1 })
    .skip(skip)
    .limit(pageSize)
    .toArray(callback);
}

// Good: O(1) performance regardless of page depth
function paginateWithRange(collection, lastCreatedAt, pageSize, callback) {
  var query = { status: 'active' };
  if (lastCreatedAt) {
    query.createdAt = { $lt: lastCreatedAt };
  }

  collection.find(query)
    .sort({ createdAt: -1 })
    .limit(pageSize)
    .toArray(function(err, docs) {
      if (err) return callback(err);
      var nextCursor = docs.length > 0
        ? docs[docs.length - 1].createdAt
        : null;
      callback(null, { docs: docs, nextCursor: nextCursor });
    });
}

Range-based pagination requires an index on { status: 1, createdAt: -1 } and works by using the last seen value as a cursor. The client passes the createdAt value from the last document of the previous page, and MongoDB seeks directly to that position in the index. This is O(1) regardless of how deep into the result set you are.

The tradeoff: range-based pagination does not support jumping to arbitrary pages. If your UI requires "go to page 47", you need skip/limit or a hybrid approach. For infinite scroll, API pagination, and feed-style interfaces, range-based is strictly superior.


WiredTiger Cache Configuration

WiredTiger, the default storage engine since MongoDB 3.2, uses an internal cache separate from the filesystem cache. By default, it claims 50% of available RAM minus 1GB, with a minimum of 256MB.

For a dedicated MongoDB server with 16GB of RAM, WiredTiger will use approximately 7.5GB for its cache. On shared servers where MongoDB competes with your application process, you may need to limit it:

# mongod.conf
storage:
  wiredTiger:
    engineConfig:
      cacheSizeGB: 4

Monitor cache utilization with:

function checkCacheStats(db, callback) {
  db.command({ serverStatus: 1 }, function(err, status) {
    if (err) return callback(err);
    var wt = status.wiredTiger.cache;
    console.log('Cache size (bytes):', wt['maximum bytes configured']);
    console.log('Bytes in cache:', wt['bytes currently in the cache']);
    console.log('Dirty bytes:', wt['tracked dirty bytes in the cache']);
    console.log('Pages read into cache:', wt['pages read into cache']);
    console.log('Pages evicted:', wt['unmodified pages evicted']);

    var utilization = wt['bytes currently in the cache'] /
      wt['maximum bytes configured'];
    console.log('Cache utilization:', (utilization * 100).toFixed(1) + '%');

    if (utilization > 0.95) {
      console.log('WARNING: Cache is nearly full. Consider increasing cacheSizeGB.');
    }
    callback(null);
  });
}

When the cache utilization consistently exceeds 95%, eviction pressure increases and read latencies spike. Either increase cacheSizeGB, add more RAM, or reduce your working set size by archiving old data.


Schema Design Impact on Performance

Schema decisions made early have outsized performance consequences. Here are the patterns that matter most.

Document Size and Padding

WiredTiger stores documents in compressed blocks. When a document grows beyond its allocated space (via $push to arrays, adding new fields), MongoDB must rewrite it to a new location. This creates fragmentation.

Keep documents under 16KB whenever possible. If you have arrays that grow unbounded (like a comments array on a blog post), move the child data to a separate collection.

// Bad: unbounded array growth causes document relocation
{
  _id: 'post_1',
  title: 'My Article',
  comments: [
    // This array grows to thousands of entries
    { userId: 'u1', text: '...', createdAt: new Date() },
    // ...
  ]
}

// Good: separate collection with parent reference
// posts collection
{ _id: 'post_1', title: 'My Article', commentCount: 847 }

// comments collection (indexed on postId + createdAt)
{ postId: 'post_1', userId: 'u1', text: '...', createdAt: new Date() }

Avoiding $where and JavaScript Execution

The $where operator executes JavaScript against every document in the query result. It cannot use indexes, it runs single-threaded, and it is a security surface for injection attacks.

// Never do this
collection.find({ $where: 'this.price * this.quantity > 1000' });

// Use $expr with aggregation operators instead
collection.find({
  $expr: { $gt: [{ $multiply: ['$price', '$quantity'] }, 1000] }
});

The $expr version can leverage indexes on price and quantity for initial filtering, and the multiplication happens in the native C++ execution engine rather than the JavaScript interpreter.


Query Patterns to Avoid

Unanchored Regex

// Terrible: cannot use index, full collection scan
collection.find({ email: { $regex: 'gmail.com' } });

// Better: anchored prefix regex uses index
collection.find({ email: { $regex: '^shane' } });

// Best for suffix matching: store reversed values and anchor
collection.find({ emailReversed: { $regex: '^moc\\.liamg' } });

Only regex patterns anchored with ^ at the start can use an index. Substring and suffix searches require a collection scan or a text index.

Negation Operators

$nin, $ne, and $not are index-unfriendly. They force MongoDB to scan every index entry that does not match, which is usually most of the index.

// Slow: scans the entire index to find non-matches
collection.find({ status: { $nin: ['cancelled', 'refunded'] } });

// Faster: query for the values you want instead
collection.find({ status: { $in: ['pending', 'shipped', 'delivered'] } });

If your application frequently queries "everything except X", consider adding a boolean field like isActive and indexing that instead.


Monitoring with mongostat and mongotop

mongostat

mongostat provides a real-time view of server throughput. Run it against your server to see operations per second:

mongostat --host cluster0.example.mongodb.net --username admin --authenticationDatabase admin

Key columns: insert, query, update, delete (operations per second), getmore (cursor batches), command (administrative commands), dirty and used (cache percentages), qrw (read/write queue depths).

If qrw consistently shows queued operations, your server is under write lock contention or CPU-saturated.

mongotop

mongotop shows per-collection read/write time:

mongotop 5 --host cluster0.example.mongodb.net

This helps identify which collections are consuming the most server time. If a single collection dominates, focus your optimization efforts there.

Atlas Performance Advisor

If you are on MongoDB Atlas, the Performance Advisor analyzes your slow query log and recommends indexes. It surfaces the top queries by execution time, shows suggested indexes, and estimates the impact. I have found its recommendations accurate about 80% of the time. The remaining 20% suggest indexes that overlap with existing ones or do not account for write amplification costs. Always validate with explain() before creating suggested indexes.


Complete Working Example: Before and After Optimization

This example demonstrates a realistic optimization workflow on an orders collection.

var MongoClient = require('mongodb').MongoClient;

var uri = process.env.MONGODB_URI || 'mongodb://localhost:27017';
var dbName = 'optimization_demo';

function seedData(db, callback) {
  var orders = [];
  var statuses = ['pending', 'processing', 'shipped', 'delivered', 'cancelled'];
  var now = Date.now();

  for (var i = 0; i < 100000; i++) {
    orders.push({
      orderNumber: 'ORD-' + String(i).padStart(7, '0'),
      customerId: 'cust_' + Math.floor(Math.random() * 5000),
      status: statuses[Math.floor(Math.random() * statuses.length)],
      total: Math.round(Math.random() * 50000) / 100,
      items: Math.floor(Math.random() * 10) + 1,
      createdAt: new Date(now - Math.floor(Math.random() * 90 * 86400000)),
      region: ['us-east', 'us-west', 'eu-west', 'ap-south'][
        Math.floor(Math.random() * 4)
      ]
    });
  }

  db.collection('orders').insertMany(orders, { ordered: false }, function(err) {
    if (err) return callback(err);
    console.log('Seeded 100,000 orders');
    callback(null);
  });
}

function runBefore(db, callback) {
  console.log('\n=== BEFORE OPTIMIZATION ===\n');
  var collection = db.collection('orders');
  var start = Date.now();

  // Query 1: Find pending orders for a customer, sorted by date
  collection.find({ status: 'pending', customerId: 'cust_42' })
    .sort({ createdAt: -1 })
    .explain('executionStats', function(err, explain1) {
      if (err) return callback(err);

      var stats1 = explain1.executionStats;
      console.log('Query 1: Pending orders for customer');
      console.log('  Stage:', explain1.queryPlanner.winningPlan.stage);
      console.log('  Execution time:', stats1.executionTimeMillis, 'ms');
      console.log('  Docs examined:', stats1.totalDocsExamined);
      console.log('  Keys examined:', stats1.totalKeysExamined);
      console.log('  Returned:', stats1.nReturned);
      console.log('  Ratio:', (stats1.totalDocsExamined / Math.max(stats1.nReturned, 1)).toFixed(1));

      // Query 2: Aggregate total revenue by region
      collection.aggregate([
        { $match: { status: { $in: ['shipped', 'delivered'] } } },
        { $group: { _id: '$region', revenue: { $sum: '$total' }, count: { $sum: 1 } } },
        { $sort: { revenue: -1 } }
      ], { explain: true }, function(err, explain2) {
        if (err) return callback(err);

        console.log('\nQuery 2: Revenue by region (aggregation)');
        console.log('  Full explain available for analysis');
        console.log('  Time for both queries:', Date.now() - start, 'ms');

        callback(null);
      });
    });
}

function createIndexes(db, callback) {
  console.log('\n=== CREATING INDEXES ===\n');
  var collection = db.collection('orders');

  // Compound index for Query 1
  collection.createIndex(
    { status: 1, customerId: 1, createdAt: -1 },
    { name: 'status_customer_date', background: true },
    function(err) {
      if (err) return callback(err);
      console.log('Created index: status_customer_date');

      // Index for aggregation match stage
      collection.createIndex(
        { status: 1, region: 1, total: 1 },
        { name: 'status_region_total', background: true },
        function(err) {
          if (err) return callback(err);
          console.log('Created index: status_region_total');
          callback(null);
        }
      );
    }
  );
}

function runAfter(db, callback) {
  console.log('\n=== AFTER OPTIMIZATION ===\n');
  var collection = db.collection('orders');

  // Same Query 1 with index
  collection.find({ status: 'pending', customerId: 'cust_42' })
    .sort({ createdAt: -1 })
    .explain('executionStats', function(err, explain1) {
      if (err) return callback(err);

      var stats1 = explain1.executionStats;
      console.log('Query 1: Pending orders for customer (indexed)');
      console.log('  Stage:', explain1.queryPlanner.winningPlan.stage);
      console.log('  Execution time:', stats1.executionTimeMillis, 'ms');
      console.log('  Docs examined:', stats1.totalDocsExamined);
      console.log('  Keys examined:', stats1.totalKeysExamined);
      console.log('  Returned:', stats1.nReturned);
      console.log('  Ratio:', (stats1.totalDocsExamined / Math.max(stats1.nReturned, 1)).toFixed(1));

      // Covered query: only fetch indexed fields
      collection.find(
        { status: 'pending', customerId: 'cust_42' },
        { projection: { status: 1, customerId: 1, createdAt: 1, _id: 0 } }
      ).explain('executionStats', function(err, explain2) {
        if (err) return callback(err);

        var stats2 = explain2.executionStats;
        console.log('\nQuery 1b: Covered query (no document fetch)');
        console.log('  Docs examined:', stats2.totalDocsExamined);
        console.log('  Keys examined:', stats2.totalKeysExamined);
        console.log('  Returned:', stats2.nReturned);

        callback(null);
      });
    });
}

function cleanup(db, callback) {
  db.collection('orders').drop(function() {
    callback(null);
  });
}

// Main execution
MongoClient.connect(uri, { maxPoolSize: 10 }, function(err, client) {
  if (err) {
    console.error('Connection failed:', err.message);
    process.exit(1);
  }

  var db = client.db(dbName);

  cleanup(db, function() {
    seedData(db, function(err) {
      if (err) throw err;
      runBefore(db, function(err) {
        if (err) throw err;
        createIndexes(db, function(err) {
          if (err) throw err;
          runAfter(db, function(err) {
            if (err) throw err;
            console.log('\n=== OPTIMIZATION COMPLETE ===');
            cleanup(db, function() {
              client.close();
            });
          });
        });
      });
    });
  });
});

Expected Output

Seeded 100,000 orders

=== BEFORE OPTIMIZATION ===

Query 1: Pending orders for customer
  Stage: COLLSCAN
  Execution time: 87 ms
  Docs examined: 100000
  Keys examined: 0
  Returned: 4
  Ratio: 25000.0

Query 2: Revenue by region (aggregation)
  Time for both queries: 203 ms

=== CREATING INDEXES ===

Created index: status_customer_date
Created index: status_region_total

=== AFTER OPTIMIZATION ===

Query 1: Pending orders for customer (indexed)
  Stage: FETCH
  Execution time: 0 ms
  Docs examined: 4
  Keys examined: 4
  Returned: 4
  Ratio: 1.0

Query 1b: Covered query (no document fetch)
  Docs examined: 0
  Keys examined: 4
  Returned: 4

=== OPTIMIZATION COMPLETE ===

The results speak for themselves: from 100,000 documents examined to 4, from 87ms to sub-millisecond, and the covered query variant touches zero documents on disk.


Common Issues and Troubleshooting

1. Queries Slow Despite Correct Index Existing

The query planner caches winning plans. If your data distribution changes significantly (e.g., a status field that was 5% "pending" is now 80% "pending"), the cached plan may be suboptimal. Clear the plan cache:

db.collection('orders').aggregate([
  { $planCacheStats: {} }
]).toArray(function(err, plans) {
  console.log('Cached plans:', plans.length);
});

// Clear all cached plans for the collection
db.command({ planCacheClear: 'orders' }, function(err) {
  if (err) throw err;
  console.log('Plan cache cleared');
});

2. Connection Pool Exhaustion Under Load

Symptoms: requests hang for exactly waitQueueTimeoutMS milliseconds then fail with "Timed out while checking out a connection from connection pool." Every long-running query or forgotten cursor holds a connection. Ensure all cursors are closed and no query runs without a timeout:

// Set maxTimeMS on every query to prevent runaway operations
collection.find({ status: 'pending' })
  .maxTimeMS(5000)
  .toArray(function(err, docs) {
    if (err && err.codeName === 'MaxTimeMSExpired') {
      console.error('Query timed out after 5 seconds');
    }
  });

3. Index Builds Blocking Operations

On MongoDB versions before 4.2, foreground index builds lock the entire collection. Always build indexes in the background on production systems, or use the hybrid index build behavior available in MongoDB 4.2+:

collection.createIndex(
  { email: 1 },
  { background: true, name: 'email_idx' },
  function(err) {
    if (err) console.error('Index build failed:', err.message);
  }
);

On MongoDB 4.2+, the background option is ignored because all index builds use the improved hybrid build process that only holds exclusive locks at the start and end.

4. Memory Exceeded During Sort Operations

MongoDB limits in-memory sorts to 100MB by default. If your sort exceeds this limit, you get: Sort exceeded memory limit of 104857600 bytes. Solutions:

  • Add the sort field to your index so MongoDB uses the index order.
  • If you must sort in memory, increase the limit with allowDiskUse:
collection.find({ region: 'us-east' })
  .sort({ total: -1 })
  .allowDiskUse(true)
  .toArray(callback);

// For aggregation pipelines
collection.aggregate(pipeline, { allowDiskUse: true }, callback);

5. Write Performance Degrading Over Time

Every index you add slows down writes. Each insertOne must update every index on the collection. If you have 8 indexes on a collection and you are doing 5,000 inserts per second, that is 40,000 index updates per second. Audit your indexes and drop any that are not actively used:

function findUnusedIndexes(db, collectionName, callback) {
  db.collection(collectionName).aggregate([
    { $indexStats: {} }
  ]).toArray(function(err, stats) {
    if (err) return callback(err);
    stats.forEach(function(idx) {
      if (idx.accesses.ops === 0 && idx.name !== '_id_') {
        console.log('UNUSED INDEX:', idx.name, '- consider dropping');
      }
    });
    callback(null, stats);
  });
}

Best Practices

  1. Profile before you optimize. Enable the database profiler at level 1 with a 50ms threshold. Let it run for 24 hours, then analyze the system.profile collection. Optimize the top 5 slowest queries first. This data-driven approach prevents wasting time on queries that do not matter.

  2. Design compound indexes for your queries, not your schema. The ESR rule (Equality, Sort, Range) determines field order in compound indexes. Put equality-match fields first, then sort fields, then range fields. { status: 1, createdAt: -1, total: { $gt: 100 } } means the index should be { status: 1, createdAt: -1, total: 1 }.

  3. Use lean() for all read-only Mongoose queries. The overhead of Mongoose document hydration is measurable. On a query returning 500 documents, .lean() can save 50-100ms. Apply it globally with a plugin if every query in a route is read-only.

  4. Set maxTimeMS on every production query. A single runaway query can exhaust your connection pool and cascade into a full application outage. Set maxTimeMS(5000) on all queries and handle the timeout error gracefully.

  5. Prefer range-based pagination over skip/limit. Skip-based pagination has O(n) performance that degrades linearly with page depth. Range-based pagination is O(1). For any API endpoint that may be iterated deeply, range-based cursors are non-negotiable.

  6. Audit indexes monthly. Use $indexStats to find indexes with zero operations. Every unused index costs write performance and consumes storage. Drop indexes that no query uses. Conversely, review the slow query log for queries that could benefit from new indexes.

  7. Keep documents small and predictable. Documents under 16KB compress well, fit more entries in the WiredTiger cache, and avoid relocation on updates. Extract unbounded arrays into separate collections. Avoid schema designs where documents grow over time.

  8. Use unordered bulk operations for independent writes. Ordered bulk operations execute sequentially and stop on first error. Unordered operations execute in parallel and continue past failures. For batch imports, log ingestion, and data migration, unordered is always faster.


References

Powered by Contentful