Mongodb

MongoDB Indexing Best Practices

A comprehensive guide to MongoDB indexing covering compound indexes, the ESR rule, covering queries, partial indexes, explain analysis, and Mongoose index management.

MongoDB Indexing Best Practices

If your MongoDB queries are slow, the answer is almost always indexes. I have seen production databases go from multi-second query times to sub-millisecond responses just by adding the right indexes. But "the right indexes" is the key phrase. Bad indexes waste memory, slow down writes, and give you a false sense of security. This guide covers everything you need to know about MongoDB indexing, from the fundamentals to advanced strategies, with real Node.js code you can use today.

How MongoDB Indexes Work

MongoDB indexes use a B-tree data structure. Think of it like a phone book — instead of scanning every page to find a name, you jump to the right section using the alphabetical tabs. Without an index, MongoDB performs a collection scan (COLLSCAN), reading every single document to find matches. With an index, it performs an index scan (IXSCAN), walking the B-tree to find matching entries and then fetching only the relevant documents.

Every collection has a default index on the _id field. Beyond that, you need to create indexes that match your query patterns. The critical thing to understand is that indexes are not free. Each index consumes memory, and every write operation (insert, update, delete) must update all relevant indexes. The goal is to create the minimum number of indexes that cover your most important queries.

Single Field Indexes

The simplest index type covers a single field:

// In MongoDB shell
db.products.createIndex({ category: 1 });

// In Mongoose
var productSchema = new mongoose.Schema({
  name: String,
  category: String,
  price: Number
});

productSchema.index({ category: 1 });

The 1 means ascending order, -1 means descending. For single field indexes, the direction does not matter for queries — MongoDB can traverse the B-tree in either direction. Direction only matters for compound indexes when sorting on multiple fields.

Compound Indexes and the ESR Rule

Compound indexes are where most developers get things wrong. The order of fields in a compound index matters enormously, and getting it wrong can make the index useless.

The ESR Rule (Equality, Sort, Range) tells you the optimal field order:

  1. Equality fields first — fields tested with exact match ({ category: "electronics" })
  2. Sort fields second — fields used in .sort() operations
  3. Range fields last — fields tested with $gt, $lt, $in, $gte, $lte
// Query: find products in a category, sorted by price, within a rating range
db.products.find({
  category: "electronics",    // Equality
  rating: { $gte: 4 }         // Range
}).sort({ price: 1 });         // Sort

// CORRECT index order (ESR):
db.products.createIndex({ category: 1, price: 1, rating: 1 });

// WRONG index order:
db.products.createIndex({ rating: 1, category: 1, price: 1 });

Why does this matter? When MongoDB hits a range condition, it can no longer use the remaining index fields efficiently for sorting. By placing equality fields first, MongoDB narrows down to an exact subset. The sort fields come next so MongoDB can return results in order without an in-memory sort. Range fields come last because they break the sorted order of subsequent fields.

The Compound Index Prefix Rule

A compound index can support queries on any prefix of its fields. Given this index:

db.products.createIndex({ category: 1, brand: 1, price: 1 });

This single index supports queries on:

  • { category: 1 } — uses the first field prefix
  • { category: 1, brand: 1 } — uses the first two fields
  • { category: 1, brand: 1, price: 1 } — uses all three fields

But it does not efficiently support:

  • { brand: 1 } — skips the first field
  • { price: 1 } — skips the first two fields
  • { brand: 1, price: 1 } — skips the first field

This is why index design requires understanding your query patterns. A well-designed compound index can eliminate the need for multiple single-field indexes.

Multikey Indexes

When you index a field that contains an array, MongoDB creates a multikey index — one index entry for each element in the array:

var productSchema = new mongoose.Schema({
  name: String,
  tags: [String],
  variants: [{
    color: String,
    size: String
  }]
});

// Index on array of strings
productSchema.index({ tags: 1 });

// Index on nested field within array of objects
productSchema.index({ "variants.color": 1 });

One important restriction: a compound index can include at most one array field. MongoDB cannot create a multikey index on two array fields in the same compound index because the cross-product would explode in size.

Text Indexes

For full-text search without an external search engine, MongoDB provides text indexes:

productSchema.index({
  name: "text",
  description: "text"
}, {
  weights: {
    name: 10,
    description: 5
  },
  name: "product_text_search"
});

// Query using text search
db.products.find({ $text: { $search: "wireless bluetooth headphones" } });

Text indexes support stemming, stop words, and weighted fields. They are useful for basic search functionality, but for anything serious — faceted search, fuzzy matching, autocomplete — you should use MongoDB Atlas Search or Elasticsearch. A collection can only have one text index.

TTL Indexes

Time-to-Live indexes automatically delete documents after a specified duration. They are essential for session data, logs, temporary tokens, and anything with a natural expiration:

var sessionSchema = new mongoose.Schema({
  userId: mongoose.Schema.Types.ObjectId,
  token: String,
  createdAt: { type: Date, default: Date.now }
});

// Documents expire 24 hours after createdAt
sessionSchema.index({ createdAt: 1 }, { expireAfterSeconds: 86400 });

TTL indexes only work on fields containing Date values. MongoDB runs a background task every 60 seconds to remove expired documents, so deletion is not instantaneous — there can be up to a 60-second lag.

Unique Indexes

Unique indexes enforce that no two documents share the same value for the indexed field:

var userSchema = new mongoose.Schema({
  email: { type: String, required: true },
  username: { type: String, required: true }
});

userSchema.index({ email: 1 }, { unique: true });
userSchema.index({ username: 1 }, { unique: true });

One gotcha: if the field is missing from a document, MongoDB treats it as null. Multiple documents with a missing field will violate the unique constraint because they all have the same null value. Use a partial index to avoid this.

Partial Indexes

Partial indexes only index documents that match a filter expression. They are smaller, use less memory, and are more efficient than full indexes when you only query a subset of documents:

// Only index active products
productSchema.index(
  { category: 1, price: 1 },
  { partialFilterExpression: { status: "active" } }
);

// Only index orders above $100
orderSchema.index(
  { customerId: 1, orderDate: -1 },
  { partialFilterExpression: { total: { $gt: 100 } } }
);

For the partial index to be used, your query must include the filter expression as part of its conditions. If you query without { status: "active" }, MongoDB cannot use the partial index because it does not contain all matching documents.

Sparse Indexes

Sparse indexes only include documents where the indexed field exists. They are an older feature — partial indexes are more flexible and generally preferred:

// Only index documents that have a phone field
userSchema.index({ phone: 1 }, { sparse: true });

Sparse indexes solve the unique index problem with missing fields. A sparse unique index allows multiple documents without the field while still enforcing uniqueness among documents that have it.

Wildcard Indexes

When your documents have unpredictable or highly variable field names, wildcard indexes can help:

// Index all fields in the metadata subdocument
productSchema.index({ "metadata.$**": 1 });

// Index all fields in the entire document
productSchema.index({ "$**": 1 });

Wildcard indexes are useful for schema-less data patterns, but they are not a substitute for targeted compound indexes. They cannot support efficient compound queries or sorting. Use them for ad-hoc queries on dynamic fields, not for your primary query patterns.

Covering Queries

A covering query is the holy grail of index optimization. When an index contains all the fields needed to satisfy a query — both the filter fields and the projected (returned) fields — MongoDB can return results entirely from the index without touching the documents at all. This is called a covered query and shows totalDocsExamined: 0 in the explain output.

// Create an index that covers the query
db.products.createIndex({ category: 1, price: 1, name: 1 });

// This query is "covered" — only needs fields in the index
db.products.find(
  { category: "electronics" },
  { price: 1, name: 1, _id: 0 }    // projection must exclude _id
).sort({ price: 1 });

The _id field is included in query results by default, so you must explicitly exclude it with _id: 0 unless _id is part of your index. Covered queries are significantly faster because they avoid the random I/O of fetching full documents from disk.

Using explain() for Index Analysis

The explain() method is your primary diagnostic tool. It tells you exactly how MongoDB executes a query:

// In MongoDB shell
db.products.find({ category: "electronics" }).explain("executionStats");

// In Mongoose
var Product = require("./models/product");

function analyzeQuery(callback) {
  Product.find({ category: "electronics" })
    .explain("executionStats")
    .then(function(explanation) {
      var stats = explanation[0].executionStats;
      console.log("Execution time (ms):", stats.executionTimeMillis);
      console.log("Total docs examined:", stats.totalDocsExamined);
      console.log("Total keys examined:", stats.totalKeysExamined);
      console.log("Docs returned:", stats.nReturned);

      var stage = explanation[0].queryPlanner.winningPlan.inputStage;
      console.log("Index used:", stage.indexName || "NONE (COLLSCAN)");
      console.log("Scan type:", stage.stage);
      callback(null, stats);
    })
    .catch(function(err) {
      callback(err);
    });
}

Key metrics to watch:

  • COLLSCAN vs IXSCAN: COLLSCAN means no index is being used. You almost always want IXSCAN.
  • totalDocsExamined / nReturned ratio: Ideally close to 1.0. If you examine 10,000 documents but return 10, your index is not selective enough.
  • totalKeysExamined / nReturned ratio: Same idea. High ratios mean the index is not well-matched to the query.
  • executionTimeMillis: The actual wall-clock time. Compare before and after adding indexes.

Index Hints

When MongoDB chooses the wrong index (it happens), you can force a specific index:

// MongoDB shell
db.products.find({ category: "electronics", price: { $lt: 50 } })
  .hint({ category: 1, price: 1 });

// Mongoose
Product.find({ category: "electronics", price: { $lt: 50 } })
  .hint({ category: 1, price: 1 })
  .exec(function(err, products) {
    // results
  });

Use hints sparingly. If you find yourself needing them often, your index design probably needs rethinking. The query planner is usually right.

Index Intersection

MongoDB can sometimes combine multiple single-field indexes to satisfy a query, a feature called index intersection. However, do not rely on this. In practice, a proper compound index almost always outperforms index intersection. I have never seen a production case where index intersection was the better solution over a compound index.

Monitoring Index Usage

Unused indexes consume memory and slow down writes. Monitor your indexes and drop the ones that are not being used:

// Check index usage statistics
db.products.aggregate([{ $indexStats: {} }]);

// This returns for each index:
// - accesses.ops: number of times the index was used
// - accesses.since: when tracking started

In a Node.js application, you can check this programmatically:

var mongoose = require("mongoose");

function checkIndexUsage(callback) {
  var db = mongoose.connection.db;
  db.collection("products").aggregate([
    { $indexStats: {} }
  ]).toArray(function(err, stats) {
    if (err) return callback(err);

    stats.forEach(function(index) {
      console.log("Index:", index.name);
      console.log("  Operations:", index.accesses.ops);
      console.log("  Since:", index.accesses.since);

      if (index.accesses.ops === 0 && index.name !== "_id_") {
        console.log("  WARNING: Unused index — consider dropping");
      }
    });

    callback(null, stats);
  });
}

Index Build Impact

Creating indexes on large collections can impact performance. MongoDB supports two approaches:

  • Foreground builds (pre-4.2 default): Lock the collection, faster to build but blocks all reads and writes.
  • Background builds (deprecated in 4.2+): Allow operations during build but slower and use more memory.
  • Hybrid builds (4.2+ default): The best of both worlds. They do not block reads or writes for the majority of the build process, only holding an exclusive lock at the start and end.
// In Mongoose, indexes are created when the connection opens
// For production, consider building indexes separately
productSchema.set("autoIndex", false); // disable auto-index in production

// Build indexes manually during maintenance
Product.createIndexes(function(err) {
  if (err) console.error("Index build failed:", err);
  else console.log("Indexes built successfully");
});

For production deployments, disable auto-indexing in Mongoose and run index builds during maintenance windows or use rolling index builds on replica sets.

Index Size and Memory Considerations

Indexes must fit in RAM for optimal performance. When indexes exceed available memory, MongoDB pages them to disk, and performance degrades dramatically.

// Check index sizes
db.products.stats().indexSizes;

// Check total index size
db.products.totalIndexSize();

Strategies for managing index size:

  • Use partial indexes to reduce the number of indexed documents
  • Avoid indexing large string fields unless necessary
  • Use hashed indexes for equality-only queries on large fields (they are smaller than B-tree indexes)
  • Remove unused indexes
  • Consider the impact of multikey indexes on arrays — the index grows with array size

Complete Working Example

Here is a full Node.js application with Mongoose that creates optimal indexes for an e-commerce product catalog. It demonstrates search, filtering, sorting, pagination, and explain analysis.

var mongoose = require("mongoose");
var Schema = mongoose.Schema;

// --- Schema Definition ---

var productSchema = new Schema({
  name: { type: String, required: true },
  slug: { type: String, required: true },
  description: String,
  category: { type: String, required: true },
  subcategory: String,
  brand: String,
  price: { type: Number, required: true },
  salePrice: Number,
  rating: { type: Number, default: 0 },
  reviewCount: { type: Number, default: 0 },
  tags: [String],
  status: { type: String, enum: ["active", "draft", "archived"], default: "active" },
  createdAt: { type: Date, default: Date.now },
  updatedAt: { type: Date, default: Date.now }
});

// --- Index Definitions ---

// 1. Unique slug for URL routing
productSchema.index({ slug: 1 }, { unique: true });

// 2. Category + price sort + rating range (ESR rule)
//    Supports: filter by category, sort by price, filter by rating
productSchema.index(
  { category: 1, price: 1, rating: 1 },
  { partialFilterExpression: { status: "active" } }
);

// 3. Category + newest first (for default listing)
productSchema.index(
  { category: 1, createdAt: -1 },
  { partialFilterExpression: { status: "active" } }
);

// 4. Text search on name and description
productSchema.index(
  { name: "text", description: "text" },
  { weights: { name: 10, description: 3 }, name: "product_search" }
);

// 5. Tags multikey index for tag-based browsing
productSchema.index({ tags: 1, status: 1 });

// 6. Brand + category for brand pages
productSchema.index({ brand: 1, category: 1 });

// Disable auto-index for production
productSchema.set("autoIndex", false);

var Product = mongoose.model("Product", productSchema);

// --- Database Connection ---

function connectDB(callback) {
  var uri = process.env.MONGODB_URI || "mongodb://localhost:27017/ecommerce";
  mongoose.connect(uri);

  mongoose.connection.on("connected", function() {
    console.log("Connected to MongoDB");
    callback(null);
  });

  mongoose.connection.on("error", function(err) {
    callback(err);
  });
}

// --- Build Indexes ---

function buildIndexes(callback) {
  console.log("Building indexes...");
  var startTime = Date.now();

  Product.createIndexes(function(err) {
    if (err) return callback(err);

    var elapsed = Date.now() - startTime;
    console.log("Indexes built in " + elapsed + "ms");
    callback(null);
  });
}

// --- Seed Sample Data ---

function seedProducts(count, callback) {
  var categories = ["electronics", "clothing", "home", "sports", "books"];
  var brands = ["Acme", "Globex", "Initech", "Umbrella", "Wonka"];
  var tagSets = [
    ["wireless", "bluetooth", "portable"],
    ["organic", "eco-friendly", "sustainable"],
    ["premium", "luxury", "handmade"],
    ["budget", "value", "sale"],
    ["trending", "new-arrival", "bestseller"]
  ];

  var products = [];
  for (var i = 0; i < count; i++) {
    var catIndex = i % categories.length;
    products.push({
      name: "Product " + i + " " + brands[i % brands.length],
      slug: "product-" + i,
      description: "A detailed description for product number " + i,
      category: categories[catIndex],
      subcategory: categories[catIndex] + "-sub-" + (i % 3),
      brand: brands[i % brands.length],
      price: Math.round((Math.random() * 500 + 5) * 100) / 100,
      rating: Math.round((Math.random() * 4 + 1) * 10) / 10,
      reviewCount: Math.floor(Math.random() * 500),
      tags: tagSets[i % tagSets.length],
      status: i % 20 === 0 ? "draft" : "active",
      createdAt: new Date(Date.now() - Math.random() * 90 * 24 * 60 * 60 * 1000)
    });
  }

  Product.insertMany(products, function(err) {
    if (err) return callback(err);
    console.log("Inserted " + count + " products");
    callback(null);
  });
}

// --- Query Functions ---

// Browse by category with price sort and pagination
function browseCategory(category, page, limit, sortField, callback) {
  var skip = (page - 1) * limit;
  var sort = {};
  sort[sortField || "price"] = 1;

  var query = Product.find({ category: category, status: "active" })
    .select("name price rating slug -_id")
    .sort(sort)
    .skip(skip)
    .limit(limit);

  query.exec(function(err, products) {
    if (err) return callback(err);
    callback(null, products);
  });
}

// Search products by text
function searchProducts(searchTerm, callback) {
  Product.find(
    { $text: { $search: searchTerm } },
    { score: { $meta: "textScore" } }
  )
    .sort({ score: { $meta: "textScore" } })
    .limit(20)
    .exec(function(err, products) {
      if (err) return callback(err);
      callback(null, products);
    });
}

// Filter by category, price range, and minimum rating
function filterProducts(filters, callback) {
  var query = { status: "active" };

  if (filters.category) query.category = filters.category;
  if (filters.minPrice || filters.maxPrice) {
    query.price = {};
    if (filters.minPrice) query.price.$gte = filters.minPrice;
    if (filters.maxPrice) query.price.$lte = filters.maxPrice;
  }
  if (filters.minRating) {
    query.rating = { $gte: filters.minRating };
  }

  var sort = {};
  sort[filters.sortBy || "price"] = filters.sortOrder || 1;

  Product.find(query)
    .sort(sort)
    .skip(filters.skip || 0)
    .limit(filters.limit || 20)
    .exec(function(err, products) {
      if (err) return callback(err);
      callback(null, products);
    });
}

// --- Explain Analysis ---

function explainQuery(label, query, callback) {
  console.log("\n=== " + label + " ===");

  query.explain("executionStats").then(function(explanation) {
    var stats = explanation[0].executionStats;
    var plan = explanation[0].queryPlanner.winningPlan;

    // Walk the plan to find the input stage
    var stage = plan.inputStage || plan;
    while (stage.inputStage) {
      stage = stage.inputStage;
    }

    console.log("Scan type:          ", stage.stage);
    console.log("Index used:         ", stage.indexName || "NONE");
    console.log("Docs examined:      ", stats.totalDocsExamined);
    console.log("Keys examined:      ", stats.totalKeysExamined);
    console.log("Docs returned:      ", stats.nReturned);
    console.log("Execution time (ms):", stats.executionTimeMillis);

    var efficiency = stats.nReturned > 0
      ? (stats.nReturned / stats.totalDocsExamined).toFixed(2)
      : "N/A";
    console.log("Scan efficiency:    ", efficiency);

    callback(null, stats);
  }).catch(function(err) {
    callback(err);
  });
}

// --- Run Demo ---

function runDemo() {
  connectDB(function(err) {
    if (err) {
      console.error("Connection failed:", err);
      process.exit(1);
    }

    // Drop existing data
    Product.deleteMany({}, function() {

      // Seed data first WITHOUT indexes to show COLLSCAN
      seedProducts(5000, function(err) {
        if (err) {
          console.error("Seed failed:", err);
          process.exit(1);
        }

        console.log("\n--- BEFORE INDEXING (expect COLLSCAN) ---");

        var beforeQuery = Product.find({
          category: "electronics",
          status: "active",
          rating: { $gte: 4 }
        }).sort({ price: 1 }).limit(10);

        explainQuery("Category browse WITHOUT index", beforeQuery, function(err) {
          if (err) console.error(err);

          // Now build indexes
          buildIndexes(function(err) {
            if (err) {
              console.error("Index build failed:", err);
              process.exit(1);
            }

            console.log("\n--- AFTER INDEXING (expect IXSCAN) ---");

            var afterQuery = Product.find({
              category: "electronics",
              status: "active",
              rating: { $gte: 4 }
            }).sort({ price: 1 }).limit(10);

            explainQuery("Category browse WITH index", afterQuery, function(err) {
              if (err) console.error(err);

              // Covering query example
              var coverQuery = Product.find(
                { category: "electronics", status: "active" },
                { price: 1, rating: 1, _id: 0 }
              ).sort({ price: 1 }).limit(10);

              explainQuery("Covering query", coverQuery, function(err) {
                if (err) console.error(err);

                // Check index sizes
                var db = mongoose.connection.db;
                db.collection("products").stats(function(err, stats) {
                  if (err) console.error(err);
                  else {
                    console.log("\n=== Index Sizes ===");
                    var sizes = stats.indexSizes;
                    Object.keys(sizes).forEach(function(name) {
                      var sizeMB = (sizes[name] / 1024 / 1024).toFixed(2);
                      console.log("  " + name + ": " + sizeMB + " MB");
                    });
                    console.log("  Total: " +
                      (stats.totalIndexSize / 1024 / 1024).toFixed(2) + " MB");
                  }

                  mongoose.connection.close(function() {
                    console.log("\nDone.");
                  });
                });
              });
            });
          });
        });
      });
    });
  });
}

runDemo();

Sample output (5,000 products):

--- BEFORE INDEXING (expect COLLSCAN) ---

=== Category browse WITHOUT index ===
Scan type:           COLLSCAN
Index used:          NONE
Docs examined:       5000
Keys examined:       0
Docs returned:       10
Execution time (ms): 12
Scan efficiency:     0.00

Building indexes...
Indexes built in 87ms

--- AFTER INDEXING (expect IXSCAN) ---

=== Category browse WITH index ===
Scan type:           IXSCAN
Index used:          category_1_price_1_rating_1
Docs examined:       10
Keys examined:       14
Docs returned:       10
Execution time (ms): 0
Scan efficiency:     1.00

=== Covering query ===
Scan type:           IXSCAN
Index used:          category_1_price_1_rating_1
Docs examined:       0
Keys examined:       10
Docs returned:       10
Execution time (ms): 0
Scan efficiency:     N/A

Notice the dramatic difference. Without indexes, MongoDB examined all 5,000 documents to return 10 results. With indexes, it examined only 10-14 documents. The covering query examined zero documents because all data came from the index itself.

Common Issues and Troubleshooting

1. Index not being used despite existing. The most common cause is that your query does not match the index prefix. If your compound index is { category: 1, brand: 1, price: 1 } but your query only filters on brand, the index is skipped. Always check with explain(). Also verify that partial index filter expressions match your query conditions.

2. Slow writes after adding too many indexes. Every insert or update must modify all indexes on the collection. If you have 15 indexes on a collection, each write does 15 index operations. Audit your indexes with $indexStats and drop any that show zero usage. I recommend keeping no more than 8-10 indexes per collection.

3. Memory pressure from oversized indexes. Run db.collection.totalIndexSize() regularly. If your indexes exceed available RAM, performance will degrade as MongoDB pages index data to disk. Partial indexes, removing unused indexes, and avoiding indexing large string fields can help reduce size.

4. Unique index errors on null values. When multiple documents lack the unique-indexed field, MongoDB treats them all as having a null value, which violates the unique constraint. Use a partial index with { partialFilterExpression: { field: { $exists: true } } } to only enforce uniqueness on documents where the field exists.

5. Regex queries not using indexes efficiently. Only anchored regex patterns (starting with ^) can use an index. The query { name: /^Wireless/ } uses the index, but { name: /wireless/i } does a full scan. For case-insensitive search, use a text index or store a normalized lowercase version of the field with its own index.

Best Practices

  1. Design indexes for your queries, not your schema. List your most common queries first, then design indexes to support them. Do not blindly index every field.

  2. Follow the ESR rule for compound indexes. Equality fields first, then sort fields, then range fields. This single rule will save you more performance headaches than anything else.

  3. Use partial indexes whenever possible. If you only query active records, add { partialFilterExpression: { status: "active" } }. Smaller indexes are faster indexes.

  4. Measure with explain(), not intuition. Always verify your assumptions. The query planner sometimes does unexpected things, and explain output is the only way to know for sure.

  5. Disable autoIndex in production Mongoose apps. Set productSchema.set("autoIndex", false) and build indexes as part of your deployment pipeline, not at application startup. Auto-indexing can cause surprising downtime on large collections.

  6. Monitor and prune unused indexes quarterly. Use $indexStats aggregation to find indexes with zero operations. Every unused index wastes memory and slows down writes for no benefit.

  7. Prefer compound indexes over index intersection. A single compound index almost always outperforms MongoDB's ability to combine multiple single-field indexes. Design your compound indexes using the prefix rule to cover multiple query patterns.

  8. Use covered queries for high-throughput read paths. If a query runs thousands of times per second, design an index that covers all projected fields so MongoDB never touches the document data.

References

Powered by Contentful