MongoDB Replication for High Availability

Shane

2/14/2026

21 min read

A practical guide to MongoDB replication covering replica set architecture, failover, read/write concerns, monitoring lag, and Node.js application configuration.

nodejs replication high-availability failover mongodb replica-set

MongoDB Replication for High Availability

Running a single MongoDB instance in production is asking for trouble. One disk failure, one kernel panic, one botched upgrade, and your entire application is down with potential data loss. MongoDB replication solves this by maintaining identical copies of your data across multiple servers, providing automatic failover and the ability to survive hardware failures without downtime. If you are running MongoDB in production and you do not have replication configured, stop what you are doing and fix that today.

Prerequisites

MongoDB 6.0 or later installed (we will use Docker for local testing)
Node.js v18+ with npm
Basic familiarity with MongoDB operations (CRUD, connection strings)
Docker and Docker Compose installed for the local replica set
Mongoose 7.x or the native MongoDB Node.js driver

Why Replication Matters

There are three reasons you replicate a MongoDB deployment, and all three matter in production.

Data redundancy. Every write that hits your primary node is replicated to secondary nodes. If the primary's disk dies, your data still exists on the secondaries. Without replication, a hardware failure means you are restoring from your last backup — assuming you have one — and losing every write since that backup was taken.

High availability. When a primary node goes down, the remaining members of the replica set hold an election and promote a new primary. This happens automatically, typically within 10-12 seconds. Your application reconnects and continues operating. No pager alerts at 3 AM, no manual intervention.

Read scaling. You can direct read operations to secondary nodes, spreading the load across multiple servers. This is not a silver bullet — there are consistency tradeoffs — but for read-heavy workloads where slightly stale data is acceptable, it is a significant performance lever.

Replica Set Architecture

A MongoDB replica set is a group of mongod processes that maintain the same data set. The architecture has three member roles.

Primary. The single node that accepts all write operations. There is exactly one primary at any time. All writes go to the primary's oplog, which secondaries tail to replicate changes.

Secondary. Nodes that replicate data from the primary by continuously reading the primary's oplog and applying operations locally. Secondaries can serve read operations if your read preference allows it. You typically run two or more secondaries.

Arbiter. A lightweight member that participates in elections but holds no data. Arbiters exist solely to break ties in elections when you have an even number of data-bearing members. I generally avoid arbiters in production — they save a bit of hardware cost but introduce risk. If you lose your primary and one secondary in a two-data-node-plus-arbiter setup, you have lost half your data capacity. A three-data-node set is more resilient.

The minimum recommended production topology is three data-bearing members: one primary and two secondaries. This tolerates the loss of any single member while maintaining a majority for elections and write acknowledgment.

The Oplog and How Replication Works

The oplog (operations log) is a special capped collection (local.oplog.rs) on the primary that records every write operation. Secondaries maintain a cursor on the primary's oplog and continuously pull new operations, applying them in order.

Each oplog entry is idempotent — applying the same entry multiple times produces the same result. This is critical because replication must survive network interruptions and retries.

# Connect to the primary and inspect the oplog
mongosh --eval "db.getSiblingDB('local').oplog.rs.find().sort({ts: -1}).limit(3).pretty()"

{
  "ts": Timestamp(1707840000, 1),
  "t": NumberLong(1),
  "h": NumberLong("5747328747234234"),
  "v": 2,
  "op": "i",
  "ns": "myapp.users",
  "o": {
    "_id": ObjectId("65c4a1b2e3f4a5b6c7d8e9f0"),
    "name": "Shane Larson",
    "email": "[email protected]"
  }
}

The op field tells you the operation type: i for insert, u for update, d for delete, c for commands (like createIndex), and n for no-ops. The ns field is the namespace (database.collection).

The oplog is a capped collection, meaning it has a fixed size and old entries are automatically removed as new ones are added. The oplog window — the time span covered by the oplog — determines how long a secondary can be offline and still catch up. On a busy production system, you want an oplog window of at least 24-72 hours. You can resize it without downtime:

# Check current oplog size and window
mongosh --eval "db.getSiblingDB('local').oplog.rs.stats().maxSize"

# Resize the oplog to 4GB
mongosh --eval "db.adminCommand({replSetResizeOplog: 1, size: 4096})"

Initiating a Replica Set

To start a replica set, each mongod instance must be launched with the --replSet flag specifying the same set name. Then you initiate the set from one member.

# Start three mongod instances with the same replica set name
mongod --replSet myrs --port 27017 --dbpath /data/rs0
mongod --replSet myrs --port 27018 --dbpath /data/rs1
mongod --replSet myrs --port 27019 --dbpath /data/rs2

// Connect to one member and initiate
rs.initiate({
  _id: "myrs",
  members: [
    { _id: 0, host: "mongo1:27017" },
    { _id: 1, host: "mongo2:27018" },
    { _id: 2, host: "mongo3:27019" }
  ]
});

After initiation, the members hold an election and one becomes primary. You can check the status with rs.status().

Automatic Failover and Elections

When a primary becomes unreachable, the remaining members detect this through heartbeats (sent every 2 seconds) and initiate an election after the electionTimeoutMillis period (default: 10 seconds).

Several factors influence which member wins an election:

Priority. Members with higher priority values are preferred as primary. The default priority is 1. Setting a member's priority to 0 prevents it from ever becoming primary — useful for members in a distant data center that you only want for disaster recovery.

Votes. Each member can have 0 or 1 votes. A member must receive a majority of votes from voting members to win. A replica set can have at most 7 voting members. Non-voting members still replicate data but do not participate in elections.

Data freshness. A member must have the most recent oplog entry (or be within 10 seconds of the most recent primary) to be eligible for election. Stale members cannot become primary.

// Configure member priorities
var config = rs.conf();
config.members[0].priority = 10;  // Strongly prefer this node as primary
config.members[1].priority = 5;
config.members[2].priority = 1;   // Least preferred
rs.reconfig(config);

// Prevent a member from becoming primary (DR replica)
var config = rs.conf();
config.members[2].priority = 0;
rs.reconfig(config);

The typical failover time in a well-configured replica set is 10-12 seconds. You can reduce electionTimeoutMillis to speed this up, but setting it too low risks unnecessary elections during brief network hiccups.

Read Preferences

Read preferences control which members of a replica set serve read operations. Choosing the right read preference is a tradeoff between consistency and performance.

Read Preference	Behavior	Use Case
`primary`	All reads go to the primary	Default. Guarantees latest data
`primaryPreferred`	Reads from primary; falls back to secondary if primary is unavailable	Maximize availability, prefer consistency
`secondary`	All reads go to secondaries	Offload read traffic from primary
`secondaryPreferred`	Reads from secondaries; falls back to primary if none available	Best effort read distribution
`nearest`	Reads from the member with lowest network latency	Geo-distributed deployments

A word of warning: reading from secondaries means you may read stale data. Replication lag, even if small, means a secondary might not have the latest write. For operations where consistency matters — user login, payment processing, inventory checks — stick with primary. For analytics dashboards, reporting queries, or search — secondaryPreferred or nearest works well.

Write Concerns

Write concern specifies the level of acknowledgment requested from MongoDB for write operations. Getting this right is the difference between "we think the data was saved" and "the data was saved."

w: 1 — The write was acknowledged by the primary. This is the default. The data exists on one node. If that node dies before replication occurs, the write is lost.

w: "majority" — The write was acknowledged by a majority of data-bearing, voting members. This is what you should use for any data you cannot afford to lose. With a 3-member set, this means the write is on at least 2 nodes before the acknowledgment returns.

w: 0 — Fire and forget. No acknowledgment at all. Only use this for non-critical telemetry data where throughput matters more than durability.

j: true — The write was committed to the journal (on-disk WAL) before acknowledgment. Combine with w: "majority" for maximum durability.

// Write with majority concern and journal acknowledgment
db.orders.insertOne(
  { customerId: "cust_123", total: 99.95, status: "pending" },
  { writeConcern: { w: "majority", j: true, wtimeout: 5000 } }
);

The wtimeout parameter prevents your application from hanging indefinitely if a majority of members are unreachable. Set it to a reasonable value like 5000ms — if the write cannot be majority-acknowledged in that time, something is seriously wrong.

Read Concerns

Read concern controls the consistency and isolation properties of the data read from replica set members.

local — Returns the most recent data available on the queried member. No guarantee that the data has been written to a majority of members. This is the default.

available — Similar to local but specifically designed for sharded clusters. On unsharded collections, it behaves identically to local.

majority — Returns only data that has been acknowledged by a majority of members. This guarantees you will not read data that could be rolled back. Use this when you need to be certain the data you read is durable.

linearizable — The strongest guarantee. Reads reflect all successful majority-acknowledged writes that completed before the read began. This involves additional coordination and higher latency. Use sparingly and only when you need it.

For most applications, local reads with majority writes provides a good balance. If you need stronger guarantees — for example, reading a document immediately after writing it — use majority read concern or read from the primary.

Replica Set Configuration Options

Beyond priority and votes, several configuration options let you tune replica set behavior.

var config = rs.conf();

// Set an election timeout (milliseconds)
config.settings.electionTimeoutMillis = 10000;

// Set heartbeat interval (default 2 seconds)
config.settings.heartbeatIntervalMillis = 2000;

// Set heartbeat timeout for detecting member failure
config.settings.heartbeatTimeoutSecs = 10;

// Enable chained replication (secondaries can sync from other secondaries)
config.settings.chainingAllowed = true;

rs.reconfig(config);

Chained replication can reduce load on the primary by having some secondaries replicate from other secondaries instead of directly from the primary. It is enabled by default and generally a good idea.

Monitoring Replication Lag

Replication lag is the time between a write hitting the primary and that write being applied on a secondary. In a healthy replica set, lag should be under 1 second. Sustained lag above a few seconds warrants investigation.

# Check replication status and lag
mongosh --eval "rs.printSecondaryReplicationInfo()"

Output:

source: mongo2:27018
    syncedTo: Thu Feb 13 2026 14:22:35 GMT+0000
    0 secs (0 hrs) behind the primary
source: mongo3:27019
    syncedTo: Thu Feb 13 2026 14:22:34 GMT+0000
    1 secs (0 hrs) behind the primary

You can also monitor programmatically:

var status = rs.status();
var primary = status.members.filter(function(m) { return m.stateStr === "PRIMARY"; })[0];
var secondaries = status.members.filter(function(m) { return m.stateStr === "SECONDARY"; });

secondaries.forEach(function(sec) {
  var lagMs = primary.optimeDate - sec.optimeDate;
  print(sec.name + " lag: " + lagMs + "ms");
});

Common causes of replication lag include: heavy write workloads, slow disks on secondaries, network bandwidth constraints, long-running operations that block the oplog application, and building indexes on secondaries.

Adding and Removing Members

Adding a new member to a running replica set requires no downtime.

// Add a new member
rs.add("mongo4:27020");

// Add with specific configuration
rs.add({
  host: "mongo4:27020",
  priority: 5,
  votes: 1
});

// Remove a member
rs.remove("mongo4:27020");

When adding a new member, it performs an initial sync — copying all data from an existing member. For large data sets, this can take hours and generate significant network traffic. Plan accordingly and consider performing the initial sync during off-peak hours.

Hidden Members and Delayed Replicas

Hidden members are replica set members that are invisible to client applications. They do not receive read traffic from driver read preference routing. They are useful for dedicated backup nodes or reporting workloads that you connect to directly.

Delayed replicas maintain a copy of the data set that is intentionally behind the primary by a configurable time window. They are your insurance policy against human error — accidental drops, bad migrations, errant bulk deletes.

// Configure a hidden, delayed member (1 hour delay)
var config = rs.conf();
config.members[3].priority = 0;       // Cannot become primary
config.members[3].hidden = true;       // Invisible to drivers
config.members[3].secondaryDelaySecs = 3600;  // 1 hour behind
rs.reconfig(config);

I run a 1-hour delayed replica on every production MongoDB deployment. When someone accidentally drops a collection (it happens more often than you think), that delayed replica gives you a 1-hour window to recover the data without touching backups.

Replica Set Tags for Geographic Distribution

Tags let you label members by their physical location, hardware profile, or any other attribute. Combined with tag-aware read preferences, tags enable sophisticated routing of read operations.

var config = rs.conf();
config.members[0].tags = { dc: "us-east", role: "primary" };
config.members[1].tags = { dc: "us-east", role: "analytics" };
config.members[2].tags = { dc: "us-west", role: "dr" };
rs.reconfig(config);

// Read from the analytics member
db.orders.find({ status: "completed" }).readPref("secondary", [{ role: "analytics" }]);

You can also use tags with write concerns to ensure writes are replicated to specific data centers before acknowledgment:

// Custom write concern requiring replication to both data centers
var config = rs.conf();
config.settings.getLastErrorModes = {
  multiDC: { dc: 2 }  // Must replicate to members in 2 distinct "dc" tag values
};
rs.reconfig(config);

// Use the custom write concern
db.orders.insertOne(
  { orderId: "ORD-001" },
  { writeConcern: { w: "multiDC", wtimeout: 10000 } }
);

Complete Working Example: Node.js with Replica Set

Let us build a complete example with a local 3-node replica set using Docker and a Node.js application with proper configuration.

Docker Compose for a Local Replica Set

# docker-compose.yml
version: "3.8"

services:
  mongo1:
    image: mongo:7
    container_name: mongo1
    command: mongod --replSet myrs --port 27017 --bind_ip_all
    ports:
      - "27017:27017"
    volumes:
      - mongo1_data:/data/db

  mongo2:
    image: mongo:7
    container_name: mongo2
    command: mongod --replSet myrs --port 27018 --bind_ip_all
    ports:
      - "27018:27018"
    volumes:
      - mongo2_data:/data/db

  mongo3:
    image: mongo:7
    container_name: mongo3
    command: mongod --replSet myrs --port 27019 --bind_ip_all
    ports:
      - "27019:27019"
    volumes:
      - mongo3_data:/data/db

volumes:
  mongo1_data:
  mongo2_data:
  mongo3_data:

# Start the containers
docker-compose up -d

# Initialize the replica set
docker exec -it mongo1 mongosh --eval '
rs.initiate({
  _id: "myrs",
  members: [
    { _id: 0, host: "mongo1:27017" },
    { _id: 1, host: "mongo2:27018" },
    { _id: 2, host: "mongo3:27019" }
  ]
});
'

# Verify the replica set is healthy
docker exec -it mongo1 mongosh --eval "rs.status().members.map(function(m) { return m.name + ': ' + m.stateStr; })"

Expected output:

[ 'mongo1:27017: PRIMARY', 'mongo2:27018: SECONDARY', 'mongo3:27019: SECONDARY' ]

Node.js Application

// app.js
var mongoose = require("mongoose");

// Connection string includes all replica set members
var connectionString = "mongodb://localhost:27017,localhost:27018,localhost:27019/myapp?replicaSet=myrs";

var connectionOptions = {
  // Write concern: majority with journal for durability
  w: "majority",
  journal: true,
  wtimeoutMS: 5000,

  // Read preference: primary for consistency
  readPreference: "primary",

  // Read concern: majority for durable reads
  readConcern: { level: "majority" },

  // Retry writes on transient failures (network blips, failover)
  retryWrites: true,
  retryReads: true,

  // Connection pool settings
  maxPoolSize: 50,
  minPoolSize: 5,

  // Server selection timeout — how long the driver waits
  // to find a suitable server before erroring
  serverSelectionTimeoutMS: 15000,

  // Heartbeat frequency for topology monitoring
  heartbeatFrequencyMS: 10000,

  // Socket timeout
  socketTimeoutMS: 45000
};

// Define a schema
var OrderSchema = new mongoose.Schema({
  orderId: { type: String, required: true, unique: true },
  customerId: String,
  items: [{
    name: String,
    quantity: Number,
    price: Number
  }],
  total: Number,
  status: { type: String, default: "pending" },
  createdAt: { type: Date, default: Date.now }
});

var Order = mongoose.model("Order", OrderSchema);

// Connect with event handlers for failover awareness
function connectToDatabase(callback) {
  mongoose.connect(connectionString, connectionOptions);

  var db = mongoose.connection;

  db.on("error", function(err) {
    console.error("MongoDB connection error:", err.message);
  });

  db.on("connected", function() {
    console.log("Connected to MongoDB replica set");
  });

  db.on("disconnected", function() {
    console.warn("Disconnected from MongoDB — driver will attempt reconnection");
  });

  db.on("reconnected", function() {
    console.log("Reconnected to MongoDB replica set");
  });

  // Topology events for replica set changes
  db.on("close", function() {
    console.warn("MongoDB connection closed");
  });

  db.once("open", function() {
    console.log("MongoDB connection open and ready");
    if (callback) callback();
  });
}

// Create an order with proper error handling for failover
function createOrder(orderData, callback) {
  var order = new Order(orderData);
  order.save(function(err, savedOrder) {
    if (err) {
      // Check for retryable error during failover
      if (err.hasErrorLabel && err.hasErrorLabel("RetryableWriteError")) {
        console.warn("Retryable write error — driver should handle automatically");
      }
      return callback(err);
    }
    callback(null, savedOrder);
  });
}

// Read with explicit read preference for analytics queries
function getOrderAnalytics(callback) {
  Order.find({ status: "completed" })
    .read("secondaryPreferred")  // Offload to secondaries
    .lean()
    .exec(function(err, orders) {
      if (err) return callback(err);

      var totalRevenue = orders.reduce(function(sum, order) {
        return sum + order.total;
      }, 0);

      callback(null, {
        count: orders.length,
        totalRevenue: totalRevenue,
        averageOrder: orders.length > 0 ? totalRevenue / orders.length : 0
      });
    });
}

// Read with majority concern when consistency is critical
function getOrderForPayment(orderId, callback) {
  Order.findOne({ orderId: orderId })
    .read("primary")
    .setOptions({ readConcern: { level: "majority" } })
    .exec(function(err, order) {
      if (err) return callback(err);
      callback(null, order);
    });
}

// Monitor replica set health
function checkReplicaSetHealth(callback) {
  var admin = mongoose.connection.db.admin();
  admin.command({ replSetGetStatus: 1 }, function(err, status) {
    if (err) return callback(err);

    var primary = null;
    var secondaries = [];

    status.members.forEach(function(member) {
      if (member.stateStr === "PRIMARY") {
        primary = member;
      } else if (member.stateStr === "SECONDARY") {
        secondaries.push(member);
      }
    });

    var health = {
      setName: status.set,
      primary: primary ? primary.name : "NONE",
      secondaries: secondaries.map(function(s) {
        var lagMs = primary ? (primary.optimeDate - s.optimeDate) : 0;
        return {
          name: s.name,
          lagMs: lagMs,
          healthy: s.health === 1
        };
      }),
      allHealthy: status.members.every(function(m) { return m.health === 1; })
    };

    callback(null, health);
  });
}

// Main entry point
connectToDatabase(function() {
  // Create a test order
  createOrder({
    orderId: "ORD-" + Date.now(),
    customerId: "CUST-001",
    items: [
      { name: "Widget", quantity: 3, price: 9.99 },
      { name: "Gadget", quantity: 1, price: 24.99 }
    ],
    total: 54.96,
    status: "pending"
  }, function(err, order) {
    if (err) {
      console.error("Failed to create order:", err.message);
      return;
    }
    console.log("Order created:", order.orderId);

    // Check replica set health
    checkReplicaSetHealth(function(err, health) {
      if (err) {
        console.error("Health check failed:", err.message);
        return;
      }
      console.log("Replica set health:", JSON.stringify(health, null, 2));

      // Run analytics query against secondaries
      getOrderAnalytics(function(err, analytics) {
        if (err) {
          console.error("Analytics failed:", err.message);
          return;
        }
        console.log("Order analytics:", analytics);

        mongoose.disconnect();
      });
    });
  });
});

# Install dependencies and run
npm init -y
npm install mongoose@7
node app.js

Expected output:

Connected to MongoDB replica set
MongoDB connection open and ready
Order created: ORD-1707840000000
Replica set health: {
  "setName": "myrs",
  "primary": "mongo1:27017",
  "secondaries": [
    { "name": "mongo2:27018", "lagMs": 0, "healthy": true },
    { "name": "mongo3:27019", "lagMs": 0, "healthy": true }
  ],
  "allHealthy": true
}
Order analytics: { count: 1, totalRevenue: 54.96, averageOrder: 54.96 }

Testing Failover

# Stop the primary to trigger an election
docker stop mongo1

# Watch the application logs — you should see:
# "Disconnected from MongoDB — driver will attempt reconnection"
# Then after ~12 seconds:
# "Reconnected to MongoDB replica set"

# Verify a new primary was elected
docker exec -it mongo2 mongosh --port 27018 --eval "rs.status().members.map(function(m) { return m.name + ': ' + m.stateStr; })"

# Restart the original primary (it comes back as secondary)
docker start mongo1

Handling Failover in Application Code

The MongoDB driver handles most failover scenarios automatically when retryWrites and retryReads are enabled. However, there are edge cases your application needs to handle.

Retryable writes cover most single-document operations (insertOne, updateOne, deleteOne, findOneAndUpdate). Multi-document operations like updateMany and deleteMany are not retryable because the driver cannot know how many documents were already modified.

Buffering is Mongoose's behavior of queuing operations when the connection is lost. By default, Mongoose buffers operations for up to 10 seconds (bufferTimeoutMS), giving the driver time to reconnect after a failover. If the reconnection does not happen within that window, buffered operations throw a MongooseServerSelectionError.

// Configure buffer timeout for your failover expectations
mongoose.connect(connectionString, {
  bufferTimeoutMS: 15000,  // Buffer operations for 15s during failover
  serverSelectionTimeoutMS: 15000
});

For critical write operations during failover, implement application-level idempotency:

function createOrderIdempotent(orderData, callback) {
  var order = new Order(orderData);
  order.save(function(err, savedOrder) {
    if (err && err.code === 11000) {
      // Duplicate key — the order was already created (retry hit)
      Order.findOne({ orderId: orderData.orderId }, function(findErr, existing) {
        if (findErr) return callback(findErr);
        callback(null, existing);
      });
      return;
    }
    if (err) return callback(err);
    callback(null, savedOrder);
  });
}

Common Issues and Troubleshooting

1. "MongoServerSelectionError: connection timed out" After Failover

MongoServerSelectionError: connection <monitor> to 192.168.1.10:27017 timed out

This happens when your serverSelectionTimeoutMS is shorter than the failover time. The driver cannot find a suitable server before the timeout expires. Increase serverSelectionTimeoutMS to at least 15000ms (15 seconds) to accommodate the default election timeout of 10 seconds plus network overhead.

2. Replica Set Member Stuck in RECOVERING State

"stateStr": "RECOVERING"

A member enters RECOVERING when it cannot keep up with the oplog — usually because the oplog rolled past the point where the member last synced. The member needs a full resync. Check if the oplog window is too small for your write volume and resize it. Then resync the member:

# On the stuck member, force a resync
mongosh --eval "db.adminCommand({resync: 1})"

For large datasets, it is faster to stop the member, delete its data directory, and let it perform a fresh initial sync.

3. Write Concern Timeout with w:majority

WriteConcernError: waiting for replication timed out. w: majority, wtimeout: 5000

This means the write was applied on the primary but could not be replicated to a majority of members within your wtimeout window. Common causes: a secondary is down or unreachable, replication lag is severe, or network issues between members. Check rs.status() immediately — if only one secondary is reachable in a 3-member set, majority writes will fail because you need 2 out of 3.

4. "NotPrimaryError" During Failover

MongoServerError: not primary

Your application sent a write to a node that is no longer primary. With retryWrites: true, the driver retries automatically on a new primary. If you see this error propagating to your application, verify that retryWrites is enabled in your connection options and that you are using a supported write operation (single-document writes).

5. Reads Returning Stale Data from Secondaries

If you are using secondary or secondaryPreferred read preference and getting stale data, check replication lag with rs.printSecondaryReplicationInfo(). If lag is consistently above acceptable thresholds, investigate the secondary's hardware (CPU, disk I/O) and network bandwidth. Consider switching to majority read concern to avoid reading data that might be rolled back.

Best Practices

Always use w: "majority" for critical writes. The default w: 1 only guarantees the write exists on one node. A failover before replication means data loss. The latency cost of w: "majority" is minimal in a healthy replica set — typically under 10ms additional.
Enable retryWrites and retryReads in every connection string. These options handle transient failures during elections and failover automatically. There is no good reason to leave them disabled.
Run at least three data-bearing members. Avoid the arbiter-plus-two-data-nodes topology. Three data-bearing members gives you the ability to lose one member while maintaining both majority write concern and a quorum for elections.
Size your oplog for at least 72 hours of operations. This gives you a comfortable window for maintenance, initial syncs, and recovering from extended outages. Monitor the oplog window regularly and resize if it shrinks below your threshold.
Use a delayed replica as a safety net. A 1-hour delayed, hidden member protects against accidental data destruction. It costs one additional server but can save your entire business when someone runs a bad migration.
Never expose MongoDB directly to the internet. Replica set members communicate on internal networks. Use authentication (--keyFile or x.509) between members and enable access control for client connections. MongoDB without authentication is a data breach waiting to happen.
Monitor replication lag continuously. Set up alerts for lag exceeding 10 seconds. Sustained lag degrades your read scaling capability and increases the window for data loss during failover. Use MongoDB's free monitoring, Atlas alerts, or your own monitoring stack.
Use connection string DNS seedlist format (mongodb+srv://) when possible. This allows the driver to discover all replica set members through DNS SRV records, simplifying configuration and handling topology changes without application redeployment.

MongoDB Replication for High Availability

MongoDB Replication for High Availability

Prerequisites

Why Replication Matters

Replica Set Architecture

The Oplog and How Replication Works

Initiating a Replica Set

Automatic Failover and Elections

Read Preferences

Write Concerns

Read Concerns

Replica Set Configuration Options

Monitoring Replication Lag

Adding and Removing Members

Hidden Members and Delayed Replicas

Replica Set Tags for Geographic Distribution

Complete Working Example: Node.js with Replica Set

Docker Compose for a Local Replica Set

Node.js Application

Testing Failover

Handling Failover in Application Code

Common Issues and Troubleshooting

1. "MongoServerSelectionError: connection timed out" After Failover

2. Replica Set Member Stuck in RECOVERING State

3. Write Concern Timeout with w:majority

4. "NotPrimaryError" During Failover

5. Reads Returning Stale Data from Secondaries

Best Practices

References

Quick Links

Need Expert Help?