MongoDB Schema Validation and Data Integrity
A practical guide to MongoDB schema validation covering native JSON Schema validators, Mongoose validation, custom validators, and data integrity strategies for Node.js.
MongoDB Schema Validation and Data Integrity
MongoDB's reputation as a "schemaless" database is one of the most dangerous half-truths in our industry. Yes, MongoDB does not enforce a fixed schema by default. But "schemaless" does not mean "no schema." Every application has a schema — the question is whether you enforce it in your application code, in the database, or not at all. If you choose "not at all," you will pay for it. I have inherited production databases where the same collection had documents with email, Email, emailAddress, and user_email fields. That is the cost of ignoring schema validation.
In this article, I will walk through both native MongoDB JSON Schema validation and Mongoose-level validation, show you when to use each, and build a complete working example that enforces real data integrity.
Schema-Free vs Schema-on-Read vs Schema-on-Write
Before touching any code, let's clarify the three approaches:
Schema-free means you store whatever you want and hope for the best. This is the default MongoDB behavior. It is fine for prototyping and terrible for production.
Schema-on-read means your data has no enforced structure at write time, but your application imposes structure when it reads documents. This is fragile because it pushes validation logic into every read path and trusts every writer to get it right.
Schema-on-write means you validate documents before they are persisted. This is what we want. MongoDB supports this natively through JSON Schema validators, and Mongoose adds another validation layer on top. In a well-designed system, you use both.
Native MongoDB JSON Schema Validation
Starting with MongoDB 3.6, you can attach a JSON Schema validator to any collection using the $jsonSchema operator. This runs inside the database engine itself — no application code required.
Creating a Collection with Validation
var MongoClient = require("mongodb").MongoClient;
var uri = "mongodb://localhost:27017";
var dbName = "myapp";
function createUsersCollection(callback) {
MongoClient.connect(uri, function(err, client) {
if (err) return callback(err);
var db = client.db(dbName);
db.createCollection("users", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["email", "password", "role", "createdAt"],
properties: {
email: {
bsonType: "string",
pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$",
description: "Must be a valid email address"
},
password: {
bsonType: "string",
minLength: 8,
description: "Must be at least 8 characters"
},
name: {
bsonType: "object",
required: ["first", "last"],
properties: {
first: { bsonType: "string", minLength: 1 },
last: { bsonType: "string", minLength: 1 }
}
},
age: {
bsonType: "int",
minimum: 13,
maximum: 150,
description: "Must be between 13 and 150"
},
role: {
bsonType: "string",
enum: ["user", "editor", "admin", "superadmin"],
description: "Must be a valid role"
},
address: {
bsonType: "object",
properties: {
street: { bsonType: "string" },
city: { bsonType: "string" },
state: { bsonType: "string", minLength: 2, maxLength: 2 },
zip: { bsonType: "string", pattern: "^[0-9]{5}(-[0-9]{4})?$" }
}
},
tags: {
bsonType: "array",
items: { bsonType: "string" },
maxItems: 10
},
createdAt: { bsonType: "date" },
updatedAt: { bsonType: "date" }
},
additionalProperties: false
}
},
validationLevel: "strict",
validationAction: "error"
}, function(err, result) {
client.close();
callback(err, result);
});
});
}
A few things to notice here. The bsonType field uses BSON types, not JSON types — so you write "int" instead of "integer" and "object" instead of "object". The additionalProperties: false setting is aggressive but effective: it rejects any field not defined in the schema. I use this on collections where data discipline matters.
Validation Levels: Strict vs Moderate
The validationLevel option controls which documents the validator applies to:
strict(default): Validates all inserts and updates. Every document must pass.moderate: Only validates documents that already match the schema. If a document was inserted before validation was added and does not conform, updates to that document skip validation.
moderate is your friend when adding validation to an existing collection with messy data. It lets you enforce rules going forward without blocking updates to legacy documents.
Validation Actions: Error vs Warn
The validationAction option controls what happens when validation fails:
error(default): The write operation is rejected.warn: The write succeeds but a warning is logged to the MongoDB log.
Use warn during a migration period when you want visibility into violations without breaking production. Switch to error once you have cleaned up the data.
Modifying Validators on Existing Collections
You can add or change validators on an existing collection using the collMod command:
function addValidation(db, callback) {
db.command({
collMod: "users",
validator: {
$jsonSchema: {
bsonType: "object",
required: ["email", "role"],
properties: {
email: {
bsonType: "string",
pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
},
role: {
bsonType: "string",
enum: ["user", "editor", "admin", "superadmin"]
}
}
}
},
validationLevel: "moderate",
validationAction: "warn"
}, callback);
}
This is the pattern I use when retrofitting validation onto production collections. Start with moderate and warn, review the logs, fix the data, then switch to strict and error.
Mongoose Schema Validation
If you are using Mongoose (and most Node.js applications are), you get a second layer of validation that runs in your application process before the document even reaches MongoDB.
Defining a Mongoose Schema with Validators
var mongoose = require("mongoose");
var Schema = mongoose.Schema;
var addressSchema = new Schema({
street: { type: String, trim: true },
city: { type: String, trim: true },
state: {
type: String,
uppercase: true,
minlength: [2, "State code must be exactly 2 characters"],
maxlength: [2, "State code must be exactly 2 characters"]
},
zip: {
type: String,
match: [/^[0-9]{5}(-[0-9]{4})?$/, "Invalid ZIP code format"]
}
}, { _id: false });
var userSchema = new Schema({
email: {
type: String,
required: [true, "Email is required"],
unique: true,
lowercase: true,
trim: true,
match: [
/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/,
"Please enter a valid email address"
]
},
password: {
type: String,
required: [true, "Password is required"],
minlength: [8, "Password must be at least 8 characters"],
validate: {
validator: function(v) {
return /[A-Z]/.test(v) && /[a-z]/.test(v) && /[0-9]/.test(v);
},
message: "Password must contain uppercase, lowercase, and a number"
}
},
name: {
first: {
type: String,
required: [true, "First name is required"],
trim: true,
minlength: 1
},
last: {
type: String,
required: [true, "Last name is required"],
trim: true,
minlength: 1
}
},
age: {
type: Number,
min: [13, "Must be at least 13 years old"],
max: [150, "Age cannot exceed 150"]
},
role: {
type: String,
enum: {
values: ["user", "editor", "admin", "superadmin"],
message: "{VALUE} is not a valid role"
},
default: "user"
},
address: addressSchema,
tags: {
type: [String],
validate: {
validator: function(v) {
return v.length <= 10;
},
message: "Cannot have more than 10 tags"
}
},
isActive: { type: Boolean, default: true },
createdAt: { type: Date, default: Date.now },
updatedAt: { type: Date, default: Date.now }
});
Notice that Mongoose validators give you custom error messages per field. This is a significant advantage over native MongoDB validation, which gives you a generic "Document failed validation" error with no indication of which field caused the problem.
Async Validators
Some validation requires external calls — checking for duplicate emails, verifying data against another service, or querying related documents. Mongoose handles this with async validators:
userSchema.path("email").validate({
validator: function(value) {
var User = mongoose.model("User");
return User.countDocuments({ email: value, _id: { $ne: this._id } })
.then(function(count) {
return count === 0;
});
},
message: "This email address is already registered"
});
A word of caution: async validators introduce a race condition. Two requests can validate simultaneously, both pass the uniqueness check, and both insert. Always back up async uniqueness validators with a unique index in MongoDB. The validator gives you a friendly error message; the index gives you actual safety.
Conditional Validation
Sometimes a field is required only under certain conditions. Mongoose supports this through function-based required:
userSchema.path("address.street").required(function() {
return this.role === "admin" || this.role === "superadmin";
}, "Address is required for admin users");
This enforces that admin users must have an address while regular users do not. Conditional validation like this is difficult to express in native MongoDB JSON Schema, which is another reason to layer Mongoose validation on top.
Pre-Save Middleware for Complex Validation
For validation logic that spans multiple fields, use pre-save middleware:
userSchema.pre("save", function(next) {
if (this.role === "superadmin" && this.age < 21) {
return next(new Error("Superadmin users must be at least 21 years old"));
}
if (this.isModified("password") && this.password === this.email) {
return next(new Error("Password cannot be the same as email"));
}
this.updatedAt = new Date();
next();
});
Pre-save middleware runs after schema-level validators, giving you a place for cross-field checks that individual validators cannot express.
Error Handling and Custom Error Messages
Mongoose validation errors have a structured format you should take advantage of:
var User = mongoose.model("User", userSchema);
function createUser(userData, callback) {
var user = new User(userData);
user.save(function(err, savedUser) {
if (err) {
if (err.name === "ValidationError") {
var messages = {};
Object.keys(err.errors).forEach(function(field) {
messages[field] = err.errors[field].message;
});
return callback({
status: 400,
error: "Validation failed",
details: messages
});
}
if (err.code === 11000) {
return callback({
status: 409,
error: "Duplicate entry",
details: { email: "This email is already registered" }
});
}
return callback({ status: 500, error: "Internal server error" });
}
callback(null, savedUser);
});
}
Native MongoDB validation errors (when using the driver directly) are less descriptive. You get an error with code: 121 and a generic message. This is one of the strongest arguments for using Mongoose validation as your primary layer and native validation as a safety net.
Unique Constraints and Compound Indexes
The unique option in Mongoose creates a MongoDB unique index, but it is not a validator — it is enforced at the database level. This distinction matters:
// Single field unique index
userSchema.index({ email: 1 }, { unique: true });
// Compound unique index - one account per organization
userSchema.index({ email: 1, organizationId: 1 }, { unique: true });
// Partial unique index - unique email only among active users
userSchema.index(
{ email: 1 },
{ unique: true, partialFilterExpression: { isActive: true } }
);
Partial unique indexes are powerful. They let you soft-delete users (set isActive: false) without blocking new users from registering with the same email. This is a common requirement that most developers solve with awkward workarounds because they do not know partial indexes exist.
Complete Working Example
Here is a full user management module that combines native MongoDB validation with Mongoose validation:
var mongoose = require("mongoose");
var Schema = mongoose.Schema;
// --- Mongoose Schema Definition ---
var addressSchema = new Schema({
street: { type: String, trim: true },
city: { type: String, trim: true },
state: {
type: String,
uppercase: true,
minlength: [2, "State must be 2 characters"],
maxlength: [2, "State must be 2 characters"]
},
zip: {
type: String,
match: [/^[0-9]{5}(-[0-9]{4})?$/, "Invalid ZIP code"]
}
}, { _id: false });
var userSchema = new Schema({
email: {
type: String,
required: [true, "Email is required"],
lowercase: true,
trim: true,
match: [/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/, "Invalid email"]
},
password: {
type: String,
required: [true, "Password is required"],
minlength: [8, "Password must be at least 8 characters"],
validate: {
validator: function(v) {
return /[A-Z]/.test(v) && /[a-z]/.test(v) && /[0-9]/.test(v);
},
message: "Password needs uppercase, lowercase, and a number"
}
},
name: {
first: { type: String, required: true, trim: true },
last: { type: String, required: true, trim: true }
},
age: {
type: Number,
min: [13, "Must be at least 13"],
max: [150, "Invalid age"]
},
role: {
type: String,
enum: ["user", "editor", "admin", "superadmin"],
default: "user"
},
address: addressSchema,
tags: {
type: [String],
validate: [function(v) { return v.length <= 10; }, "Max 10 tags"]
},
isActive: { type: Boolean, default: true },
createdAt: { type: Date, default: Date.now },
updatedAt: { type: Date, default: Date.now }
});
// Indexes
userSchema.index({ email: 1 }, { unique: true });
userSchema.index({ role: 1 });
userSchema.index({ createdAt: -1 });
// Async email uniqueness validator
userSchema.path("email").validate({
validator: function(value) {
var self = this;
return mongoose.model("User").countDocuments({
email: value,
_id: { $ne: self._id }
}).then(function(count) {
return count === 0;
});
},
message: "Email already registered"
});
// Pre-save middleware
userSchema.pre("save", function(next) {
this.updatedAt = new Date();
next();
});
var User = mongoose.model("User", userSchema);
// --- API Layer ---
function formatValidationError(err) {
if (err.name === "ValidationError") {
var details = {};
Object.keys(err.errors).forEach(function(key) {
details[key] = err.errors[key].message;
});
return { status: 400, error: "Validation failed", details: details };
}
if (err.code === 11000) {
return { status: 409, error: "Duplicate", details: { email: "Already exists" } };
}
return { status: 500, error: "Server error" };
}
function registerUser(data, callback) {
var user = new User(data);
user.save(function(err, doc) {
if (err) return callback(formatValidationError(err));
callback(null, {
id: doc._id,
email: doc.email,
name: doc.name,
role: doc.role
});
});
}
function updateUser(userId, updates, callback) {
User.findById(userId, function(err, user) {
if (err) return callback({ status: 500, error: "Server error" });
if (!user) return callback({ status: 404, error: "User not found" });
Object.keys(updates).forEach(function(key) {
user[key] = updates[key];
});
user.save(function(saveErr, doc) {
if (saveErr) return callback(formatValidationError(saveErr));
callback(null, doc);
});
});
}
// --- Native Validator Setup (run once during initialization) ---
function ensureNativeValidation(callback) {
var db = mongoose.connection.db;
db.command({
collMod: "users",
validator: {
$jsonSchema: {
bsonType: "object",
required: ["email", "password", "role"],
properties: {
email: { bsonType: "string" },
password: { bsonType: "string", minLength: 8 },
role: {
bsonType: "string",
enum: ["user", "editor", "admin", "superadmin"]
}
}
}
},
validationLevel: "strict",
validationAction: "error"
}, function(err) {
if (err) console.error("Failed to set native validator:", err.message);
else console.log("Native MongoDB validation active");
callback(err);
});
}
// --- Usage ---
mongoose.connect("mongodb://localhost:27017/myapp", function(err) {
if (err) throw err;
ensureNativeValidation(function() {
registerUser({
email: "[email protected]",
password: "SecurePass1",
name: { first: "Shane", last: "Larson" },
age: 35,
role: "admin",
address: { city: "Berkeley", state: "CA", zip: "94704" },
tags: ["nodejs", "mongodb"]
}, function(err, user) {
if (err) {
console.error("Registration failed:", JSON.stringify(err, null, 2));
} else {
console.log("User created:", user);
}
mongoose.disconnect();
});
});
});
module.exports = {
User: User,
registerUser: registerUser,
updateUser: updateUser,
ensureNativeValidation: ensureNativeValidation
};
Testing Validation Rules
Validation logic should be tested like any other business logic. Here is a pattern using a simple test harness:
var mongoose = require("mongoose");
var assert = require("assert");
var User = require("./userModel").User;
function runTests(callback) {
var tests = [
{ name: "reject missing email", data: { password: "Test1234", name: { first: "A", last: "B" } }, shouldFail: true },
{ name: "reject short password", data: { email: "[email protected]", password: "short", name: { first: "A", last: "B" } }, shouldFail: true },
{ name: "reject weak password", data: { email: "[email protected]", password: "alllowercase1", name: { first: "A", last: "B" } }, shouldFail: true },
{ name: "reject invalid role", data: { email: "[email protected]", password: "Test1234", role: "hacker", name: { first: "A", last: "B" } }, shouldFail: true },
{ name: "reject bad email", data: { email: "notanemail", password: "Test1234", name: { first: "A", last: "B" } }, shouldFail: true },
{ name: "reject underage", data: { email: "[email protected]", password: "Test1234", age: 10, name: { first: "A", last: "B" } }, shouldFail: true },
{ name: "accept valid user", data: { email: "[email protected]", password: "Valid123", name: { first: "Test", last: "User" }, age: 25 }, shouldFail: false }
];
var passed = 0;
var failed = 0;
function runNext(index) {
if (index >= tests.length) {
console.log("\nResults: " + passed + " passed, " + failed + " failed");
return callback();
}
var test = tests[index];
var user = new User(test.data);
user.validate(function(err) {
var didFail = !!err;
if (didFail === test.shouldFail) {
console.log("PASS: " + test.name);
passed++;
} else {
console.log("FAIL: " + test.name + " (expected " +
(test.shouldFail ? "failure" : "success") + ")");
failed++;
}
runNext(index + 1);
});
}
runNext(0);
}
Notice we use user.validate() instead of user.save(). This lets you test validation without touching the database. Fast, isolated, and deterministic.
Migration Strategies for Existing Data
Adding validation to an existing collection is a three-phase process:
Phase 1: Audit. Find documents that violate your planned schema.
function auditCollection(db, callback) {
db.collection("users").find({
$or: [
{ email: { $exists: false } },
{ email: { $not: { $type: "string" } } },
{ role: { $nin: ["user", "editor", "admin", "superadmin"] } },
{ password: { $exists: false } }
]
}).toArray(function(err, violations) {
console.log("Found " + violations.length + " non-conforming documents");
callback(err, violations);
});
}
Phase 2: Fix. Update or remove non-conforming documents.
function fixMissingRoles(db, callback) {
db.collection("users").updateMany(
{ role: { $exists: false } },
{ $set: { role: "user" } },
function(err, result) {
console.log("Fixed " + result.modifiedCount + " documents");
callback(err);
}
);
}
Phase 3: Enforce. Apply the validator with moderate level first, monitor, then switch to strict.
Native Validation vs Mongoose Validation
| Aspect | Native MongoDB | Mongoose |
|---|---|---|
| Runs where | Database engine | Application process |
| Bypassed by direct DB access | No | Yes |
| Custom error messages | No | Yes |
| Async validators | No | Yes |
| Conditional logic | Limited | Full |
| Cross-field validation | No | Yes (middleware) |
| Performance overhead | Minimal | Some |
| Works without ODM | Yes | No |
My recommendation: use both. Mongoose validation is your primary layer — it gives you rich error messages and complex logic. Native validation is your safety net — it catches writes that bypass your application (scripts, admin tools, other services).
Common Issues and Troubleshooting
1. "Document failed validation" with no details. Native MongoDB validation errors do not tell you which field failed. This is by design (security), but it makes debugging painful. Solution: validate with Mongoose first, and only rely on native validation as a backstop. When debugging, temporarily set validationAction: "warn" and check the MongoDB logs.
2. Unique index errors during bulk inserts. If you are importing data and hit duplicate key errors, use insertMany with ordered: false. This lets the rest of the batch succeed even if some documents violate uniqueness. Collect the errors and process them separately.
3. update() and findOneAndUpdate() skip Mongoose validators by default. This catches everyone. Mongoose only runs validators on save() by default. For update operations, pass { runValidators: true }:
User.findOneAndUpdate(
{ _id: userId },
{ $set: { role: "admin" } },
{ runValidators: true, new: true },
callback
);
4. Validation on nested arrays of objects. Mongoose validates array elements if you define a subdocument schema, but it does not validate elements defined as plain objects. Always use a proper Schema for array elements that need validation.
5. Sparse indexes and null values. A unique index rejects duplicate null values by default. If a field is optional but unique when present, create a sparse unique index:
userSchema.index({ phone: 1 }, { unique: true, sparse: true });
Best Practices
Validate at both layers. Mongoose gives you developer-friendly errors. Native validation protects against bypasses. Use both.
Always set
runValidators: trueon updates. Better yet, create a wrapper function that enforces this so individual developers cannot forget.Use
validate()beforesave()in tests. It is faster and does not require database setup. Save database integration tests for your CI pipeline.Start with
moderatevalidation when retrofitting. Do not setstricton a collection with dirty data. You will block legitimate updates on existing documents.Version your validators. When your schema changes, update both the Mongoose schema and the native validator. Keep them in sync. A mismatch causes confusing errors.
Prefer
enumover regex for fixed sets. Regex validation on role fields is harder to maintain than a simple enum array. Save regex for genuinely variable patterns like emails and zip codes.Index what you constrain. If you are enforcing uniqueness, make sure you have an index for it. Without the index, the constraint does not exist at the database level — you just have an application-level check with race conditions.
Document your validation rules. Your schema definition IS documentation, but a brief comment explaining WHY a constraint exists (not just WHAT it is) saves future developers hours of archeology.
References
- MongoDB JSON Schema Validation — Official documentation on native validation
- Mongoose Validation — Mongoose validation API reference
- JSON Schema Specification — The underlying spec MongoDB uses
- MongoDB Index Types — Unique, sparse, partial, and compound indexes
- MongoDB collMod Command — Modifying collection validators