Vector Database Selection Guide for Node.js Developers
Compare vector databases for Node.js including pgvector, Pinecone, Weaviate, and Qdrant with benchmarks and selection criteria.
Vector Database Selection Guide for Node.js Developers
Overview
Vector databases store and query high-dimensional embeddings, which is the foundational layer beneath every semantic search, recommendation engine, and RAG pipeline you will build. Choosing the wrong one means rewriting your data layer six months from now when your prototype hits production scale. This guide walks through the five major options available to Node.js developers, with real benchmarks, working code, and the hard-won opinions you need to make a decision you will not regret.
Prerequisites
- Node.js v18+ installed
- Basic understanding of embeddings and what they represent
- Familiarity with PostgreSQL (for pgvector sections)
- An OpenAI API key (for generating embeddings in examples)
- npm packages:
pg,pgvector,@pinecone-database/pinecone,openai
What Vector Databases Do and Why You Need One
Traditional databases index rows by exact values. You query for WHERE name = 'Shane' and the database uses a B-tree index to find the match in logarithmic time. Vector databases solve a fundamentally different problem: finding the closest neighbors in high-dimensional space.
When you generate an embedding from text using a model like OpenAI's text-embedding-3-small, you get a 1536-dimensional float array. Two pieces of text that mean similar things will have vectors that are close together in that space. A vector database lets you ask: "give me the 10 vectors closest to this query vector," and it does so efficiently even across millions of records.
You need a vector database when:
- You are building semantic search (search by meaning, not keywords)
- You are implementing RAG (Retrieval-Augmented Generation) for an LLM
- You need recommendation engines based on content similarity
- You are doing duplicate or near-duplicate detection
- You want to classify documents against a known set of categories
You do not need a vector database when simple full-text search or exact-match queries solve your problem. Do not add infrastructure complexity because embeddings sound impressive.
pgvector: When Your Existing Postgres Is Enough
pgvector is a PostgreSQL extension that adds vector column types and similarity search operators directly to your existing database. If you already run Postgres, this is where you should start.
Setup
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
content TEXT NOT NULL,
embedding vector(1536),
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
Node.js Integration
var pg = require("pg");
var pgvector = require("pgvector/pg");
var pool = new pg.Pool({
connectionString: process.env.DATABASE_URL
});
function initPgvector() {
return pool.query("CREATE EXTENSION IF NOT EXISTS vector").then(function () {
return pgvector.registerType(pool);
});
}
function insertDocument(title, content, embedding) {
var sql = "INSERT INTO documents (title, content, embedding) VALUES ($1, $2, $3) RETURNING id";
return pool.query(sql, [title, content, pgvector.toSql(embedding)]);
}
function searchSimilar(queryEmbedding, limit) {
var sql = [
"SELECT id, title, content,",
" 1 - (embedding <=> $1) AS similarity",
"FROM documents",
"ORDER BY embedding <=> $1",
"LIMIT $2"
].join("\n");
return pool.query(sql, [pgvector.toSql(queryEmbedding), limit || 10]);
}
When pgvector Is the Right Choice
pgvector shines when you have fewer than 5 million vectors, you already run PostgreSQL, and you want to join vector search results with relational data in a single query. The ability to do WHERE category = 'engineering' ORDER BY embedding <=> $1 in one SQL statement is genuinely powerful. You do not need a second database, a second backup strategy, or a second monitoring stack.
The tradeoff is performance at scale. pgvector is slower than purpose-built vector databases once you pass a few million records, and its indexing options are more limited.
Pinecone: When to Pay for Managed
Pinecone is a fully managed vector database. You do not run servers, tune indexes, or worry about replication. You send vectors in, you query vectors out, and Pinecone handles everything in between.
Node.js Integration
var Pinecone = require("@pinecone-database/pinecone").Pinecone;
var pinecone = new Pinecone({
apiKey: process.env.PINECONE_API_KEY
});
function upsertVectors(indexName, vectors) {
var index = pinecone.Index(indexName);
var records = vectors.map(function (v) {
return {
id: v.id,
values: v.embedding,
metadata: {
title: v.title,
category: v.category,
created_at: v.createdAt
}
};
});
return index.upsert(records);
}
function querySimilar(indexName, queryEmbedding, filter, topK) {
var index = pinecone.Index(indexName);
return index.query({
vector: queryEmbedding,
topK: topK || 10,
includeMetadata: true,
filter: filter || {}
});
}
// Example with metadata filtering
function searchByCategory(indexName, queryEmbedding, category) {
return querySimilar(indexName, queryEmbedding, {
category: { "$eq": category }
}, 10);
}
When to Pay for Pinecone
Pay for Pinecone when your team is small and you cannot afford to operate infrastructure, when you need to scale past 10 million vectors without performance tuning, or when your SLA demands the kind of uptime guarantees that come with a managed service. Pinecone's serverless tier has also made it cost-competitive for smaller workloads.
Do not pay for Pinecone when you have a 50,000-document knowledge base that fits comfortably in pgvector. That is paying a premium for convenience you do not need.
Weaviate: Open-Source with Hybrid Search
Weaviate is an open-source vector database that natively supports hybrid search, combining vector similarity with BM25 keyword search in a single query. This is a significant differentiator.
var weaviate = require("weaviate-ts-client").default;
var client = weaviate.client({
scheme: "http",
host: "localhost:8080"
});
function hybridSearch(query, queryEmbedding, limit) {
return client.graphql
.get()
.withClassName("Document")
.withHybrid({
query: query,
vector: queryEmbedding,
alpha: 0.75 // 0 = pure keyword, 1 = pure vector
})
.withFields("title content _additional { score }")
.withLimit(limit || 10)
.do();
}
The alpha parameter lets you tune the balance between semantic and keyword relevance. In practice, an alpha of 0.7 to 0.8 works well for most search use cases. Pure vector search misses exact keyword matches; pure keyword search misses semantic meaning. Hybrid search gets you both.
Weaviate also supports built-in vectorization modules, so you can send raw text and let Weaviate generate embeddings for you. This simplifies your pipeline but locks you into their module ecosystem.
Qdrant: Open-Source with Rich Filtering
Qdrant is purpose-built for vector search with an emphasis on filtering performance. Where other databases treat metadata filtering as an afterthought, Qdrant indexes metadata natively and filters before the vector search, which means filtered queries are fast even on large datasets.
var QdrantClient = require("@qdrant/js-client-rest").QdrantClient;
var qdrant = new QdrantClient({ host: "localhost", port: 6333 });
function createCollection(name, vectorSize) {
return qdrant.createCollection(name, {
vectors: {
size: vectorSize,
distance: "Cosine"
}
});
}
function searchWithFilters(collectionName, queryEmbedding, filters) {
return qdrant.search(collectionName, {
vector: queryEmbedding,
limit: 10,
filter: {
must: [
{ key: "category", match: { value: filters.category } },
{ key: "published", match: { value: true } }
],
must_not: [
{ key: "status", match: { value: "archived" } }
]
},
with_payload: true
});
}
Qdrant's filtering language is more expressive than most competitors. You can combine must, should, and must_not clauses, filter on nested objects, and use range filters on numeric fields. If your application needs complex metadata queries alongside vector search, Qdrant handles it well.
Chroma: Lightweight for Prototyping
Chroma is the SQLite of vector databases. It runs in-process, requires no infrastructure, and gets you from zero to working prototype in minutes.
var ChromaClient = require("chromadb").ChromaClient;
var chroma = new ChromaClient();
function setupAndSearch() {
var collection;
return chroma.createCollection({ name: "documents" })
.then(function (col) {
collection = col;
return collection.add({
ids: ["doc1", "doc2", "doc3"],
embeddings: [/* embedding arrays */],
metadatas: [
{ source: "wiki" },
{ source: "docs" },
{ source: "blog" }
],
documents: [
"First document text",
"Second document text",
"Third document text"
]
});
})
.then(function () {
return collection.query({
queryEmbeddings: [/* query embedding */],
nResults: 5,
where: { source: "docs" }
});
});
}
Chroma is excellent for local development, proof-of-concept work, and applications with fewer than 100,000 vectors. It is not suitable for production workloads that need replication, backups, or multi-tenant isolation. Use it to validate that your embedding strategy works, then migrate to a production database.
Comparison Matrix
| Feature | pgvector | Pinecone | Weaviate | Qdrant | Chroma |
|---|---|---|---|---|---|
| Scalability | Millions | Billions | Billions | Billions | Thousands |
| Query Speed (1M vectors) | 15-50ms | 5-15ms | 8-20ms | 5-15ms | 20-80ms |
| Filtering | SQL WHERE | Basic metadata | Moderate | Rich & fast | Basic |
| Cost (1M vectors/mo) | $0 (self-hosted) | $70+ | $0 (self-hosted) | $0 (self-hosted) | $0 |
| Node.js SDK Quality | Good (via pg) | Excellent | Good | Good | Adequate |
| Hybrid Search | Manual (with tsvector) | No | Native | Sparse vectors | No |
| Managed Option | Supabase, Neon | Yes (only) | Weaviate Cloud | Qdrant Cloud | No |
| Operational Burden | Low (existing PG) | None | Moderate | Moderate | None |
| ACID Transactions | Yes | No | No | No | No |
The numbers in this table are approximate and depend heavily on hardware, index configuration, and query complexity. Benchmark against your own data before making a decision.
Indexing Strategies: IVFFlat vs HNSW
Vector databases use specialized index structures to avoid brute-force scanning every vector on every query. The two dominant algorithms are IVFFlat and HNSW.
IVFFlat (Inverted File with Flat Compression)
IVFFlat partitions your vectors into clusters (called "lists") using k-means clustering. At query time, it searches only the nearest clusters rather than every vector.
-- pgvector IVFFlat index
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
The lists parameter controls how many clusters to create. A good starting point is sqrt(number_of_vectors). More lists means faster queries but slower index builds and potentially lower recall (the index might miss relevant results in adjacent clusters).
IVFFlat requires your data to be present before you build the index. If you create the index on an empty table and then insert data, the clusters will be meaningless.
Trade-off: Faster index creation, lower memory usage, but lower recall at high speed.
HNSW (Hierarchical Navigable Small World)
HNSW builds a multi-layer graph where each node connects to its approximate nearest neighbors. Queries traverse the graph from top to bottom, narrowing the search at each layer.
-- pgvector HNSW index
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
The m parameter controls how many connections each node has (higher means better recall but more memory). The ef_construction parameter controls index build quality (higher means better recall but slower builds).
HNSW can be built incrementally. You can insert data after creating the index and it remains effective. This makes it the better choice for datasets that grow over time.
Trade-off: Higher recall, works with incremental inserts, but uses more memory and takes longer to build.
Which to Choose
Use HNSW unless you have a specific reason not to. It gives better recall at comparable speed, handles incremental data well, and is the default choice for Pinecone, Qdrant, and Weaviate. Use IVFFlat when memory is constrained or when you rebuild your index regularly from a complete dataset.
When to Use pgvector vs a Dedicated Vector DB
Use pgvector when:
- You have fewer than 5 million vectors
- You already run PostgreSQL
- You need to join vector results with relational data
- You want ACID transactions on your vector data
- Your team does not want to operate additional infrastructure
- Your query patterns combine metadata filters with vector search
Use a dedicated vector database when:
- You have more than 10 million vectors
- You need sub-10ms query latency at scale
- You need advanced features like hybrid search or sparse vectors
- Your vector workload would overwhelm your application database
- You need multi-tenancy with strong isolation
The 5-10 million range is a gray zone. At that scale, benchmark both options against your actual query patterns before committing.
Hybrid Search Capabilities
Hybrid search combines dense vector similarity (semantic meaning) with sparse vector or keyword matching (exact terms). This is critical for production search applications because neither approach alone is sufficient.
// Manual hybrid search with pgvector
function hybridSearchPgvector(pool, queryEmbedding, queryText, limit) {
var sql = [
"WITH vector_results AS (",
" SELECT id, title, content,",
" 1 - (embedding <=> $1) AS vector_score",
" FROM documents",
" ORDER BY embedding <=> $1",
" LIMIT $3",
"),",
"keyword_results AS (",
" SELECT id, title, content,",
" ts_rank(to_tsvector('english', content), plainto_tsquery('english', $2)) AS keyword_score",
" FROM documents",
" WHERE to_tsvector('english', content) @@ plainto_tsquery('english', $2)",
" LIMIT $3",
")",
"SELECT COALESCE(v.id, k.id) AS id,",
" COALESCE(v.title, k.title) AS title,",
" COALESCE(v.content, k.content) AS content,",
" COALESCE(v.vector_score, 0) * 0.7 + COALESCE(k.keyword_score, 0) * 0.3 AS combined_score",
"FROM vector_results v",
"FULL OUTER JOIN keyword_results k ON v.id = k.id",
"ORDER BY combined_score DESC",
"LIMIT $3"
].join("\n");
return pool.query(sql, [pgvector.toSql(queryEmbedding), queryText, limit || 10]);
}
This manual approach works but is clunky. Weaviate handles this natively with its hybrid search endpoint, which is one of the strongest reasons to choose it for search-heavy applications.
Metadata Filtering Across Databases
Every vector database supports metadata filtering, but the implementations differ significantly in expressiveness and performance.
// Pinecone filtering
var pineconeFilter = {
"$and": [
{ "category": { "$eq": "engineering" } },
{ "published_year": { "$gte": 2024 } },
{ "tags": { "$in": ["nodejs", "javascript"] } }
]
};
// Qdrant filtering
var qdrantFilter = {
must: [
{ key: "category", match: { value: "engineering" } },
{ key: "published_year", range: { gte: 2024 } }
],
should: [
{ key: "tags", match: { value: "nodejs" } },
{ key: "tags", match: { value: "javascript" } }
]
};
// pgvector filtering (standard SQL)
var pgvectorFilter = "WHERE category = $2 AND published_year >= $3 AND tags && $4";
pgvector wins on filtering expressiveness because it uses full SQL. Qdrant wins on filtering performance at scale because it indexes metadata natively. Pinecone's filtering is adequate for simple cases but struggles with deeply nested or complex boolean logic.
Operational Considerations
Backups
- pgvector: Use your existing PostgreSQL backup strategy (pg_dump, WAL archiving, managed backups). Nothing changes.
- Pinecone: Managed by Pinecone. You can create collections as snapshots. Limited control over backup schedules.
- Weaviate / Qdrant: Both support snapshot-based backups to S3 or local storage. You are responsible for scheduling and monitoring.
Scaling
- pgvector: Vertical scaling (bigger machine) or read replicas. No built-in horizontal sharding for vector indexes.
- Pinecone: Automatic scaling on the serverless tier. Pod-based tier requires manual shard configuration.
- Weaviate / Qdrant: Both support sharding and replication in cluster mode. Configuration is your responsibility.
Monitoring
Build dashboards for these metrics regardless of which database you choose:
- Query latency (p50, p95, p99)
- Recall quality (sample queries against ground truth, measured periodically)
- Index size and memory usage
- Insert throughput (vectors per second)
- Error rates on query and insert operations
// Simple latency tracking wrapper
function timedQuery(queryFn) {
return function () {
var args = Array.prototype.slice.call(arguments);
var start = process.hrtime.bigint();
return queryFn.apply(null, args).then(function (result) {
var elapsed = Number(process.hrtime.bigint() - start) / 1e6;
console.log("Query latency: " + elapsed.toFixed(2) + "ms");
return result;
});
};
}
var timedSearch = timedQuery(searchSimilar);
Migration Strategies Between Vector Databases
Migrating vector databases is painful but sometimes necessary. Here is a practical approach:
var async = require("async");
function migrateVectors(sourceDb, targetDb, batchSize) {
var offset = 0;
var totalMigrated = 0;
function processBatch(callback) {
sourceDb.fetchBatch(offset, batchSize).then(function (batch) {
if (batch.length === 0) {
return callback(null, totalMigrated);
}
var records = batch.map(function (record) {
return {
id: record.id,
values: record.embedding,
metadata: record.metadata
};
});
return targetDb.upsertBatch(records).then(function () {
totalMigrated += records.length;
offset += batchSize;
console.log("Migrated " + totalMigrated + " vectors");
processBatch(callback);
});
}).catch(function (err) {
callback(err);
});
}
return new Promise(function (resolve, reject) {
processBatch(function (err, total) {
if (err) return reject(err);
resolve(total);
});
});
}
Key migration principles:
- Export embeddings, not just text. Re-generating embeddings is expensive and may produce different vectors if the model has been updated.
- Migrate in batches. Most vector databases have upsert rate limits or optimal batch sizes (typically 100-500 vectors per request).
- Run both databases in parallel during migration. Route reads to the old database while backfilling the new one, then switch over.
- Validate with sample queries. Run the same queries against both databases and compare results to catch index configuration issues.
Benchmarking Vector Search Performance
Do not trust published benchmarks. Benchmark against your own data, your own query patterns, and your own hardware.
var OpenAI = require("openai");
var openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
function generateEmbedding(text) {
return openai.embeddings.create({
model: "text-embedding-3-small",
input: text
}).then(function (response) {
return response.data[0].embedding;
});
}
function benchmarkSearch(searchFn, queries, runs) {
var results = {
latencies: [],
p50: 0,
p95: 0,
p99: 0,
mean: 0
};
var allLatencies = [];
function runQuery(queryIndex, runIndex) {
if (queryIndex >= queries.length) {
return Promise.resolve();
}
if (runIndex >= runs) {
return runQuery(queryIndex + 1, 0);
}
var start = process.hrtime.bigint();
return searchFn(queries[queryIndex]).then(function () {
var elapsed = Number(process.hrtime.bigint() - start) / 1e6;
allLatencies.push(elapsed);
return runQuery(queryIndex, runIndex + 1);
});
}
return runQuery(0, 0).then(function () {
allLatencies.sort(function (a, b) { return a - b; });
var len = allLatencies.length;
results.latencies = allLatencies;
results.p50 = allLatencies[Math.floor(len * 0.5)];
results.p95 = allLatencies[Math.floor(len * 0.95)];
results.p99 = allLatencies[Math.floor(len * 0.99)];
results.mean = allLatencies.reduce(function (sum, l) { return sum + l; }, 0) / len;
console.log("Benchmark Results:");
console.log(" Mean: " + results.mean.toFixed(2) + "ms");
console.log(" P50: " + results.p50.toFixed(2) + "ms");
console.log(" P95: " + results.p95.toFixed(2) + "ms");
console.log(" P99: " + results.p99.toFixed(2) + "ms");
return results;
});
}
Complete Working Example: pgvector vs Pinecone Side-by-Side
This application indexes the same documents into both pgvector and Pinecone, then runs identical searches against both to compare performance and developer experience.
var pg = require("pg");
var pgvector = require("pgvector/pg");
var PineconeLib = require("@pinecone-database/pinecone");
var OpenAI = require("openai");
var Pinecone = PineconeLib.Pinecone;
var openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// ============================================================
// Configuration
// ============================================================
var PGVECTOR_URL = process.env.DATABASE_URL;
var PINECONE_API_KEY = process.env.PINECONE_API_KEY;
var PINECONE_INDEX = "article-search";
var VECTOR_DIM = 1536;
// ============================================================
// Sample documents
// ============================================================
var documents = [
{ id: "doc-1", title: "Getting Started with Express.js", content: "Express.js is a minimal web framework for Node.js that provides robust features for building web and mobile applications. It handles routing, middleware, and HTTP utilities." },
{ id: "doc-2", title: "PostgreSQL Performance Tuning", content: "Optimizing PostgreSQL involves configuring shared_buffers, work_mem, and effective_cache_size. Proper indexing and query analysis with EXPLAIN ANALYZE are essential for production databases." },
{ id: "doc-3", title: "Building REST APIs with Node.js", content: "RESTful APIs follow resource-oriented design patterns. Use proper HTTP methods, status codes, and content negotiation. Validate input, handle errors gracefully, and implement rate limiting." },
{ id: "doc-4", title: "Docker Container Orchestration", content: "Container orchestration with Docker Compose and Kubernetes manages multi-container deployments. Define services, networks, and volumes in declarative configuration files." },
{ id: "doc-5", title: "Machine Learning Model Deployment", content: "Deploying ML models requires serving infrastructure, model versioning, A/B testing capabilities, and monitoring for data drift. Use ONNX or TensorFlow Serving for production inference." },
{ id: "doc-6", title: "Authentication with JWT Tokens", content: "JSON Web Tokens provide stateless authentication for APIs. Implement access and refresh token patterns, store tokens securely, and handle token expiration and rotation." },
{ id: "doc-7", title: "MongoDB Aggregation Pipelines", content: "MongoDB aggregation framework processes data through pipeline stages including match, group, project, sort, and lookup. Use indexes to optimize pipeline performance." },
{ id: "doc-8", title: "WebSocket Real-Time Communication", content: "WebSockets enable full-duplex communication between client and server. Use Socket.IO or ws library in Node.js for real-time features like chat, notifications, and live updates." }
];
// ============================================================
// Embedding generation
// ============================================================
function getEmbedding(text) {
return openai.embeddings.create({
model: "text-embedding-3-small",
input: text
}).then(function (res) {
return res.data[0].embedding;
});
}
function getEmbeddings(texts) {
return openai.embeddings.create({
model: "text-embedding-3-small",
input: texts
}).then(function (res) {
return res.data.map(function (d) { return d.embedding; });
});
}
// ============================================================
// pgvector setup and operations
// ============================================================
var pool = new pg.Pool({ connectionString: PGVECTOR_URL });
function setupPgvector() {
console.log("\n--- Setting up pgvector ---");
return pool.query("CREATE EXTENSION IF NOT EXISTS vector")
.then(function () {
return pgvector.registerType(pool);
})
.then(function () {
return pool.query("DROP TABLE IF EXISTS bench_documents");
})
.then(function () {
return pool.query([
"CREATE TABLE bench_documents (",
" id TEXT PRIMARY KEY,",
" title TEXT NOT NULL,",
" content TEXT NOT NULL,",
" embedding vector(" + VECTOR_DIM + ")",
")"
].join("\n"));
})
.then(function () {
console.log("pgvector table created");
});
}
function indexPgvector(docs, embeddings) {
var inserts = docs.map(function (doc, i) {
return pool.query(
"INSERT INTO bench_documents (id, title, content, embedding) VALUES ($1, $2, $3, $4)",
[doc.id, doc.title, doc.content, pgvector.toSql(embeddings[i])]
);
});
return Promise.all(inserts)
.then(function () {
return pool.query(
"CREATE INDEX ON bench_documents USING hnsw (embedding vector_cosine_ops)"
);
})
.then(function () {
console.log("pgvector: " + docs.length + " documents indexed with HNSW");
});
}
function searchPgvector(queryEmbedding, topK) {
var sql = [
"SELECT id, title, content,",
" 1 - (embedding <=> $1) AS score",
"FROM bench_documents",
"ORDER BY embedding <=> $1",
"LIMIT $2"
].join("\n");
return pool.query(sql, [pgvector.toSql(queryEmbedding), topK]);
}
// ============================================================
// Pinecone setup and operations
// ============================================================
var pinecone = new Pinecone({ apiKey: PINECONE_API_KEY });
function setupPinecone() {
console.log("\n--- Setting up Pinecone ---");
return pinecone.listIndexes().then(function (indexes) {
var exists = indexes.indexes && indexes.indexes.some(function (idx) {
return idx.name === PINECONE_INDEX;
});
if (exists) {
var index = pinecone.Index(PINECONE_INDEX);
return index.deleteAll().then(function () {
console.log("Pinecone index cleared");
});
}
return pinecone.createIndex({
name: PINECONE_INDEX,
dimension: VECTOR_DIM,
metric: "cosine",
spec: { serverless: { cloud: "aws", region: "us-east-1" } }
}).then(function () {
console.log("Pinecone index created");
// Wait for index to be ready
return new Promise(function (resolve) { setTimeout(resolve, 30000); });
});
});
}
function indexPinecone(docs, embeddings) {
var index = pinecone.Index(PINECONE_INDEX);
var records = docs.map(function (doc, i) {
return {
id: doc.id,
values: embeddings[i],
metadata: { title: doc.title, content: doc.content }
};
});
return index.upsert(records).then(function () {
console.log("Pinecone: " + docs.length + " documents indexed");
});
}
function searchPinecone(queryEmbedding, topK) {
var index = pinecone.Index(PINECONE_INDEX);
return index.query({
vector: queryEmbedding,
topK: topK,
includeMetadata: true
});
}
// ============================================================
// Benchmark runner
// ============================================================
function runBenchmark() {
var testQueries = [
"How do I build a web server in Node.js?",
"What is the best way to optimize database queries?",
"How to deploy machine learning models?",
"Real-time communication between browser and server"
];
var embeddings;
var queryEmbeddings;
console.log("=== Vector Database Benchmark: pgvector vs Pinecone ===\n");
console.log("Generating embeddings for " + documents.length + " documents...");
var contentTexts = documents.map(function (d) { return d.title + " " + d.content; });
return getEmbeddings(contentTexts)
.then(function (emb) {
embeddings = emb;
console.log("Document embeddings generated");
return setupPgvector();
})
.then(function () {
return setupPinecone();
})
.then(function () {
return indexPgvector(documents, embeddings);
})
.then(function () {
return indexPinecone(documents, embeddings);
})
.then(function () {
return getEmbeddings(testQueries);
})
.then(function (qEmb) {
queryEmbeddings = qEmb;
console.log("\n=== Running search benchmarks ===\n");
var queryIndex = 0;
function runNextQuery() {
if (queryIndex >= testQueries.length) return Promise.resolve();
var q = testQueries[queryIndex];
var qe = queryEmbeddings[queryIndex];
queryIndex++;
console.log('Query: "' + q + '"');
var pgStart, pgEnd, pcStart, pcEnd;
pgStart = process.hrtime.bigint();
return searchPgvector(qe, 3)
.then(function (pgResults) {
pgEnd = process.hrtime.bigint();
var pgLatency = Number(pgEnd - pgStart) / 1e6;
pcStart = process.hrtime.bigint();
return searchPinecone(qe, 3).then(function (pcResults) {
pcEnd = process.hrtime.bigint();
var pcLatency = Number(pcEnd - pcStart) / 1e6;
console.log(" pgvector (" + pgLatency.toFixed(1) + "ms):");
pgResults.rows.forEach(function (r) {
console.log(" - " + r.title + " (score: " + parseFloat(r.score).toFixed(4) + ")");
});
console.log(" Pinecone (" + pcLatency.toFixed(1) + "ms):");
pcResults.matches.forEach(function (m) {
console.log(" - " + m.metadata.title + " (score: " + m.score.toFixed(4) + ")");
});
console.log("");
return runNextQuery();
});
});
}
return runNextQuery();
})
.then(function () {
console.log("=== Benchmark complete ===");
return pool.end();
})
.catch(function (err) {
console.error("Benchmark failed:", err);
return pool.end();
});
}
runBenchmark();
What You Will Observe
Running this benchmark reveals several practical truths:
- Similarity scores are nearly identical. Both databases use cosine similarity, so the ranking is the same. The scores may differ slightly due to floating-point precision.
- pgvector latency includes network round-trip to Postgres. If Postgres is local, pgvector is often faster than Pinecone for small datasets because there is no internet latency.
- Pinecone latency includes internet round-trip. For a serverless index in us-east-1, expect 30-80ms per query from outside AWS.
- Index build time matters. pgvector's HNSW index builds in seconds for 8 documents. At 1 million documents, expect minutes. Pinecone handles indexing transparently.
Common Issues and Troubleshooting
1. pgvector: Wrong Vector Dimensions
ERROR: expected 1536 dimensions, not 768
This happens when you change embedding models without updating your table schema. If you switch from text-embedding-3-small (1536 dimensions) to a different model, you must alter the column or recreate the table:
ALTER TABLE documents ALTER COLUMN embedding TYPE vector(768);
-- You must also drop and recreate indexes after changing dimensions
DROP INDEX IF EXISTS documents_embedding_idx;
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
2. Pinecone: Upsert Timeout on Large Batches
PineconeConnectionError: Request timed out after 60000ms
Pinecone's API has a maximum payload size and recommended batch size of 100 vectors per upsert call. Split large imports into batches:
function batchUpsert(index, records, batchSize) {
var batches = [];
for (var i = 0; i < records.length; i += batchSize) {
batches.push(records.slice(i, i + batchSize));
}
var current = 0;
function processNext() {
if (current >= batches.length) return Promise.resolve();
var batch = batches[current];
current++;
console.log("Upserting batch " + current + "/" + batches.length);
return index.upsert(batch).then(function () {
return processNext();
});
}
return processNext();
}
3. pgvector: IVFFlat Index Built on Empty Table
WARNING: ivfflat index created with no data
DETAIL: Index will have poor query performance until table is populated
IVFFlat indexes require representative data to build effective clusters. If you create the index before inserting data, every query will scan most or all of your vectors, defeating the purpose of the index entirely. Insert your data first, then create the IVFFlat index. Alternatively, use HNSW, which handles incremental inserts correctly.
4. Qdrant: Collection Not Found After Restart
QdrantError: Collection "documents" not found
If you are running Qdrant with Docker and did not mount a persistent volume, your data disappears when the container stops. Always mount a volume:
docker run -p 6333:6333 -v $(pwd)/qdrant_data:/qdrant/storage qdrant/qdrant
5. Embedding Model Mismatch Between Index and Query
Results are returned but relevance is completely wrong - scores near 0.0
This is the most insidious bug because there is no error message. If you index documents with text-embedding-ada-002 but query with text-embedding-3-small, the vectors live in completely different spaces. Every similarity score will be near zero. Always store which embedding model was used alongside your vectors, and validate at query time:
function validateEmbeddingModel(expectedModel, actualModel) {
if (expectedModel !== actualModel) {
throw new Error(
"Embedding model mismatch: index uses " + expectedModel +
" but query uses " + actualModel
);
}
}
Best Practices
Start with pgvector if you already run PostgreSQL. Adding a vector column to an existing table is trivial. You can always migrate later if you outgrow it. Premature infrastructure is as dangerous as premature optimization.
Store the embedding model identifier with every vector. When you inevitably upgrade models, you need to know which vectors need re-embedding. Add a
model_versioncolumn or metadata field to every record.Use HNSW indexes by default. Unless you have a specific reason to use IVFFlat (memory constraints, full-rebuild workflow), HNSW gives better recall and handles incremental inserts. The memory overhead is worth it.
Batch your embedding API calls. OpenAI's embedding endpoint accepts up to 2048 inputs per request. Sending one document at a time wastes API calls and adds latency. Group documents into batches of 100-500.
Implement embedding caching. Cache generated embeddings to avoid regenerating them on every application restart or redeployment. Store them in your database alongside the source content.
Normalize your vectors before storage if your database does not do it automatically. Cosine similarity on normalized vectors is equivalent to dot product, which is faster to compute. pgvector handles this automatically with
vector_cosine_ops, but check your database's documentation.Monitor recall quality, not just latency. A fast search that returns irrelevant results is worse than a slow search that returns good results. Build a test set of queries with known relevant documents and measure recall periodically.
Set appropriate
ef_searchorprobesparameters at query time. Higher values improve recall at the cost of latency. Tune these based on your application's tolerance for each:
-- pgvector: increase probes for better IVFFlat recall
SET ivfflat.probes = 10;
-- pgvector: increase ef_search for better HNSW recall
SET hnsw.ef_search = 100;
Plan for re-indexing. Embedding models improve regularly. When a new model produces better embeddings, you will want to re-embed your entire corpus. Build your pipeline to support this from day one with a background job that can re-process documents in batches.
Use metadata filtering to reduce the search space before vector comparison. Filtering by category, date range, or tenant ID before computing similarity is far more efficient than filtering after. Qdrant and pgvector both support this pattern natively.
References
- pgvector GitHub Repository - Official pgvector documentation and benchmarks
- Pinecone Documentation - Pinecone's official Node.js SDK guide
- Weaviate Documentation - Hybrid search and module configuration
- Qdrant Documentation - Filtering, indexing, and cluster setup
- Chroma Documentation - Getting started with Chroma
- OpenAI Embeddings Guide - Embedding model selection and best practices
- HNSW Algorithm Paper - Original HNSW research paper by Malkov and Yashunin
- ANN Benchmarks - Independent approximate nearest neighbor benchmarks across algorithms and implementations