Embeddings

pgvector: Vector Search in PostgreSQL

Complete guide to pgvector for vector search in PostgreSQL including setup, indexing, distance functions, and optimization with Node.js.

pgvector: Vector Search in PostgreSQL

Overview

pgvector is a PostgreSQL extension that adds vector similarity search directly to the database you already know and trust. Instead of bolting on a separate vector database like Pinecone or Weaviate alongside your PostgreSQL instance, pgvector lets you store embeddings in the same tables as your application data, query them with standard SQL, and join them against your existing relational schema. If you are already running PostgreSQL in production, this is the shortest path to production-grade vector search without adding operational complexity.

Prerequisites

  • PostgreSQL 13 or later (15+ recommended for best pgvector support)
  • Node.js 18 or later
  • The pg npm module for PostgreSQL connectivity
  • Basic familiarity with SQL and embedding concepts
  • An embedding source (OpenAI API, Cohere, or a local model like sentence-transformers)

What pgvector Is and Why It Matters

Every vector database pitch starts the same way: you need a specialized system to store and search embeddings. And for a while, that was true. But pgvector has changed the calculus entirely.

pgvector is an open-source PostgreSQL extension maintained by Andrew Kane. It adds a vector data type, distance operators, and approximate nearest neighbor (ANN) indexes to PostgreSQL. The extension is written in C, integrates with the PostgreSQL query planner, and supports ACID transactions. Your vectors participate in the same commits, rollbacks, and backups as the rest of your data.

The real power is not just storing vectors — it is querying them alongside relational data. You can write a single SQL query that finds the 10 most similar documents to a query embedding, filters by category, joins against a users table, and orders by a combination of semantic similarity and recency. Try doing that across two separate databases.

I have shipped pgvector in production for semantic search, recommendation engines, and RAG (retrieval-augmented generation) pipelines. For datasets under 10 million vectors, it handles everything I have thrown at it. Beyond that scale, you start evaluating dedicated solutions. But most applications never get there.

Installing pgvector

From Source on Linux/macOS

cd /tmp
git clone --branch v0.8.0 https://github.com/pgvector/pgvector.git
cd pgvector
make
make install

You need the PostgreSQL development headers installed (postgresql-server-dev-15 on Debian/Ubuntu or postgresql@15 via Homebrew on macOS).

Docker (Fastest for Development)

The pgvector/pgvector Docker image ships PostgreSQL with the extension pre-installed:

docker run -d \
  --name pgvector-dev \
  -e POSTGRES_PASSWORD=devpassword \
  -e POSTGRES_DB=vectordb \
  -p 5432:5432 \
  pgvector/pgvector:pg16

Enabling the Extension

Once installed, enable it in your database:

CREATE EXTENSION IF NOT EXISTS vector;

Verify the installation:

SELECT extversion FROM pg_extension WHERE extname = 'vector';
 extversion
------------
 0.8.0
(1 row)

Creating Vector Columns and Choosing Dimensions

Vectors are stored in columns with the vector type. You specify the dimensionality at column definition time:

CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  title TEXT NOT NULL,
  content TEXT,
  category TEXT,
  embedding vector(1536),
  created_at TIMESTAMPTZ DEFAULT NOW()
);

The number in parentheses is the vector dimension count. Common choices:

Model Dimensions
OpenAI text-embedding-3-small 1536
OpenAI text-embedding-3-large 3072
Cohere embed-english-v3 1024
sentence-transformers/all-MiniLM-L6-v2 384

Pick the dimension that matches your embedding model. You cannot mix dimensions in a single column. If you change models later, you are re-embedding everything. Plan accordingly.

pgvector supports up to 16,000 dimensions as of version 0.7.0, but performance degrades above 2,000. For most use cases, 1536 or fewer is the sweet spot.

Inserting Vectors from Node.js

The pg module handles vector insertion cleanly. pgvector accepts vectors as string literals in the format [0.1, 0.2, 0.3, ...]:

var { Pool } = require("pg");

var pool = new Pool({
  connectionString: process.env.DATABASE_URL
});

function insertDocument(title, content, category, embedding) {
  var query = `
    INSERT INTO documents (title, content, category, embedding)
    VALUES ($1, $2, $3, $4)
    RETURNING id
  `;
  var vectorString = "[" + embedding.join(",") + "]";
  return pool.query(query, [title, content, category, vectorString]);
}

// Usage with an embedding from OpenAI
var embedding = [0.0023, -0.0142, 0.0312, /* ... 1536 floats total */];

insertDocument(
  "Understanding API Rate Limiting",
  "Rate limiting is a critical pattern...",
  "api-design",
  embedding
).then(function(result) {
  console.log("Inserted document ID:", result.rows[0].id);
}).catch(function(err) {
  console.error("Insert failed:", err.message);
});

For bulk inserts, use COPY or batch INSERT statements. Inserting 10,000 vectors one at a time is painfully slow. Batch them:

function bulkInsertDocuments(documents) {
  var values = [];
  var params = [];
  var paramIndex = 1;

  documents.forEach(function(doc) {
    var vectorString = "[" + doc.embedding.join(",") + "]";
    values.push(
      "($" + paramIndex++ +
      ", $" + paramIndex++ +
      ", $" + paramIndex++ +
      ", $" + paramIndex++ + ")"
    );
    params.push(doc.title, doc.content, doc.category, vectorString);
  });

  var query = "INSERT INTO documents (title, content, category, embedding) VALUES " +
    values.join(", ");

  return pool.query(query, params);
}

Distance Functions

pgvector supports three distance operators. Understanding when to use each one matters more than most tutorials let on.

Cosine Distance (<=>)

SELECT id, title, embedding <=> $1 AS distance
FROM documents
ORDER BY embedding <=> $1
LIMIT 10;

Cosine distance measures the angle between two vectors, ignoring magnitude. A distance of 0 means identical direction; 2 means opposite directions. This is the default choice for text embeddings because most embedding models normalize their output. If your embeddings come from OpenAI, Cohere, or sentence-transformers, use cosine distance.

L2 (Euclidean) Distance (<->)

SELECT id, title, embedding <-> $1 AS distance
FROM documents
ORDER BY embedding <-> $1
LIMIT 10;

L2 distance measures the straight-line distance between two points in vector space. Use this when magnitude matters — for example, if your vectors represent physical measurements, sensor readings, or unnormalized feature vectors. For normalized text embeddings, L2 and cosine produce equivalent rankings, but cosine is conventional.

Inner Product (<#>)

SELECT id, title, (embedding <#> $1) * -1 AS similarity
FROM documents
ORDER BY embedding <#> $1
LIMIT 10;

The inner product operator returns the negative inner product (so that ORDER BY ascending works correctly for nearest-neighbor search). Higher inner product means more similar. Use this when you want to factor in vector magnitude as a signal — for example, in recommendation systems where a larger magnitude indicates stronger confidence.

My recommendation: Start with cosine distance (<=>). Switch to L2 or inner product only if you have a specific reason. I have used cosine for every production text search system I have built.

Basic Similarity Search Queries

A straightforward similarity search from Node.js:

function searchSimilar(queryEmbedding, limit) {
  var vectorString = "[" + queryEmbedding.join(",") + "]";
  var query = `
    SELECT id, title, category,
           embedding <=> $1::vector AS distance
    FROM documents
    ORDER BY embedding <=> $1::vector
    LIMIT $2
  `;
  return pool.query(query, [vectorString, limit || 10]);
}

searchSimilar(queryEmbedding, 5).then(function(result) {
  result.rows.forEach(function(row) {
    console.log(row.title, "- distance:", row.distance.toFixed(4));
  });
});

Output:

Understanding API Rate Limiting - distance: 0.0821
Building Resilient API Gateways - distance: 0.1247
REST API Versioning Strategies - distance: 0.1583
GraphQL vs REST Performance - distance: 0.2001
API Authentication Patterns - distance: 0.2134

Creating IVFFlat Indexes

Without an index, pgvector performs an exact nearest-neighbor scan — it compares your query vector against every single row. This is fine for a few thousand rows. At 100,000+ rows, you need an approximate index.

IVFFlat (Inverted File with Flat compression) partitions vectors into lists, then searches only the most relevant lists at query time:

CREATE INDEX idx_documents_embedding_ivfflat
ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

The lists parameter controls how many partitions are created. The pgvector documentation recommends lists = rows / 1000 for up to 1 million rows, and sqrt(rows) for larger datasets.

Row Count Recommended Lists
10,000 10
100,000 100
1,000,000 1,000
10,000,000 3,162

Critical rule: Build IVFFlat indexes after you have loaded your data, not before. IVFFlat clusters vectors into lists using k-means during index creation. If you create the index on an empty table, the clusters will be meaningless and recall will be terrible. Load your data first, then create the index.

The vector_cosine_ops operator class must match the distance function you use in queries. Options:

  • vector_cosine_ops for <=>
  • vector_l2_ops for <->
  • vector_ip_ops for <#>

Creating HNSW Indexes

HNSW (Hierarchical Navigable Small World) is the newer index type, available since pgvector 0.5.0. It builds a multi-layer graph structure that generally provides better recall than IVFFlat at the cost of higher memory usage and slower index builds:

CREATE INDEX idx_documents_embedding_hnsw
ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

Parameters:

  • m: Maximum number of connections per node per layer. Higher values improve recall but increase memory usage and build time. Default is 16. Range 2-100. I use 16 for most workloads and bump to 32 for recall-critical applications.
  • ef_construction: Size of the dynamic candidate list during index construction. Higher values produce a better graph but take longer to build. Default is 64. Range 4-1000. I use 64 for development and 128-200 for production.

At query time, you can also tune ef_search per session:

SET hnsw.ef_search = 100;

Higher ef_search values improve recall at the cost of query latency. The default is 40.

HNSW vs IVFFlat: HNSW generally wins on recall quality and does not require rebuilding after inserts. IVFFlat builds faster and uses less memory. For most new projects, I recommend HNSW unless you have a specific memory constraint. HNSW can be built on an empty table and maintains quality as data is added — unlike IVFFlat.

Index Maintenance

After Bulk Inserts

After loading a large batch of data, run VACUUM to update table statistics and allow the query planner to make informed decisions:

VACUUM ANALYZE documents;

For IVFFlat indexes, if your data distribution has changed significantly since index creation, rebuild the index:

REINDEX INDEX idx_documents_embedding_ivfflat;

This re-clusters the vectors. HNSW indexes do not need this — they update incrementally.

Monitoring Index Size

SELECT pg_size_pretty(pg_relation_size('idx_documents_embedding_hnsw')) AS index_size;
 index_size
------------
 487 MB
(1 row)

For 500,000 vectors at 1536 dimensions with HNSW (m=16), expect roughly 450-500 MB of index size. Plan your memory accordingly — the index should fit in shared_buffers or at minimum in the OS page cache for optimal performance.

Combining Vector Search with WHERE Clauses

This is where pgvector in PostgreSQL really shines over standalone vector databases. You can filter by any column alongside the vector search:

SELECT id, title, embedding <=> $1::vector AS distance
FROM documents
WHERE category = 'api-design'
  AND created_at > NOW() - INTERVAL '90 days'
ORDER BY embedding <=> $1::vector
LIMIT 10;

For filtered queries to use the vector index efficiently, you want a partial index:

CREATE INDEX idx_docs_api_embedding_hnsw
ON documents
USING hnsw (embedding vector_cosine_ops)
WHERE category = 'api-design';

If your filter cardinality is high (many distinct categories), partial indexes become impractical. In that case, rely on PostgreSQL's ability to combine a regular B-tree index scan with a sequential vector comparison. It is not as fast as a pure vector index scan, but it is correct and often fast enough.

Using pgvector with Connection Pooling

pgvector works fine with connection poolers like PgBouncer and pgpool-II in transaction mode. There is one catch: session-level settings like SET hnsw.ef_search = 100 do not persist across pooled connections.

Set them per-query instead:

function searchWithCustomEfSearch(queryEmbedding, efSearch, limit) {
  var vectorString = "[" + queryEmbedding.join(",") + "]";
  var query = `
    SET LOCAL hnsw.ef_search = $1;
    SELECT id, title, embedding <=> $2::vector AS distance
    FROM documents
    ORDER BY embedding <=> $2::vector
    LIMIT $3
  `;
  return pool.query(query, [efSearch, vectorString, limit]);
}

SET LOCAL scopes the setting to the current transaction. Wrap the query in an explicit transaction if your pooler requires it:

function searchInTransaction(queryEmbedding, efSearch, limit) {
  var vectorString = "[" + queryEmbedding.join(",") + "]";

  return pool.connect().then(function(client) {
    return client.query("BEGIN")
      .then(function() {
        return client.query("SET LOCAL hnsw.ef_search = " + parseInt(efSearch));
      })
      .then(function() {
        return client.query(
          "SELECT id, title, embedding <=> $1::vector AS distance " +
          "FROM documents ORDER BY embedding <=> $1::vector LIMIT $2",
          [vectorString, limit]
        );
      })
      .then(function(result) {
        return client.query("COMMIT").then(function() {
          client.release();
          return result;
        });
      })
      .catch(function(err) {
        return client.query("ROLLBACK").then(function() {
          client.release();
          throw err;
        });
      });
  });
}

Monitoring Vector Query Performance

Always check your query plans:

EXPLAIN (ANALYZE, BUFFERS)
SELECT id, title, embedding <=> '[0.01, -0.02, ...]'::vector AS distance
FROM documents
ORDER BY embedding <=> '[0.01, -0.02, ...]'::vector
LIMIT 10;

What you want to see:

 Limit  (cost=467.53..470.12 rows=10 width=48) (actual time=12.341..12.389 rows=10 loops=1)
   ->  Index Scan using idx_documents_embedding_hnsw on documents
         (cost=467.53..35241.53 rows=134217 width=48)
         (actual time=12.340..12.386 rows=10 loops=1)
         Order By: (embedding <=> '[0.01, -0.02, ...]'::vector)
         Buffers: shared hit=1847
 Planning Time: 0.182 ms
 Execution Time: 12.437 ms

What you do not want to see: Seq Scan or Sort. A sequential scan means the index is not being used. Common causes:

  1. No index exists on the embedding column
  2. The query uses a different distance operator than the index was built with
  3. The table is too small and the planner decides a sequential scan is cheaper (this is actually fine for small tables)
  4. The WHERE clause prevents index usage

pgvector Limitations and When to Outgrow It

pgvector is excellent, but it is not infinite. Know the boundaries:

  • Scale ceiling: Performance degrades noticeably above 5-10 million vectors depending on dimensionality and hardware. Dedicated vector databases are optimized for billions of vectors.
  • No built-in sharding: pgvector runs on a single PostgreSQL instance. You can use Citus for distributed PostgreSQL, but that adds significant complexity.
  • Memory pressure: HNSW indexes must fit in memory for optimal performance. A 10-million-row table with 1536-dimension vectors and HNSW indexing will consume 10+ GB of RAM for the index alone.
  • No GPU acceleration: All computation is CPU-bound. Dedicated vector databases like Milvus can leverage GPU for distance calculations.
  • Index build time: Building an HNSW index on 1 million 1536-dimensional vectors takes 15-30 minutes depending on hardware. Plan for maintenance windows.

If you find yourself hitting these limits, Pinecone, Weaviate, Qdrant, and Milvus are the usual next steps. But do not jump to them prematurely. Most teams I have worked with are well under 1 million vectors.

Upgrading pgvector Versions

Upgrading pgvector is straightforward:

# Install the new version (from source)
cd /tmp/pgvector
git fetch origin
git checkout v0.8.0
make
make install

# In PostgreSQL
ALTER EXTENSION vector UPDATE;

After upgrading, rebuild your indexes to take advantage of new optimizations:

REINDEX INDEX idx_documents_embedding_hnsw;

Check the pgvector changelog before upgrading. Major versions occasionally change index formats or default parameters. Version 0.5.0 introduced HNSW. Version 0.7.0 increased the maximum dimension limit to 16,000.

pgvector on Managed PostgreSQL

DigitalOcean Managed Databases

DigitalOcean supports pgvector on managed PostgreSQL. Enable it from the database dashboard or via SQL:

CREATE EXTENSION vector;

Available on PostgreSQL 13+ managed databases. No additional configuration needed.

AWS RDS

RDS for PostgreSQL supports pgvector starting with PostgreSQL 15.2. Enable the extension the same way:

CREATE EXTENSION vector;

For Aurora PostgreSQL, pgvector is available on version 15.3+ and 14.8+.

Supabase

Supabase ships with pgvector pre-installed. It is already enabled by default — you can start creating vector columns immediately. Supabase also provides helper functions in their JavaScript client library, but the raw SQL approach works identically.

Neon

Neon supports pgvector out of the box on all plans. Given Neon's serverless autoscaling architecture, it handles bursty vector search workloads particularly well.

Complete Working Example

Here is a full Express.js application that sets up pgvector, stores document embeddings, creates optimized indexes, and serves similarity search queries with category filtering.

var express = require("express");
var { Pool } = require("pg");

var app = express();
app.use(express.json({ limit: "2mb" }));

var pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: 20
});

// Initialize database schema and indexes
function initializeDatabase() {
  return pool.query("CREATE EXTENSION IF NOT EXISTS vector")
    .then(function() {
      return pool.query(`
        CREATE TABLE IF NOT EXISTS documents (
          id SERIAL PRIMARY KEY,
          title TEXT NOT NULL,
          content TEXT,
          category TEXT,
          embedding vector(1536),
          created_at TIMESTAMPTZ DEFAULT NOW(),
          updated_at TIMESTAMPTZ DEFAULT NOW()
        )
      `);
    })
    .then(function() {
      return pool.query(`
        CREATE INDEX IF NOT EXISTS idx_documents_category
        ON documents (category)
      `);
    })
    .then(function() {
      console.log("Database schema initialized");
    });
}

// Build HNSW index (call after loading initial data)
function buildSearchIndex() {
  console.log("Building HNSW index... this may take a few minutes");
  var startTime = Date.now();

  return pool.query(`
    DROP INDEX IF EXISTS idx_documents_embedding_hnsw
  `).then(function() {
    return pool.query(`
      CREATE INDEX idx_documents_embedding_hnsw
      ON documents
      USING hnsw (embedding vector_cosine_ops)
      WITH (m = 16, ef_construction = 128)
    `);
  }).then(function() {
    return pool.query("VACUUM ANALYZE documents");
  }).then(function() {
    var elapsed = ((Date.now() - startTime) / 1000).toFixed(1);
    console.log("HNSW index built in " + elapsed + "s");
  });
}

// Insert a single document
app.post("/api/documents", function(req, res) {
  var body = req.body;

  if (!body.title || !body.embedding) {
    return res.status(400).json({ error: "title and embedding are required" });
  }

  if (body.embedding.length !== 1536) {
    return res.status(400).json({
      error: "embedding must have 1536 dimensions, got " + body.embedding.length
    });
  }

  var vectorString = "[" + body.embedding.join(",") + "]";
  var query = `
    INSERT INTO documents (title, content, category, embedding)
    VALUES ($1, $2, $3, $4)
    RETURNING id, title, category, created_at
  `;
  var params = [body.title, body.content || null, body.category || null, vectorString];

  pool.query(query, params)
    .then(function(result) {
      res.status(201).json(result.rows[0]);
    })
    .catch(function(err) {
      console.error("Insert error:", err.message);
      res.status(500).json({ error: "Failed to insert document" });
    });
});

// Bulk insert documents
app.post("/api/documents/bulk", function(req, res) {
  var documents = req.body.documents;

  if (!Array.isArray(documents) || documents.length === 0) {
    return res.status(400).json({ error: "documents array is required" });
  }

  var values = [];
  var params = [];
  var paramIndex = 1;

  documents.forEach(function(doc) {
    var vectorString = "[" + doc.embedding.join(",") + "]";
    values.push(
      "($" + paramIndex++ +
      ", $" + paramIndex++ +
      ", $" + paramIndex++ +
      ", $" + paramIndex++ + ")"
    );
    params.push(doc.title, doc.content || null, doc.category || null, vectorString);
  });

  var query = "INSERT INTO documents (title, content, category, embedding) VALUES " +
    values.join(", ") + " RETURNING id, title";

  pool.query(query, params)
    .then(function(result) {
      res.status(201).json({
        inserted: result.rowCount,
        documents: result.rows
      });
    })
    .catch(function(err) {
      console.error("Bulk insert error:", err.message);
      res.status(500).json({ error: "Bulk insert failed" });
    });
});

// Similarity search with optional category filter
app.post("/api/search", function(req, res) {
  var body = req.body;

  if (!body.embedding) {
    return res.status(400).json({ error: "embedding is required" });
  }

  var vectorString = "[" + body.embedding.join(",") + "]";
  var limit = body.limit || 10;
  var category = body.category || null;
  var maxDistance = body.maxDistance || 1.0;

  var query;
  var params;

  if (category) {
    query = `
      SELECT id, title, content, category,
             embedding <=> $1::vector AS distance,
             created_at
      FROM documents
      WHERE category = $2
        AND embedding <=> $1::vector < $3
      ORDER BY embedding <=> $1::vector
      LIMIT $4
    `;
    params = [vectorString, category, maxDistance, limit];
  } else {
    query = `
      SELECT id, title, content, category,
             embedding <=> $1::vector AS distance,
             created_at
      FROM documents
      WHERE embedding <=> $1::vector < $2
      ORDER BY embedding <=> $1::vector
      LIMIT $3
    `;
    params = [vectorString, maxDistance, limit];
  }

  var startTime = Date.now();

  pool.query(query, params)
    .then(function(result) {
      var elapsed = Date.now() - startTime;
      res.json({
        results: result.rows.map(function(row) {
          return {
            id: row.id,
            title: row.title,
            content: row.content,
            category: row.category,
            distance: parseFloat(row.distance.toFixed(4)),
            created_at: row.created_at
          };
        }),
        count: result.rowCount,
        query_time_ms: elapsed
      });
    })
    .catch(function(err) {
      console.error("Search error:", err.message);
      res.status(500).json({ error: "Search failed" });
    });
});

// Build/rebuild index endpoint
app.post("/api/admin/build-index", function(req, res) {
  buildSearchIndex()
    .then(function() {
      res.json({ status: "Index built successfully" });
    })
    .catch(function(err) {
      console.error("Index build error:", err.message);
      res.status(500).json({ error: "Index build failed" });
    });
});

// Table statistics endpoint
app.get("/api/admin/stats", function(req, res) {
  var queries = [
    pool.query("SELECT COUNT(*) AS total FROM documents"),
    pool.query(`
      SELECT indexname, pg_size_pretty(pg_relation_size(indexname::regclass)) AS size
      FROM pg_indexes
      WHERE tablename = 'documents'
    `),
    pool.query("SELECT pg_size_pretty(pg_total_relation_size('documents')) AS table_size")
  ];

  Promise.all(queries)
    .then(function(results) {
      res.json({
        document_count: parseInt(results[0].rows[0].total),
        indexes: results[1].rows,
        total_table_size: results[2].rows[0].table_size
      });
    })
    .catch(function(err) {
      console.error("Stats error:", err.message);
      res.status(500).json({ error: "Failed to get stats" });
    });
});

var PORT = process.env.PORT || 3000;

initializeDatabase().then(function() {
  app.listen(PORT, function() {
    console.log("Vector search server running on port " + PORT);
  });
}).catch(function(err) {
  console.error("Failed to initialize:", err.message);
  process.exit(1);
});

Save this as server.js, install dependencies, and run:

npm init -y
npm install express pg
DATABASE_URL=postgresql://postgres:devpassword@localhost:5432/vectordb node server.js

Test with curl:

# Insert a document (embedding truncated for readability)
curl -X POST http://localhost:3000/api/documents \
  -H "Content-Type: application/json" \
  -d '{
    "title": "API Rate Limiting Strategies",
    "content": "Rate limiting protects your API from abuse...",
    "category": "api-design",
    "embedding": [0.0023, -0.0142, 0.0312, ...]
  }'

# Search for similar documents
curl -X POST http://localhost:3000/api/search \
  -H "Content-Type: application/json" \
  -d '{
    "embedding": [0.0019, -0.0138, 0.0287, ...],
    "category": "api-design",
    "limit": 5,
    "maxDistance": 0.5
  }'

Example search response:

{
  "results": [
    {
      "id": 1,
      "title": "API Rate Limiting Strategies",
      "content": "Rate limiting protects your API from abuse...",
      "category": "api-design",
      "distance": 0.0821,
      "created_at": "2026-02-10T14:30:00.000Z"
    },
    {
      "id": 14,
      "title": "Building Resilient API Gateways",
      "content": "An API gateway serves as the entry point...",
      "category": "api-design",
      "distance": 0.1247,
      "created_at": "2026-02-08T09:15:00.000Z"
    }
  ],
  "count": 2,
  "query_time_ms": 14
}

Common Issues and Troubleshooting

1. Extension Not Found

ERROR: could not open extension control file
"/usr/share/postgresql/16/extension/vector.control": No such file or directory

pgvector is not installed on your PostgreSQL server. The extension must be compiled and installed at the system level before you can CREATE EXTENSION. If using Docker, make sure you are using the pgvector/pgvector image, not the plain postgres image.

2. Wrong Dimension Count

ERROR: expected 1536 dimensions, not 384

You tried to insert a vector with the wrong number of dimensions for the column. This happens when you switch embedding models without updating your schema. The column was defined as vector(1536) but you are inserting a 384-dimension vector. Either re-create the column with the correct dimension or re-embed your content with the matching model.

3. Index Not Being Used

EXPLAIN shows: Seq Scan on documents

Run EXPLAIN ANALYZE on your query. If you see a sequential scan on a large table, check these causes in order:

  • The distance operator in your query does not match the operator class of your index (e.g., using <-> with a vector_cosine_ops index)
  • Your table has fewer than a few thousand rows and the planner determines a sequential scan is faster
  • You need to run ANALYZE documents so the planner has accurate statistics
  • For IVFFlat: you created the index on an empty table. Rebuild it with REINDEX INDEX index_name

4. Out of Memory During Index Build

ERROR: could not resize shared memory segment: No space left on device

HNSW index builds are memory-intensive. For large tables, increase maintenance_work_mem before building:

SET maintenance_work_mem = '2GB';
CREATE INDEX idx_documents_embedding_hnsw
ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

On managed databases where you cannot modify system settings, reduce ef_construction or m to lower memory requirements, or build the index in a maintenance window with elevated resource limits.

5. Slow Queries After Bulk Insert

Query time jumps from 15ms to 800ms after inserting 100,000 rows

After large bulk inserts, the table statistics and index are stale. Run:

VACUUM ANALYZE documents;

For IVFFlat indexes, the cluster boundaries may no longer reflect the data distribution. Rebuild:

REINDEX INDEX idx_documents_embedding_ivfflat;

HNSW indexes handle incremental inserts better, but VACUUM ANALYZE is still important for the query planner.

6. Connection Pool Exhaustion During Index Builds

Index creation acquires a SHARE lock on the table, which blocks writes. If your application has a write-heavy workload and a limited connection pool, index builds will cause connection timeouts.

Use CREATE INDEX CONCURRENTLY to build the index without blocking writes:

CREATE INDEX CONCURRENTLY idx_documents_embedding_hnsw
ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 128);

This takes longer but does not block DML operations. Note that CONCURRENTLY cannot be used inside a transaction block.

Best Practices

  • Match your operator class to your distance function. If you use <=> (cosine) in queries, your index must use vector_cosine_ops. Mismatches silently fall back to sequential scans, and your queries will be orders of magnitude slower on large tables.

  • Use HNSW for new projects. HNSW provides better recall out of the box, handles incremental inserts gracefully, and does not require rebuilding after data loads. IVFFlat still makes sense when memory is constrained or you need faster index builds on very large tables.

  • Store embeddings alongside the data they represent. Do not create a separate embeddings table with foreign keys. Keeping the vector in the same row as the source document eliminates joins and simplifies your queries. PostgreSQL handles wide rows efficiently.

  • Set a distance threshold, not just a limit. Returning the top 10 results is meaningless if results 6-10 have cosine distances above 0.5. Always include a WHERE embedding <=> query < threshold clause to filter out irrelevant matches.

  • Run VACUUM ANALYZE after bulk operations. The PostgreSQL query planner relies on table statistics to choose efficient query plans. Stale statistics after large inserts lead to poor plan choices and degraded performance.

  • Monitor index memory usage. HNSW indexes can grow large. A table with 1 million 1536-dimension vectors and HNSW (m=16) will have an index around 5-8 GB. Ensure your shared_buffers and available RAM can accommodate this. Use pg_relation_size() to track growth.

  • Build indexes with CONCURRENTLY in production. Standard CREATE INDEX blocks writes. For tables that serve live traffic, always use CREATE INDEX CONCURRENTLY to avoid downtime, even though it takes longer.

  • Normalize your vectors before insertion if your model does not. Cosine distance assumes normalized vectors for optimal performance. Most modern embedding APIs (OpenAI, Cohere) return normalized vectors. If you are using a local model, verify normalization or add it in your insertion pipeline.

  • Test recall, not just latency. Approximate nearest neighbor indexes trade accuracy for speed. After building an index, compare your ANN results against exact results (a sequential scan) on a sample of queries. Aim for 95%+ recall at your target latency.

  • Pin your pgvector version. Different pgvector versions can have different index formats and behaviors. Pin the version in your Docker images and infrastructure-as-code. Test upgrades in staging before applying to production.

References

Powered by Contentful