Semantic Search Implementation from Scratch

Shane

2/13/2026

25 min read

Build a semantic search engine from scratch with embeddings, pgvector, ranking, and an Express.js API in Node.js.

nodejs embeddings semantic-search vector search implementation

Semantic Search Implementation from Scratch

Semantic search finds results based on meaning rather than exact keyword matches, and it has become practical to build from scratch thanks to affordable embedding APIs and pgvector. This article walks through every piece of the pipeline — from generating embeddings and storing vectors to ranking results and exposing a search API — so you can build a production-grade semantic search engine in Node.js without relying on a managed vector database.

Prerequisites

Node.js 18 or later installed
PostgreSQL 15+ with the pgvector extension enabled
An OpenAI API key (for generating embeddings)
Familiarity with Express.js and SQL
Basic understanding of what vector embeddings are (arrays of floats that represent meaning)

Install the required packages:

npm install express pg openai body-parser uuid

Enable pgvector in your PostgreSQL instance:

CREATE EXTENSION IF NOT EXISTS vector;

How Semantic Search Differs from Keyword Search

Traditional keyword search (full-text search, LIKE queries, even Elasticsearch) works by matching tokens. If a user searches for "how to deploy containers," keyword search looks for documents containing those exact words. It misses documents that talk about "shipping Docker images to production" even though the meaning is identical.

Semantic search converts both the query and every document into high-dimensional vectors (embeddings) that capture meaning. Two pieces of text with similar meaning end up close together in vector space, regardless of whether they share any words. The search becomes a nearest-neighbor lookup in that space.

The tradeoff is cost: you need to generate embeddings for every document upfront and for every query at search time. But for most applications, the quality improvement is dramatic. Users find what they actually meant, not just what they literally typed.

The Semantic Search Pipeline

Every semantic search system follows the same core pipeline:

Index time: Take each document, generate an embedding vector, store it alongside the document metadata.
Query time: Take the user's query, generate an embedding vector, find the nearest vectors in the database, rank and return results.

That is the entire architecture. Everything else — filtering, re-ranking, analytics — layers on top of this loop.

Generating Embeddings for Your Document Corpus

The first step is turning your documents into vectors. OpenAI's text-embedding-3-small model is a solid choice: it produces 1536-dimensional vectors, costs fractions of a cent per call, and handles up to 8191 tokens of input.

var OpenAI = require("openai");

var openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

function generateEmbedding(text, callback) {
  openai.embeddings.create({
    model: "text-embedding-3-small",
    input: text.substring(0, 8000)
  }).then(function(response) {
    var embedding = response.data[0].embedding;
    callback(null, embedding);
  }).catch(function(err) {
    callback(err);
  });
}

For documents with multiple fields (title, body, tags), you want to create a combined text representation before embedding:

function buildDocumentText(doc) {
  var parts = [];
  if (doc.title) parts.push(doc.title);
  if (doc.tags && doc.tags.length) parts.push("Tags: " + doc.tags.join(", "));
  if (doc.body) parts.push(doc.body);
  return parts.join("\n\n");
}

Weighting matters. The title appears first because embedding models give more weight to earlier text. Tags come next. The body fills in the detail. This ordering produces better search quality than concatenating fields randomly.

Storing Vectors in PostgreSQL with pgvector

pgvector adds a vector column type to PostgreSQL. You store embeddings directly alongside your relational data — no separate vector database required.

CREATE TABLE documents (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  title TEXT NOT NULL,
  body TEXT,
  category TEXT,
  author TEXT,
  tags TEXT[],
  embedding vector(1536),
  created_at TIMESTAMP DEFAULT NOW(),
  updated_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

The ivfflat index is an approximate nearest-neighbor index. The lists parameter controls how many clusters IVFFlat creates — a good starting point is the square root of your row count. For 10,000 documents, use lists = 100. For 1 million, use lists = 1000.

Here is the Node.js code to insert a document with its embedding:

var { Pool } = require("pg");

var pool = new Pool({
  connectionString: process.env.DATABASE_URL
});

function indexDocument(doc, callback) {
  var text = buildDocumentText(doc);

  generateEmbedding(text, function(err, embedding) {
    if (err) return callback(err);

    var query = `
      INSERT INTO documents (title, body, category, author, tags, embedding)
      VALUES ($1, $2, $3, $4, $5, $6)
      RETURNING id
    `;
    var values = [
      doc.title,
      doc.body,
      doc.category,
      doc.author,
      doc.tags || [],
      "[" + embedding.join(",") + "]"
    ];

    pool.query(query, values, function(err, result) {
      if (err) return callback(err);
      callback(null, result.rows[0].id);
    });
  });
}

Note the embedding format: pgvector expects a string like [0.1,0.2,0.3,...]. You convert the JavaScript array to that format before inserting.

Implementing Cosine Similarity Search

Cosine similarity measures how similar two vectors are on a scale from -1 to 1, where 1 means identical direction. pgvector provides the <=> operator for cosine distance (which is 1 - cosine_similarity), so lower distance means more similar.

function searchDocuments(queryText, limit, callback) {
  generateEmbedding(queryText, function(err, queryEmbedding) {
    if (err) return callback(err);

    var embeddingStr = "[" + queryEmbedding.join(",") + "]";

    var query = `
      SELECT
        id, title, body, category, author, tags,
        1 - (embedding <=> $1::vector) AS similarity
      FROM documents
      WHERE 1 - (embedding <=> $1::vector) > 0.3
      ORDER BY embedding <=> $1::vector
      LIMIT $2
    `;

    pool.query(query, [embeddingStr, limit || 10], function(err, result) {
      if (err) return callback(err);
      callback(null, result.rows);
    });
  });
}

The WHERE clause filters out results with similarity below 0.3, which in practice eliminates irrelevant noise. You can tune this threshold — 0.3 is a reasonable starting point for text-embedding-3-small. If you are getting too few results, lower it to 0.2. If you are getting junk results, raise it to 0.4.

Building a Query Pipeline

A real search system does more than a single vector lookup. Here is a complete query pipeline that handles embedding generation, search, and post-processing:

function searchPipeline(options, callback) {
  var queryText = options.query;
  var filters = options.filters || {};
  var limit = options.limit || 10;
  var offset = options.offset || 0;
  var minSimilarity = options.minSimilarity || 0.3;

  generateEmbedding(queryText, function(err, queryEmbedding) {
    if (err) return callback(err);

    var embeddingStr = "[" + queryEmbedding.join(",") + "]";
    var conditions = ["1 - (embedding <=> $1::vector) > $2"];
    var params = [embeddingStr, minSimilarity];
    var paramIndex = 3;

    if (filters.category) {
      conditions.push("category = $" + paramIndex);
      params.push(filters.category);
      paramIndex++;
    }

    if (filters.author) {
      conditions.push("author = $" + paramIndex);
      params.push(filters.author);
      paramIndex++;
    }

    if (filters.dateFrom) {
      conditions.push("created_at >= $" + paramIndex);
      params.push(filters.dateFrom);
      paramIndex++;
    }

    if (filters.dateTo) {
      conditions.push("created_at <= $" + paramIndex);
      params.push(filters.dateTo);
      paramIndex++;
    }

    if (filters.tags && filters.tags.length) {
      conditions.push("tags && $" + paramIndex);
      params.push(filters.tags);
      paramIndex++;
    }

    var whereClause = conditions.join(" AND ");
    params.push(limit);
    params.push(offset);

    var query = `
      SELECT
        id, title, body, category, author, tags, created_at,
        1 - (embedding <=> $1::vector) AS similarity
      FROM documents
      WHERE ${whereClause}
      ORDER BY embedding <=> $1::vector
      LIMIT $${paramIndex} OFFSET $${paramIndex + 1}
    `;

    pool.query(query, params, function(err, result) {
      if (err) return callback(err);

      var results = result.rows.map(function(row) {
        return {
          id: row.id,
          title: row.title,
          snippet: row.body ? row.body.substring(0, 200) + "..." : "",
          category: row.category,
          author: row.author,
          tags: row.tags,
          similarity: parseFloat(row.similarity.toFixed(4)),
          createdAt: row.created_at
        };
      });

      callback(null, {
        query: queryText,
        total: results.length,
        results: results
      });
    });
  });
}

This pipeline supports filtering by category, author, date range, and tags — all combined with the vector similarity search. The filters happen in SQL alongside the vector distance calculation, which is critical for performance. Do not filter in application code after fetching all results.

Result Ranking and Relevance Scoring

Raw cosine similarity is a good starting signal, but you can improve ranking by combining it with other factors:

function computeRankScore(result, query) {
  var score = result.similarity * 0.7;

  var titleLower = (result.title || "").toLowerCase();
  var queryLower = query.toLowerCase();
  if (titleLower.indexOf(queryLower) !== -1) {
    score += 0.15;
  }

  var ageInDays = (Date.now() - new Date(result.createdAt).getTime()) / 86400000;
  var recencyBoost = Math.max(0, 0.1 * (1 - ageInDays / 365));
  score += recencyBoost;

  var popularityBoost = Math.min(0.05, (result.viewCount || 0) / 10000 * 0.05);
  score += popularityBoost;

  return Math.min(1, score);
}

function rerankResults(results, query) {
  results.forEach(function(result) {
    result.rankScore = computeRankScore(result, query);
  });

  results.sort(function(a, b) {
    return b.rankScore - a.rankScore;
  });

  return results;
}

The weighting here is opinionated: 70% semantic similarity, 15% title match bonus, 10% recency, 5% popularity. Tune these weights based on your users' behavior. Track click-through rates to validate whether your ranking actually helps.

Implementing Search-as-You-Type with Debouncing

On the client side, you want to trigger searches as the user types without hammering your API on every keystroke. Debouncing delays the API call until the user pauses typing:

var searchInput = document.getElementById("search-input");
var resultsContainer = document.getElementById("results");
var debounceTimer = null;

function debounce(fn, delay) {
  return function() {
    var args = arguments;
    var context = this;
    clearTimeout(debounceTimer);
    debounceTimer = setTimeout(function() {
      fn.apply(context, args);
    }, delay);
  };
}

var performSearch = debounce(function(query) {
  if (query.length < 3) {
    resultsContainer.innerHTML = "";
    return;
  }

  var xhr = new XMLHttpRequest();
  xhr.open("GET", "/api/search?q=" + encodeURIComponent(query));
  xhr.onload = function() {
    if (xhr.status === 200) {
      var data = JSON.parse(xhr.responseText);
      renderResults(data.results);
    }
  };
  xhr.send();
}, 300);

searchInput.addEventListener("input", function() {
  performSearch(this.value.trim());
});

function renderResults(results) {
  var html = "";
  results.forEach(function(result) {
    html += '<div class="search-result">';
    html += '<h3><a href="/documents/' + result.id + '">' + result.title + '</a></h3>';
    html += '<p>' + result.snippet + '</p>';
    html += '<span class="similarity">Relevance: ' + (result.similarity * 100).toFixed(1) + '%</span>';
    html += '</div>';
  });
  resultsContainer.innerHTML = html || '<p>No results found.</p>';
}

The 300ms debounce delay is a sweet spot: fast enough to feel responsive, slow enough to avoid wasted API calls. For production, you should also add a loading indicator and cancel in-flight requests when a new search fires.

Handling Multi-Field Documents

When your documents have distinct fields that carry different semantic weight, you have two options: single-vector or multi-vector.

Single-vector (recommended for most cases): Concatenate fields with appropriate weighting and create one embedding per document. This is what buildDocumentText does above. It is simpler, cheaper to store, and faster to query.

Multi-vector: Create separate embeddings for title, body, and tags, then query against each and merge results. This is more expensive but gives you finer control:

function multiFieldSearch(queryText, callback) {
  generateEmbedding(queryText, function(err, queryEmbedding) {
    if (err) return callback(err);

    var embeddingStr = "[" + queryEmbedding.join(",") + "]";

    var query = `
      SELECT
        id, title, body, category,
        1 - (title_embedding <=> $1::vector) AS title_sim,
        1 - (body_embedding <=> $1::vector) AS body_sim,
        (0.5 * (1 - (title_embedding <=> $1::vector)) +
         0.5 * (1 - (body_embedding <=> $1::vector))) AS combined_sim
      FROM documents
      WHERE 1 - (title_embedding <=> $1::vector) > 0.25
         OR 1 - (body_embedding <=> $1::vector) > 0.25
      ORDER BY combined_sim DESC
      LIMIT 20
    `;

    pool.query(query, [embeddingStr], function(err, result) {
      if (err) return callback(err);
      callback(null, result.rows);
    });
  });
}

In practice, the single-vector approach handles 90% of use cases well. Only reach for multi-vector if you have evidence that it improves your specific search quality.

Query Expansion Techniques

Sometimes the user's query is too short or vague to produce a good embedding. Query expansion adds related terms to improve recall:

function expandQuery(query, callback) {
  openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
      {
        role: "system",
        content: "Given a search query, output an expanded version that includes synonyms and related terms. Return ONLY the expanded query text, nothing else. Keep it under 100 words."
      },
      {
        role: "user",
        content: query
      }
    ],
    max_tokens: 150,
    temperature: 0.3
  }).then(function(response) {
    var expanded = response.choices[0].message.content.trim();
    callback(null, expanded);
  }).catch(function(err) {
    callback(null, query);
  });
}

function searchWithExpansion(query, options, callback) {
  expandQuery(query, function(err, expandedQuery) {
    var finalQuery = err ? query : expandedQuery;
    searchPipeline({
      query: finalQuery,
      filters: options.filters,
      limit: options.limit
    }, callback);
  });
}

Query expansion costs an extra LLM call per search, so use it selectively. A good strategy is to only expand queries shorter than 5 words, or to trigger it when the initial search returns fewer than 3 results above the similarity threshold.

Re-Ranking Results with an LLM

After retrieving candidates via vector similarity, you can re-rank them using an LLM that evaluates how well each result actually answers the query. This is more expensive but dramatically improves quality for the top results:

function rerankWithLLM(query, results, callback) {
  if (results.length === 0) return callback(null, results);

  var candidates = results.slice(0, 20).map(function(r, i) {
    return "Document " + (i + 1) + ": " + r.title + "\n" + (r.snippet || "");
  }).join("\n\n");

  openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
      {
        role: "system",
        content: "You are a search relevance evaluator. Given a query and candidate documents, return a JSON array of document numbers ordered by relevance to the query. Return ONLY the JSON array, e.g. [3,1,5,2,4]."
      },
      {
        role: "user",
        content: "Query: " + query + "\n\nCandidates:\n" + candidates
      }
    ],
    max_tokens: 200,
    temperature: 0
  }).then(function(response) {
    try {
      var ranking = JSON.parse(response.choices[0].message.content.trim());
      var reranked = [];
      ranking.forEach(function(docNum) {
        var idx = docNum - 1;
        if (results[idx]) {
          reranked.push(results[idx]);
        }
      });
      results.forEach(function(r) {
        if (reranked.indexOf(r) === -1) {
          reranked.push(r);
        }
      });
      callback(null, reranked);
    } catch (e) {
      callback(null, results);
    }
  }).catch(function(err) {
    callback(null, results);
  });
}

The fallback behavior here is important: if the LLM call fails or returns unparseable JSON, you return the original vector-ranked results. Never let a re-ranking failure break your search.

Search Analytics

Tracking search behavior is essential for improving your search over time. Log queries, result counts, and click-throughs:

CREATE TABLE search_logs (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  query TEXT NOT NULL,
  expanded_query TEXT,
  result_count INTEGER,
  top_similarity FLOAT,
  filters JSONB,
  clicked_result_id UUID,
  clicked_position INTEGER,
  session_id TEXT,
  created_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_search_logs_query ON search_logs (query);
CREATE INDEX idx_search_logs_created ON search_logs (created_at);

function logSearch(data, callback) {
  var query = `
    INSERT INTO search_logs
      (query, expanded_query, result_count, top_similarity, filters, session_id)
    VALUES ($1, $2, $3, $4, $5, $6)
    RETURNING id
  `;
  var values = [
    data.query,
    data.expandedQuery || null,
    data.resultCount,
    data.topSimilarity || null,
    JSON.stringify(data.filters || {}),
    data.sessionId
  ];

  pool.query(query, values, function(err, result) {
    if (err) {
      console.error("Failed to log search:", err.message);
      return callback(null);
    }
    callback(result.rows[0].id);
  });
}

function logClick(searchLogId, resultId, position, callback) {
  var query = `
    UPDATE search_logs
    SET clicked_result_id = $1, clicked_position = $2
    WHERE id = $3
  `;
  pool.query(query, [resultId, position, searchLogId], function(err) {
    if (callback) callback(err);
  });
}

Useful analytics queries to run periodically:

-- Top queries with no clicks (users didn't find what they wanted)
SELECT query, COUNT(*) as searches
FROM search_logs
WHERE clicked_result_id IS NULL
  AND created_at > NOW() - INTERVAL '7 days'
GROUP BY query
ORDER BY searches DESC
LIMIT 20;

-- Average click position (lower is better)
SELECT AVG(clicked_position) as avg_position
FROM search_logs
WHERE clicked_result_id IS NOT NULL
  AND created_at > NOW() - INTERVAL '7 days';

-- Queries with low top similarity (your content might have gaps)
SELECT query, AVG(top_similarity) as avg_sim, COUNT(*) as searches
FROM search_logs
WHERE created_at > NOW() - INTERVAL '7 days'
GROUP BY query
HAVING AVG(top_similarity) < 0.4
ORDER BY searches DESC;

These analytics reveal where your content has gaps, where your search is underperforming, and what your users actually care about.

Building a Search API with Express.js

Tie everything together with an Express.js API:

var express = require("express");
var bodyParser = require("body-parser");
var { v4: uuidv4 } = require("uuid");

var app = express();
app.use(bodyParser.json());

// Search endpoint
app.get("/api/search", function(req, res) {
  var query = req.query.q;
  var category = req.query.category;
  var author = req.query.author;
  var limit = parseInt(req.query.limit) || 10;
  var page = parseInt(req.query.page) || 1;
  var offset = (page - 1) * limit;
  var sessionId = req.query.session || uuidv4();

  if (!query || query.trim().length < 2) {
    return res.status(400).json({ error: "Query must be at least 2 characters" });
  }

  var filters = {};
  if (category) filters.category = category;
  if (author) filters.author = author;

  searchPipeline({
    query: query.trim(),
    filters: filters,
    limit: limit,
    offset: offset
  }, function(err, searchResults) {
    if (err) {
      console.error("Search error:", err);
      return res.status(500).json({ error: "Search failed" });
    }

    var reranked = rerankResults(searchResults.results, query);

    logSearch({
      query: query,
      resultCount: reranked.length,
      topSimilarity: reranked.length > 0 ? reranked[0].similarity : null,
      filters: filters,
      sessionId: sessionId
    }, function(searchLogId) {
      res.json({
        query: query,
        page: page,
        limit: limit,
        total: reranked.length,
        searchLogId: searchLogId,
        results: reranked
      });
    });
  });
});

// Click tracking endpoint
app.post("/api/search/click", function(req, res) {
  var searchLogId = req.body.searchLogId;
  var resultId = req.body.resultId;
  var position = req.body.position;

  if (!searchLogId || !resultId) {
    return res.status(400).json({ error: "Missing required fields" });
  }

  logClick(searchLogId, resultId, position, function(err) {
    if (err) {
      console.error("Click log error:", err);
    }
    res.json({ ok: true });
  });
});

// Index a new document
app.post("/api/documents", function(req, res) {
  var doc = req.body;

  if (!doc.title) {
    return res.status(400).json({ error: "Title is required" });
  }

  indexDocument(doc, function(err, id) {
    if (err) {
      console.error("Index error:", err);
      return res.status(500).json({ error: "Failed to index document" });
    }
    res.status(201).json({ id: id, message: "Document indexed successfully" });
  });
});

// Bulk index documents
app.post("/api/documents/bulk", function(req, res) {
  var docs = req.body.documents;

  if (!Array.isArray(docs) || docs.length === 0) {
    return res.status(400).json({ error: "Documents array is required" });
  }

  var indexed = 0;
  var errors = [];

  function processNext(i) {
    if (i >= docs.length) {
      return res.json({
        indexed: indexed,
        errors: errors.length,
        errorDetails: errors
      });
    }

    indexDocument(docs[i], function(err, id) {
      if (err) {
        errors.push({ index: i, title: docs[i].title, error: err.message });
      } else {
        indexed++;
      }
      processNext(i + 1);
    });
  }

  processNext(0);
});

var PORT = process.env.PORT || 3000;
app.listen(PORT, function() {
  console.log("Semantic search API running on port " + PORT);
});

Complete Working Example

Here is a complete, self-contained semantic search engine you can run locally. Save this as search-engine.js:

var express = require("express");
var bodyParser = require("body-parser");
var { Pool } = require("pg");
var OpenAI = require("openai");
var { v4: uuidv4 } = require("uuid");

// --- Configuration ---
var pool = new Pool({ connectionString: process.env.DATABASE_URL });
var openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
var app = express();
app.use(bodyParser.json({ limit: "5mb" }));

// --- Database Setup ---
function initializeDatabase(callback) {
  var schema = `
    CREATE EXTENSION IF NOT EXISTS vector;

    CREATE TABLE IF NOT EXISTS documents (
      id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
      title TEXT NOT NULL,
      body TEXT,
      category TEXT,
      author TEXT,
      tags TEXT[] DEFAULT '{}',
      embedding vector(1536),
      view_count INTEGER DEFAULT 0,
      created_at TIMESTAMP DEFAULT NOW(),
      updated_at TIMESTAMP DEFAULT NOW()
    );

    CREATE TABLE IF NOT EXISTS search_logs (
      id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
      query TEXT NOT NULL,
      result_count INTEGER,
      top_similarity FLOAT,
      filters JSONB DEFAULT '{}',
      clicked_result_id UUID,
      clicked_position INTEGER,
      session_id TEXT,
      created_at TIMESTAMP DEFAULT NOW()
    );
  `;

  pool.query(schema, function(err) {
    if (err) {
      console.error("Schema init error:", err.message);
      return callback(err);
    }

    // Create IVFFlat index if enough rows exist
    pool.query("SELECT COUNT(*) FROM documents", function(err, result) {
      if (err) return callback(null); // Non-fatal
      var count = parseInt(result.rows[0].count);
      if (count >= 100) {
        var lists = Math.max(1, Math.floor(Math.sqrt(count)));
        var indexQuery = "CREATE INDEX IF NOT EXISTS idx_doc_embedding ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = " + lists + ")";
        pool.query(indexQuery, function() {
          callback(null);
        });
      } else {
        callback(null);
      }
    });
  });
}

// --- Embedding Generation ---
function generateEmbedding(text, callback) {
  var truncated = text.substring(0, 8000);
  openai.embeddings.create({
    model: "text-embedding-3-small",
    input: truncated
  }).then(function(response) {
    callback(null, response.data[0].embedding);
  }).catch(function(err) {
    callback(err);
  });
}

function buildDocumentText(doc) {
  var parts = [];
  if (doc.title) parts.push(doc.title);
  if (doc.tags && doc.tags.length) parts.push("Tags: " + doc.tags.join(", "));
  if (doc.body) parts.push(doc.body);
  return parts.join("\n\n");
}

// --- Document Indexing ---
function indexDocument(doc, callback) {
  var text = buildDocumentText(doc);
  generateEmbedding(text, function(err, embedding) {
    if (err) return callback(err);

    var query = `
      INSERT INTO documents (title, body, category, author, tags, embedding)
      VALUES ($1, $2, $3, $4, $5, $6)
      RETURNING id, created_at
    `;
    var values = [
      doc.title,
      doc.body || null,
      doc.category || null,
      doc.author || null,
      doc.tags || [],
      "[" + embedding.join(",") + "]"
    ];

    pool.query(query, values, function(err, result) {
      if (err) return callback(err);
      callback(null, result.rows[0]);
    });
  });
}

// --- Search ---
function search(queryText, options, callback) {
  generateEmbedding(queryText, function(err, queryEmbedding) {
    if (err) return callback(err);

    var embeddingStr = "[" + queryEmbedding.join(",") + "]";
    var conditions = ["1 - (embedding <=> $1::vector) > $2"];
    var params = [embeddingStr, options.minSimilarity || 0.3];
    var paramIndex = 3;

    if (options.category) {
      conditions.push("category = $" + paramIndex);
      params.push(options.category);
      paramIndex++;
    }

    if (options.author) {
      conditions.push("author = $" + paramIndex);
      params.push(options.author);
      paramIndex++;
    }

    if (options.tags && options.tags.length) {
      conditions.push("tags && $" + paramIndex);
      params.push(options.tags);
      paramIndex++;
    }

    params.push(options.limit || 10);
    params.push(options.offset || 0);

    var sql = `
      SELECT
        id, title, body, category, author, tags, view_count, created_at,
        1 - (embedding <=> $1::vector) AS similarity
      FROM documents
      WHERE ${conditions.join(" AND ")}
      ORDER BY embedding <=> $1::vector
      LIMIT $${paramIndex} OFFSET $${paramIndex + 1}
    `;

    pool.query(sql, params, function(err, result) {
      if (err) return callback(err);

      var results = result.rows.map(function(row) {
        return {
          id: row.id,
          title: row.title,
          snippet: row.body ? row.body.substring(0, 250) + "..." : "",
          category: row.category,
          author: row.author,
          tags: row.tags,
          similarity: parseFloat(parseFloat(row.similarity).toFixed(4)),
          viewCount: row.view_count,
          createdAt: row.created_at
        };
      });

      // Apply rank scoring
      results.forEach(function(r) {
        var score = r.similarity * 0.7;
        var titleLower = (r.title || "").toLowerCase();
        if (titleLower.indexOf(queryText.toLowerCase()) !== -1) {
          score += 0.15;
        }
        var ageInDays = (Date.now() - new Date(r.createdAt).getTime()) / 86400000;
        score += Math.max(0, 0.1 * (1 - ageInDays / 365));
        score += Math.min(0.05, (r.viewCount || 0) / 10000 * 0.05);
        r.rankScore = Math.min(1, parseFloat(score.toFixed(4)));
      });

      results.sort(function(a, b) {
        return b.rankScore - a.rankScore;
      });

      callback(null, results);
    });
  });
}

// --- Analytics ---
function logSearch(data, callback) {
  var sql = `
    INSERT INTO search_logs (query, result_count, top_similarity, filters, session_id)
    VALUES ($1, $2, $3, $4, $5)
    RETURNING id
  `;
  pool.query(sql, [
    data.query,
    data.resultCount,
    data.topSimilarity,
    JSON.stringify(data.filters || {}),
    data.sessionId
  ], function(err, result) {
    if (err) {
      console.error("Log error:", err.message);
      return callback(null);
    }
    callback(result.rows[0].id);
  });
}

// --- API Routes ---
app.get("/api/search", function(req, res) {
  var q = (req.query.q || "").trim();
  if (q.length < 2) {
    return res.status(400).json({ error: "Query must be at least 2 characters" });
  }

  var options = {
    category: req.query.category || null,
    author: req.query.author || null,
    limit: parseInt(req.query.limit) || 10,
    offset: ((parseInt(req.query.page) || 1) - 1) * (parseInt(req.query.limit) || 10)
  };

  var sessionId = req.query.session || uuidv4();

  search(q, options, function(err, results) {
    if (err) {
      console.error("Search error:", err);
      return res.status(500).json({ error: "Search failed" });
    }

    logSearch({
      query: q,
      resultCount: results.length,
      topSimilarity: results.length > 0 ? results[0].similarity : null,
      filters: { category: options.category, author: options.author },
      sessionId: sessionId
    }, function(searchLogId) {
      res.json({
        query: q,
        total: results.length,
        searchLogId: searchLogId,
        results: results
      });
    });
  });
});

app.post("/api/search/click", function(req, res) {
  var sql = "UPDATE search_logs SET clicked_result_id = $1, clicked_position = $2 WHERE id = $3";
  pool.query(sql, [req.body.resultId, req.body.position, req.body.searchLogId], function(err) {
    res.json({ ok: !err });
  });
});

app.post("/api/documents", function(req, res) {
  if (!req.body.title) {
    return res.status(400).json({ error: "Title is required" });
  }

  indexDocument(req.body, function(err, doc) {
    if (err) {
      console.error("Index error:", err);
      return res.status(500).json({ error: "Failed to index document" });
    }
    res.status(201).json({ id: doc.id, message: "Indexed", createdAt: doc.created_at });
  });
});

app.post("/api/documents/bulk", function(req, res) {
  var docs = req.body.documents;
  if (!Array.isArray(docs) || docs.length === 0) {
    return res.status(400).json({ error: "Documents array required" });
  }

  var indexed = 0;
  var failed = [];

  function next(i) {
    if (i >= docs.length) {
      return res.json({ indexed: indexed, failed: failed.length, errors: failed });
    }
    indexDocument(docs[i], function(err) {
      if (err) {
        failed.push({ index: i, title: docs[i].title, error: err.message });
      } else {
        indexed++;
      }
      next(i + 1);
    });
  }

  next(0);
});

// --- Start Server ---
initializeDatabase(function(err) {
  if (err) {
    console.error("Failed to initialize database:", err);
    process.exit(1);
  }

  var PORT = process.env.PORT || 3000;
  app.listen(PORT, function() {
    console.log("Semantic search engine running on port " + PORT);
  });
});

Run it:

export DATABASE_URL="postgresql://user:pass@localhost:5432/searchdb"
export OPENAI_API_KEY="sk-..."
node search-engine.js

Index some documents:

curl -X POST http://localhost:3000/api/documents \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Getting Started with Docker Compose",
    "body": "Docker Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application services...",
    "category": "devops",
    "author": "Shane Larson",
    "tags": ["docker", "containers", "devops"]
  }'

Search:

curl "http://localhost:3000/api/search?q=how+to+run+multiple+containers"

Even though the query says "run multiple containers" and the document says "multi-container Docker applications," semantic search finds it because the meaning is the same.

Common Issues and Troubleshooting

1. pgvector Extension Not Found

ERROR: could not open extension control file "/usr/share/postgresql/15/extension/vector.control": No such file or directory

This means pgvector is not installed on your PostgreSQL server. On Ubuntu/Debian:

sudo apt install postgresql-15-pgvector

On macOS with Homebrew:

brew install pgvector

After installing, connect to your database and run CREATE EXTENSION vector;. You need superuser or CREATE privilege on the database.

2. Dimension Mismatch When Inserting Embeddings

ERROR: expected 1536 dimensions, not 768

This happens when you switch embedding models without updating your schema. text-embedding-3-small produces 1536 dimensions, but some older models produce 768. If you change models, you need to recreate the column and re-embed all documents:

ALTER TABLE documents DROP COLUMN embedding;
ALTER TABLE documents ADD COLUMN embedding vector(1536);
-- Re-index all documents

3. IVFFlat Index Requires Table Data

ERROR: cannot create ivfflat index on empty table

The IVFFlat index needs existing data to build its clusters. You must insert documents before creating the index. The complete example above handles this by only creating the index when the table has at least 100 rows. Alternatively, use an HNSW index which does not have this limitation:

CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

4. OpenAI Rate Limits During Bulk Indexing

Error: 429 Too Many Requests - Rate limit reached for text-embedding-3-small

When indexing thousands of documents, you will hit OpenAI's rate limits. Add a delay between requests and batch your embeddings:

function batchEmbed(texts, callback) {
  openai.embeddings.create({
    model: "text-embedding-3-small",
    input: texts.map(function(t) { return t.substring(0, 8000); })
  }).then(function(response) {
    var embeddings = response.data.map(function(d) { return d.embedding; });
    callback(null, embeddings);
  }).catch(function(err) {
    if (err.status === 429) {
      console.log("Rate limited, retrying in 5 seconds...");
      setTimeout(function() {
        batchEmbed(texts, callback);
      }, 5000);
    } else {
      callback(err);
    }
  });
}

The embeddings API accepts up to 2048 inputs in a single call. Batching is both faster and more cost-effective than individual calls.

5. Slow Queries on Large Tables Without Index

Query took 12.4 seconds (sequential scan on 500,000 rows)

Without an index, pgvector performs a sequential scan comparing your query vector against every row. For tables over 10,000 rows, you absolutely need an index. IVFFlat is faster to build but less accurate. HNSW is slower to build but gives better recall:

-- For speed-critical applications
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 200);

-- Set probes higher for better recall (IVFFlat only)
SET ivfflat.probes = 10;

6. Empty Results Despite Relevant Documents

If searches return no results even though relevant documents exist, your similarity threshold is likely too high. Check your threshold:

SELECT title, 1 - (embedding <=> '[your_query_embedding]'::vector) AS similarity
FROM documents
ORDER BY similarity DESC
LIMIT 5;

If the best match has a similarity of 0.28 but your threshold is 0.3, you are filtering out good results. Lower the threshold or consider whether your documents need to be re-embedded with more descriptive text.

Best Practices

Batch your embedding calls. The OpenAI embeddings API supports up to 2048 inputs per request. Batching reduces latency and cost by 10-50x compared to individual calls. Always batch during bulk indexing.
Use HNSW indexes for tables under 1 million rows. HNSW gives better recall than IVFFlat and does not require tuning the lists parameter. The tradeoff is higher memory usage and slower index builds, but for most applications the improved search quality is worth it.
Store the raw text alongside the embedding. You need the original text for generating snippets, debugging relevance issues, and re-embedding when you upgrade models. Never discard the source text after generating an embedding.
Set a minimum query length. Queries under 3 characters produce poor embeddings and waste API calls. Enforce a minimum on both the client and server side.
Cache frequently repeated queries. If 100 users search for "docker tutorial" in an hour, you should not generate the embedding 100 times. Cache the query embedding in Redis or even in-memory with a TTL of 5-10 minutes.
Re-embed when you change models. Embeddings from different models are not compatible — you cannot mix text-embedding-ada-002 vectors with text-embedding-3-small vectors in the same column. When upgrading models, re-embed your entire corpus and rebuild the index.
Monitor your similarity score distribution. If most results cluster at 0.5-0.6 similarity, your search is working. If everything is below 0.3, either your content does not match what users are searching for, or your document text preparation needs improvement. Log and review these metrics weekly.
Degrade gracefully when the embedding API is down. Fall back to PostgreSQL full-text search (tsvector) if the OpenAI API is unavailable. Your search will be worse but your users will not see an error page.
Keep your embedding dimension as small as practical. text-embedding-3-small at 1536 dimensions is a good balance. You can request fewer dimensions (e.g., 512) via the dimensions parameter to reduce storage and improve query speed, at a modest cost to accuracy.

References

pgvector GitHub repository - PostgreSQL extension for vector similarity search
OpenAI Embeddings Guide - Official documentation for embedding models
OpenAI text-embedding-3-small model - Model specifications and pricing
node-postgres (pg) documentation - PostgreSQL client for Node.js
Approximate Nearest Neighbor benchmarks - Comparison of ANN algorithms including IVFFlat and HNSW
Express.js documentation - Web framework for building the search API

Semantic Search Implementation from Scratch

Semantic Search Implementation from Scratch

Prerequisites

How Semantic Search Differs from Keyword Search

The Semantic Search Pipeline

Generating Embeddings for Your Document Corpus

Storing Vectors in PostgreSQL with pgvector

Implementing Cosine Similarity Search

Building a Query Pipeline

Result Ranking and Relevance Scoring

Implementing Search-as-You-Type with Debouncing

Handling Multi-Field Documents

Query Expansion Techniques

Re-Ranking Results with an LLM

Search Analytics

Building a Search API with Express.js

Complete Working Example

Common Issues and Troubleshooting

1. pgvector Extension Not Found

2. Dimension Mismatch When Inserting Embeddings

3. IVFFlat Index Requires Table Data

4. OpenAI Rate Limits During Bulk Indexing

5. Slow Queries on Large Tables Without Index

6. Empty Results Despite Relevant Documents

Best Practices

References

Quick Links

Need Expert Help?