Search and Discovery with AI
Build AI-powered search with semantic understanding, query expansion, personalization, and result summarization in Node.js.
Search and Discovery with AI
Overview
Traditional keyword search breaks down when users type vague queries, misspell terms, or describe what they want using different vocabulary than what exists in your content. AI-powered search transforms this experience by understanding intent, matching concepts rather than exact words, and delivering results that feel like the system actually understands what the user needs. In this article, we will build a complete AI-powered search system in Node.js that combines semantic embeddings, query understanding, personalization, and result summarization into a production-ready service.
Prerequisites
- Node.js v18 or later
- PostgreSQL 15+ with the pgvector extension installed
- An OpenAI API key (for embeddings and LLM features)
- Working knowledge of Express.js and SQL
- Familiarity with REST APIs and basic information retrieval concepts
Install the pgvector extension in PostgreSQL:
CREATE EXTENSION IF NOT EXISTS vector;
Install the required Node.js packages:
npm install express pg openai body-parser uuid
How AI Transforms Search
Traditional search relies on keyword matching. If a user searches for "fast web server" and your content uses the phrase "high-performance HTTP framework," keyword search returns nothing. This is the vocabulary mismatch problem, and it plagues every search system built on exact matching.
AI transforms search in three fundamental ways:
- Semantic understanding — Embeddings capture meaning, not just words. "Fast web server" and "high-performance HTTP framework" land close together in vector space.
- Intent recognition — LLMs can parse what a user actually wants from an ambiguous query and rewrite it into something the search system handles well.
- Contextual ranking — AI can weigh signals like user behavior, content freshness, and semantic relevance simultaneously to produce rankings that feel intuitive.
I have shipped search systems that went from 15% zero-result rates to under 2% simply by adding semantic search alongside keyword matching. The improvement is not incremental; it is transformative.
Implementing Semantic Search with Embeddings and pgvector
The foundation of AI-powered search is vector embeddings. Every piece of content gets converted into a high-dimensional vector that captures its meaning. At query time, the user's query is also embedded, and we find the closest content vectors.
Schema Setup
CREATE TABLE content (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
title TEXT NOT NULL,
body TEXT NOT NULL,
category TEXT,
tags TEXT[],
popularity_score FLOAT DEFAULT 0,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
embedding VECTOR(1536)
);
CREATE INDEX ON content USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
CREATE INDEX ON content USING gin (tags);
CREATE INDEX ON content (category);
CREATE INDEX ON content (created_at DESC);
The ivfflat index is critical for performance. Without it, pgvector does a sequential scan across every row. With it, you get approximate nearest neighbor search that scales to millions of documents.
Generating Embeddings
var OpenAI = require("openai");
var openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
function generateEmbedding(text) {
return openai.embeddings.create({
model: "text-embedding-3-small",
input: text.substring(0, 8000)
}).then(function(response) {
return response.data[0].embedding;
});
}
function indexContent(pool, item) {
var textForEmbedding = item.title + "\n\n" + item.body.substring(0, 4000);
return generateEmbedding(textForEmbedding).then(function(embedding) {
var vectorStr = "[" + embedding.join(",") + "]";
return pool.query(
"INSERT INTO content (title, body, category, tags, embedding) VALUES ($1, $2, $3, $4, $5::vector) RETURNING id",
[item.title, item.body, item.category, item.tags, vectorStr]
);
});
}
I use text-embedding-3-small rather than the larger model because the quality difference is negligible for most search use cases and the cost is an order of magnitude lower. Embed the title prominently by concatenating it before the body — titles carry disproportionate signal.
Querying by Similarity
function semanticSearch(pool, queryEmbedding, limit) {
var vectorStr = "[" + queryEmbedding.join(",") + "]";
return pool.query(
"SELECT id, title, body, category, tags, " +
"1 - (embedding <=> $1::vector) AS similarity " +
"FROM content " +
"WHERE embedding IS NOT NULL " +
"ORDER BY embedding <=> $1::vector " +
"LIMIT $2",
[vectorStr, limit]
).then(function(result) {
return result.rows;
});
}
The <=> operator computes cosine distance. We subtract from 1 to convert it to cosine similarity where 1.0 means identical and 0.0 means orthogonal.
Query Understanding and Expansion with LLMs
Raw user queries are often terrible. People type "node thing that watches files" when they mean "file system watcher in Node.js." LLMs can bridge this gap.
function expandQuery(originalQuery) {
return openai.chat.completions.create({
model: "gpt-4o-mini",
temperature: 0,
messages: [
{
role: "system",
content: "You are a search query optimizer. Given a user search query, " +
"produce a JSON object with: " +
"1) 'rewritten' — the query rewritten for clarity " +
"2) 'keywords' — an array of 3-5 key search terms " +
"3) 'intent' — one of: informational, navigational, transactional " +
"Return ONLY valid JSON, no markdown."
},
{ role: "user", content: originalQuery }
]
}).then(function(response) {
return JSON.parse(response.choices[0].message.content);
});
}
Example transformation:
Input: "node thing that watches files"
Output: {
"rewritten": "Node.js file system watcher implementation",
"keywords": ["nodejs", "file watcher", "fs.watch", "chokidar", "file monitoring"],
"intent": "informational"
}
The expanded keywords give you multiple angles of attack. Use the rewritten query for semantic search and the keywords for traditional full-text matching.
Faceted Search with AI-Powered Categorization
When content lacks proper categorization, AI can infer it. This is especially useful when ingesting content from external sources or legacy systems without consistent metadata.
function categorizeContent(title, body) {
var categories = [
"backend-development", "frontend-development", "devops",
"databases", "security", "architecture", "ai-ml",
"testing", "performance", "tooling"
];
return openai.chat.completions.create({
model: "gpt-4o-mini",
temperature: 0,
messages: [
{
role: "system",
content: "Classify this content into one primary and up to two secondary " +
"categories from this list: " + categories.join(", ") + ". " +
"Also extract 3-5 tags. Return JSON: " +
'{"primary": "...", "secondary": [...], "tags": [...]}'
},
{ role: "user", content: title + "\n\n" + body.substring(0, 2000) }
]
}).then(function(response) {
return JSON.parse(response.choices[0].message.content);
});
}
Run this at ingest time, not query time. Store the results in the database so faceted filtering is a simple WHERE clause, not an LLM call on every search.
Personalized Search Results Based on User Behavior
Personalization requires tracking what users interact with and using that signal to re-rank results.
CREATE TABLE user_interactions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT NOT NULL,
content_id UUID REFERENCES content(id),
interaction_type TEXT NOT NULL, -- 'view', 'click', 'bookmark', 'share'
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX ON user_interactions (user_id, created_at DESC);
function getUserProfile(pool, userId) {
return pool.query(
"SELECT c.category, c.tags, COUNT(*) as interaction_count " +
"FROM user_interactions ui " +
"JOIN content c ON c.id = ui.content_id " +
"WHERE ui.user_id = $1 AND ui.created_at > NOW() - INTERVAL '30 days' " +
"GROUP BY c.category, c.tags " +
"ORDER BY interaction_count DESC " +
"LIMIT 20",
[userId]
).then(function(result) {
var profile = { categories: {}, tags: {} };
result.rows.forEach(function(row) {
profile.categories[row.category] = (profile.categories[row.category] || 0) + parseInt(row.interaction_count);
if (row.tags) {
row.tags.forEach(function(tag) {
profile.tags[tag] = (profile.tags[tag] || 0) + parseInt(row.interaction_count);
});
}
});
return profile;
});
}
function personalizeResults(results, userProfile) {
return results.map(function(result) {
var boost = 0;
var categoryWeight = userProfile.categories[result.category] || 0;
boost += categoryWeight * 0.02;
if (result.tags) {
result.tags.forEach(function(tag) {
var tagWeight = userProfile.tags[tag] || 0;
boost += tagWeight * 0.01;
});
}
// Cap the personalization boost so it nudges, not dominates
result.personalizedScore = result.similarity + Math.min(boost, 0.15);
return result;
}).sort(function(a, b) {
return b.personalizedScore - a.personalizedScore;
});
}
A critical design decision: cap the personalization boost. If you let it dominate, users get stuck in filter bubbles and never discover new content. I cap it at 15% of the maximum similarity score. The semantic relevance should still be the primary signal.
Implementing "More Like This" Recommendations
Once you have embeddings, "more like this" is almost free:
function moreLikeThis(pool, contentId, limit) {
return pool.query(
"SELECT b.id, b.title, b.category, b.tags, " +
"1 - (a.embedding <=> b.embedding) AS similarity " +
"FROM content a, content b " +
"WHERE a.id = $1 AND b.id != $1 AND b.embedding IS NOT NULL " +
"ORDER BY a.embedding <=> b.embedding " +
"LIMIT $2",
[contentId, limit]
).then(function(result) {
return result.rows;
});
}
This outperforms tag-based recommendations dramatically. Two articles about "Node.js performance tuning" and "Express.js optimization techniques" share semantic meaning even if they have completely different tags.
Conversational Search (Multi-Turn Refinement)
Users often need to refine their search iteratively. Conversational search maintains context across multiple queries.
function conversationalSearch(messages, newQuery) {
var contextMessages = messages.slice(-4); // Keep last 4 turns
contextMessages.push({ role: "user", content: newQuery });
return openai.chat.completions.create({
model: "gpt-4o-mini",
temperature: 0,
messages: [
{
role: "system",
content: "You are a search assistant. The user is refining their search. " +
"Given the conversation history, produce a single optimized search query " +
"that captures their full intent. Return ONLY the optimized query string."
}
].concat(contextMessages)
}).then(function(response) {
return response.choices[0].message.content.trim();
});
}
Example conversation:
User: "node caching"
→ Search: "Node.js caching strategies"
User: "specifically redis"
→ Search: "Node.js Redis caching implementation"
User: "with session storage"
→ Search: "Node.js Redis session storage and caching"
Each turn builds on the previous context. The LLM synthesizes the entire conversation into a single coherent search query.
Search Result Summarization
Instead of showing raw content snippets, generate contextual summaries that explain why each result matches the query.
function summarizeResults(query, results) {
var resultsContext = results.slice(0, 5).map(function(r, i) {
return (i + 1) + ". Title: " + r.title + "\nExcerpt: " + r.body.substring(0, 300);
}).join("\n\n");
return openai.chat.completions.create({
model: "gpt-4o-mini",
temperature: 0,
messages: [
{
role: "system",
content: "Given a search query and results, generate a brief 1-2 sentence " +
"summary for each result explaining why it is relevant to the query. " +
"Return JSON array: [{\"index\": 0, \"summary\": \"...\"}]"
},
{
role: "user",
content: "Query: " + query + "\n\nResults:\n" + resultsContext
}
]
}).then(function(response) {
return JSON.parse(response.choices[0].message.content);
});
}
This is expensive, so do it only for the top 5 results. Cache the summaries aggressively — the same query-result pair always produces a similar summary.
Zero-Result Handling
A zero-result page is the worst user experience in search. AI can salvage it.
function handleZeroResults(pool, originalQuery) {
return openai.chat.completions.create({
model: "gpt-4o-mini",
temperature: 0.3,
messages: [
{
role: "system",
content: "The user searched for something and got no results. " +
"Suggest 3 alternative search queries that might find what they need. " +
"Also identify the core topic. Return JSON: " +
'{"topic": "...", "alternatives": ["...", "...", "..."]}'
},
{ role: "user", content: originalQuery }
]
}).then(function(response) {
var suggestions = JSON.parse(response.choices[0].message.content);
// Run the first alternative as a semantic search fallback
return generateEmbedding(suggestions.alternatives[0]).then(function(embedding) {
return semanticSearch(pool, embedding, 5).then(function(fallbackResults) {
return {
suggestions: suggestions.alternatives,
topic: suggestions.topic,
fallbackResults: fallbackResults
};
});
});
});
}
This approach has two layers: it gives the user alternative queries they can click, and it proactively runs a fallback search so the page is never truly empty.
Implementing Autocomplete with Semantic Suggestions
Traditional autocomplete matches prefixes. Semantic autocomplete suggests conceptually related queries.
function semanticAutocomplete(pool, partialQuery, limit) {
if (partialQuery.length < 2) {
return Promise.resolve([]);
}
// Combine prefix matching from popular queries with semantic similarity
return pool.query(
"SELECT query_text, search_count FROM search_logs " +
"WHERE query_text ILIKE $1 " +
"ORDER BY search_count DESC LIMIT $2",
[partialQuery + "%", limit]
).then(function(prefixResults) {
if (prefixResults.rows.length >= limit) {
return prefixResults.rows.map(function(r) { return r.query_text; });
}
// Fill remaining slots with semantic suggestions
return generateEmbedding(partialQuery).then(function(embedding) {
var vectorStr = "[" + embedding.join(",") + "]";
return pool.query(
"SELECT query_text FROM search_logs " +
"WHERE query_embedding IS NOT NULL " +
"ORDER BY query_embedding <=> $1::vector " +
"LIMIT $2",
[vectorStr, limit - prefixResults.rows.length]
).then(function(semanticResults) {
var combined = prefixResults.rows.map(function(r) { return r.query_text; });
semanticResults.rows.forEach(function(r) {
if (combined.indexOf(r.query_text) === -1) {
combined.push(r.query_text);
}
});
return combined;
});
});
});
}
Store embeddings for popular queries in your search logs table. This avoids generating embeddings on every keystroke. Only fall back to live embedding generation for novel prefixes that do not match anything in the logs.
Combining Search Signals
The real power comes from blending multiple signals into a single relevance score.
function hybridSearch(pool, query, options) {
var opts = options || {};
var limit = opts.limit || 20;
var weights = {
semantic: opts.semanticWeight || 0.5,
textMatch: opts.textWeight || 0.25,
popularity: opts.popularityWeight || 0.1,
freshness: opts.freshnessWeight || 0.15
};
return generateEmbedding(query).then(function(embedding) {
var vectorStr = "[" + embedding.join(",") + "]";
return pool.query(
"SELECT id, title, body, category, tags, popularity_score, created_at, " +
"1 - (embedding <=> $1::vector) AS semantic_score, " +
"ts_rank(to_tsvector('english', title || ' ' || body), plainto_tsquery('english', $2)) AS text_score, " +
"EXTRACT(EPOCH FROM (NOW() - created_at)) / 86400.0 AS days_old " +
"FROM content " +
"WHERE embedding IS NOT NULL " +
"ORDER BY embedding <=> $1::vector " +
"LIMIT $3",
[vectorStr, query, limit * 3] // Fetch extra for re-ranking
).then(function(result) {
// Normalize and combine scores
var maxTextScore = 0;
var maxPopularity = 0;
result.rows.forEach(function(row) {
if (row.text_score > maxTextScore) maxTextScore = row.text_score;
if (row.popularity_score > maxPopularity) maxPopularity = row.popularity_score;
});
var scored = result.rows.map(function(row) {
var normalizedText = maxTextScore > 0 ? row.text_score / maxTextScore : 0;
var normalizedPopularity = maxPopularity > 0 ? row.popularity_score / maxPopularity : 0;
var freshnessScore = Math.max(0, 1 - (row.days_old / 365));
row.finalScore = (
weights.semantic * row.semantic_score +
weights.textMatch * normalizedText +
weights.popularity * normalizedPopularity +
weights.freshness * freshnessScore
);
return row;
});
scored.sort(function(a, b) {
return b.finalScore - a.finalScore;
});
return scored.slice(0, limit);
});
});
}
The weight distribution matters enormously. I start with semantic at 50% because it handles the vocabulary mismatch problem. Text match at 25% rewards exact keyword hits. Popularity at 10% gives a gentle boost to proven content. Freshness at 15% keeps results current. Tune these based on your domain and user feedback.
Search Analytics and Query Understanding
You cannot improve what you do not measure. Log every search and track outcomes.
CREATE TABLE search_logs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT,
query_text TEXT NOT NULL,
query_embedding VECTOR(1536),
expanded_query TEXT,
result_count INTEGER,
clicked_result_id UUID,
clicked_position INTEGER,
session_id TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX ON search_logs (created_at DESC);
CREATE INDEX ON search_logs (query_text);
function logSearch(pool, searchData) {
var vectorStr = searchData.queryEmbedding
? "[" + searchData.queryEmbedding.join(",") + "]"
: null;
return pool.query(
"INSERT INTO search_logs (user_id, query_text, query_embedding, expanded_query, result_count, session_id) " +
"VALUES ($1, $2, $3::vector, $4, $5, $6) RETURNING id",
[
searchData.userId,
searchData.query,
vectorStr,
searchData.expandedQuery,
searchData.resultCount,
searchData.sessionId
]
);
}
function logClick(pool, searchLogId, contentId, position) {
return pool.query(
"UPDATE search_logs SET clicked_result_id = $1, clicked_position = $2 WHERE id = $3",
[contentId, position, searchLogId]
);
}
Key metrics to track:
- Click-through rate (CTR): What percentage of searches result in a click?
- Mean reciprocal rank (MRR): How far down did users have to scroll?
- Zero-result rate: What percentage of queries return nothing?
- Query abandonment rate: How many users search and immediately leave?
A/B Testing Search Algorithms
When you change ranking weights or add a new signal, A/B test it before rolling it out.
function getSearchVariant(userId) {
// Simple deterministic assignment based on user ID hash
var hash = 0;
for (var i = 0; i < userId.length; i++) {
hash = ((hash << 5) - hash) + userId.charCodeAt(i);
hash = hash & hash;
}
return Math.abs(hash) % 100 < 50 ? "control" : "treatment";
}
function searchWithABTest(pool, query, userId) {
var variant = getSearchVariant(userId);
var weights;
if (variant === "control") {
weights = { semanticWeight: 0.5, textWeight: 0.25, popularityWeight: 0.1, freshnessWeight: 0.15 };
} else {
weights = { semanticWeight: 0.6, textWeight: 0.15, popularityWeight: 0.1, freshnessWeight: 0.15 };
}
return hybridSearch(pool, query, weights).then(function(results) {
// Log the variant for later analysis
logSearch(pool, {
userId: userId,
query: query,
resultCount: results.length,
variant: variant
});
return { results: results, variant: variant };
});
}
Run tests for at least two weeks. Search behavior varies by day of week and you need enough data for statistical significance. Compare CTR and MRR between variants.
Building a Search Feedback Loop
The most powerful search systems learn from user behavior. When a user clicks result #4 instead of result #1, that is a signal.
function updatePopularityFromClicks(pool) {
return pool.query(
"UPDATE content SET popularity_score = sub.score " +
"FROM (" +
" SELECT clicked_result_id, " +
" COUNT(*) * 1.0 / GREATEST(EXTRACT(EPOCH FROM (NOW() - MIN(sl.created_at))) / 86400, 1) AS score " +
" FROM search_logs sl " +
" WHERE clicked_result_id IS NOT NULL " +
" AND sl.created_at > NOW() - INTERVAL '90 days' " +
" GROUP BY clicked_result_id " +
") sub " +
"WHERE content.id = sub.clicked_result_id"
);
}
Run this as a nightly batch job. The formula produces a time-decayed click rate — content that gets clicks recently ranks higher than content that got clicks months ago.
Complete Working Example
Here is a full Express.js search service that combines everything we have covered.
var express = require("express");
var bodyParser = require("body-parser");
var { Pool } = require("pg");
var OpenAI = require("openai");
var { v4: uuidv4 } = require("uuid");
var app = express();
app.use(bodyParser.json());
var pool = new Pool({
connectionString: process.env.DATABASE_URL
});
var openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// --- Embedding helpers ---
function generateEmbedding(text) {
return openai.embeddings.create({
model: "text-embedding-3-small",
input: text.substring(0, 8000)
}).then(function(response) {
return response.data[0].embedding;
});
}
function vectorToString(embedding) {
return "[" + embedding.join(",") + "]";
}
// --- Query expansion ---
function expandQuery(query) {
return openai.chat.completions.create({
model: "gpt-4o-mini",
temperature: 0,
messages: [
{
role: "system",
content: "Rewrite this search query for maximum retrieval effectiveness. " +
"Return JSON: {\"rewritten\": \"...\", \"keywords\": [\"...\"]}"
},
{ role: "user", content: query }
]
}).then(function(response) {
try {
return JSON.parse(response.choices[0].message.content);
} catch (e) {
return { rewritten: query, keywords: [query] };
}
});
}
// --- Hybrid search ---
function hybridSearch(query, options) {
var opts = options || {};
var limit = opts.limit || 20;
var userId = opts.userId;
return expandQuery(query).then(function(expanded) {
return generateEmbedding(expanded.rewritten).then(function(embedding) {
var vecStr = vectorToString(embedding);
return pool.query(
"SELECT id, title, LEFT(body, 500) AS excerpt, category, tags, " +
"popularity_score, created_at, " +
"1 - (embedding <=> $1::vector) AS semantic_score, " +
"ts_rank(to_tsvector('english', title || ' ' || body), " +
"plainto_tsquery('english', $2)) AS text_score, " +
"EXTRACT(EPOCH FROM (NOW() - created_at)) / 86400.0 AS days_old " +
"FROM content WHERE embedding IS NOT NULL " +
"ORDER BY embedding <=> $1::vector LIMIT $3",
[vecStr, query, limit * 2]
).then(function(result) {
var rows = result.rows;
var maxText = 0;
var maxPop = 0;
rows.forEach(function(row) {
if (row.text_score > maxText) maxText = row.text_score;
if (row.popularity_score > maxPop) maxPop = row.popularity_score;
});
var scored = rows.map(function(row) {
var normText = maxText > 0 ? row.text_score / maxText : 0;
var normPop = maxPop > 0 ? row.popularity_score / maxPop : 0;
var freshness = Math.max(0, 1 - (row.days_old / 365));
row.finalScore = (
0.50 * row.semantic_score +
0.25 * normText +
0.10 * normPop +
0.15 * freshness
);
return row;
});
scored.sort(function(a, b) {
return b.finalScore - a.finalScore;
});
return {
results: scored.slice(0, limit),
expandedQuery: expanded.rewritten,
embedding: embedding
};
});
});
});
}
// --- Personalization ---
function applyPersonalization(pool, userId, results) {
if (!userId) return Promise.resolve(results);
return pool.query(
"SELECT c.category, COUNT(*) AS cnt FROM user_interactions ui " +
"JOIN content c ON c.id = ui.content_id " +
"WHERE ui.user_id = $1 AND ui.created_at > NOW() - INTERVAL '30 days' " +
"GROUP BY c.category",
[userId]
).then(function(profileResult) {
var catBoosts = {};
profileResult.rows.forEach(function(row) {
catBoosts[row.category] = parseInt(row.cnt);
});
return results.map(function(r) {
var boost = (catBoosts[r.category] || 0) * 0.02;
r.personalizedScore = r.finalScore + Math.min(boost, 0.15);
return r;
}).sort(function(a, b) {
return b.personalizedScore - a.personalizedScore;
});
});
}
// --- Result summarization ---
function summarizeTopResults(query, results) {
var top = results.slice(0, 5);
var context = top.map(function(r, i) {
return (i + 1) + ". " + r.title + ": " + (r.excerpt || "").substring(0, 200);
}).join("\n");
return openai.chat.completions.create({
model: "gpt-4o-mini",
temperature: 0,
messages: [
{
role: "system",
content: "Generate a 1-sentence summary for each search result explaining " +
"its relevance to the query. Return JSON array: " +
'[{"index": 0, "summary": "..."}]'
},
{ role: "user", content: "Query: " + query + "\n\n" + context }
]
}).then(function(response) {
try {
var summaries = JSON.parse(response.choices[0].message.content);
summaries.forEach(function(s) {
if (top[s.index]) {
top[s.index].aiSummary = s.summary;
}
});
} catch (e) {
// Summaries are optional; degrade gracefully
}
return results;
});
}
// --- Zero-result fallback ---
function handleZeroResults(query) {
return openai.chat.completions.create({
model: "gpt-4o-mini",
temperature: 0.3,
messages: [
{
role: "system",
content: "A search returned no results. Suggest 3 alternative queries. " +
'Return JSON: {"alternatives": ["...", "...", "..."]}'
},
{ role: "user", content: query }
]
}).then(function(response) {
try {
return JSON.parse(response.choices[0].message.content);
} catch (e) {
return { alternatives: [] };
}
});
}
// --- API Routes ---
// Main search endpoint
app.get("/api/search", function(req, res) {
var query = req.query.q;
var userId = req.query.userId;
var limit = parseInt(req.query.limit) || 20;
var summarize = req.query.summarize === "true";
if (!query) {
return res.status(400).json({ error: "Query parameter 'q' is required" });
}
var sessionId = req.query.sessionId || uuidv4();
hybridSearch(query, { limit: limit, userId: userId }).then(function(searchResult) {
if (searchResult.results.length === 0) {
return handleZeroResults(query).then(function(fallback) {
return res.json({
query: query,
results: [],
suggestions: fallback.alternatives,
sessionId: sessionId
});
});
}
var pipeline = Promise.resolve(searchResult.results);
// Apply personalization if user is identified
if (userId) {
pipeline = pipeline.then(function(results) {
return applyPersonalization(pool, userId, results);
});
}
// Add AI summaries if requested
if (summarize) {
pipeline = pipeline.then(function(results) {
return summarizeTopResults(query, results);
});
}
return pipeline.then(function(finalResults) {
// Log the search
var vecStr = vectorToString(searchResult.embedding);
pool.query(
"INSERT INTO search_logs (user_id, query_text, query_embedding, expanded_query, result_count, session_id) " +
"VALUES ($1, $2, $3::vector, $4, $5, $6) RETURNING id",
[userId, query, vecStr, searchResult.expandedQuery, finalResults.length, sessionId]
).then(function(logResult) {
res.json({
query: query,
expandedQuery: searchResult.expandedQuery,
results: finalResults.map(function(r) {
return {
id: r.id,
title: r.title,
excerpt: r.excerpt,
category: r.category,
tags: r.tags,
score: r.personalizedScore || r.finalScore,
aiSummary: r.aiSummary || null
};
}),
resultCount: finalResults.length,
sessionId: sessionId,
searchLogId: logResult.rows[0].id
});
});
});
}).catch(function(err) {
console.error("Search error:", err);
res.status(500).json({ error: "Search failed" });
});
});
// Click tracking endpoint
app.post("/api/search/click", function(req, res) {
var searchLogId = req.body.searchLogId;
var contentId = req.body.contentId;
var position = req.body.position;
pool.query(
"UPDATE search_logs SET clicked_result_id = $1, clicked_position = $2 WHERE id = $3",
[contentId, position, searchLogId]
).then(function() {
// Also log the interaction for personalization
if (req.body.userId) {
return pool.query(
"INSERT INTO user_interactions (user_id, content_id, interaction_type) VALUES ($1, $2, 'click')",
[req.body.userId, contentId]
);
}
}).then(function() {
res.json({ success: true });
}).catch(function(err) {
console.error("Click tracking error:", err);
res.status(500).json({ error: "Failed to track click" });
});
});
// More like this endpoint
app.get("/api/content/:id/similar", function(req, res) {
var contentId = req.params.id;
var limit = parseInt(req.query.limit) || 5;
pool.query(
"SELECT b.id, b.title, LEFT(b.body, 300) AS excerpt, b.category, " +
"1 - (a.embedding <=> b.embedding) AS similarity " +
"FROM content a, content b " +
"WHERE a.id = $1 AND b.id != $1 AND b.embedding IS NOT NULL " +
"ORDER BY a.embedding <=> b.embedding LIMIT $2",
[contentId, limit]
).then(function(result) {
res.json({ similar: result.rows });
}).catch(function(err) {
console.error("Similar content error:", err);
res.status(500).json({ error: "Failed to find similar content" });
});
});
// Autocomplete endpoint
app.get("/api/search/autocomplete", function(req, res) {
var partial = req.query.q;
if (!partial || partial.length < 2) {
return res.json({ suggestions: [] });
}
pool.query(
"SELECT query_text, COUNT(*) AS cnt FROM search_logs " +
"WHERE query_text ILIKE $1 " +
"GROUP BY query_text ORDER BY cnt DESC LIMIT 8",
[partial + "%"]
).then(function(result) {
res.json({
suggestions: result.rows.map(function(r) { return r.query_text; })
});
}).catch(function(err) {
console.error("Autocomplete error:", err);
res.status(500).json({ error: "Autocomplete failed" });
});
});
// Content indexing endpoint
app.post("/api/content/index", function(req, res) {
var item = req.body;
var textForEmbedding = item.title + "\n\n" + (item.body || "").substring(0, 4000);
generateEmbedding(textForEmbedding).then(function(embedding) {
var vecStr = vectorToString(embedding);
return pool.query(
"INSERT INTO content (title, body, category, tags, embedding) " +
"VALUES ($1, $2, $3, $4, $5::vector) " +
"ON CONFLICT (id) DO UPDATE SET " +
"title = EXCLUDED.title, body = EXCLUDED.body, " +
"category = EXCLUDED.category, tags = EXCLUDED.tags, " +
"embedding = EXCLUDED.embedding, updated_at = NOW() " +
"RETURNING id",
[item.title, item.body, item.category, item.tags, vecStr]
);
}).then(function(result) {
res.json({ id: result.rows[0].id, indexed: true });
}).catch(function(err) {
console.error("Indexing error:", err);
res.status(500).json({ error: "Failed to index content" });
});
});
var PORT = process.env.PORT || 3000;
app.listen(PORT, function() {
console.log("AI Search Service running on port " + PORT);
});
Test it with curl:
# Index some content
curl -X POST http://localhost:3000/api/content/index \
-H "Content-Type: application/json" \
-d '{"title": "Getting Started with Express.js", "body": "Express is a minimal and flexible Node.js web application framework...", "category": "backend-development", "tags": ["nodejs", "express", "web"]}'
# Search with summarization
curl "http://localhost:3000/api/search?q=build+a+web+server&summarize=true"
# Get similar content
curl "http://localhost:3000/api/content/CONTENT_UUID/similar?limit=3"
# Track a click
curl -X POST http://localhost:3000/api/search/click \
-H "Content-Type: application/json" \
-d '{"searchLogId": "LOG_UUID", "contentId": "CONTENT_UUID", "position": 1}'
Common Issues and Troubleshooting
1. pgvector Extension Not Found
ERROR: could not open extension control file "/usr/share/postgresql/15/extension/vector.control": No such file or directory
The pgvector extension is not installed at the system level. On Ubuntu/Debian:
sudo apt install postgresql-15-pgvector
On macOS with Homebrew:
brew install pgvector
After installing the system package, connect to your database and run CREATE EXTENSION vector;.
2. Embedding Dimension Mismatch
ERROR: expected 1536 dimensions, not 3072
This happens when you switch between embedding models. text-embedding-3-small produces 1536 dimensions while text-embedding-3-large produces 3072. Your column type must match the model you are using. If you need to switch models, you must re-embed all existing content:
ALTER TABLE content ALTER COLUMN embedding TYPE vector(3072);
-- Then re-run embeddings for all rows
3. OpenAI Rate Limiting During Bulk Indexing
Error: 429 Rate limit reached for text-embedding-3-small
When indexing thousands of documents, you will hit rate limits. Add a simple throttle:
function batchEmbed(items, batchSize, delayMs) {
var batches = [];
for (var i = 0; i < items.length; i += batchSize) {
batches.push(items.slice(i, i + batchSize));
}
return batches.reduce(function(chain, batch, index) {
return chain.then(function(allResults) {
return openai.embeddings.create({
model: "text-embedding-3-small",
input: batch.map(function(item) { return item.text.substring(0, 8000); })
}).then(function(response) {
return new Promise(function(resolve) {
setTimeout(function() {
resolve(allResults.concat(response.data));
}, delayMs);
});
});
});
}, Promise.resolve([]));
}
The embeddings API accepts batch inputs (up to 2048 per request), which is significantly faster than individual calls.
4. Slow Semantic Search on Large Tables
Query returned in 4200ms (expected < 200ms)
If EXPLAIN ANALYZE shows a sequential scan instead of an index scan, the ivfflat index needs retuning. The lists parameter should be roughly the square root of your row count:
-- For ~1 million rows
DROP INDEX content_embedding_idx;
CREATE INDEX content_embedding_idx ON content USING ivfflat (embedding vector_cosine_ops) WITH (lists = 1000);
Also set the probes parameter at query time for the recall-speed tradeoff:
SET ivfflat.probes = 10; -- Higher = better recall, slower
5. JSON Parse Errors from LLM Responses
SyntaxError: Unexpected token 'H' at position 0
LLMs occasionally return markdown-wrapped JSON (json ... ) or explanatory text instead of pure JSON. Always wrap your parsing in try-catch and have a fallback:
function safeParseLLMJSON(text) {
// Strip markdown code fences if present
var cleaned = text.replace(/```json\n?/g, "").replace(/```\n?/g, "").trim();
try {
return JSON.parse(cleaned);
} catch (e) {
console.warn("Failed to parse LLM response:", text.substring(0, 100));
return null;
}
}
Best Practices
Embed at ingest time, not query time for content. Content embeddings are computed once and stored. Only the query embedding needs to be generated at search time. This keeps search latency under 200ms for the embedding step.
Use hybrid search, never pure semantic or pure keyword. Semantic search handles vocabulary mismatch but misses exact matches. Keyword search is precise but brittle. Combining them gives you the best of both worlds. Start with a 50/25 semantic/keyword split and adjust based on your analytics.
Cache aggressively. Query embeddings, LLM-generated summaries, and expanded queries should all be cached. Use a TTL of 1 hour for embeddings (queries do not change meaning over time) and 24 hours for summaries.
Degrade gracefully when AI services are down. If the embedding API is unavailable, fall back to full-text search. If the LLM is down, skip query expansion and summarization. Never let an AI feature outage take down your entire search.
Log everything but analyze weekly. Capture every query, click, and zero-result event. But do not react to daily fluctuations. Analyze trends weekly and make ranking changes based on at least two weeks of data.
Keep personalization subtle. A 10-15% maximum boost from personalization is enough to surface relevant content without creating filter bubbles. Users should still discover content outside their usual interests.
Re-index embeddings when you change models. Mixing embeddings from different models in the same column produces garbage similarity scores. When upgrading models, re-embed your entire corpus.
Set a similarity threshold. Do not return results with cosine similarity below 0.3 for general search. Low-similarity results feel random and erode user trust. Better to show fewer high-quality results than many irrelevant ones.
Rate-limit LLM-powered features per user. Query expansion and summarization involve LLM calls that cost money. Set per-user rate limits (e.g., 100 searches with summarization per day) to prevent abuse.
Test with real queries from your search logs. Synthetic test queries do not capture how real users search. Export your top 100 queries and measure relevance before and after changes to your ranking algorithm.
References
- pgvector Documentation — PostgreSQL vector similarity extension
- OpenAI Embeddings Guide — Embedding models and best practices
- OpenAI Text Embedding 3 Models — Model dimensions and pricing
- PostgreSQL Full Text Search — Built-in text search for hybrid approaches
- Information Retrieval: Implementing and Evaluating Search Engines — Foundational IR concepts
- Nearest Neighbor Indexes for Similarity Search — Understanding ANN index types