Building an AI Knowledge Base
Build an AI-powered knowledge base with semantic search, Q&A with citations, auto-categorization, and admin tools in Node.js.
Building an AI Knowledge Base
Overview
Traditional wikis and documentation portals are passive repositories. They sit there waiting for someone to search the right keyword or stumble onto the right page. An AI-powered knowledge base flips that model entirely — it understands questions in natural language, retrieves relevant context from across your entire document corpus, and generates precise answers with citations back to source material. In this article, we will build one from scratch using Node.js, PostgreSQL with pgvector, and an LLM for retrieval-augmented generation.
Prerequisites
- Node.js v18 or later
- PostgreSQL 15+ with the pgvector extension installed
- An OpenAI API key (or any compatible embeddings and chat API)
- Basic familiarity with Express.js and SQL
- A collection of documents you want to make searchable (Markdown files, HTML pages, PDFs, or database records)
What an AI Knowledge Base Provides Beyond Traditional Wikis
A conventional wiki gives you keyword search, a folder hierarchy, and maybe some tags. That is table stakes. An AI knowledge base adds three capabilities that fundamentally change how people interact with organizational knowledge.
Semantic search means users do not need to guess the exact terminology used in the documentation. A query like "how do we handle customer refunds" will match a document titled "Return Policy and Credit Processing Workflow" even though the words barely overlap. The system encodes meaning, not just tokens.
Question-and-answer generation means users get direct answers synthesized from multiple source documents, not a list of ten links to skim through. The system retrieves the three or four most relevant chunks, feeds them to an LLM as context, and produces a coherent answer with citations pointing back to the original documents.
Automatic categorization and tagging means new documents ingested into the system get classified without anyone manually assigning metadata. The LLM reads the content and assigns categories, tags, and even a difficulty level. This keeps the knowledge base organized at scale without requiring a dedicated librarian.
Designing the Knowledge Base Architecture
The architecture follows a four-stage pipeline: ingest, store, retrieve, generate.
Documents (files, URLs, APIs)
|
v
[ Ingestion Pipeline ]
- Extract text
- Chunk into segments
- Generate embeddings
- Classify and tag
|
v
[ Storage Layer ]
- PostgreSQL + pgvector
- Document metadata
- Chunk embeddings
- Access control records
|
v
[ Retrieval Engine ]
- Semantic similarity search
- Metadata filtering
- Re-ranking
|
v
[ Generation Layer ]
- LLM with retrieved context
- Answer with citations
- Follow-up conversation
Each stage is independent and can be scaled or replaced without affecting the others. The ingestion pipeline runs asynchronously. The storage layer is PostgreSQL because pgvector gives us vector similarity search without introducing a separate vector database. The retrieval engine combines vector search with traditional filtering. The generation layer wraps the LLM call with prompt engineering for citation accuracy.
Implementing Document Ingestion from Multiple Sources
The ingestion pipeline needs to handle heterogeneous input. Documents come from files on disk, URLs scraped from internal sites, API responses from tools like Confluence or Notion, and rows from existing databases. Here is the core ingestion module:
var crypto = require("crypto");
var fs = require("fs");
var path = require("path");
var { Pool } = require("pg");
var OpenAI = require("openai");
var pool = new Pool({ connectionString: process.env.POSTGRES_URL });
var openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
var CHUNK_SIZE = 1500;
var CHUNK_OVERLAP = 200;
function splitIntoChunks(text, chunkSize, overlap) {
var chunks = [];
var start = 0;
while (start < text.length) {
var end = Math.min(start + chunkSize, text.length);
var chunk = text.substring(start, end);
// Try to break at a paragraph or sentence boundary
if (end < text.length) {
var lastParagraph = chunk.lastIndexOf("\n\n");
var lastSentence = chunk.lastIndexOf(". ");
var breakPoint = lastParagraph > chunkSize * 0.5 ? lastParagraph :
lastSentence > chunkSize * 0.5 ? lastSentence + 1 : end;
chunk = text.substring(start, start + breakPoint);
end = start + breakPoint;
}
chunks.push({
text: chunk.trim(),
startOffset: start,
endOffset: end
});
start = end - overlap;
}
return chunks.filter(function(c) { return c.text.length > 50; });
}
function generateDocumentHash(content) {
return crypto.createHash("sha256").update(content).digest("hex");
}
function ingestDocument(source, content, metadata) {
var hash = generateDocumentHash(content);
return pool.query(
"SELECT id FROM documents WHERE content_hash = $1",
[hash]
).then(function(existing) {
if (existing.rows.length > 0) {
console.log("Document already ingested, skipping:", source);
return { skipped: true, id: existing.rows[0].id };
}
return pool.query(
"INSERT INTO documents (source, title, content, content_hash, metadata, ingested_at) VALUES ($1, $2, $3, $4, $5, NOW()) RETURNING id",
[source, metadata.title || path.basename(source), content, hash, JSON.stringify(metadata)]
).then(function(docResult) {
var docId = docResult.rows[0].id;
var chunks = splitIntoChunks(content, CHUNK_SIZE, CHUNK_OVERLAP);
return generateEmbeddings(chunks.map(function(c) { return c.text; }))
.then(function(embeddings) {
var insertPromises = chunks.map(function(chunk, i) {
return pool.query(
"INSERT INTO document_chunks (document_id, chunk_index, content, embedding, start_offset, end_offset) VALUES ($1, $2, $3, $4, $5, $6)",
[docId, i, chunk.text, JSON.stringify(embeddings[i]), chunk.startOffset, chunk.endOffset]
);
});
return Promise.all(insertPromises).then(function() {
return { skipped: false, id: docId, chunks: chunks.length };
});
});
});
});
}
function generateEmbeddings(texts) {
var batches = [];
var batchSize = 100;
for (var i = 0; i < texts.length; i += batchSize) {
batches.push(texts.slice(i, i + batchSize));
}
return batches.reduce(function(promise, batch) {
return promise.then(function(allEmbeddings) {
return openai.embeddings.create({
model: "text-embedding-3-small",
input: batch
}).then(function(response) {
var batchEmbeddings = response.data.map(function(d) { return d.embedding; });
return allEmbeddings.concat(batchEmbeddings);
});
});
}, Promise.resolve([]));
}
For ingesting from different sources, we build adapters:
var axios = require("axios");
var cheerio = require("cheerio");
function ingestFromFile(filePath) {
var content = fs.readFileSync(filePath, "utf-8");
var ext = path.extname(filePath).toLowerCase();
var title = path.basename(filePath, ext);
return ingestDocument(filePath, content, {
title: title,
sourceType: "file",
fileType: ext
});
}
function ingestFromURL(url) {
return axios.get(url).then(function(response) {
var $ = cheerio.load(response.data);
// Remove scripts, styles, nav elements
$("script, style, nav, footer, header").remove();
var title = $("title").text() || $("h1").first().text() || url;
var content = $("article, main, .content, body").first().text();
content = content.replace(/\s+/g, " ").trim();
return ingestDocument(url, content, {
title: title,
sourceType: "url",
fetchedAt: new Date().toISOString()
});
});
}
function ingestFromAPI(endpoint, headers, extractFn) {
return axios.get(endpoint, { headers: headers }).then(function(response) {
var documents = extractFn(response.data);
return Promise.all(documents.map(function(doc) {
return ingestDocument(endpoint + "#" + doc.id, doc.content, {
title: doc.title,
sourceType: "api",
externalId: doc.id
});
}));
});
}
Automatic Categorization and Tagging with LLMs
Once a document is ingested, we run it through an LLM to assign categories and tags automatically. This is cheaper than you might think — a single classification call on a document summary costs a fraction of a cent.
function classifyDocument(docId) {
return pool.query("SELECT id, title, content FROM documents WHERE id = $1", [docId])
.then(function(result) {
var doc = result.rows[0];
// Use first 3000 chars for classification to keep costs low
var sample = doc.content.substring(0, 3000);
return openai.chat.completions.create({
model: "gpt-4o-mini",
response_format: { type: "json_object" },
messages: [
{
role: "system",
content: "You are a document classifier. Analyze the document and return JSON with: category (string), subcategory (string), tags (array of strings, max 8), difficulty (beginner/intermediate/advanced), summary (2-3 sentences)."
},
{
role: "user",
content: "Title: " + doc.title + "\n\nContent:\n" + sample
}
]
});
})
.then(function(response) {
var classification = JSON.parse(response.choices[0].message.content);
return pool.query(
"UPDATE documents SET category = $1, subcategory = $2, tags = $3, difficulty = $4, summary = $5, classified_at = NOW() WHERE id = $6",
[classification.category, classification.subcategory, JSON.stringify(classification.tags), classification.difficulty, classification.summary, docId]
).then(function() {
return classification;
});
});
}
I have found that using a smaller model like GPT-4o-mini for classification works just as well as the full model for this task. Save the expensive model for answer generation where nuance matters.
Building the Q&A Interface
This is the core of the knowledge base — the question-in, answer-out pipeline. The flow is: embed the question, find similar chunks, feed them as context to the LLM, and return a generated answer with citations.
function answerQuestion(question, options) {
var opts = options || {};
var topK = opts.topK || 5;
var categoryFilter = opts.category || null;
return generateEmbeddings([question]).then(function(embeddings) {
var questionEmbedding = JSON.stringify(embeddings[0]);
var query = "SELECT dc.id, dc.content, dc.chunk_index, d.id AS doc_id, d.title, d.source, " +
"1 - (dc.embedding <=> $1::vector) AS similarity " +
"FROM document_chunks dc JOIN documents d ON dc.document_id = d.id " +
"WHERE 1=1 ";
var params = [questionEmbedding];
if (categoryFilter) {
query += " AND d.category = $" + (params.length + 1);
params.push(categoryFilter);
}
query += " ORDER BY dc.embedding <=> $1::vector LIMIT $" + (params.length + 1);
params.push(topK);
return pool.query(query, params);
}).then(function(chunkResults) {
var chunks = chunkResults.rows;
if (chunks.length === 0) {
return {
answer: "I could not find any relevant information to answer that question.",
citations: [],
confidence: 0
};
}
var context = chunks.map(function(chunk, i) {
return "[Source " + (i + 1) + ": " + chunk.title + "]\n" + chunk.content;
}).join("\n\n---\n\n");
return openai.chat.completions.create({
model: "gpt-4o",
messages: [
{
role: "system",
content: "You are a knowledge base assistant. Answer the user's question using ONLY the provided source documents. " +
"Cite your sources using [Source N] notation. If the sources do not contain enough information, say so. " +
"Do not make up information that is not in the sources. Be concise and direct."
},
{
role: "user",
content: "Sources:\n\n" + context + "\n\n---\n\nQuestion: " + question
}
],
temperature: 0.2
}).then(function(response) {
var answer = response.choices[0].message.content;
var citations = chunks.map(function(chunk) {
return {
title: chunk.title,
source: chunk.source,
similarity: parseFloat(chunk.similarity).toFixed(3),
excerpt: chunk.content.substring(0, 200) + "..."
};
});
// Log the query for analytics
logQuery(question, answer, citations);
return {
answer: answer,
citations: citations,
confidence: parseFloat(chunks[0].similarity)
};
});
});
}
Implementing Conversational Knowledge Exploration
A single question is useful, but real knowledge exploration requires follow-up. The user asks "How does our auth system work?", gets an answer, then asks "What about refresh tokens specifically?" The system needs to maintain conversation context while still grounding every answer in source documents.
var conversationStore = {};
function conversationalAnswer(sessionId, question) {
if (!conversationStore[sessionId]) {
conversationStore[sessionId] = {
messages: [],
createdAt: Date.now()
};
}
var session = conversationStore[sessionId];
session.messages.push({ role: "user", content: question });
// Build a standalone question from the conversation history
var standalonePromise;
if (session.messages.length > 2) {
standalonePromise = openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [
{
role: "system",
content: "Rewrite the user's latest question as a standalone question that includes necessary context from the conversation. Return only the rewritten question."
}
].concat(session.messages.slice(-6)),
temperature: 0
}).then(function(r) { return r.choices[0].message.content; });
} else {
standalonePromise = Promise.resolve(question);
}
return standalonePromise.then(function(standaloneQuestion) {
return answerQuestion(standaloneQuestion).then(function(result) {
session.messages.push({ role: "assistant", content: result.answer });
result.standaloneQuestion = standaloneQuestion;
return result;
});
});
}
// Clean up stale sessions every hour
setInterval(function() {
var cutoff = Date.now() - (60 * 60 * 1000);
Object.keys(conversationStore).forEach(function(key) {
if (conversationStore[key].createdAt < cutoff) {
delete conversationStore[key];
}
});
}, 60 * 60 * 1000);
The key insight here is the "standalone question rewrite" step. Without it, the embedding search for "What about refresh tokens?" would not know we are talking about authentication. By rewriting the follow-up into a self-contained question, we get much better retrieval results.
Knowledge Gap Detection
One of the most valuable features of an AI knowledge base is telling you what you do not know. When users ask questions that produce low-confidence answers, those represent gaps in your documentation. Track them systematically.
function logQuery(question, answer, citations) {
var confidence = citations.length > 0 ? parseFloat(citations[0].similarity) : 0;
return pool.query(
"INSERT INTO query_log (question, answer, top_similarity, citation_count, created_at) VALUES ($1, $2, $3, $4, NOW())",
[question, answer, confidence, citations.length]
);
}
function detectKnowledgeGaps(options) {
var opts = options || {};
var threshold = opts.similarityThreshold || 0.65;
var minOccurrences = opts.minOccurrences || 3;
var days = opts.days || 30;
return pool.query(
"SELECT question, top_similarity, COUNT(*) as occurrences " +
"FROM query_log " +
"WHERE top_similarity < $1 AND created_at > NOW() - INTERVAL '" + days + " days' " +
"GROUP BY question, top_similarity " +
"HAVING COUNT(*) >= $2 " +
"ORDER BY occurrences DESC",
[threshold, minOccurrences]
).then(function(result) {
// Cluster similar questions using the LLM
if (result.rows.length === 0) return [];
var questions = result.rows.map(function(r) { return r.question; });
return openai.chat.completions.create({
model: "gpt-4o-mini",
response_format: { type: "json_object" },
messages: [
{
role: "system",
content: "Group these questions by topic. Return JSON with an array of objects: { topic, questions (array), suggestedDocTitle, priority (high/medium/low) }."
},
{
role: "user",
content: "Questions that users asked but we could not answer well:\n" + questions.join("\n")
}
]
}).then(function(response) {
return JSON.parse(response.choices[0].message.content);
});
});
}
Run this weekly and feed the results into your content planning process. The knowledge base tells you exactly what documentation to write next.
Auto-Generating FAQ Sections
Common queries naturally cluster around certain topics. We can automatically generate FAQ documents from these clusters and add them back into the knowledge base.
function generateFAQ(category, limit) {
var faqLimit = limit || 20;
return pool.query(
"SELECT question, answer FROM query_log " +
"WHERE top_similarity > 0.75 " +
"ORDER BY created_at DESC LIMIT $1",
[faqLimit * 3]
).then(function(result) {
return openai.chat.completions.create({
model: "gpt-4o",
response_format: { type: "json_object" },
messages: [
{
role: "system",
content: "You are creating an FAQ document. Given these question-answer pairs, deduplicate and organize them into a clean FAQ. Return JSON with: { title, sections: [{ heading, items: [{ question, answer }] }] }. Limit to " + faqLimit + " total items."
},
{
role: "user",
content: JSON.stringify(result.rows)
}
]
});
}).then(function(response) {
var faq = JSON.parse(response.choices[0].message.content);
// Convert FAQ to markdown and ingest it back
var markdown = "# " + faq.title + "\n\n";
faq.sections.forEach(function(section) {
markdown += "## " + section.heading + "\n\n";
section.items.forEach(function(item) {
markdown += "### " + item.question + "\n\n" + item.answer + "\n\n";
});
});
return ingestDocument("auto-faq:" + category, markdown, {
title: faq.title,
sourceType: "generated-faq",
category: category,
autoGenerated: true
}).then(function() { return faq; });
});
}
Implementing Access Control and Permissions
Not all knowledge should be accessible to everyone. Engineering runbooks should not be visible to the sales team, and HR policies might have restricted sections. We implement document-level and chunk-level access control.
function answerWithAccessControl(question, userId) {
return getUserPermissions(userId).then(function(permissions) {
return generateEmbeddings([question]).then(function(embeddings) {
var questionEmbedding = JSON.stringify(embeddings[0]);
var accessGroups = permissions.groups.map(function(g) { return "'" + g + "'"; }).join(",");
var query = "SELECT dc.content, d.title, d.source, " +
"1 - (dc.embedding <=> $1::vector) AS similarity " +
"FROM document_chunks dc " +
"JOIN documents d ON dc.document_id = d.id " +
"LEFT JOIN document_access da ON d.id = da.document_id " +
"WHERE (da.access_group IN (" + accessGroups + ") OR da.access_group IS NULL) " +
"ORDER BY dc.embedding <=> $1::vector LIMIT 5";
return pool.query(query, [questionEmbedding]);
}).then(function(results) {
// Generate answer from permitted chunks only
return generateAnswer(question, results.rows);
});
});
}
function getUserPermissions(userId) {
return pool.query(
"SELECT ug.group_name FROM user_groups ug " +
"JOIN user_memberships um ON ug.id = um.group_id " +
"WHERE um.user_id = $1",
[userId]
).then(function(result) {
return {
groups: result.rows.map(function(r) { return r.group_name; })
};
});
}
Knowledge Freshness Management
Documents go stale. API documentation from six months ago might reference deprecated endpoints. The knowledge base needs to track freshness and surface documents that need review.
function checkFreshness() {
var staleThresholdDays = 90;
return pool.query(
"SELECT id, title, source, source, ingested_at, last_verified_at, " +
"EXTRACT(DAY FROM NOW() - COALESCE(last_verified_at, ingested_at)) AS days_stale " +
"FROM documents " +
"WHERE EXTRACT(DAY FROM NOW() - COALESCE(last_verified_at, ingested_at)) > $1 " +
"ORDER BY days_stale DESC",
[staleThresholdDays]
).then(function(result) {
return result.rows.map(function(doc) {
return {
id: doc.id,
title: doc.title,
source: doc.source,
daysStale: parseInt(doc.days_stale),
action: doc.days_stale > 180 ? "urgent_review" : "review"
};
});
});
}
function refreshDocument(docId) {
return pool.query("SELECT source, metadata FROM documents WHERE id = $1", [docId])
.then(function(result) {
var doc = result.rows[0];
var meta = JSON.parse(doc.metadata || "{}");
if (meta.sourceType === "url") {
return ingestFromURL(doc.source).then(function(newDoc) {
// Archive old version, replace with new
return pool.query("UPDATE documents SET archived = true WHERE id = $1", [docId])
.then(function() { return newDoc; });
});
}
return { message: "Manual refresh required for source type: " + meta.sourceType };
});
}
Analytics: Understanding How Knowledge Is Used
Track everything. Popular topics tell you what to expand. Unanswered questions tell you what to write. User satisfaction scores tell you whether the system is actually helping.
function getAnalytics(days) {
var period = days || 30;
var queries = {};
queries.totalQueries = pool.query(
"SELECT COUNT(*) as total, AVG(top_similarity) as avg_confidence " +
"FROM query_log WHERE created_at > NOW() - INTERVAL '" + period + " days'"
);
queries.topTopics = pool.query(
"SELECT d.category, COUNT(*) as query_count " +
"FROM query_log ql " +
"JOIN document_chunks dc ON dc.content ILIKE '%' || LEFT(ql.question, 30) || '%' " +
"JOIN documents d ON dc.document_id = d.id " +
"WHERE ql.created_at > NOW() - INTERVAL '" + period + " days' " +
"GROUP BY d.category ORDER BY query_count DESC LIMIT 10"
);
queries.unansweredRate = pool.query(
"SELECT COUNT(*) FILTER (WHERE top_similarity < 0.6) as unanswered, " +
"COUNT(*) as total " +
"FROM query_log WHERE created_at > NOW() - INTERVAL '" + period + " days'"
);
queries.satisfaction = pool.query(
"SELECT AVG(rating) as avg_rating, COUNT(*) as rated_count " +
"FROM query_feedback WHERE created_at > NOW() - INTERVAL '" + period + " days'"
);
var keys = Object.keys(queries);
return Promise.all(keys.map(function(k) { return queries[k]; }))
.then(function(results) {
var analytics = {};
keys.forEach(function(key, i) {
analytics[key] = results[i].rows;
});
return analytics;
});
}
Integrating with Slack and Teams
The knowledge base becomes dramatically more useful when people can query it without leaving their communication tools. Here is a Slack integration using the Bolt framework:
var { App } = require("@slack/bolt");
var slackApp = new App({
token: process.env.SLACK_BOT_TOKEN,
signingSecret: process.env.SLACK_SIGNING_SECRET
});
slackApp.message(function(args) {
var message = args.message;
var say = args.say;
// Only respond to direct mentions or DMs
if (message.channel_type !== "im" && !message.text.includes("<@" + slackApp.botUserId + ">")) {
return;
}
var question = message.text.replace(/<@[A-Z0-9]+>/g, "").trim();
if (!question) return;
answerQuestion(question, { topK: 3 }).then(function(result) {
var blocks = [
{
type: "section",
text: { type: "mrkdwn", text: result.answer }
},
{ type: "divider" }
];
if (result.citations.length > 0) {
var citationText = result.citations.map(function(c, i) {
return (i + 1) + ". *" + c.title + "* (relevance: " + c.similarity + ")";
}).join("\n");
blocks.push({
type: "context",
elements: [{ type: "mrkdwn", text: "Sources:\n" + citationText }]
});
}
say({ blocks: blocks });
}).catch(function(err) {
console.error("Slack answer error:", err);
say("Sorry, I ran into an error looking that up. Please try again.");
});
});
For Microsoft Teams, the pattern is similar using the Bot Framework SDK. The key difference is the message format — Teams uses Adaptive Cards instead of Slack blocks.
Exporting and Syncing with External Systems
A knowledge base should not be a silo. Provide export endpoints so other systems can consume your curated knowledge:
function exportToJSON(options) {
var opts = options || {};
var category = opts.category;
var query = "SELECT id, title, source, category, tags, summary, content, ingested_at FROM documents WHERE archived IS NOT TRUE";
var params = [];
if (category) {
query += " AND category = $1";
params.push(category);
}
query += " ORDER BY ingested_at DESC";
return pool.query(query, params).then(function(result) {
return {
exportedAt: new Date().toISOString(),
count: result.rows.length,
documents: result.rows
};
});
}
function syncToConfluence(spaceKey, parentPageId) {
return exportToJSON().then(function(data) {
return data.documents.reduce(function(chain, doc) {
return chain.then(function() {
return axios.put(
process.env.CONFLUENCE_URL + "/rest/api/content",
{
type: "page",
title: doc.title,
space: { key: spaceKey },
ancestors: [{ id: parentPageId }],
body: { storage: { value: doc.content, representation: "wiki" } }
},
{
headers: {
"Authorization": "Basic " + Buffer.from(process.env.CONFLUENCE_USER + ":" + process.env.CONFLUENCE_TOKEN).toString("base64"),
"Content-Type": "application/json"
}
}
);
});
}, Promise.resolve());
});
}
Building an Admin Interface for Knowledge Curation
The admin dashboard gives knowledge managers visibility into what the system contains, what needs attention, and how well it is performing. We expose this through Express routes that serve a simple dashboard.
var express = require("express");
var adminRouter = express.Router();
adminRouter.get("/dashboard", function(req, res) {
Promise.all([
pool.query("SELECT COUNT(*) as total, COUNT(*) FILTER (WHERE classified_at IS NOT NULL) as classified FROM documents WHERE archived IS NOT TRUE"),
pool.query("SELECT COUNT(*) as total FROM document_chunks"),
checkFreshness(),
detectKnowledgeGaps({ days: 30 }),
getAnalytics(30)
]).then(function(results) {
res.json({
documents: results[0].rows[0],
chunks: results[1].rows[0],
staleDocuments: results[2].slice(0, 10),
knowledgeGaps: results[3],
analytics: results[4]
});
}).catch(function(err) {
res.status(500).json({ error: err.message });
});
});
adminRouter.post("/documents/:id/verify", function(req, res) {
pool.query(
"UPDATE documents SET last_verified_at = NOW(), verified_by = $1 WHERE id = $2",
[req.user.id, req.params.id]
).then(function() {
res.json({ success: true });
});
});
adminRouter.post("/documents/:id/refresh", function(req, res) {
refreshDocument(parseInt(req.params.id)).then(function(result) {
res.json(result);
}).catch(function(err) {
res.status(500).json({ error: err.message });
});
});
adminRouter.delete("/documents/:id", function(req, res) {
pool.query("UPDATE documents SET archived = true WHERE id = $1", [req.params.id])
.then(function() {
return pool.query("DELETE FROM document_chunks WHERE document_id = $1", [req.params.id]);
})
.then(function() {
res.json({ success: true });
});
});
Complete Working Example
Here is the full Express application that ties everything together. This gives you a working API with document ingestion, semantic Q&A, conversational follow-ups, and an admin dashboard.
First, the database schema:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
source TEXT NOT NULL,
title TEXT NOT NULL,
content TEXT NOT NULL,
content_hash VARCHAR(64) UNIQUE NOT NULL,
category VARCHAR(100),
subcategory VARCHAR(100),
tags JSONB DEFAULT '[]',
difficulty VARCHAR(20),
summary TEXT,
metadata JSONB DEFAULT '{}',
archived BOOLEAN DEFAULT FALSE,
classified_at TIMESTAMP,
last_verified_at TIMESTAMP,
verified_by INTEGER,
ingested_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE document_chunks (
id SERIAL PRIMARY KEY,
document_id INTEGER REFERENCES documents(id) ON DELETE CASCADE,
chunk_index INTEGER NOT NULL,
content TEXT NOT NULL,
embedding vector(1536),
start_offset INTEGER,
end_offset INTEGER
);
CREATE INDEX idx_chunks_embedding ON document_chunks
USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
CREATE TABLE document_access (
id SERIAL PRIMARY KEY,
document_id INTEGER REFERENCES documents(id) ON DELETE CASCADE,
access_group VARCHAR(100) NOT NULL
);
CREATE TABLE query_log (
id SERIAL PRIMARY KEY,
question TEXT NOT NULL,
answer TEXT,
top_similarity FLOAT,
citation_count INTEGER,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE query_feedback (
id SERIAL PRIMARY KEY,
query_log_id INTEGER REFERENCES query_log(id),
rating INTEGER CHECK (rating BETWEEN 1 AND 5),
comment TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE user_groups (
id SERIAL PRIMARY KEY,
group_name VARCHAR(100) UNIQUE NOT NULL
);
CREATE TABLE user_memberships (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL,
group_id INTEGER REFERENCES user_groups(id)
);
Now the Express server:
var express = require("express");
var bodyParser = require("body-parser");
var { Pool } = require("pg");
var OpenAI = require("openai");
var crypto = require("crypto");
var path = require("path");
var fs = require("fs");
var app = express();
app.use(bodyParser.json({ limit: "10mb" }));
var pool = new Pool({ connectionString: process.env.POSTGRES_URL });
var openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
var CHUNK_SIZE = 1500;
var CHUNK_OVERLAP = 200;
// --- Utility functions ---
function splitIntoChunks(text) {
var chunks = [];
var start = 0;
while (start < text.length) {
var end = Math.min(start + CHUNK_SIZE, text.length);
var chunk = text.substring(start, end);
if (end < text.length) {
var breakAt = chunk.lastIndexOf("\n\n");
if (breakAt < CHUNK_SIZE * 0.5) breakAt = chunk.lastIndexOf(". ");
if (breakAt > CHUNK_SIZE * 0.5) {
chunk = text.substring(start, start + breakAt + 1);
end = start + breakAt + 1;
}
}
if (chunk.trim().length > 50) {
chunks.push({ text: chunk.trim(), start: start, end: end });
}
start = end - CHUNK_OVERLAP;
}
return chunks;
}
function embed(texts) {
return openai.embeddings.create({
model: "text-embedding-3-small",
input: texts
}).then(function(r) {
return r.data.map(function(d) { return d.embedding; });
});
}
// --- API Routes ---
// Ingest a document
app.post("/api/ingest", function(req, res) {
var title = req.body.title;
var content = req.body.content;
var source = req.body.source || "manual";
var metadata = req.body.metadata || {};
var hash = crypto.createHash("sha256").update(content).digest("hex");
pool.query("SELECT id FROM documents WHERE content_hash = $1", [hash])
.then(function(existing) {
if (existing.rows.length > 0) {
return res.json({ message: "Already ingested", id: existing.rows[0].id });
}
return pool.query(
"INSERT INTO documents (source, title, content, content_hash, metadata) VALUES ($1, $2, $3, $4, $5) RETURNING id",
[source, title, content, hash, JSON.stringify(metadata)]
).then(function(docResult) {
var docId = docResult.rows[0].id;
var chunks = splitIntoChunks(content);
return embed(chunks.map(function(c) { return c.text; }))
.then(function(embeddings) {
var inserts = chunks.map(function(chunk, i) {
return pool.query(
"INSERT INTO document_chunks (document_id, chunk_index, content, embedding, start_offset, end_offset) VALUES ($1, $2, $3, $4, $5, $6)",
[docId, i, chunk.text, JSON.stringify(embeddings[i]), chunk.start, chunk.end]
);
});
return Promise.all(inserts);
})
.then(function() {
// Auto-classify
return openai.chat.completions.create({
model: "gpt-4o-mini",
response_format: { type: "json_object" },
messages: [
{ role: "system", content: "Classify this document. Return JSON: { category, tags (array), summary (2 sentences) }" },
{ role: "user", content: "Title: " + title + "\n\n" + content.substring(0, 2000) }
]
});
})
.then(function(classResult) {
var cls = JSON.parse(classResult.choices[0].message.content);
return pool.query(
"UPDATE documents SET category = $1, tags = $2, summary = $3, classified_at = NOW() WHERE id = $4",
[cls.category, JSON.stringify(cls.tags), cls.summary, docId]
).then(function() {
res.json({ id: docId, chunks: chunks.length, classification: cls });
});
});
});
})
.catch(function(err) {
console.error("Ingest error:", err);
res.status(500).json({ error: err.message });
});
});
// Ask a question
app.post("/api/ask", function(req, res) {
var question = req.body.question;
var sessionId = req.body.sessionId;
var topK = req.body.topK || 5;
embed([question]).then(function(embeddings) {
var qVec = JSON.stringify(embeddings[0]);
return pool.query(
"SELECT dc.content, d.title, d.source, 1 - (dc.embedding <=> $1::vector) AS similarity " +
"FROM document_chunks dc JOIN documents d ON dc.document_id = d.id " +
"WHERE d.archived IS NOT TRUE " +
"ORDER BY dc.embedding <=> $1::vector LIMIT $2",
[qVec, topK]
);
}).then(function(results) {
var chunks = results.rows;
var context = chunks.map(function(c, i) {
return "[Source " + (i + 1) + ": " + c.title + "]\n" + c.content;
}).join("\n\n---\n\n");
return openai.chat.completions.create({
model: "gpt-4o",
messages: [
{
role: "system",
content: "Answer using ONLY the provided sources. Cite with [Source N]. If unsure, say so."
},
{ role: "user", content: "Sources:\n" + context + "\n\nQuestion: " + question }
],
temperature: 0.2
}).then(function(response) {
var answer = response.choices[0].message.content;
var citations = chunks.map(function(c) {
return { title: c.title, source: c.source, similarity: parseFloat(c.similarity).toFixed(3) };
});
// Log for analytics
pool.query(
"INSERT INTO query_log (question, answer, top_similarity, citation_count) VALUES ($1, $2, $3, $4)",
[question, answer, parseFloat(chunks[0].similarity), citations.length]
);
res.json({ answer: answer, citations: citations });
});
}).catch(function(err) {
console.error("Ask error:", err);
res.status(500).json({ error: err.message });
});
});
// Feedback on answers
app.post("/api/feedback", function(req, res) {
pool.query(
"INSERT INTO query_feedback (query_log_id, rating, comment) VALUES ($1, $2, $3)",
[req.body.queryId, req.body.rating, req.body.comment]
).then(function() {
res.json({ success: true });
}).catch(function(err) {
res.status(500).json({ error: err.message });
});
});
// Admin dashboard data
app.get("/api/admin/dashboard", function(req, res) {
Promise.all([
pool.query("SELECT COUNT(*) as total FROM documents WHERE archived IS NOT TRUE"),
pool.query("SELECT COUNT(*) as total FROM document_chunks"),
pool.query(
"SELECT id, title, source, EXTRACT(DAY FROM NOW() - COALESCE(last_verified_at, ingested_at)) AS days_stale " +
"FROM documents WHERE archived IS NOT TRUE " +
"AND EXTRACT(DAY FROM NOW() - COALESCE(last_verified_at, ingested_at)) > 90 " +
"ORDER BY days_stale DESC LIMIT 20"
),
pool.query(
"SELECT question, top_similarity FROM query_log " +
"WHERE top_similarity < 0.6 AND created_at > NOW() - INTERVAL '30 days' " +
"ORDER BY top_similarity ASC LIMIT 20"
),
pool.query(
"SELECT COUNT(*) as total, AVG(top_similarity) as avg_confidence, " +
"COUNT(*) FILTER (WHERE top_similarity < 0.6) as low_confidence " +
"FROM query_log WHERE created_at > NOW() - INTERVAL '30 days'"
)
]).then(function(results) {
res.json({
documents: results[0].rows[0],
chunks: results[1].rows[0],
staleDocuments: results[2].rows,
knowledgeGaps: results[3].rows,
queryStats: results[4].rows[0]
});
}).catch(function(err) {
res.status(500).json({ error: err.message });
});
});
var PORT = process.env.PORT || 3000;
app.listen(PORT, function() {
console.log("Knowledge base running on port " + PORT);
});
Install the required dependencies:
npm install express body-parser pg openai axios cheerio @slack/bolt
Common Issues and Troubleshooting
1. pgvector extension not found
ERROR: could not open extension control file
"/usr/share/postgresql/15/extension/vector.control": No such file or directory
This means pgvector is not installed on your PostgreSQL server. On Ubuntu, install it with sudo apt install postgresql-15-pgvector. On macOS with Homebrew, use brew install pgvector. On managed services like RDS, enable it through the parameter group and then run CREATE EXTENSION vector in your database.
2. Embedding dimension mismatch
ERROR: expected 1536 dimensions, not 3072
This occurs when you switch embedding models. text-embedding-3-small produces 1536-dimensional vectors, but text-embedding-3-large produces 3072. If you change models, you must re-embed all existing documents. You cannot mix embedding models within the same vector column. Drop and recreate the chunks table, update the column definition to match the new dimension, and re-run ingestion.
3. IVFFlat index requires training data
ERROR: cannot create index on empty table
The IVFFlat index type requires existing data to build its clustering structure. Insert at least a few hundred chunks before creating the index. Alternatively, use HNSW instead of IVFFlat — it does not require pre-existing data and generally provides better recall at the cost of more memory:
CREATE INDEX idx_chunks_embedding ON document_chunks
USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);
4. Context window exceeded during answer generation
Error: This model's maximum context length is 128000 tokens.
However, your messages resulted in 142385 tokens.
This happens when your retrieved chunks are too large or you are retrieving too many. Reduce topK from 5 to 3, reduce CHUNK_SIZE from 1500 to 1000, or switch to a model with a larger context window. You can also implement a re-ranking step that scores chunks by relevance and only passes the top results to the LLM.
5. Slow vector search on large datasets
If your similarity search takes more than 500ms with over 100,000 chunks, make sure your index is built and you are setting the appropriate probe count:
SET ivfflat.probes = 10; -- increase for better recall, decrease for speed
For HNSW indexes, increase ef_search:
SET hnsw.ef_search = 100; -- default is 40
Best Practices
Chunk at semantic boundaries, not arbitrary character counts. A chunk that starts mid-sentence and ends mid-paragraph produces poor embeddings. Break at paragraph boundaries, section headers, or complete thoughts. The extra logic in chunking pays dividends in retrieval quality.
Deduplicate before ingesting. The content hash check prevents exact duplicates, but you should also check for near-duplicates using embedding similarity. Two documents that are 95% similar will both get retrieved and waste context window space.
Use a smaller model for classification and a larger model for generation. GPT-4o-mini handles categorization and tagging perfectly. Save GPT-4o (or better) for the actual answer generation where reasoning quality directly impacts user trust.
Log every query, even the ones you answer well. Query logs are the single best source of information about what your users actually need. They tell you what to document next, what is confusing, and which documents are most valuable.
Set a confidence threshold and be honest when you do not know. An answer with 0.45 similarity to the best matching chunk is essentially a hallucination waiting to happen. Below 0.6 similarity, tell the user you could not find a good answer and suggest they contact a human expert.
Re-embed periodically as models improve. Embedding models get better over time. When OpenAI or another provider releases a new embedding model with better performance, re-embed your entire corpus. The improvement in retrieval quality is usually significant and the cost is minimal for most knowledge bases.
Version your documents, do not overwrite. When a document is updated, archive the old version and create a new one. This lets you track how knowledge evolves and roll back if a bad update gets ingested.
Implement rate limiting on the Q&A endpoint. Each query costs money (embedding call plus LLM call). Without rate limiting, a misbehaving client or automated script can rack up serious API costs in minutes.
References
- pgvector documentation - Vector similarity search for PostgreSQL
- OpenAI Embeddings guide - Text embedding models and best practices
- OpenAI Cookbook: Question answering using embeddings - Official RAG patterns
- Slack Bolt for JavaScript - Slack app framework documentation
- Chunking strategies for RAG - Analysis of different text splitting approaches
- HNSW vs IVFFlat indexes - Performance characteristics of different vector index types