Embeddings

Building a Knowledge Base with Embeddings

Build an embedding-powered knowledge base with document ingestion, semantic search, question-answering, and admin tools in Node.js.

Building a Knowledge Base with Embeddings

Overview

An embedding-powered knowledge base transforms your documentation from a keyword-matching filing cabinet into a system that actually understands what people are asking. Instead of requiring users to guess the exact terminology buried in your docs, semantic search finds relevant content based on meaning. I have built these systems for internal engineering wikis, customer support portals, and product documentation sites, and they consistently outperform traditional full-text search by a wide margin.

Prerequisites

  • Node.js 18+ installed
  • PostgreSQL 15+ with the pgvector extension enabled
  • An OpenAI API key (for generating embeddings)
  • An Anthropic API key (for question-answering with Claude)
  • Basic familiarity with Express.js and SQL
  • npm install express pg pgvector multer marked sanitize-html pdf-parse openai @anthropic-ai/sdk

What an Embedding-Powered Knowledge Base Provides

Traditional keyword search fails the moment someone phrases a question differently than the document author expected. If your documentation says "configuring environment variables" but a user searches "how do I set up my app settings," keyword search returns nothing. Embedding-based search maps both phrases into the same region of vector space because they carry similar meaning.

An embedding-powered knowledge base gives you:

  • Semantic search that understands synonyms, paraphrasing, and intent
  • Question-answering that retrieves relevant chunks and synthesizes an answer
  • Content gap analysis by tracking queries that return low-confidence results
  • Cross-document discovery where users find related content they did not know existed

The architecture is straightforward: documents go in, get chunked and embedded, and vector similarity search retrieves them at query time. The real work is in the details of chunking strategy, metadata modeling, and the retrieval-augmented generation pipeline.

Designing the Knowledge Base Schema

The schema needs to handle documents, their chunks, embeddings, metadata, and access control. Here is the PostgreSQL schema with pgvector:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE kb_documents (
    id SERIAL PRIMARY KEY,
    title VARCHAR(500) NOT NULL,
    source_url TEXT,
    format VARCHAR(20) NOT NULL DEFAULT 'markdown',
    category VARCHAR(100),
    tags TEXT[] DEFAULT '{}',
    content TEXT NOT NULL,
    access_level VARCHAR(50) DEFAULT 'public',
    created_by VARCHAR(100),
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    is_active BOOLEAN DEFAULT true
);

CREATE TABLE kb_chunks (
    id SERIAL PRIMARY KEY,
    document_id INTEGER REFERENCES kb_documents(id) ON DELETE CASCADE,
    chunk_index INTEGER NOT NULL,
    content TEXT NOT NULL,
    token_count INTEGER,
    embedding vector(1536),
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE kb_queries (
    id SERIAL PRIMARY KEY,
    query_text TEXT NOT NULL,
    top_score FLOAT,
    results_count INTEGER,
    user_id VARCHAR(100),
    answered BOOLEAN DEFAULT false,
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_chunks_embedding ON kb_chunks
    USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

CREATE INDEX idx_chunks_document ON kb_chunks(document_id);
CREATE INDEX idx_documents_category ON kb_documents(category);
CREATE INDEX idx_documents_active ON kb_documents(is_active);
CREATE INDEX idx_queries_created ON kb_queries(created_at);

The kb_chunks table holds the actual embeddings. Each document gets split into overlapping chunks, and each chunk gets its own 1536-dimensional vector. The IVFFlat index on the embedding column is critical for performance once you get past a few thousand chunks. Without it, every search scans every row.

I store the full document content in kb_documents separately from the chunks. This makes document updates clean: delete old chunks, re-chunk, re-embed, insert new chunks. The original content stays intact for display and re-processing.

Document Ingestion Pipeline

The ingestion pipeline handles the full flow from upload to stored embeddings:

var { Pool } = require("pg");
var { OpenAI } = require("openai");
var marked = require("marked");
var sanitizeHtml = require("sanitize-html");
var pdfParse = require("pdf-parse");

var pool = new Pool({ connectionString: process.env.POSTGRES_URL });
var openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

function extractText(content, format) {
    if (format === "markdown") {
        var html = marked.parse(content);
        return sanitizeHtml(html, { allowedTags: [], allowedAttributes: {} });
    }
    if (format === "html") {
        return sanitizeHtml(content, { allowedTags: [], allowedAttributes: {} });
    }
    return content;
}

function chunkText(text, options) {
    var chunkSize = (options && options.chunkSize) || 500;
    var overlap = (options && options.overlap) || 50;
    var sentences = text.split(/(?<=[.!?])\s+/);
    var chunks = [];
    var currentChunk = "";
    var currentTokens = 0;

    for (var i = 0; i < sentences.length; i++) {
        var sentenceTokens = Math.ceil(sentences[i].split(/\s+/).length * 1.3);

        if (currentTokens + sentenceTokens > chunkSize && currentChunk.length > 0) {
            chunks.push(currentChunk.trim());
            // Overlap: keep last few sentences
            var overlapSentences = currentChunk.split(/(?<=[.!?])\s+/);
            var overlapText = "";
            var overlapTokens = 0;
            for (var j = overlapSentences.length - 1; j >= 0; j--) {
                var st = Math.ceil(overlapSentences[j].split(/\s+/).length * 1.3);
                if (overlapTokens + st > overlap) break;
                overlapText = overlapSentences[j] + " " + overlapText;
                overlapTokens += st;
            }
            currentChunk = overlapText;
            currentTokens = overlapTokens;
        }

        currentChunk += sentences[i] + " ";
        currentTokens += sentenceTokens;
    }

    if (currentChunk.trim().length > 0) {
        chunks.push(currentChunk.trim());
    }

    return chunks;
}

function generateEmbeddings(texts, callback) {
    var batchSize = 100;
    var allEmbeddings = [];
    var batches = [];

    for (var i = 0; i < texts.length; i += batchSize) {
        batches.push(texts.slice(i, i + batchSize));
    }

    var index = 0;

    function processBatch() {
        if (index >= batches.length) {
            return callback(null, allEmbeddings);
        }

        openai.embeddings.create({
            model: "text-embedding-3-small",
            input: batches[index]
        }).then(function (response) {
            for (var j = 0; j < response.data.length; j++) {
                allEmbeddings.push(response.data[j].embedding);
            }
            index++;
            processBatch();
        }).catch(function (err) {
            callback(err);
        });
    }

    processBatch();
}

function ingestDocument(doc, callback) {
    var plainText = extractText(doc.content, doc.format);
    var chunks = chunkText(plainText, { chunkSize: 500, overlap: 50 });

    pool.query(
        "INSERT INTO kb_documents (title, source_url, format, category, tags, content, access_level, created_by) VALUES ($1, $2, $3, $4, $5, $6, $7, $8) RETURNING id",
        [doc.title, doc.sourceUrl, doc.format, doc.category, doc.tags || [], doc.content, doc.accessLevel || "public", doc.createdBy],
        function (err, result) {
            if (err) return callback(err);

            var documentId = result.rows[0].id;

            generateEmbeddings(chunks, function (err, embeddings) {
                if (err) return callback(err);

                var insertPromises = [];

                for (var i = 0; i < chunks.length; i++) {
                    var tokenCount = Math.ceil(chunks[i].split(/\s+/).length * 1.3);
                    insertPromises.push(
                        pool.query(
                            "INSERT INTO kb_chunks (document_id, chunk_index, content, token_count, embedding) VALUES ($1, $2, $3, $4, $5)",
                            [documentId, i, chunks[i], tokenCount, JSON.stringify(embeddings[i])]
                        )
                    );
                }

                Promise.all(insertPromises).then(function () {
                    callback(null, { documentId: documentId, chunksCreated: chunks.length });
                }).catch(function (err) {
                    callback(err);
                });
            });
        }
    );
}

The chunking strategy matters enormously. Chunks that are too small lose context. Chunks that are too large dilute the signal during retrieval. I have found 400-600 tokens with 50-token overlap to be the sweet spot for technical documentation. The overlap ensures that information spanning a chunk boundary is not lost entirely.

Supporting Multiple Document Formats

Real knowledge bases accept content from many sources. Here is how to handle PDF text extraction alongside markdown and HTML:

var fs = require("fs");
var path = require("path");
var pdfParse = require("pdf-parse");

function parseDocument(filePath, callback) {
    var ext = path.extname(filePath).toLowerCase();
    var content = "";
    var format = "text";

    if (ext === ".pdf") {
        var dataBuffer = fs.readFileSync(filePath);
        pdfParse(dataBuffer).then(function (data) {
            callback(null, { content: data.text, format: "pdf" });
        }).catch(function (err) {
            callback(err);
        });
        return;
    }

    content = fs.readFileSync(filePath, "utf-8");

    if (ext === ".md" || ext === ".markdown") {
        format = "markdown";
    } else if (ext === ".html" || ext === ".htm") {
        format = "html";
    } else {
        format = "text";
    }

    callback(null, { content: content, format: format });
}

function ingestFile(filePath, metadata, callback) {
    parseDocument(filePath, function (err, parsed) {
        if (err) return callback(err);

        var doc = {
            title: metadata.title || path.basename(filePath, path.extname(filePath)),
            content: parsed.content,
            format: parsed.format,
            category: metadata.category,
            tags: metadata.tags,
            sourceUrl: metadata.sourceUrl,
            accessLevel: metadata.accessLevel,
            createdBy: metadata.createdBy
        };

        ingestDocument(doc, callback);
    });
}

PDF extraction is inherently lossy. Tables, code blocks, and formatted lists often come through garbled. For technical documentation, I recommend storing the original file alongside the extracted text so users can fall back to the source when the extracted text looks wrong.

Building a Search API with Semantic Ranking

The search endpoint embeds the query, runs a cosine similarity search against pgvector, and returns ranked results with metadata:

function searchKnowledgeBase(query, options, callback) {
    var limit = (options && options.limit) || 10;
    var category = options && options.category;
    var accessLevel = (options && options.accessLevel) || "public";
    var threshold = (options && options.threshold) || 0.3;

    openai.embeddings.create({
        model: "text-embedding-3-small",
        input: query
    }).then(function (response) {
        var queryEmbedding = JSON.stringify(response.data[0].embedding);

        var sql = "SELECT c.id, c.content, c.chunk_index, c.document_id, " +
            "d.title, d.category, d.source_url, d.tags, " +
            "1 - (c.embedding <=> $1::vector) AS similarity " +
            "FROM kb_chunks c " +
            "JOIN kb_documents d ON c.document_id = d.id " +
            "WHERE d.is_active = true " +
            "AND d.access_level = $2 ";

        var params = [queryEmbedding, accessLevel];
        var paramIndex = 3;

        if (category) {
            sql += "AND d.category = $" + paramIndex + " ";
            params.push(category);
            paramIndex++;
        }

        sql += "AND 1 - (c.embedding <=> $1::vector) > $" + paramIndex + " ";
        params.push(threshold);
        paramIndex++;

        sql += "ORDER BY c.embedding <=> $1::vector LIMIT $" + paramIndex;
        params.push(limit);

        pool.query(sql, params, function (err, result) {
            if (err) return callback(err);

            // Log query for analytics
            var topScore = result.rows.length > 0 ? result.rows[0].similarity : 0;
            pool.query(
                "INSERT INTO kb_queries (query_text, top_score, results_count, answered) VALUES ($1, $2, $3, $4)",
                [query, topScore, result.rows.length, result.rows.length > 0]
            );

            // Deduplicate by document, keeping highest-scoring chunk per doc
            var docMap = {};
            for (var i = 0; i < result.rows.length; i++) {
                var row = result.rows[i];
                if (!docMap[row.document_id] || docMap[row.document_id].similarity < row.similarity) {
                    docMap[row.document_id] = row;
                }
            }

            var results = Object.values(docMap).sort(function (a, b) {
                return b.similarity - a.similarity;
            });

            callback(null, results);
        });
    }).catch(function (err) {
        callback(err);
    });
}

The similarity threshold of 0.3 filters out irrelevant noise. I have seen systems that skip the threshold and return garbage results with 0.05 similarity scores. That destroys user trust. Better to return nothing than to return irrelevant results.

Deduplication by document is important. Without it, a single long document can dominate the results with five different chunks that all say roughly the same thing.

Implementing Question-Answering Over the Knowledge Base

Retrieval-augmented generation (RAG) combines semantic search with an LLM to answer questions directly. The pattern is: search for relevant chunks, stuff them into a prompt, and ask Claude to synthesize an answer:

var Anthropic = require("@anthropic-ai/sdk");
var anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

function answerQuestion(question, options, callback) {
    searchKnowledgeBase(question, { limit: 5, accessLevel: options.accessLevel }, function (err, results) {
        if (err) return callback(err);

        if (results.length === 0) {
            return callback(null, {
                answer: "I could not find any relevant information in the knowledge base for that question.",
                sources: [],
                confidence: 0
            });
        }

        var contextParts = [];
        var sources = [];

        for (var i = 0; i < results.length; i++) {
            contextParts.push("--- Document: " + results[i].title + " ---\n" + results[i].content);
            sources.push({
                title: results[i].title,
                url: results[i].source_url,
                similarity: results[i].similarity
            });
        }

        var systemPrompt = "You are a helpful assistant that answers questions based strictly on the provided knowledge base context. " +
            "If the context does not contain enough information to answer the question, say so. " +
            "Always cite which document(s) your answer comes from. Do not make up information.";

        var userPrompt = "Context from knowledge base:\n\n" + contextParts.join("\n\n") +
            "\n\nQuestion: " + question + "\n\nProvide a clear, accurate answer based on the context above.";

        anthropic.messages.create({
            model: "claude-sonnet-4-20250514",
            max_tokens: 1024,
            system: systemPrompt,
            messages: [{ role: "user", content: userPrompt }]
        }).then(function (response) {
            var answer = response.content[0].text;
            var avgSimilarity = 0;
            for (var i = 0; i < results.length; i++) {
                avgSimilarity += results[i].similarity;
            }
            avgSimilarity = avgSimilarity / results.length;

            callback(null, {
                answer: answer,
                sources: sources,
                confidence: avgSimilarity
            });
        }).catch(function (err) {
            callback(err);
        });
    });
}

The system prompt is critical. Without the instruction to stick to the provided context, the LLM will happily hallucinate answers. The citation requirement keeps the answers grounded and verifiable.

Access Control for Knowledge Base Documents

Not all documentation should be visible to everyone. The access_level field on documents supports tiered access:

function getAccessLevel(req) {
    if (!req.user) return "public";
    if (req.user.role === "admin") return "internal";
    if (req.user.role === "employee") return "internal";
    return "public";
}

function searchWithAccess(req, query, options, callback) {
    var accessLevel = getAccessLevel(req);

    var accessLevels = ["public"];
    if (accessLevel === "internal") {
        accessLevels.push("internal");
    }

    // Modify search to accept array of access levels
    var sql = "SELECT c.id, c.content, c.chunk_index, c.document_id, " +
        "d.title, d.category, d.source_url, d.tags, " +
        "1 - (c.embedding <=> $1::vector) AS similarity " +
        "FROM kb_chunks c " +
        "JOIN kb_documents d ON c.document_id = d.id " +
        "WHERE d.is_active = true " +
        "AND d.access_level = ANY($2) " +
        "ORDER BY c.embedding <=> $1::vector LIMIT $3";

    // ... embed query, execute with accessLevels parameter
    callback(null, []);
}

Keep access control at the document level, not the chunk level. Chunk-level permissions are a nightmare to maintain and reason about. If a document is internal, all its chunks are internal.

Incremental Updates

Updating a document means re-chunking and re-embedding. The old chunks must be deleted atomically:

function updateDocument(documentId, updates, callback) {
    pool.query("SELECT * FROM kb_documents WHERE id = $1", [documentId], function (err, result) {
        if (err) return callback(err);
        if (result.rows.length === 0) return callback(new Error("Document not found"));

        var existing = result.rows[0];
        var newContent = updates.content || existing.content;
        var newTitle = updates.title || existing.title;
        var newCategory = updates.category || existing.category;
        var newTags = updates.tags || existing.tags;

        pool.query("BEGIN", function (err) {
            if (err) return callback(err);

            pool.query(
                "UPDATE kb_documents SET title = $1, content = $2, category = $3, tags = $4, updated_at = NOW() WHERE id = $5",
                [newTitle, newContent, newCategory, newTags, documentId],
                function (err) {
                    if (err) {
                        pool.query("ROLLBACK");
                        return callback(err);
                    }

                    // Delete old chunks
                    pool.query("DELETE FROM kb_chunks WHERE document_id = $1", [documentId], function (err) {
                        if (err) {
                            pool.query("ROLLBACK");
                            return callback(err);
                        }

                        // Re-chunk and re-embed
                        var plainText = extractText(newContent, existing.format);
                        var chunks = chunkText(plainText, { chunkSize: 500, overlap: 50 });

                        generateEmbeddings(chunks, function (err, embeddings) {
                            if (err) {
                                pool.query("ROLLBACK");
                                return callback(err);
                            }

                            var insertPromises = [];
                            for (var i = 0; i < chunks.length; i++) {
                                var tokenCount = Math.ceil(chunks[i].split(/\s+/).length * 1.3);
                                insertPromises.push(
                                    pool.query(
                                        "INSERT INTO kb_chunks (document_id, chunk_index, content, token_count, embedding) VALUES ($1, $2, $3, $4, $5)",
                                        [documentId, i, chunks[i], tokenCount, JSON.stringify(embeddings[i])]
                                    )
                                );
                            }

                            Promise.all(insertPromises).then(function () {
                                pool.query("COMMIT", function (err) {
                                    if (err) return callback(err);
                                    callback(null, { documentId: documentId, chunksUpdated: chunks.length });
                                });
                            }).catch(function (err) {
                                pool.query("ROLLBACK");
                                callback(err);
                            });
                        });
                    });
                }
            );
        });
    });
}

function deleteDocument(documentId, callback) {
    // Chunks cascade-delete via foreign key
    pool.query("DELETE FROM kb_documents WHERE id = $1", [documentId], function (err, result) {
        if (err) return callback(err);
        callback(null, { deleted: result.rowCount > 0 });
    });
}

The transaction wrapper is essential. Without it, a crash during re-embedding leaves the document with no chunks at all. Wrapping it in a transaction ensures the old chunks stay until the new ones are fully committed.

Building an Admin Interface for Content Management

Admin endpoints let you manage documents, review analytics, and monitor the health of the knowledge base:

var express = require("express");
var router = express.Router();

// List all documents with chunk counts
router.get("/admin/documents", function (req, res) {
    var sql = "SELECT d.*, COUNT(c.id) AS chunk_count " +
        "FROM kb_documents d " +
        "LEFT JOIN kb_chunks c ON d.id = c.document_id " +
        "GROUP BY d.id " +
        "ORDER BY d.updated_at DESC";

    pool.query(sql, function (err, result) {
        if (err) return res.status(500).json({ error: err.message });
        res.json({ documents: result.rows });
    });
});

// Get document details with chunks
router.get("/admin/documents/:id", function (req, res) {
    pool.query("SELECT * FROM kb_documents WHERE id = $1", [req.params.id], function (err, docResult) {
        if (err) return res.status(500).json({ error: err.message });
        if (docResult.rows.length === 0) return res.status(404).json({ error: "Not found" });

        pool.query(
            "SELECT id, chunk_index, content, token_count FROM kb_chunks WHERE document_id = $1 ORDER BY chunk_index",
            [req.params.id],
            function (err, chunkResult) {
                if (err) return res.status(500).json({ error: err.message });
                res.json({ document: docResult.rows[0], chunks: chunkResult.rows });
            }
        );
    });
});

// Toggle document active status
router.post("/admin/documents/:id/toggle", function (req, res) {
    pool.query(
        "UPDATE kb_documents SET is_active = NOT is_active, updated_at = NOW() WHERE id = $1 RETURNING is_active",
        [req.params.id],
        function (err, result) {
            if (err) return res.status(500).json({ error: err.message });
            res.json({ is_active: result.rows[0].is_active });
        }
    );
});

The admin interface should also expose document re-indexing. If you change your embedding model or chunking strategy, you need to reprocess everything. A bulk re-index endpoint that processes documents in batches prevents OpenAI rate limits from killing the operation.

Categorization and Tagging with Embeddings

You can automatically suggest categories and tags for new documents by comparing their embeddings against existing categorized content:

function suggestCategory(content, callback) {
    var plainText = extractText(content, "text");
    var preview = plainText.substring(0, 1000);

    openai.embeddings.create({
        model: "text-embedding-3-small",
        input: preview
    }).then(function (response) {
        var queryEmbedding = JSON.stringify(response.data[0].embedding);

        var sql = "SELECT d.category, AVG(1 - (c.embedding <=> $1::vector)) AS avg_similarity " +
            "FROM kb_chunks c " +
            "JOIN kb_documents d ON c.document_id = d.id " +
            "WHERE d.category IS NOT NULL AND d.is_active = true " +
            "GROUP BY d.category " +
            "ORDER BY avg_similarity DESC " +
            "LIMIT 5";

        pool.query(sql, [queryEmbedding], function (err, result) {
            if (err) return callback(err);
            callback(null, result.rows);
        });
    }).catch(function (err) {
        callback(err);
    });
}

This approach works well when you already have a body of categorized content. The suggested categories are ranked by average similarity, so the most semantically relevant category floats to the top.

Analytics: Popular Queries, Content Gaps, and Unanswered Questions

The kb_queries table becomes a goldmine for understanding what users need:

// Most popular queries in the last 30 days
router.get("/admin/analytics/popular-queries", function (req, res) {
    var sql = "SELECT query_text, COUNT(*) AS count, AVG(top_score) AS avg_score " +
        "FROM kb_queries " +
        "WHERE created_at > NOW() - INTERVAL '30 days' " +
        "GROUP BY query_text " +
        "ORDER BY count DESC " +
        "LIMIT 50";

    pool.query(sql, function (err, result) {
        if (err) return res.status(500).json({ error: err.message });
        res.json({ queries: result.rows });
    });
});

// Content gaps: queries with low scores (users searching for content we don't have)
router.get("/admin/analytics/content-gaps", function (req, res) {
    var sql = "SELECT query_text, COUNT(*) AS count, AVG(top_score) AS avg_score " +
        "FROM kb_queries " +
        "WHERE top_score < 0.4 " +
        "AND created_at > NOW() - INTERVAL '30 days' " +
        "GROUP BY query_text " +
        "HAVING COUNT(*) > 2 " +
        "ORDER BY count DESC " +
        "LIMIT 30";

    pool.query(sql, function (err, result) {
        if (err) return res.status(500).json({ error: err.message });
        res.json({ gaps: result.rows });
    });
});

// Unanswered questions: queries where no results were returned
router.get("/admin/analytics/unanswered", function (req, res) {
    var sql = "SELECT query_text, COUNT(*) AS count, created_at " +
        "FROM kb_queries " +
        "WHERE answered = false " +
        "AND created_at > NOW() - INTERVAL '30 days' " +
        "GROUP BY query_text, created_at " +
        "ORDER BY count DESC " +
        "LIMIT 30";

    pool.query(sql, function (err, result) {
        if (err) return res.status(500).json({ error: err.message });
        res.json({ unanswered: result.rows });
    });
});

Content gaps are the most actionable insight. When multiple users search for something your knowledge base cannot answer, that is a clear signal to write that documentation. I review the content gaps report weekly and use it to prioritize documentation sprints.

Exporting and Backing Up the Knowledge Base

Embeddings are expensive to regenerate. Back up both the documents and their embeddings:

function exportKnowledgeBase(callback) {
    pool.query("SELECT * FROM kb_documents WHERE is_active = true ORDER BY id", function (err, docs) {
        if (err) return callback(err);

        pool.query(
            "SELECT c.document_id, c.chunk_index, c.content, c.token_count, c.embedding::text " +
            "FROM kb_chunks c " +
            "JOIN kb_documents d ON c.document_id = d.id " +
            "WHERE d.is_active = true " +
            "ORDER BY c.document_id, c.chunk_index",
            function (err, chunks) {
                if (err) return callback(err);

                var exportData = {
                    exportDate: new Date().toISOString(),
                    embeddingModel: "text-embedding-3-small",
                    documents: docs.rows,
                    chunks: chunks.rows
                };

                callback(null, exportData);
            }
        );
    });
}

router.get("/admin/export", function (req, res) {
    exportKnowledgeBase(function (err, data) {
        if (err) return res.status(500).json({ error: err.message });

        res.setHeader("Content-Disposition", "attachment; filename=kb-export-" + Date.now() + ".json");
        res.setHeader("Content-Type", "application/json");
        res.json(data);
    });
});

The export includes the embedding model name because embeddings from different models are not compatible. If you restore a backup that was created with text-embedding-ada-002 but your system now uses text-embedding-3-small, the similarity scores will be meaningless. Always track which model produced the embeddings.

Integrating with Chat Applications

A Slack bot integration turns your knowledge base into a team resource accessible from where people already work:

var express = require("express");
var crypto = require("crypto");

function verifySlackSignature(req, signingSecret) {
    var timestamp = req.headers["x-slack-request-timestamp"];
    var slackSignature = req.headers["x-slack-signature"];
    var body = req.rawBody;

    var sigBasestring = "v0:" + timestamp + ":" + body;
    var mySignature = "v0=" + crypto.createHmac("sha256", signingSecret).update(sigBasestring).digest("hex");

    return crypto.timingSafeEqual(Buffer.from(mySignature), Buffer.from(slackSignature));
}

router.post("/integrations/slack", function (req, res) {
    if (!verifySlackSignature(req, process.env.SLACK_SIGNING_SECRET)) {
        return res.status(401).send("Invalid signature");
    }

    // Handle URL verification challenge
    if (req.body.type === "url_verification") {
        return res.json({ challenge: req.body.challenge });
    }

    // Handle slash command or mention
    var question = req.body.text || req.body.event && req.body.event.text;

    if (!question) {
        return res.json({ text: "Please include a question." });
    }

    // Acknowledge immediately, respond async
    res.json({ response_type: "in_channel", text: "Searching the knowledge base..." });

    answerQuestion(question, { accessLevel: "internal" }, function (err, result) {
        if (err) {
            console.error("Slack KB error:", err);
            return;
        }

        var responseText = result.answer + "\n\n";
        if (result.sources.length > 0) {
            responseText += "*Sources:*\n";
            for (var i = 0; i < result.sources.length; i++) {
                responseText += "- " + result.sources[i].title;
                if (result.sources[i].url) {
                    responseText += " (<" + result.sources[i].url + "|link>)";
                }
                responseText += "\n";
            }
        }

        // Post response via Slack API
        var https = require("https");
        var postData = JSON.stringify({
            channel: req.body.channel_id || req.body.event.channel,
            text: responseText
        });

        var postOptions = {
            hostname: "slack.com",
            path: "/api/chat.postMessage",
            method: "POST",
            headers: {
                "Content-Type": "application/json",
                "Authorization": "Bearer " + process.env.SLACK_BOT_TOKEN
            }
        };

        var postReq = https.request(postOptions, function () {});
        postReq.write(postData);
        postReq.end();
    });
});

The key detail is acknowledging the Slack request immediately and posting the answer asynchronously. Slack requires a response within 3 seconds, and embedding generation plus LLM inference takes longer than that.

Complete Working Example

Here is the full Express.js application tying everything together:

var express = require("express");
var multer = require("multer");
var { Pool } = require("pg");
var { OpenAI } = require("openai");
var Anthropic = require("@anthropic-ai/sdk");
var path = require("path");

var app = express();
var upload = multer({ dest: "uploads/" });

app.use(express.json({ limit: "10mb" }));

var pool = new Pool({ connectionString: process.env.POSTGRES_URL });
var openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
var anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

// -- Search endpoint --
app.get("/api/kb/search", function (req, res) {
    var query = req.query.q;
    var category = req.query.category;

    if (!query) return res.status(400).json({ error: "Query parameter 'q' is required" });

    searchKnowledgeBase(query, {
        limit: 10,
        category: category,
        accessLevel: "public"
    }, function (err, results) {
        if (err) return res.status(500).json({ error: err.message });
        res.json({ query: query, results: results });
    });
});

// -- Question-answering endpoint --
app.post("/api/kb/ask", function (req, res) {
    var question = req.body.question;

    if (!question) return res.status(400).json({ error: "Question is required" });

    answerQuestion(question, { accessLevel: "public" }, function (err, result) {
        if (err) return res.status(500).json({ error: err.message });
        res.json(result);
    });
});

// -- Document ingestion endpoint --
app.post("/api/kb/documents", upload.single("file"), function (req, res) {
    var metadata = {
        title: req.body.title,
        category: req.body.category,
        tags: req.body.tags ? req.body.tags.split(",").map(function (t) { return t.trim(); }) : [],
        accessLevel: req.body.access_level || "public",
        createdBy: req.body.created_by || "admin"
    };

    if (req.file) {
        ingestFile(req.file.path, metadata, function (err, result) {
            if (err) return res.status(500).json({ error: err.message });
            res.json({ success: true, documentId: result.documentId, chunks: result.chunksCreated });
        });
    } else if (req.body.content) {
        var doc = {
            title: metadata.title,
            content: req.body.content,
            format: req.body.format || "markdown",
            category: metadata.category,
            tags: metadata.tags,
            accessLevel: metadata.accessLevel,
            createdBy: metadata.createdBy
        };

        ingestDocument(doc, function (err, result) {
            if (err) return res.status(500).json({ error: err.message });
            res.json({ success: true, documentId: result.documentId, chunks: result.chunksCreated });
        });
    } else {
        res.status(400).json({ error: "Either file upload or content body is required" });
    }
});

// -- Update document --
app.put("/api/kb/documents/:id", function (req, res) {
    updateDocument(parseInt(req.params.id), req.body, function (err, result) {
        if (err) return res.status(500).json({ error: err.message });
        res.json({ success: true, chunksUpdated: result.chunksUpdated });
    });
});

// -- Delete document --
app.delete("/api/kb/documents/:id", function (req, res) {
    deleteDocument(parseInt(req.params.id), function (err, result) {
        if (err) return res.status(500).json({ error: err.message });
        res.json(result);
    });
});

// -- Admin: list documents --
app.get("/api/kb/admin/documents", function (req, res) {
    var sql = "SELECT d.id, d.title, d.category, d.format, d.access_level, d.is_active, " +
        "d.created_at, d.updated_at, COUNT(c.id) AS chunk_count " +
        "FROM kb_documents d " +
        "LEFT JOIN kb_chunks c ON d.id = c.document_id " +
        "GROUP BY d.id ORDER BY d.updated_at DESC";

    pool.query(sql, function (err, result) {
        if (err) return res.status(500).json({ error: err.message });
        res.json({ documents: result.rows });
    });
});

// -- Admin: analytics --
app.get("/api/kb/admin/analytics", function (req, res) {
    var queries = [
        pool.query("SELECT COUNT(*) AS total_documents FROM kb_documents WHERE is_active = true"),
        pool.query("SELECT COUNT(*) AS total_chunks FROM kb_chunks"),
        pool.query("SELECT COUNT(*) AS queries_today FROM kb_queries WHERE created_at > NOW() - INTERVAL '1 day'"),
        pool.query("SELECT AVG(top_score) AS avg_relevance FROM kb_queries WHERE created_at > NOW() - INTERVAL '7 days'")
    ];

    Promise.all(queries).then(function (results) {
        res.json({
            totalDocuments: parseInt(results[0].rows[0].total_documents),
            totalChunks: parseInt(results[1].rows[0].total_chunks),
            queriesToday: parseInt(results[2].rows[0].queries_today),
            avgRelevanceScore: parseFloat(results[3].rows[0].avg_relevance) || 0
        });
    }).catch(function (err) {
        res.status(500).json({ error: err.message });
    });
});

// -- Admin: export --
app.get("/api/kb/admin/export", function (req, res) {
    exportKnowledgeBase(function (err, data) {
        if (err) return res.status(500).json({ error: err.message });

        res.setHeader("Content-Disposition", "attachment; filename=kb-export-" + Date.now() + ".json");
        res.json(data);
    });
});

var PORT = process.env.PORT || 3000;
app.listen(PORT, function () {
    console.log("Knowledge base API running on port " + PORT);
});

Test it with curl:

# Ingest a markdown document
curl -X POST http://localhost:3000/api/kb/documents \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Deploying Node.js with Docker",
    "content": "# Docker Deployment\n\nUse multi-stage builds to reduce image size...",
    "format": "markdown",
    "category": "devops",
    "tags": "docker,nodejs,deployment"
  }'

# Search the knowledge base
curl "http://localhost:3000/api/kb/search?q=how+do+I+containerize+my+app"

# Ask a question
curl -X POST http://localhost:3000/api/kb/ask \
  -H "Content-Type: application/json" \
  -d '{"question": "What is the recommended way to deploy a Node.js app?"}'

# View analytics
curl http://localhost:3000/api/kb/admin/analytics

Expected search response:

{
  "query": "how do I containerize my app",
  "results": [
    {
      "document_id": 1,
      "title": "Deploying Node.js with Docker",
      "content": "Use multi-stage builds to reduce image size. Start with a node:20-slim base...",
      "similarity": 0.82,
      "category": "devops"
    }
  ]
}

Common Issues and Troubleshooting

1. pgvector extension not found

ERROR: could not open extension control file "/usr/share/postgresql/15/extension/vector.control": No such file or directory

The pgvector extension is not installed on your PostgreSQL server. On Ubuntu: sudo apt install postgresql-15-pgvector. On macOS with Homebrew: brew install pgvector. For managed databases like AWS RDS, enable it in your parameter group. DigitalOcean managed PostgreSQL has pgvector available by default but you still need to run CREATE EXTENSION vector in your database.

2. Embedding dimension mismatch

ERROR: expected 1536 dimensions, not 3072

This happens when you switch embedding models without recreating the schema. text-embedding-3-small produces 1536 dimensions, while text-embedding-3-large produces 3072. You need to alter the column: ALTER TABLE kb_chunks ALTER COLUMN embedding TYPE vector(3072). Then re-index: REINDEX INDEX idx_chunks_embedding. Better yet, re-embed all your documents with the new model to ensure consistency.

3. IVFFlat index returning inaccurate results

Query returns different results each time, or misses obviously relevant documents

The IVFFlat index uses approximate nearest neighbor search. If you created the index on a small dataset and then added many more documents, the index clusters are stale. Run REINDEX INDEX idx_chunks_embedding after significant data changes. Alternatively, if you have fewer than 50,000 chunks, use an HNSW index instead: CREATE INDEX idx_chunks_embedding ON kb_chunks USING hnsw (embedding vector_cosine_ops). HNSW is slower to build but gives more accurate results.

4. OpenAI rate limits during bulk ingestion

Error: 429 Too Many Requests - Rate limit reached for text-embedding-3-small

When ingesting many documents at once, you will hit the embedding API rate limit. Add a delay between batches and implement exponential backoff:

function delay(ms) {
    return new Promise(function (resolve) {
        setTimeout(resolve, ms);
    });
}

function generateEmbeddingsWithRetry(texts, callback, retries) {
    retries = retries || 0;

    openai.embeddings.create({
        model: "text-embedding-3-small",
        input: texts
    }).then(function (response) {
        callback(null, response.data.map(function (d) { return d.embedding; }));
    }).catch(function (err) {
        if (err.status === 429 && retries < 5) {
            var waitTime = Math.pow(2, retries) * 1000;
            console.log("Rate limited, waiting " + waitTime + "ms before retry...");
            delay(waitTime).then(function () {
                generateEmbeddingsWithRetry(texts, callback, retries + 1);
            });
        } else {
            callback(err);
        }
    });
}

5. Large documents causing memory issues

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory

A single document with 100,000+ words generates hundreds of chunks and their embeddings. Process large documents in streaming fashion rather than loading everything into memory at once. Set --max-old-space-size=4096 on your Node process for the ingestion worker, and process chunks in batches of 50 rather than all at once.

Best Practices

  • Chunk size matters more than embedding model. A mediocre embedding model with good 400-500 token chunks outperforms a premium model with bad chunking every time. Experiment with your specific content.

  • Always store raw document content alongside chunks. You will need to re-chunk when you change your strategy, and regenerating from the original is far easier than trying to reconstruct from chunks.

  • Log every query and its top similarity score. This is your most valuable feedback signal. Queries with consistently low scores point directly to content gaps you need to fill.

  • Set a similarity threshold and return nothing rather than garbage. A 0.3-0.4 threshold works for most knowledge bases. Users lose trust quickly when the system returns irrelevant results confidently.

  • Use transactions for document updates. Deleting old chunks and inserting new ones must be atomic. A partial update leaves the document unsearchable.

  • Run the IVFFlat index rebuild on a schedule. After adding significant content, the index clusters become stale. A weekly REINDEX during low-traffic hours keeps search quality high.

  • Separate the ingestion worker from the search API. Embedding generation is CPU-and-network-intensive. Running it in the same process as search requests degrades response times for users.

  • Version your embedding model in the export. When you back up embeddings, record which model created them. Mixing embeddings from different models in the same vector space produces nonsensical similarity scores.

  • Implement rate limiting on the search and ask endpoints. Embedding each query costs money. A runaway client or bot can rack up significant OpenAI charges if your endpoints are unprotected.

  • Pre-filter with metadata before vector search when possible. If the user selects a category, adding a WHERE category = $1 clause reduces the search space and improves both performance and relevance.

References

Powered by Contentful