Llm Apis

Building a Prompt Library for Your Application

Build a reusable prompt library with templating, versioning, A/B testing, and analytics for Node.js LLM applications.

Building a Prompt Library for Your Application

Overview

As soon as your application grows beyond a single LLM call, you need a prompt library. Hardcoding prompt strings across your codebase leads to inconsistency, makes iteration painful, and makes it nearly impossible to track which prompts actually perform well. This article walks through building a production-grade prompt library in Node.js with file-based storage, Handlebars templating, version management, A/B testing, and usage analytics.

Prerequisites

  • Node.js v16 or later installed
  • Familiarity with Express.js and basic LLM API usage
  • Understanding of template engines (Handlebars or similar)
  • Basic knowledge of file system operations in Node.js

Why You Need a Prompt Library

If you have worked on any LLM-integrated application for more than a month, you have felt the pain. Prompts are scattered across controllers, service files, and utility functions. One developer tweaks the customer support prompt and breaks the tone of the refund handler. Nobody knows which version of the summarization prompt is running in production. Sound familiar?

A prompt library solves several critical problems at once:

Consistency. Every call to the LLM for a given task uses the same prompt structure. Your customer-facing tone stays uniform whether the request comes from the chat widget or the email handler.

Versioning. You can track exactly what changed, when, and why. When a prompt regression happens (and it will), you can roll back in seconds rather than digging through git blame across twelve files.

Reuse. Common fragments like system instructions, output format specifications, and safety guardrails get defined once and composed into larger prompts. No more copy-pasting the same JSON formatting instruction into thirty different prompts.

Testability. When prompts live in a structured system, you can write automated tests against them. You can validate that required variables are present, that output format instructions are included, and that safety constraints have not been accidentally removed.

Analytics. You can measure which prompt versions produce better results, lower latency, or fewer tokens. Without a library, this kind of measurement is nearly impossible.

Designing a Prompt Template System

The foundation of any prompt library is a template system that separates prompt structure from runtime data. Handlebars is an excellent choice here because it is logic-light, widely understood, and runs on both server and client.

Variable Interpolation

At its simplest, a prompt template replaces placeholders with runtime values:

var Handlebars = require("handlebars");

var template = Handlebars.compile(
  "Summarize the following {{contentType}} in {{language}}:\n\n{{content}}"
);

var prompt = template({
  contentType: "article",
  language: "English",
  content: "The Federal Reserve announced today..."
});

This is straightforward, but real applications need more. You need conditionals, loops, and partials.

Conditional Sections

Prompts often need optional sections depending on context:

var templateSource = [
  "You are a helpful assistant.",
  "{{#if persona}}Adopt the following persona: {{persona}}{{/if}}",
  "",
  "Analyze the following text:",
  "{{content}}",
  "",
  "{{#if outputFormat}}Respond in {{outputFormat}} format.{{/if}}",
  "{{#unless outputFormat}}Respond in plain text.{{/unless}}"
].join("\n");

var template = Handlebars.compile(templateSource);

Iteration for Dynamic Content

When you need to inject lists of examples, tools, or constraints:

var templateSource = [
  "You have access to the following tools:",
  "{{#each tools}}",
  "- {{this.name}}: {{this.description}}",
  "{{/each}}",
  "",
  "User request: {{userMessage}}"
].join("\n");

Organizing Prompts by Domain and Function

Structure matters. A flat directory of prompt files becomes unmanageable fast. Organize by domain first, then by function:

prompts/
  customer-support/
    classify-ticket.v1.hbs
    classify-ticket.v2.hbs
    draft-response.v1.hbs
    escalation-check.v1.hbs
  content/
    summarize.v1.hbs
    translate.v1.hbs
    moderate.v1.hbs
  shared/
    json-output.partial.hbs
    safety-guardrails.partial.hbs
    tone-professional.partial.hbs

Each prompt file contains the full template plus metadata in a front-matter block. The version number is part of the filename so you can see all versions at a glance. Shared partials go in a dedicated directory and get registered globally.

Front Matter in Prompt Files

Every prompt file should carry its own metadata:

---
name: classify-ticket
version: 2
description: Classifies support tickets into categories
model: gpt-4o-mini
temperature: 0.1
maxTokens: 100
requiredParams:
  - ticketSubject
  - ticketBody
  - categories
optionalParams:
  - previousCategory
  - customerTier
---
You are a support ticket classifier.

Classify the following ticket into exactly one of these categories:
{{#each categories}}
- {{this}}
{{/each}}

Subject: {{ticketSubject}}
Body: {{ticketBody}}

{{#if previousCategory}}Previously classified as: {{previousCategory}}{{/if}}
{{#if customerTier}}Customer tier: {{customerTier}}{{/if}}

Respond with ONLY the category name, nothing else.

Storage Strategies: Files vs. Database vs. Code

There are three common approaches to storing prompts, and each has trade-offs.

File-Based Storage

Files are the simplest and most developer-friendly option. Prompts live in version control alongside your code. Developers can review prompt changes in pull requests. The downside is that updating a prompt requires a deployment.

This is my recommended starting point. Most teams overthink this. File-based storage with git gives you versioning, review, and rollback for free. You only need a database when non-engineers need to edit prompts or when you need to change prompts without deploying.

Database Storage

Storing prompts in a database (PostgreSQL, MongoDB, or even a key-value store) lets you update prompts without redeploying your application. This is useful when product managers or prompt engineers need to iterate quickly. The downside is that you lose git-based review and need to build your own versioning, audit trail, and rollback mechanism.

In-Code Storage

Keeping prompts as string constants in your JavaScript files is the worst option for anything beyond a prototype. It tangles prompt content with application logic, makes review difficult, and encourages copy-paste duplication. Avoid this.

Prompt Versioning and Migration

Every prompt should have an explicit version number. Never overwrite a prompt in place. Instead, create a new version and update the routing configuration to point to it.

// prompt-registry.js
var registry = {
  "classify-ticket": {
    active: 2,
    versions: {
      1: "customer-support/classify-ticket.v1.hbs",
      2: "customer-support/classify-ticket.v2.hbs"
    }
  },
  "summarize": {
    active: 1,
    versions: {
      1: "content/summarize.v1.hbs"
    }
  }
};

module.exports = registry;

When you need to roll back, you change a single number in the registry. The old version is still on disk, tested, and ready to go. This is dramatically safer than trying to revert a prompt change buried in application code.

Migration Strategy

When a new prompt version changes the required parameters, you need a migration path. The registry should track parameter compatibility:

var registry = {
  "classify-ticket": {
    active: 2,
    versions: {
      1: {
        file: "customer-support/classify-ticket.v1.hbs",
        params: ["ticketSubject", "ticketBody"],
        deprecated: true
      },
      2: {
        file: "customer-support/classify-ticket.v2.hbs",
        params: ["ticketSubject", "ticketBody", "categories"],
        migratedFrom: 1,
        migrationNotes: "Added required categories parameter"
      }
    }
  }
};

A/B Testing Different Prompt Versions

Once you have versioning, A/B testing is a natural extension. You split traffic between two prompt versions and measure which performs better.

function selectVersion(promptName, sessionId) {
  var config = abTestConfig[promptName];
  if (!config || !config.enabled) {
    return registry[promptName].active;
  }

  // Deterministic assignment based on session ID
  var hash = 0;
  for (var i = 0; i < sessionId.length; i++) {
    hash = ((hash << 5) - hash) + sessionId.charCodeAt(i);
    hash = hash & hash; // Convert to 32-bit integer
  }
  var bucket = Math.abs(hash) % 100;

  if (bucket < config.trafficPercent) {
    return config.challengerVersion;
  }
  return config.controlVersion;
}

The key insight is deterministic assignment. A given session ID always gets the same prompt version. This prevents confusing behavior where the same user sees different response styles across requests.

Prompt Composition

Real-world prompts are rarely monolithic. They are assembled from reusable fragments. A customer support response prompt might compose a persona fragment, a tone fragment, a safety guardrail fragment, and a format specification fragment.

var Handlebars = require("handlebars");
var fs = require("fs");
var path = require("path");

function registerPartials(partialsDir) {
  var files = fs.readdirSync(partialsDir);
  files.forEach(function(file) {
    if (file.endsWith(".partial.hbs")) {
      var name = file.replace(".partial.hbs", "");
      var content = fs.readFileSync(path.join(partialsDir, file), "utf8");
      Handlebars.registerPartial(name, content);
    }
  });
}

// Now prompts can reference shared fragments:
// {{> safety-guardrails}}
// {{> json-output}}
// {{> tone-professional}}

A prompt file using composition looks like this:

{{> tone-professional}}

You are a customer support agent for {{companyName}}.

{{> safety-guardrails}}

The customer wrote:
{{customerMessage}}

Draft a helpful response that addresses their concern.

{{> json-output}}

This approach keeps individual prompt files focused and small while maintaining consistency across your entire prompt surface area.

Type-Safe Prompt Parameters with Validation

When a prompt expects certain parameters, you should validate them before rendering. Nothing is worse than sending a half-rendered prompt to an LLM because a variable was undefined.

var yaml = require("js-yaml");

function parsePromptFile(content) {
  var parts = content.split("---");
  if (parts.length < 3) {
    throw new Error("Prompt file missing front matter");
  }

  var metadata = yaml.load(parts[1]);
  var template = parts.slice(2).join("---").trim();

  return {
    metadata: metadata,
    template: template
  };
}

function validateParams(metadata, params) {
  var errors = [];
  var required = metadata.requiredParams || [];

  required.forEach(function(param) {
    if (params[param] === undefined || params[param] === null) {
      errors.push("Missing required parameter: " + param);
    }
  });

  // Check for unexpected parameters (catches typos)
  var allKnown = required.concat(metadata.optionalParams || []);
  Object.keys(params).forEach(function(key) {
    if (allKnown.indexOf(key) === -1) {
      errors.push("Unknown parameter: " + key + " (did you mean one of: " + allKnown.join(", ") + "?)");
    }
  });

  if (errors.length > 0) {
    var err = new Error("Prompt parameter validation failed:\n" + errors.join("\n"));
    err.validationErrors = errors;
    throw err;
  }
}

This validation layer catches bugs early. I cannot count the number of times a typo in a parameter name has caused a prompt to silently render with a blank where critical context should have been.

Prompt Testing and Quality Assurance

Prompts deserve tests just like your application code. There are three levels of prompt testing:

Structural Tests

Validate that templates compile, required parameters are documented, and partials resolve:

var assert = require("assert");

function testPromptStructure(promptName) {
  var prompt = library.getPrompt(promptName);

  // Template compiles without error
  assert.doesNotThrow(function() {
    Handlebars.compile(prompt.template);
  }, "Template should compile");

  // Metadata is complete
  assert.ok(prompt.metadata.name, "Must have a name");
  assert.ok(prompt.metadata.version, "Must have a version");
  assert.ok(prompt.metadata.model, "Must specify a model");

  // Required params are documented
  assert.ok(
    Array.isArray(prompt.metadata.requiredParams),
    "Must list required parameters"
  );
}

Rendering Tests

Verify that prompts render correctly with sample data:

function testPromptRendering(promptName, sampleParams, expectations) {
  var rendered = library.render(promptName, sampleParams);

  expectations.forEach(function(expectation) {
    assert.ok(
      rendered.indexOf(expectation.contains) !== -1,
      "Rendered prompt should contain: " + expectation.contains
    );
  });

  // Verify no unresolved Handlebars expressions
  assert.ok(
    rendered.indexOf("{{") === -1,
    "Rendered prompt should not contain unresolved template expressions"
  );
}

Output Quality Tests

Run the prompt against the LLM with known inputs and validate the output structure:

function testPromptOutput(promptName, testCases) {
  return Promise.all(testCases.map(function(testCase) {
    return library.execute(promptName, testCase.params)
      .then(function(result) {
        testCase.assertions.forEach(function(assertion) {
          assertion(result);
        });
      });
  }));
}

Output quality tests are expensive (they cost real API calls), so run them on a schedule rather than on every commit. A nightly CI job that validates your top twenty prompts is a reasonable compromise.

Sharing Prompts Across Microservices

In a microservices architecture, multiple services often need the same prompts. There are two practical approaches.

Shared NPM Package

Extract your prompt library into a private NPM package. Each service installs it as a dependency. Updates go through the normal package release cycle.

// In your shared package: @yourorg/prompt-library
var PromptLibrary = require("./lib/prompt-library");
var prompts = require("./prompts");

module.exports = new PromptLibrary(prompts);
// In each service
var promptLib = require("@yourorg/prompt-library");

var prompt = promptLib.render("classify-ticket", {
  ticketSubject: subject,
  ticketBody: body,
  categories: categories
});

Prompt Service

For larger organizations, a dedicated prompt service provides prompts over HTTP. Services request rendered prompts by name and parameters. This centralizes prompt management and allows real-time updates without redeploying consumers.

// Prompt service endpoint
app.post("/api/prompts/:name/render", function(req, res) {
  try {
    var rendered = library.render(req.params.name, req.body.params);
    res.json({
      prompt: rendered,
      model: library.getMetadata(req.params.name).model,
      version: library.getActiveVersion(req.params.name)
    });
  } catch (err) {
    res.status(400).json({ error: err.message });
  }
});

I lean toward the NPM package approach for most teams. The prompt service approach adds a network dependency to every LLM call, which is an availability risk you need to carefully consider.

Prompt Analytics

You cannot improve what you do not measure. Track these metrics for every prompt execution:

  • Prompt name and version used
  • Token count (input and output)
  • Latency from send to first token and total
  • Success/failure (did parsing succeed, did the LLM refuse, etc.)
  • User feedback if available (thumbs up/down, corrections)
  • Cost calculated from token counts and model pricing
function trackPromptUsage(promptName, version, metrics) {
  var record = {
    promptName: promptName,
    version: version,
    timestamp: new Date().toISOString(),
    inputTokens: metrics.inputTokens,
    outputTokens: metrics.outputTokens,
    latencyMs: metrics.latencyMs,
    success: metrics.success,
    errorType: metrics.errorType || null,
    abTestGroup: metrics.abTestGroup || null,
    sessionId: metrics.sessionId || null
  };

  // Write to your analytics store
  analyticsStore.insert("prompt_usage", record);
}

Over time, this data tells you which prompts are expensive, which are slow, and which are failing. When you run A/B tests, this same data pipeline provides the comparison metrics.

Internationalization of Prompts

If your application serves multiple languages, your prompts need to handle i18n. There are two strategies:

Language as a Parameter

The simplest approach includes the target language as a parameter and instructs the LLM to respond accordingly:

{{> safety-guardrails}}

Respond in {{language}}.

{{#if languageInstructions}}
{{languageInstructions}}
{{/if}}

User query: {{userQuery}}

This works well for models with strong multilingual capabilities but gives you less control over the exact phrasing of system instructions.

Separate Prompt Files per Language

For tighter control, maintain localized versions of prompts:

prompts/
  customer-support/
    classify-ticket.v1.en.hbs
    classify-ticket.v1.es.hbs
    classify-ticket.v1.ja.hbs

The library resolves prompts with a locale fallback chain:

function resolvePromptFile(promptName, version, locale) {
  var base = registry[promptName].versions[version].file;
  var dir = path.dirname(base);
  var ext = path.extname(base);
  var name = path.basename(base, ext);

  // Try exact locale (e.g., "en-US")
  var localized = path.join(dir, name + "." + locale + ext);
  if (fs.existsSync(path.join(promptsDir, localized))) {
    return localized;
  }

  // Try language only (e.g., "en")
  var lang = locale.split("-")[0];
  localized = path.join(dir, name + "." + lang + ext);
  if (fs.existsSync(path.join(promptsDir, localized))) {
    return localized;
  }

  // Fall back to default (no locale suffix)
  return base;
}

I recommend starting with the parameter approach and only moving to separate files when you find that system instructions need significant localization beyond what the LLM can handle natively.

Complete Working Example

Here is a full prompt library module that ties everything together. It supports file-based storage, Handlebars templating, version management, A/B testing, and usage analytics.

// prompt-library.js
var fs = require("fs");
var path = require("path");
var Handlebars = require("handlebars");
var yaml = require("js-yaml");
var crypto = require("crypto");

function PromptLibrary(options) {
  this.promptsDir = options.promptsDir || path.join(__dirname, "prompts");
  this.partialsDir = options.partialsDir || path.join(this.promptsDir, "shared");
  this.registry = {};
  this.compiled = {};
  this.analytics = [];
  this.abTests = {};

  this._loadPartials();
  this._loadPrompts();
}

PromptLibrary.prototype._loadPartials = function() {
  var self = this;
  if (!fs.existsSync(this.partialsDir)) return;

  var files = fs.readdirSync(this.partialsDir);
  files.forEach(function(file) {
    if (file.endsWith(".partial.hbs")) {
      var name = file.replace(".partial.hbs", "");
      var content = fs.readFileSync(
        path.join(self.partialsDir, file),
        "utf8"
      );
      Handlebars.registerPartial(name, content);
    }
  });
};

PromptLibrary.prototype._loadPrompts = function() {
  var self = this;
  var dirs = this._getPromptDirs(this.promptsDir);

  dirs.forEach(function(dir) {
    var files = fs.readdirSync(dir);
    files.forEach(function(file) {
      if (!file.endsWith(".hbs") || file.endsWith(".partial.hbs")) return;

      var match = file.match(/^(.+)\.v(\d+)(?:\.(\w+))?\.hbs$/);
      if (!match) return;

      var promptName = match[1];
      var version = parseInt(match[2], 10);
      var locale = match[3] || "default";
      var filePath = path.join(dir, file);
      var content = fs.readFileSync(filePath, "utf8");
      var parsed = self._parsePromptFile(content);

      if (!self.registry[promptName]) {
        self.registry[promptName] = {
          active: version,
          versions: {}
        };
      }

      if (!self.registry[promptName].versions[version]) {
        self.registry[promptName].versions[version] = {
          locales: {},
          metadata: parsed.metadata
        };
      }

      self.registry[promptName].versions[version].locales[locale] =
        parsed.template;

      // Update active to highest version
      if (version > self.registry[promptName].active) {
        self.registry[promptName].active = version;
      }
    });
  });
};

PromptLibrary.prototype._getPromptDirs = function(baseDir) {
  var dirs = [baseDir];
  var entries = fs.readdirSync(baseDir, { withFileTypes: true });
  var self = this;

  entries.forEach(function(entry) {
    if (entry.isDirectory() && entry.name !== "shared") {
      dirs = dirs.concat(
        self._getPromptDirs(path.join(baseDir, entry.name))
      );
    }
  });

  return dirs;
};

PromptLibrary.prototype._parsePromptFile = function(content) {
  var parts = content.split("---");
  if (parts.length < 3) {
    return { metadata: {}, template: content.trim() };
  }

  var metadata = yaml.load(parts[1]);
  var template = parts.slice(2).join("---").trim();

  return { metadata: metadata, template: template };
};

PromptLibrary.prototype.render = function(promptName, params, options) {
  options = options || {};
  var locale = options.locale || "default";
  var sessionId = options.sessionId || null;

  var version = this._resolveVersion(promptName, sessionId);
  var versionData = this.registry[promptName].versions[version];

  if (!versionData) {
    throw new Error(
      "Prompt not found: " + promptName + " v" + version
    );
  }

  // Validate parameters
  this._validateParams(versionData.metadata, params);

  // Resolve locale
  var templateSource = versionData.locales[locale]
    || versionData.locales[locale.split("-")[0]]
    || versionData.locales["default"];

  if (!templateSource) {
    throw new Error(
      "No template found for prompt: " + promptName +
      " v" + version + " locale: " + locale
    );
  }

  // Compile (with caching)
  var cacheKey = promptName + ":" + version + ":" + locale;
  if (!this.compiled[cacheKey]) {
    this.compiled[cacheKey] = Handlebars.compile(templateSource);
  }

  var startTime = Date.now();
  var rendered = this.compiled[cacheKey](params);
  var renderTime = Date.now() - startTime;

  // Track usage
  this._trackUsage(promptName, version, {
    renderTimeMs: renderTime,
    locale: locale,
    sessionId: sessionId,
    abTestGroup: this._getAbTestGroup(promptName, version)
  });

  return {
    prompt: rendered,
    metadata: versionData.metadata,
    version: version,
    promptName: promptName
  };
};

PromptLibrary.prototype._resolveVersion = function(promptName, sessionId) {
  var entry = this.registry[promptName];
  if (!entry) {
    throw new Error("Unknown prompt: " + promptName);
  }

  var abConfig = this.abTests[promptName];
  if (!abConfig || !abConfig.enabled || !sessionId) {
    return entry.active;
  }

  // Deterministic bucket assignment
  var hash = crypto
    .createHash("md5")
    .update(sessionId + ":" + promptName)
    .digest("hex");
  var bucket = parseInt(hash.substring(0, 8), 16) % 100;

  if (bucket < abConfig.trafficPercent) {
    return abConfig.challengerVersion;
  }
  return abConfig.controlVersion;
};

PromptLibrary.prototype._validateParams = function(metadata, params) {
  if (!metadata || !metadata.requiredParams) return;

  var errors = [];
  metadata.requiredParams.forEach(function(param) {
    if (params[param] === undefined || params[param] === null) {
      errors.push("Missing required parameter: " + param);
    }
  });

  var allKnown = (metadata.requiredParams || [])
    .concat(metadata.optionalParams || []);

  Object.keys(params).forEach(function(key) {
    if (allKnown.length > 0 && allKnown.indexOf(key) === -1) {
      errors.push(
        "Unknown parameter: " + key +
        " (expected: " + allKnown.join(", ") + ")"
      );
    }
  });

  if (errors.length > 0) {
    var err = new Error(
      "Prompt validation failed for " + metadata.name +
      ":\n" + errors.join("\n")
    );
    err.validationErrors = errors;
    throw err;
  }
};

PromptLibrary.prototype.configureAbTest = function(promptName, config) {
  this.abTests[promptName] = {
    enabled: config.enabled !== false,
    controlVersion: config.controlVersion,
    challengerVersion: config.challengerVersion,
    trafficPercent: config.trafficPercent || 50
  };
};

PromptLibrary.prototype._getAbTestGroup = function(promptName, version) {
  var abConfig = this.abTests[promptName];
  if (!abConfig || !abConfig.enabled) return null;

  if (version === abConfig.controlVersion) return "control";
  if (version === abConfig.challengerVersion) return "challenger";
  return null;
};

PromptLibrary.prototype._trackUsage = function(promptName, version, data) {
  this.analytics.push({
    promptName: promptName,
    version: version,
    timestamp: new Date().toISOString(),
    renderTimeMs: data.renderTimeMs,
    locale: data.locale,
    sessionId: data.sessionId,
    abTestGroup: data.abTestGroup
  });

  // Flush periodically (every 100 records)
  if (this.analytics.length >= 100) {
    this.flush();
  }
};

PromptLibrary.prototype.flush = function() {
  var records = this.analytics.splice(0, this.analytics.length);
  if (records.length === 0) return;

  // Write to file (swap with database insert in production)
  var logFile = path.join(this.promptsDir, "..", "prompt-analytics.jsonl");
  var lines = records.map(function(r) {
    return JSON.stringify(r);
  }).join("\n") + "\n";

  fs.appendFileSync(logFile, lines, "utf8");
};

PromptLibrary.prototype.getAnalyticsSummary = function(promptName) {
  var records = this.analytics.filter(function(r) {
    return r.promptName === promptName;
  });

  var byVersion = {};
  records.forEach(function(r) {
    if (!byVersion[r.version]) {
      byVersion[r.version] = { count: 0, totalRenderMs: 0 };
    }
    byVersion[r.version].count += 1;
    byVersion[r.version].totalRenderMs += r.renderTimeMs;
  });

  var summary = {};
  Object.keys(byVersion).forEach(function(v) {
    var data = byVersion[v];
    summary[v] = {
      count: data.count,
      avgRenderMs: Math.round(data.totalRenderMs / data.count)
    };
  });

  return summary;
};

PromptLibrary.prototype.listPrompts = function() {
  var self = this;
  return Object.keys(this.registry).map(function(name) {
    var entry = self.registry[name];
    return {
      name: name,
      activeVersion: entry.active,
      versions: Object.keys(entry.versions).map(Number),
      abTestEnabled: !!(self.abTests[name] && self.abTests[name].enabled)
    };
  });
};

module.exports = PromptLibrary;

Using the Library

var PromptLibrary = require("./prompt-library");

var library = new PromptLibrary({
  promptsDir: path.join(__dirname, "prompts")
});

// Basic rendering
var result = library.render("classify-ticket", {
  ticketSubject: "Cannot access my account",
  ticketBody: "I have been locked out since yesterday...",
  categories: ["billing", "account-access", "technical", "general"]
});

console.log(result.prompt);
console.log("Using model:", result.metadata.model);
console.log("Version:", result.version);

// With A/B testing
library.configureAbTest("classify-ticket", {
  controlVersion: 1,
  challengerVersion: 2,
  trafficPercent: 20
});

// 20% of sessions get v2, 80% get v1
var abResult = library.render("classify-ticket", params, {
  sessionId: "user-abc-123"
});

// Check analytics
var summary = library.getAnalyticsSummary("classify-ticket");
console.log("Usage by version:", summary);

Integration with Express

var express = require("express");
var PromptLibrary = require("./prompt-library");
var OpenAI = require("openai");

var app = express();
var library = new PromptLibrary({
  promptsDir: path.join(__dirname, "prompts")
});
var openai = new OpenAI();

app.post("/api/classify-ticket", function(req, res) {
  try {
    var result = library.render("classify-ticket", {
      ticketSubject: req.body.subject,
      ticketBody: req.body.body,
      categories: ["billing", "account-access", "technical", "general"]
    }, {
      sessionId: req.sessionID
    });

    openai.chat.completions.create({
      model: result.metadata.model || "gpt-4o-mini",
      temperature: result.metadata.temperature || 0.1,
      max_tokens: result.metadata.maxTokens || 100,
      messages: [{ role: "user", content: result.prompt }]
    }).then(function(completion) {
      var category = completion.choices[0].message.content.trim();
      res.json({
        category: category,
        promptVersion: result.version
      });
    }).catch(function(err) {
      res.status(500).json({ error: "Classification failed" });
    });
  } catch (err) {
    if (err.validationErrors) {
      res.status(400).json({ error: err.message });
    } else {
      res.status(500).json({ error: "Internal error" });
    }
  }
});

Common Issues and Troubleshooting

1. Unresolved Template Variables

Symptom: Your prompt contains literal {{variableName}} text instead of the actual value.

Error: Rendered prompt contains unresolved expressions
Found: {{customerTier}} in output

Cause: The parameter name does not match the template variable. Handlebars silently renders missing variables as empty strings by default, but if you have strict mode enabled or a post-render check, you will catch this.

Fix: Enable strict mode in Handlebars to get early errors:

var template = Handlebars.compile(source, { strict: true });
// Throws: "customerTier" not defined in [object Object]

2. Partial Not Found

Symptom:

Error: The partial safety-guardrails could not be found
    at Object.compile (handlebars/compiler.js:286:19)

Cause: Partials must be registered before compiling templates that reference them. If your partial directory path is wrong or the partial file naming convention does not match, registration silently skips the file.

Fix: Verify the partials directory exists and files end with .partial.hbs. Add logging to your partial loader to confirm each registration.

3. YAML Front Matter Parsing Failure

Symptom:

YAMLException: bad indentation of a mapping entry at line 5, column 3:
      - ticketSubject
      ^

Cause: YAML is whitespace-sensitive. Mixing tabs and spaces in your front matter block or incorrect list indentation causes parse failures.

Fix: Use consistent two-space indentation in all front matter blocks. Run prompt files through a YAML linter as part of your CI pipeline.

4. A/B Test Producing Skewed Distribution

Symptom: You configured 50/50 traffic split, but analytics show 70/30 distribution.

Cause: The hash function distributes unevenly when session IDs share common prefixes (e.g., auto-incrementing user IDs like user-1, user-2, user-3).

Fix: Use a stronger hash function (MD5 or SHA-256) rather than a simple character-code hash. The complete example above uses crypto.createHash("md5") for this reason. Also, ensure you are seeding the hash with both the session ID and the prompt name to avoid correlated assignments across different prompts.

5. Memory Growth from Compiled Template Cache

Symptom: Application memory usage grows steadily over time.

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory

Cause: If you are dynamically generating cache keys (e.g., including user-specific data), the compiled template cache grows unbounded.

Fix: Cache keys should only include prompt name, version, and locale. Never include runtime data in cache keys. If you need to invalidate the cache, implement an LRU eviction strategy or simply restart the process on prompt file changes.

Best Practices

  • Never hardcode prompts in application code. Even a single inline prompt string will multiply. Start with a library from day one of your LLM integration.

  • Version every prompt change. Create a new version file for every meaningful edit. The storage cost is negligible; the ability to roll back is invaluable.

  • Validate parameters before rendering. A prompt rendered with missing context is worse than an error. It produces plausible but wrong LLM output that is hard to detect downstream.

  • Include the prompt version in all API responses and logs. When a user reports a bad response, you need to know exactly which prompt version generated it. This is non-negotiable for debugging.

  • Test prompts in CI. At minimum, verify that every prompt compiles, all required parameters are documented, and sample renderings produce expected output. Run LLM-based quality tests nightly.

  • Keep prompts readable. A prompt template should be understandable by a non-engineer. Avoid deeply nested Handlebars logic. If your template needs complex conditionals, the prompt design probably needs simplification.

  • Separate prompt content from model configuration. The template defines what to say. The metadata defines which model to use, what temperature to set, and how many tokens to allow. Keep them together in the same file but structurally distinct.

  • Measure everything. Track token usage, latency, and success rate per prompt per version. Without metrics, prompt optimization is guesswork.

  • Use composition over duplication. If the same instruction appears in more than two prompts, extract it into a partial. Safety guardrails, output format instructions, and persona definitions are almost always worth extracting.

  • Plan for prompt-model coupling. A prompt optimized for GPT-4o may perform poorly on Claude or Gemini. When you switch models, treat it as a new prompt version and test accordingly.

References

Powered by Contentful