Production

Error Handling for Production AI Systems

Build robust error handling for AI systems with structured errors, graceful degradation, retry strategies, and monitoring in Node.js.

Error Handling for Production AI Systems

Overview

AI systems fail differently than traditional software. A database query either returns rows or it does not. An API call either responds or it times out. But an AI model can return a perfectly valid HTTP 200 response containing output that is completely wrong, subtly biased, or dangerously hallucinated. Building production-grade error handling for AI features means rethinking what "failure" means and designing systems that degrade gracefully across a spectrum of failure modes. This article walks through the architecture, patterns, and concrete Node.js code you need to handle every category of AI error in production.

Prerequisites

  • Working knowledge of Node.js and Express.js
  • Basic familiarity with calling LLM APIs (OpenAI, Anthropic, etc.)
  • Understanding of middleware patterns in Express
  • Experience running Node.js applications in production

Unique Error Characteristics of AI Systems

Before writing a single line of error handling code, you need to understand why AI errors are fundamentally different from traditional software errors.

Non-Deterministic Failures

The same input to the same model can produce different outputs on every call. This means you cannot reliably reproduce failures. A prompt that worked perfectly in testing might produce garbage in production because the model's sampling introduced a different token path. Your error handling must account for the fact that retrying the exact same request might succeed, which is rarely true for traditional API errors.

Partial Results

An AI model might return a response that is 80% correct. A summarization endpoint might produce a good summary but include one fabricated statistic. A classification endpoint might correctly identify 9 out of 10 categories but hallucinate the tenth. Traditional error handling gives you binary outcomes. AI error handling requires you to evaluate quality on a continuum.

Quality Degradation Under Load

When an AI provider is under heavy load, you often do not get a hard failure. Instead, you get slower responses with subtly lower quality. The model might take shortcuts, produce more generic output, or truncate its reasoning. These soft failures are harder to detect than a 500 status code.

Cascading Prompt Failures

If your system chains multiple AI calls together, a subtle error in step one can amplify through subsequent steps. A slightly off entity extraction feeds into a classification step, which feeds into a generation step, and by the end you have confidently wrong output with no obvious error signal.

Error Taxonomy for AI Features

Every AI error you encounter in production falls into one of these categories. Classifying errors correctly determines how you handle them.

API Errors (Retriable)

Standard HTTP failures from the AI provider. These include 429 (rate limit), 500 (server error), 502/503 (service unavailable), and network timeouts. These are the easiest to handle because they look like traditional API errors.

Content Filter Errors (Non-Retriable)

The AI provider rejected your request because the input or output triggered a safety filter. Retrying will produce the same result. You need to either modify the input, use a different model, or fall back to a non-AI path.

Token Limit Errors (Partially Retriable)

Your input exceeded the model's context window, or the output was truncated because it hit the max token limit. Input-side errors are retriable if you can truncate or chunk the input. Output truncation requires either increasing the limit or splitting the task.

Quality Errors (Detected Post-Response)

The model returned a valid response, but the content fails your quality checks. This includes hallucinated data, off-topic responses, responses that ignore instructions, or output that fails structural validation (expected JSON but got prose). These require validation logic beyond HTTP status codes.

Timeout Errors (Retriable with Backoff)

AI inference can be slow, especially for large models or complex prompts. A request that takes 45 seconds is not necessarily failing. It might just be a complex generation. Your timeout thresholds need to be much more generous than traditional API calls, and your retry strategy needs to account for this.

Cost Errors (Non-Retriable)

You have exhausted your spending limit, your billing failed, or a runaway loop is burning through tokens. These require immediate circuit breaking, not retries.

Implementing Structured Error Types

Generic Error objects are not sufficient for AI error handling. You need structured types that carry classification metadata.

var util = require("util");

// Base AI error type
function AIError(message, options) {
  Error.call(this, message);
  this.name = "AIError";
  this.message = message;
  this.code = options.code || "AI_UNKNOWN_ERROR";
  this.retriable = options.retriable !== undefined ? options.retriable : false;
  this.category = options.category || "unknown";
  this.provider = options.provider || "unknown";
  this.model = options.model || "unknown";
  this.promptId = options.promptId || null;
  this.inputTokens = options.inputTokens || 0;
  this.outputTokens = options.outputTokens || 0;
  this.latencyMs = options.latencyMs || 0;
  this.timestamp = new Date().toISOString();
  this.originalError = options.originalError || null;
}
util.inherits(AIError, Error);

// Specific error subtypes
function AIRateLimitError(message, options) {
  options.code = "AI_RATE_LIMIT";
  options.retriable = true;
  options.category = "api";
  AIError.call(this, message, options);
  this.name = "AIRateLimitError";
  this.retryAfterMs = options.retryAfterMs || 60000;
}
util.inherits(AIRateLimitError, AIError);

function AIContentFilterError(message, options) {
  options.code = "AI_CONTENT_FILTER";
  options.retriable = false;
  options.category = "content_filter";
  AIError.call(this, message, options);
  this.name = "AIContentFilterError";
  this.filterType = options.filterType || "unknown";
  this.triggeredBy = options.triggeredBy || "unknown"; // "input" or "output"
}
util.inherits(AIContentFilterError, AIError);

function AITokenLimitError(message, options) {
  options.code = "AI_TOKEN_LIMIT";
  options.retriable = true;
  options.category = "token_limit";
  AIError.call(this, message, options);
  this.name = "AITokenLimitError";
  this.tokenLimit = options.tokenLimit || 0;
  this.tokensUsed = options.tokensUsed || 0;
}
util.inherits(AITokenLimitError, AIError);

function AIQualityError(message, options) {
  options.code = "AI_QUALITY_FAILURE";
  options.retriable = true;
  options.category = "quality";
  AIError.call(this, message, options);
  this.name = "AIQualityError";
  this.qualityScore = options.qualityScore || 0;
  this.qualityThreshold = options.qualityThreshold || 0;
  this.failedChecks = options.failedChecks || [];
}
util.inherits(AIQualityError, AIError);

function AITimeoutError(message, options) {
  options.code = "AI_TIMEOUT";
  options.retriable = true;
  options.category = "timeout";
  AIError.call(this, message, options);
  this.name = "AITimeoutError";
  this.timeoutMs = options.timeoutMs || 0;
}
util.inherits(AITimeoutError, AIError);

module.exports = {
  AIError: AIError,
  AIRateLimitError: AIRateLimitError,
  AIContentFilterError: AIContentFilterError,
  AITokenLimitError: AITokenLimitError,
  AIQualityError: AIQualityError,
  AITimeoutError: AITimeoutError
};

Error Classification From Provider Responses

Raw provider responses need to be translated into your structured error types. Here is a classifier that handles OpenAI and Anthropic response formats.

var errors = require("./ai-errors");

function classifyProviderError(err, context) {
  var status = err.status || err.statusCode || 0;
  var message = err.message || "";
  var errorCode = err.error && err.error.code ? err.error.code : "";
  var errorType = err.error && err.error.type ? err.error.type : "";

  var baseOptions = {
    provider: context.provider,
    model: context.model,
    promptId: context.promptId,
    latencyMs: context.latencyMs || 0,
    originalError: err
  };

  // Rate limiting
  if (status === 429) {
    var retryAfter = err.headers && err.headers["retry-after"]
      ? parseInt(err.headers["retry-after"], 10) * 1000
      : 60000;
    baseOptions.retryAfterMs = retryAfter;
    return new errors.AIRateLimitError(
      "AI provider rate limit exceeded",
      baseOptions
    );
  }

  // Content filter
  if (errorCode === "content_filter" ||
      errorType === "invalid_request_error" && message.indexOf("content") > -1 ||
      status === 400 && message.indexOf("safety") > -1) {
    baseOptions.filterType = errorCode;
    baseOptions.triggeredBy = "input";
    return new errors.AIContentFilterError(
      "Request blocked by content filter: " + message,
      baseOptions
    );
  }

  // Token limit
  if (errorCode === "context_length_exceeded" ||
      message.indexOf("maximum context length") > -1 ||
      message.indexOf("too many tokens") > -1) {
    var match = message.match(/(\d+)\s*tokens/);
    baseOptions.tokensUsed = match ? parseInt(match[1], 10) : 0;
    return new errors.AITokenLimitError(
      "Input exceeds model context window",
      baseOptions
    );
  }

  // Timeout
  if (err.code === "ETIMEDOUT" || err.code === "ESOCKETTIMEDOUT" ||
      message.indexOf("timeout") > -1) {
    baseOptions.timeoutMs = context.timeoutMs || 0;
    return new errors.AITimeoutError(
      "AI request timed out after " + baseOptions.timeoutMs + "ms",
      baseOptions
    );
  }

  // Server errors (retriable)
  if (status >= 500) {
    baseOptions.retriable = true;
    baseOptions.category = "api";
    return new errors.AIError(
      "AI provider server error: " + status,
      baseOptions
    );
  }

  // Default: unknown error
  return new errors.AIError("Unexpected AI error: " + message, baseOptions);
}

module.exports = { classifyProviderError: classifyProviderError };

User-Facing Error Messages

Technical error details should never leak to users. Map every error category to a helpful, non-alarming message.

var USER_MESSAGES = {
  AI_RATE_LIMIT: {
    title: "High Demand",
    message: "This feature is experiencing high demand. Please try again in a moment.",
    suggestion: "Your request has been queued and will be processed shortly."
  },
  AI_CONTENT_FILTER: {
    title: "Unable to Process",
    message: "We were unable to process your request as submitted.",
    suggestion: "Please revise your input and try again. If this persists, contact support."
  },
  AI_TOKEN_LIMIT: {
    title: "Input Too Long",
    message: "Your input exceeds the maximum length we can process.",
    suggestion: "Try shortening your input or breaking it into smaller sections."
  },
  AI_QUALITY_FAILURE: {
    title: "Processing Issue",
    message: "We could not generate a satisfactory result for your request.",
    suggestion: "Please try again. Results may vary with different phrasing."
  },
  AI_TIMEOUT: {
    title: "Request Timed Out",
    message: "Your request took longer than expected to process.",
    suggestion: "Please try again. Complex requests may need a moment."
  },
  AI_UNKNOWN_ERROR: {
    title: "Something Went Wrong",
    message: "We encountered an unexpected issue processing your request.",
    suggestion: "Please try again. If this persists, contact support."
  }
};

function getUserMessage(aiError) {
  var info = USER_MESSAGES[aiError.code] || USER_MESSAGES.AI_UNKNOWN_ERROR;
  return {
    title: info.title,
    message: info.message,
    suggestion: info.suggestion,
    requestId: aiError.promptId,
    retriable: aiError.retriable
  };
}

module.exports = { getUserMessage: getUserMessage };

Retry Strategies by Error Type

Not all errors deserve a retry, and not all retries should use the same strategy. Here is a retry engine that adapts its behavior based on the error classification.

var RETRY_POLICIES = {
  AI_RATE_LIMIT: { maxRetries: 5, baseDelayMs: 2000, strategy: "exponential_with_jitter" },
  AI_TIMEOUT: { maxRetries: 3, baseDelayMs: 1000, strategy: "linear" },
  AI_QUALITY_FAILURE: { maxRetries: 2, baseDelayMs: 100, strategy: "immediate" },
  AI_TOKEN_LIMIT: { maxRetries: 1, baseDelayMs: 0, strategy: "immediate" },
  AI_CONTENT_FILTER: { maxRetries: 0 },
  AI_UNKNOWN_ERROR: { maxRetries: 1, baseDelayMs: 1000, strategy: "linear" }
};

function calculateDelay(policy, attempt) {
  if (policy.strategy === "immediate") return 0;
  if (policy.strategy === "linear") return policy.baseDelayMs * attempt;
  if (policy.strategy === "exponential_with_jitter") {
    var base = policy.baseDelayMs * Math.pow(2, attempt);
    var jitter = Math.random() * base * 0.3;
    return Math.min(base + jitter, 60000);
  }
  return policy.baseDelayMs;
}

function retryWithPolicy(fn, context, callback) {
  var attempt = 0;

  function execute() {
    fn(context, function(err, result) {
      if (!err) return callback(null, result);

      var errorCode = err.code || "AI_UNKNOWN_ERROR";
      var policy = RETRY_POLICIES[errorCode] || RETRY_POLICIES.AI_UNKNOWN_ERROR;

      if (!err.retriable || attempt >= policy.maxRetries) {
        err.totalAttempts = attempt + 1;
        return callback(err);
      }

      attempt++;
      var delay = calculateDelay(policy, attempt);

      // For rate limits, respect the provider's retry-after header
      if (errorCode === "AI_RATE_LIMIT" && err.retryAfterMs) {
        delay = Math.max(delay, err.retryAfterMs);
      }

      // For quality errors, modify the context to encourage different output
      if (errorCode === "AI_QUALITY_FAILURE") {
        context.temperature = Math.min((context.temperature || 0.7) + 0.1, 1.0);
        context.retryHint = "Previous attempt produced unsatisfactory quality. " +
          "Failed checks: " + (err.failedChecks || []).join(", ");
      }

      setTimeout(execute, delay);
    });
  }

  execute();
}

module.exports = { retryWithPolicy: retryWithPolicy };

Graceful Degradation Hierarchies

This is the most important pattern in AI error handling. When your primary AI path fails, you need a chain of fallbacks that provide progressively simpler but still useful results.

function DegradationChain(name) {
  this.name = name;
  this.levels = [];
  this.metrics = {
    levelHits: {},
    totalRequests: 0
  };
}

DegradationChain.prototype.addLevel = function(label, handler) {
  this.levels.push({ label: label, handler: handler });
  this.metrics.levelHits[label] = 0;
  return this;
};

DegradationChain.prototype.execute = function(input, callback) {
  var self = this;
  var levelIndex = 0;
  var errors = [];
  self.metrics.totalRequests++;

  function tryNext() {
    if (levelIndex >= self.levels.length) {
      return callback({
        code: "AI_ALL_LEVELS_EXHAUSTED",
        message: "All degradation levels failed for " + self.name,
        errors: errors
      });
    }

    var level = self.levels[levelIndex];
    level.handler(input, function(err, result) {
      if (!err && result) {
        self.metrics.levelHits[level.label]++;
        result._degradationLevel = level.label;
        result._degradationIndex = levelIndex;
        return callback(null, result);
      }

      errors.push({ level: level.label, error: err });
      levelIndex++;
      tryNext();
    });
  }

  tryNext();
};

// Example: summarization feature with 4 degradation levels
var summarizationChain = new DegradationChain("summarization");

summarizationChain
  .addLevel("full_ai", function(input, cb) {
    // Primary: GPT-4 with detailed prompt
    callOpenAI({
      model: "gpt-4",
      prompt: buildDetailedSummaryPrompt(input.text),
      maxTokens: 500
    }, cb);
  })
  .addLevel("simpler_ai", function(input, cb) {
    // Fallback 1: GPT-3.5 with simpler prompt
    callOpenAI({
      model: "gpt-3.5-turbo",
      prompt: buildSimpleSummaryPrompt(input.text),
      maxTokens: 300
    }, cb);
  })
  .addLevel("cached", function(input, cb) {
    // Fallback 2: check if we have a cached summary
    cache.get("summary:" + input.contentId, function(err, cached) {
      if (cached) return cb(null, { summary: cached, fromCache: true });
      cb(new Error("No cached summary available"));
    });
  })
  .addLevel("static_excerpt", function(input, cb) {
    // Fallback 3: return first 200 characters as excerpt
    var excerpt = input.text.substring(0, 200).trim();
    var lastSpace = excerpt.lastIndexOf(" ");
    if (lastSpace > 150) excerpt = excerpt.substring(0, lastSpace);
    cb(null, { summary: excerpt + "...", isExcerpt: true });
  });

Centralized Error Handler for AI Routes

All your AI-powered Express routes should funnel through a single error handler that logs, classifies, and responds consistently.

var errors = require("./ai-errors");
var classifier = require("./error-classifier");
var userMessages = require("./user-messages");

function aiErrorHandler(err, req, res, next) {
  // Already classified?
  var aiErr = err instanceof errors.AIError
    ? err
    : classifier.classifyProviderError(err, {
        provider: req.aiContext && req.aiContext.provider || "unknown",
        model: req.aiContext && req.aiContext.model || "unknown",
        promptId: req.aiContext && req.aiContext.promptId || req.id,
        latencyMs: req.aiContext && req.aiContext.startTime
          ? Date.now() - req.aiContext.startTime
          : 0
      });

  // Log the full error for debugging
  var logEntry = {
    level: aiErr.retriable ? "warn" : "error",
    type: aiErr.name,
    code: aiErr.code,
    category: aiErr.category,
    provider: aiErr.provider,
    model: aiErr.model,
    promptId: aiErr.promptId,
    inputTokens: aiErr.inputTokens,
    outputTokens: aiErr.outputTokens,
    latencyMs: aiErr.latencyMs,
    requestPath: req.path,
    requestMethod: req.method,
    userId: req.user && req.user.id || "anonymous",
    timestamp: aiErr.timestamp
  };

  if (aiErr.retriable) {
    console.warn("[AI Error - Retriable]", JSON.stringify(logEntry));
  } else {
    console.error("[AI Error - Fatal]", JSON.stringify(logEntry));
  }

  // Track metrics
  if (typeof trackMetric === "function") {
    trackMetric("ai.error", 1, {
      code: aiErr.code,
      provider: aiErr.provider,
      model: aiErr.model,
      route: req.path
    });
  }

  // Send user-friendly response
  var userMsg = userMessages.getUserMessage(aiErr);
  var statusCode = getStatusCode(aiErr);

  res.status(statusCode).json({
    error: userMsg,
    status: statusCode
  });
}

function getStatusCode(aiErr) {
  var codeMap = {
    AI_RATE_LIMIT: 429,
    AI_CONTENT_FILTER: 422,
    AI_TOKEN_LIMIT: 413,
    AI_QUALITY_FAILURE: 502,
    AI_TIMEOUT: 504,
    AI_UNKNOWN_ERROR: 500
  };
  return codeMap[aiErr.code] || 500;
}

module.exports = aiErrorHandler;

To wire this into your Express app:

var express = require("express");
var aiErrorHandler = require("./middleware/ai-error-handler");
var app = express();

// AI-powered routes
app.use("/api/ai", require("./routes/ai"));

// AI-specific error handler (must come after AI routes)
app.use("/api/ai", aiErrorHandler);

// General error handler for everything else
app.use(function(err, req, res, next) {
  console.error(err);
  res.status(500).json({ error: "Internal server error" });
});

Dead Letter Queues for Failed AI Requests

When all retries are exhausted and all degradation levels have failed, you still should not lose the request. A dead letter queue captures failed requests for later analysis and potential reprocessing.

var fs = require("fs");
var path = require("path");

function DeadLetterQueue(options) {
  this.storePath = options.storePath || "./dead-letters";
  this.maxAge = options.maxAgeMs || 7 * 24 * 60 * 60 * 1000; // 7 days
  this.onEnqueue = options.onEnqueue || function() {};

  if (!fs.existsSync(this.storePath)) {
    fs.mkdirSync(this.storePath, { recursive: true });
  }
}

DeadLetterQueue.prototype.enqueue = function(failedRequest) {
  var entry = {
    id: failedRequest.promptId || generateId(),
    enqueuedAt: new Date().toISOString(),
    route: failedRequest.route,
    input: failedRequest.input,
    errors: failedRequest.errors.map(function(e) {
      return {
        code: e.code,
        message: e.message,
        category: e.category,
        provider: e.provider,
        model: e.model,
        timestamp: e.timestamp
      };
    }),
    userId: failedRequest.userId,
    metadata: failedRequest.metadata || {}
  };

  var filePath = path.join(this.storePath, entry.id + ".json");
  fs.writeFileSync(filePath, JSON.stringify(entry, null, 2));

  this.onEnqueue(entry);
  return entry.id;
};

DeadLetterQueue.prototype.reprocess = function(entryId, handler, callback) {
  var filePath = path.join(this.storePath, entryId + ".json");

  try {
    var entry = JSON.parse(fs.readFileSync(filePath, "utf8"));
    handler(entry, function(err, result) {
      if (!err) {
        // Success: remove from DLQ
        fs.unlinkSync(filePath);
        return callback(null, result);
      }
      // Still failing: update the entry
      entry.lastRetry = new Date().toISOString();
      entry.retryCount = (entry.retryCount || 0) + 1;
      fs.writeFileSync(filePath, JSON.stringify(entry, null, 2));
      callback(err);
    });
  } catch (readErr) {
    callback(new Error("DLQ entry not found: " + entryId));
  }
};

function generateId() {
  return Date.now().toString(36) + Math.random().toString(36).substr(2, 8);
}

module.exports = DeadLetterQueue;

Error Rate Monitoring and Automatic Circuit Breaking

If your AI provider is having an outage, you should not keep hammering it with requests. A circuit breaker pattern prevents cascading failures and gives the provider time to recover.

function CircuitBreaker(options) {
  this.name = options.name;
  this.failureThreshold = options.failureThreshold || 5;
  this.resetTimeoutMs = options.resetTimeoutMs || 30000;
  this.monitorWindowMs = options.monitorWindowMs || 60000;
  this.halfOpenMaxRequests = options.halfOpenMaxRequests || 3;

  this.state = "closed"; // closed, open, half-open
  this.failures = [];
  this.halfOpenAttempts = 0;
  this.lastStateChange = Date.now();
  this.onStateChange = options.onStateChange || function() {};
}

CircuitBreaker.prototype.recordFailure = function(error) {
  var now = Date.now();
  this.failures.push({ timestamp: now, error: error.code });

  // Remove failures outside the monitoring window
  this.failures = this.failures.filter(function(f) {
    return now - f.timestamp < this.monitorWindowMs;
  }.bind(this));

  if (this.failures.length >= this.failureThreshold) {
    this._trip();
  }
};

CircuitBreaker.prototype.recordSuccess = function() {
  if (this.state === "half-open") {
    this._close();
  }
  // Decay failures on success
  if (this.failures.length > 0) {
    this.failures.shift();
  }
};

CircuitBreaker.prototype.canRequest = function() {
  if (this.state === "closed") return true;

  if (this.state === "open") {
    if (Date.now() - this.lastStateChange >= this.resetTimeoutMs) {
      this._halfOpen();
      return true;
    }
    return false;
  }

  // half-open: allow limited requests
  if (this.halfOpenAttempts < this.halfOpenMaxRequests) {
    this.halfOpenAttempts++;
    return true;
  }
  return false;
};

CircuitBreaker.prototype._trip = function() {
  this.state = "open";
  this.lastStateChange = Date.now();
  this.onStateChange("open", this.name);
  console.error("[Circuit Breaker] OPEN: " + this.name +
    " (" + this.failures.length + " failures in window)");
};

CircuitBreaker.prototype._halfOpen = function() {
  this.state = "half-open";
  this.halfOpenAttempts = 0;
  this.lastStateChange = Date.now();
  this.onStateChange("half-open", this.name);
  console.warn("[Circuit Breaker] HALF-OPEN: " + this.name);
};

CircuitBreaker.prototype._close = function() {
  this.state = "closed";
  this.failures = [];
  this.halfOpenAttempts = 0;
  this.lastStateChange = Date.now();
  this.onStateChange("closed", this.name);
  console.log("[Circuit Breaker] CLOSED: " + this.name + " (recovered)");
};

module.exports = CircuitBreaker;

Correlating Errors with Prompt Content

Not all errors are random. Some errors correlate strongly with specific input patterns. Tracking this correlation helps you identify and fix problematic inputs proactively.

function ErrorCorrelator() {
  this.patternStats = {};
  this.recentErrors = [];
  this.maxHistory = 1000;
}

ErrorCorrelator.prototype.record = function(input, error) {
  var features = this._extractFeatures(input);
  var entry = {
    timestamp: Date.now(),
    errorCode: error.code,
    features: features
  };

  this.recentErrors.push(entry);
  if (this.recentErrors.length > this.maxHistory) {
    this.recentErrors.shift();
  }

  // Update pattern statistics
  features.forEach(function(feature) {
    if (!this.patternStats[feature]) {
      this.patternStats[feature] = { total: 0, errors: 0, errorCodes: {} };
    }
    this.patternStats[feature].total++;
    this.patternStats[feature].errors++;
    var code = error.code;
    this.patternStats[feature].errorCodes[code] =
      (this.patternStats[feature].errorCodes[code] || 0) + 1;
  }.bind(this));
};

ErrorCorrelator.prototype.recordSuccess = function(input) {
  var features = this._extractFeatures(input);
  features.forEach(function(feature) {
    if (!this.patternStats[feature]) {
      this.patternStats[feature] = { total: 0, errors: 0, errorCodes: {} };
    }
    this.patternStats[feature].total++;
  }.bind(this));
};

ErrorCorrelator.prototype.getProblematicPatterns = function(minSamples, minErrorRate) {
  minSamples = minSamples || 10;
  minErrorRate = minErrorRate || 0.3;
  var problematic = [];

  Object.keys(this.patternStats).forEach(function(feature) {
    var stats = this.patternStats[feature];
    if (stats.total >= minSamples) {
      var errorRate = stats.errors / stats.total;
      if (errorRate >= minErrorRate) {
        problematic.push({
          feature: feature,
          errorRate: Math.round(errorRate * 100) + "%",
          totalRequests: stats.total,
          topErrors: stats.errorCodes
        });
      }
    }
  }.bind(this));

  return problematic.sort(function(a, b) {
    return parseFloat(b.errorRate) - parseFloat(a.errorRate);
  });
};

ErrorCorrelator.prototype._extractFeatures = function(input) {
  var features = [];
  var text = typeof input === "string" ? input : JSON.stringify(input);

  // Length bucket
  if (text.length > 10000) features.push("length:very_long");
  else if (text.length > 3000) features.push("length:long");
  else if (text.length > 500) features.push("length:medium");
  else features.push("length:short");

  // Language hints
  if (/[^\x00-\x7F]/.test(text)) features.push("charset:non_ascii");

  // Content patterns
  if (/<[^>]+>/.test(text)) features.push("content:html");
  if (/```/.test(text)) features.push("content:code_blocks");
  if (/https?:\/\//.test(text)) features.push("content:urls");

  return features;
};

module.exports = ErrorCorrelator;

Error Budgets for AI Features

Unlike traditional services where you target 99.99% uptime, AI features often operate with lower reliability expectations. Define explicit error budgets that acknowledge AI's inherent unpredictability.

function ErrorBudget(options) {
  this.featureName = options.featureName;
  this.windowMs = options.windowMs || 3600000; // 1 hour default
  this.maxErrorRate = options.maxErrorRate || 0.05; // 5% default
  this.minSamples = options.minSamples || 50;

  this.window = [];
  this.alerts = [];
  this.onBudgetExceeded = options.onBudgetExceeded || function() {};
}

ErrorBudget.prototype.record = function(success) {
  var now = Date.now();
  this.window.push({ timestamp: now, success: success });

  // Prune old entries
  this.window = this.window.filter(function(entry) {
    return now - entry.timestamp < this.windowMs;
  }.bind(this));

  // Check budget
  if (this.window.length >= this.minSamples) {
    var failures = this.window.filter(function(e) { return !e.success; }).length;
    var errorRate = failures / this.window.length;

    if (errorRate > this.maxErrorRate) {
      this.onBudgetExceeded({
        feature: this.featureName,
        errorRate: errorRate,
        budget: this.maxErrorRate,
        sampleSize: this.window.length,
        windowMs: this.windowMs
      });
    }
  }
};

ErrorBudget.prototype.getStatus = function() {
  var now = Date.now();
  var active = this.window.filter(function(e) {
    return now - e.timestamp < this.windowMs;
  }.bind(this));

  var failures = active.filter(function(e) { return !e.success; }).length;
  var errorRate = active.length > 0 ? failures / active.length : 0;
  var budgetRemaining = Math.max(0, this.maxErrorRate - errorRate);

  return {
    feature: this.featureName,
    currentErrorRate: Math.round(errorRate * 1000) / 10 + "%",
    budgetLimit: this.maxErrorRate * 100 + "%",
    budgetRemaining: Math.round(budgetRemaining * 1000) / 10 + "%",
    sampleSize: active.length,
    healthy: errorRate <= this.maxErrorRate
  };
};

module.exports = ErrorBudget;

Error Recovery Patterns

Partial Result Recovery

When a streaming response fails partway through, you may still have usable output. Do not discard partial results.

function PartialResultRecovery(options) {
  this.minUsableLength = options.minUsableLength || 100;
  this.validators = options.validators || [];
}

PartialResultRecovery.prototype.recover = function(partialOutput, error) {
  if (!partialOutput || partialOutput.length < this.minUsableLength) {
    return { recovered: false, reason: "Output too short to be usable" };
  }

  // Try to find a clean break point
  var cleanOutput = this._findCleanBreak(partialOutput);

  // Run validators on partial output
  var validationResults = this.validators.map(function(validator) {
    return validator(cleanOutput);
  });

  var allValid = validationResults.every(function(r) { return r.valid; });

  if (allValid) {
    return {
      recovered: true,
      output: cleanOutput,
      isPartial: true,
      originalLength: partialOutput.length,
      recoveredLength: cleanOutput.length,
      truncatedAt: error.timestamp
    };
  }

  return {
    recovered: false,
    reason: "Partial output failed validation",
    failedChecks: validationResults.filter(function(r) { return !r.valid; })
  };
};

PartialResultRecovery.prototype._findCleanBreak = function(text) {
  // Try to break at paragraph boundary
  var lastParagraph = text.lastIndexOf("\n\n");
  if (lastParagraph > this.minUsableLength) {
    return text.substring(0, lastParagraph).trim();
  }

  // Try sentence boundary
  var lastSentence = Math.max(
    text.lastIndexOf(". "),
    text.lastIndexOf("! "),
    text.lastIndexOf("? ")
  );
  if (lastSentence > this.minUsableLength) {
    return text.substring(0, lastSentence + 1).trim();
  }

  return text.trim();
};

module.exports = PartialResultRecovery;

Streaming Error Handler

For Server-Sent Events (SSE) streaming responses, errors mid-stream require special handling because you have already started sending data to the client.

function createStreamErrorHandler(res) {
  var headersSent = false;

  return {
    startStream: function() {
      res.writeHead(200, {
        "Content-Type": "text/event-stream",
        "Cache-Control": "no-cache",
        "Connection": "keep-alive"
      });
      headersSent = true;
    },

    sendChunk: function(data) {
      res.write("data: " + JSON.stringify({ type: "chunk", content: data }) + "\n\n");
    },

    sendError: function(error) {
      if (!headersSent) {
        // Error before streaming started: send normal error response
        var statusCode = getStatusCode(error);
        res.status(statusCode).json({ error: getUserMessage(error) });
        return;
      }

      // Error mid-stream: send error event
      res.write("data: " + JSON.stringify({
        type: "error",
        code: error.code,
        message: getUserMessage(error).message,
        retriable: error.retriable,
        partialContent: true
      }) + "\n\n");

      res.write("data: [DONE]\n\n");
      res.end();
    },

    endStream: function() {
      res.write("data: [DONE]\n\n");
      res.end();
    }
  };
}

User Feedback Integration for Quality Errors

Automated quality checks catch structural problems, but users catch semantic issues that code cannot. Build feedback loops that feed directly into your error tracking.

// Express route for user feedback on AI output
function setupFeedbackRoute(app, errorCorrelator) {
  app.post("/api/ai/feedback", function(req, res) {
    var feedback = {
      requestId: req.body.requestId,
      rating: req.body.rating, // 1-5
      issue: req.body.issue, // "inaccurate", "incomplete", "irrelevant", "offensive"
      details: req.body.details,
      userId: req.user && req.user.id || "anonymous",
      timestamp: new Date().toISOString()
    };

    // If user reports a quality issue, create a synthetic quality error
    if (feedback.rating <= 2) {
      var qualityError = new AIQualityError(
        "User-reported quality issue: " + feedback.issue,
        {
          provider: "user_feedback",
          model: "n/a",
          promptId: feedback.requestId,
          qualityScore: feedback.rating / 5,
          qualityThreshold: 0.6,
          failedChecks: [feedback.issue]
        }
      );

      // Feed into error correlator to find patterns
      errorCorrelator.record(
        { requestId: feedback.requestId, issue: feedback.issue },
        qualityError
      );
    }

    // Store feedback for analysis
    storeFeedback(feedback, function(err) {
      if (err) return res.status(500).json({ error: "Failed to store feedback" });
      res.json({ received: true });
    });
  });
}

Post-Mortem Analysis for AI Incidents

AI incidents are different from traditional outages. An AI incident might be "the model was confidently wrong for 4 hours and nobody noticed." Your post-mortem template should reflect this.

When an AI-specific incident occurs, your post-mortem should cover:

  1. Detection method -- How was the issue found? Automated monitoring, user complaint, or manual review? If users found it first, your monitoring has a gap.
  2. Blast radius -- How many users received degraded output? Unlike a hard outage, quality degradation may have affected users without them realizing it.
  3. Root cause category -- Was this a model regression, a prompt drift issue, a data quality problem, or a provider outage?
  4. Quality impact assessment -- Review a sample of outputs from the incident window. What percentage were actually wrong? How wrong?
  5. Detection latency -- How long between the first bad output and the moment you were alerted? This gap is your primary improvement target.
  6. Monitoring gaps -- What check would have caught this sooner? Add it.

Keep an incident log in structured format so you can analyze patterns over time:

var incidentSchema = {
  id: "AI-INC-001",
  detectedAt: "2026-01-15T14:30:00Z",
  resolvedAt: "2026-01-15T16:45:00Z",
  detectionMethod: "user_complaint", // automated | user_complaint | manual_review
  affectedFeature: "article_summarization",
  affectedUsers: 342,
  rootCause: "prompt_drift",
  qualitySampleSize: 50,
  qualityDefectRate: 0.34,
  detectionLatencyMinutes: 127,
  monitoringGapIdentified: "No automated check for summary factual consistency",
  actionItems: [
    "Add factual consistency validator to summarization pipeline",
    "Reduce error budget window from 1 hour to 15 minutes",
    "Add alert for sudden increase in short summaries"
  ]
};

Complete Working Example

Here is a complete Express.js middleware stack that ties together everything from this article into a production-ready AI error handling framework.

var express = require("express");
var crypto = require("crypto");

// ============================================================
// AI Error Handling Framework - Complete Implementation
// ============================================================

// --- Structured Error Types ---
var util = require("util");

function AIError(message, opts) {
  Error.call(this, message);
  this.name = "AIError";
  this.message = message;
  this.code = opts.code || "AI_UNKNOWN_ERROR";
  this.retriable = opts.retriable || false;
  this.category = opts.category || "unknown";
  this.provider = opts.provider || "unknown";
  this.model = opts.model || "unknown";
  this.promptId = opts.promptId || null;
  this.latencyMs = opts.latencyMs || 0;
  this.timestamp = new Date().toISOString();
  this.originalError = opts.originalError || null;
}
util.inherits(AIError, Error);

// --- Circuit Breaker ---
function CircuitBreaker(name, threshold, resetMs) {
  this.name = name;
  this.threshold = threshold || 5;
  this.resetMs = resetMs || 30000;
  this.state = "closed";
  this.failureCount = 0;
  this.lastFailure = 0;
}

CircuitBreaker.prototype.check = function() {
  if (this.state === "open") {
    if (Date.now() - this.lastFailure > this.resetMs) {
      this.state = "half-open";
      return true;
    }
    return false;
  }
  return true;
};

CircuitBreaker.prototype.success = function() {
  this.failureCount = 0;
  this.state = "closed";
};

CircuitBreaker.prototype.failure = function() {
  this.failureCount++;
  this.lastFailure = Date.now();
  if (this.failureCount >= this.threshold) {
    this.state = "open";
    console.error("[CIRCUIT OPEN] " + this.name +
      " after " + this.failureCount + " failures");
  }
};

// --- Degradation Chain ---
function DegradationChain(levels) {
  this.levels = levels;
}

DegradationChain.prototype.execute = function(input, callback) {
  var index = 0;
  var levels = this.levels;

  function tryNext() {
    if (index >= levels.length) {
      return callback(new Error("All degradation levels exhausted"));
    }
    var level = levels[index];
    level.handler(input, function(err, result) {
      if (!err && result) {
        result._level = level.name;
        return callback(null, result);
      }
      index++;
      tryNext();
    });
  }

  tryNext();
};

// --- Retry Engine ---
function retryCall(fn, input, maxRetries, baseDelay, callback) {
  var attempt = 0;

  function run() {
    fn(input, function(err, result) {
      if (!err) return callback(null, result);
      if (!err.retriable || attempt >= maxRetries) return callback(err);

      attempt++;
      var delay = baseDelay * Math.pow(2, attempt - 1);
      delay += Math.random() * delay * 0.2; // jitter
      setTimeout(run, delay);
    });
  }

  run();
}

// --- Framework Middleware ---
function aiContext() {
  return function(req, res, next) {
    req.aiContext = {
      requestId: crypto.randomBytes(8).toString("hex"),
      startTime: Date.now(),
      provider: null,
      model: null
    };
    next();
  };
}

function aiCircuitGuard(breaker) {
  return function(req, res, next) {
    if (!breaker.check()) {
      return res.status(503).json({
        error: {
          title: "Service Temporarily Unavailable",
          message: "This AI feature is temporarily unavailable. Please try again shortly.",
          retriable: true
        }
      });
    }
    req.aiCircuitBreaker = breaker;
    next();
  };
}

function aiErrorResponder(err, req, res, next) {
  if (req.aiCircuitBreaker) {
    req.aiCircuitBreaker.failure();
  }

  var userMessages = {
    AI_RATE_LIMIT: { status: 429, title: "High Demand", message: "Please try again in a moment." },
    AI_CONTENT_FILTER: { status: 422, title: "Unable to Process", message: "Please revise your input." },
    AI_TOKEN_LIMIT: { status: 413, title: "Input Too Long", message: "Please shorten your input." },
    AI_QUALITY_FAILURE: { status: 502, title: "Processing Issue", message: "Please try again." },
    AI_TIMEOUT: { status: 504, title: "Timed Out", message: "Please try again shortly." }
  };

  var code = err.code || "AI_UNKNOWN_ERROR";
  var info = userMessages[code] || { status: 500, title: "Error", message: "Something went wrong." };

  console.error("[AI Error]", JSON.stringify({
    requestId: req.aiContext && req.aiContext.requestId,
    code: code,
    provider: err.provider,
    model: err.model,
    latencyMs: req.aiContext ? Date.now() - req.aiContext.startTime : 0,
    path: req.path,
    timestamp: new Date().toISOString()
  }));

  res.status(info.status).json({
    error: {
      title: info.title,
      message: info.message,
      requestId: req.aiContext && req.aiContext.requestId,
      retriable: err.retriable || false
    }
  });
}

// --- Usage Example ---
var app = express();
var openaiBreaker = new CircuitBreaker("openai", 5, 30000);

app.use("/api/ai", aiContext());
app.use("/api/ai", aiCircuitGuard(openaiBreaker));

app.post("/api/ai/summarize", function(req, res, next) {
  var text = req.body.text;

  var chain = new DegradationChain([
    {
      name: "gpt4_full",
      handler: function(input, cb) {
        retryCall(callGPT4, input, 2, 1000, cb);
      }
    },
    {
      name: "gpt35_simple",
      handler: function(input, cb) {
        retryCall(callGPT35, input, 1, 500, cb);
      }
    },
    {
      name: "extractive",
      handler: function(input, cb) {
        // Non-AI fallback: return first two sentences
        var sentences = input.text.match(/[^.!?]+[.!?]+/g) || [];
        var excerpt = sentences.slice(0, 2).join(" ").trim();
        if (excerpt) return cb(null, { summary: excerpt, isExcerpt: true });
        cb(new Error("Could not extract sentences"));
      }
    }
  ]);

  chain.execute({ text: text }, function(err, result) {
    if (err) return next(err);

    if (req.aiCircuitBreaker) req.aiCircuitBreaker.success();

    res.json({
      summary: result.summary,
      degradationLevel: result._level,
      requestId: req.aiContext.requestId
    });
  });
});

app.use("/api/ai", aiErrorResponder);

app.listen(8080, function() {
  console.log("Server running on port 8080");
});

Common Issues and Troubleshooting

1. Rate Limit Errors Persist After Backoff

Error: 429 Too Many Requests
Headers: { "retry-after": "60", "x-ratelimit-remaining": "0" }

This happens when your retry logic respects the delay but you have multiple server instances all retrying simultaneously. The fix is to add randomized jitter to your backoff delay and, if running multiple instances, use a shared rate limiter (like Redis-based bottleneck) so all instances share a single request budget.

2. Context Length Exceeded with Dynamic Prompts

Error: This model's maximum context length is 128000 tokens.
However, your messages resulted in 131,847 tokens.

When you build prompts dynamically by injecting user content, it is easy to exceed the context window. Always calculate token counts before sending. Use tiktoken (or the equivalent for your provider) to count tokens and truncate input proactively rather than catching this error after the fact.

3. Streaming Response Fails Mid-Stream with Incomplete JSON

Error: SyntaxError: Unexpected end of JSON input
Partial data received: '{"summary": "The article discusses three main approach'

The connection dropped during a streaming response, leaving you with truncated JSON. Use the PartialResultRecovery pattern from above. If the output is structured (like JSON), you cannot simply truncate. Instead, validate the partial output, and if it is not parseable, fall back to the next degradation level. For JSON specifically, consider streaming into a buffer and only parsing once you receive the complete response or an explicit end marker.

4. Content Filter Triggers on Legitimate Medical/Legal Content

Error: { "code": "content_filter", "message": "Output filtered due to content policy" }

AI provider content filters can be overly aggressive with legitimate content in medical, legal, or security domains. This is not retriable with the same input. Solutions: use the provider's content filter configuration options (OpenAI's moderation endpoint as a pre-check, or Anthropic's system prompt guidance), rephrase the prompt to provide clinical context, or maintain a separate model deployment with adjusted safety settings for your verified use case.

5. Silent Quality Degradation During Provider Incidents

No error is thrown, but output quality drops noticeably. Response times increase from 2 seconds to 15 seconds, and generated text becomes more generic and repetitive.

This is the hardest failure mode to detect. Implement output quality scoring (response length distribution, vocabulary diversity, structural compliance checks) and alert when metrics deviate from baseline. A sudden increase in average response time combined with shorter outputs is a strong signal of degraded inference quality even without hard errors.

Best Practices

  • Classify before you handle. Every error should be assigned a structured type before any retry or fallback logic runs. Treating all errors the same leads to retrying non-retriable errors and giving up on retriable ones.

  • Set separate timeouts for AI calls. AI inference is orders of magnitude slower than traditional API calls. A 3-second timeout that is reasonable for a REST API will cause constant false timeouts on AI endpoints. Use 30-60 second timeouts for AI calls and make them configurable per model and per use case.

  • Always have a non-AI fallback. Every AI feature in your application should have a degradation path that requires zero AI inference. Cached results, static excerpts, rule-based alternatives -- anything that keeps the feature functional when the AI layer is completely down.

  • Log prompts with errors, but redact PII. When an AI call fails, you need the prompt content to debug it. But prompts often contain user data. Build a redaction pipeline that strips PII before logging, and store full prompts in a separate, access-controlled audit log with a short retention period.

  • Budget for AI errors explicitly. Define an acceptable error rate for each AI feature and monitor against it. A summarization feature might tolerate 3% failures. A content moderation feature might tolerate 0.1%. These numbers drive your retry, circuit breaker, and alerting configurations.

  • Treat quality failures as errors. If your AI returns a valid HTTP response but the content is wrong, that is an error. Build validation into your response pipeline: check for expected structure, minimum output length, absence of known hallucination patterns, and semantic relevance to the input.

  • Monitor degradation level distribution. If your degradation chain is hitting level 3 (cached fallback) on 40% of requests, something is wrong even if users are not seeing errors. Track which degradation level serves each request and alert on distribution shifts.

  • Never expose raw AI errors to users. AI provider error messages contain model names, token counts, and internal details that confuse users and leak implementation details. Always map to user-friendly messages with actionable suggestions.

  • Test your error paths actively. Use chaos engineering principles: randomly inject failures into your AI call layer in staging and verify that degradation, retries, circuit breakers, and dead letter queues all work correctly. Error handling code that only runs during real outages is error handling code you have never actually tested.

References

Powered by Contentful