Agents

Human-in-the-Loop Patterns for AI Agents

Implement human oversight for AI agents with approval gates, confidence routing, escalation, and audit trails in Node.js.

Human-in-the-Loop Patterns for AI Agents

AI agents that operate without guardrails are a liability. Human-in-the-loop (HITL) patterns give you the architecture to let agents act autonomously on routine decisions while escalating high-stakes or low-confidence actions to a human reviewer. This article walks through the core patterns for building supervised AI agent systems in Node.js, from approval gates and confidence routing to escalation chains, audit trails, and progressive autonomy.

Prerequisites

  • Working knowledge of Node.js and Express.js
  • Familiarity with WebSockets (the ws library)
  • Basic understanding of AI agents and LLM tool-calling
  • Node.js v18+ installed
  • A running MongoDB instance (for audit logging and approval persistence)

Why Human Oversight Matters for Agent Safety

The moment you let an AI agent take real-world actions — sending emails, modifying databases, calling external APIs, spending money — you introduce a class of failure modes that no amount of prompt engineering eliminates. LLMs hallucinate. They misinterpret ambiguous instructions. They confidently execute the wrong action. And unlike a bug in traditional software, the failure mode is non-deterministic: the same input can produce different (and sometimes catastrophic) outputs.

Human-in-the-loop is not about distrusting your agent. It is about building a system where the blast radius of any single agent mistake is bounded. You want your agent to handle the 90% of routine work autonomously while a human catches the 10% that could go sideways.

The key insight is that HITL is not binary. It is a spectrum. On one end, every action requires approval. On the other, the agent runs fully autonomously. The patterns in this article let you place your system anywhere on that spectrum — and move along it dynamically based on confidence, risk, and the agent's track record.

Approval Gates

An approval gate pauses agent execution at a decision point and waits for explicit human approval before proceeding. This is the most fundamental HITL pattern.

The architecture is straightforward: when the agent decides to take an action, it serializes the proposed action into an approval request, stores it in a queue, notifies the human reviewer, and suspends execution until a response comes back.

var EventEmitter = require("events");

function ApprovalGate() {
  this.emitter = new EventEmitter();
  this.pendingApprovals = new Map();
  this.timeoutMs = 300000; // 5 minutes default
}

ApprovalGate.prototype.requestApproval = function (action) {
  var self = this;
  var approvalId = "apr_" + Date.now() + "_" + Math.random().toString(36).substr(2, 9);

  var request = {
    id: approvalId,
    action: action.type,
    parameters: action.parameters,
    reasoning: action.reasoning,
    confidence: action.confidence,
    timestamp: new Date().toISOString(),
    status: "pending"
  };

  return new Promise(function (resolve, reject) {
    self.pendingApprovals.set(approvalId, {
      request: request,
      resolve: resolve,
      reject: reject
    });

    self.emitter.emit("approval:requested", request);

    // Set timeout for unresponsive humans
    var timer = setTimeout(function () {
      var pending = self.pendingApprovals.get(approvalId);
      if (pending && pending.request.status === "pending") {
        pending.request.status = "timed_out";
        self.pendingApprovals.delete(approvalId);
        reject(new Error("Approval timed out after " + self.timeoutMs + "ms: " + approvalId));
      }
    }, self.timeoutMs);

    // Store timer reference for cleanup
    self.pendingApprovals.get(approvalId).timer = timer;
  });
};

ApprovalGate.prototype.submitDecision = function (approvalId, decision) {
  var pending = this.pendingApprovals.get(approvalId);
  if (!pending) {
    throw new Error("No pending approval found: " + approvalId);
  }

  clearTimeout(pending.timer);
  pending.request.status = decision.approved ? "approved" : "rejected";
  pending.request.reviewedBy = decision.reviewerId;
  pending.request.reviewedAt = new Date().toISOString();
  pending.request.feedback = decision.feedback || null;

  this.pendingApprovals.delete(approvalId);
  this.emitter.emit("approval:decided", pending.request);

  if (decision.approved) {
    pending.resolve({ approved: true, feedback: decision.feedback });
  } else {
    pending.resolve({ approved: false, reason: decision.reason, feedback: decision.feedback });
  }
};

The critical detail here is that the agent's reasoning is included in the approval request. A human cannot make a good approval decision without understanding why the agent wants to take this action.

Confidence-Based Routing

Not every action needs human review. Confidence-based routing lets the agent auto-execute high-confidence decisions and only escalate low-confidence ones.

function ConfidenceRouter(options) {
  this.autoApproveThreshold = options.autoApproveThreshold || 0.9;
  this.autoRejectThreshold = options.autoRejectThreshold || 0.2;
  this.approvalGate = options.approvalGate;
  this.auditLog = options.auditLog;
}

ConfidenceRouter.prototype.route = function (action) {
  var self = this;

  if (action.confidence >= self.autoApproveThreshold) {
    // High confidence — auto-approve but still log
    self.auditLog.record({
      action: action,
      decision: "auto_approved",
      reason: "Confidence " + action.confidence + " >= threshold " + self.autoApproveThreshold,
      timestamp: new Date().toISOString()
    });
    return Promise.resolve({ approved: true, method: "auto" });
  }

  if (action.confidence <= self.autoRejectThreshold) {
    // Very low confidence — auto-reject
    self.auditLog.record({
      action: action,
      decision: "auto_rejected",
      reason: "Confidence " + action.confidence + " <= threshold " + self.autoRejectThreshold,
      timestamp: new Date().toISOString()
    });
    return Promise.resolve({ approved: false, method: "auto", reason: "Confidence too low" });
  }

  // Middle ground — needs human review
  return self.approvalGate.requestApproval(action).then(function (result) {
    result.method = "human";
    return result;
  });
};

I typically set the auto-approve threshold conservatively at first — around 0.95 — and lower it gradually as the system proves reliable. The auto-reject threshold catches obvious garbage: malformed actions, hallucinated tool calls, or actions with no rational basis.

One thing to watch out for: confidence scores from LLMs are not well-calibrated probabilities. An LLM saying it is 90% confident does not mean it is right 90% of the time. You need to calibrate your thresholds empirically against your specific use case.

Implementing Approval Workflows with Express.js and WebSockets

The approval queue needs both a REST API for managing approvals and WebSocket connections for real-time notifications. Here is how to wire them together.

var express = require("express");
var http = require("http");
var WebSocket = require("ws");
var router = express.Router();

function ApprovalServer(approvalGate) {
  this.approvalGate = approvalGate;
  this.wsClients = new Set();

  var self = this;

  // Broadcast new approval requests to all connected reviewers
  approvalGate.emitter.on("approval:requested", function (request) {
    self.broadcast({
      type: "new_approval",
      data: request
    });
  });

  approvalGate.emitter.on("approval:decided", function (request) {
    self.broadcast({
      type: "approval_decided",
      data: request
    });
  });
}

ApprovalServer.prototype.broadcast = function (message) {
  var payload = JSON.stringify(message);
  this.wsClients.forEach(function (client) {
    if (client.readyState === WebSocket.OPEN) {
      client.send(payload);
    }
  });
};

ApprovalServer.prototype.setupWebSocket = function (server) {
  var self = this;
  var wss = new WebSocket.Server({ server: server, path: "/ws/approvals" });

  wss.on("connection", function (ws) {
    self.wsClients.add(ws);
    console.log("Reviewer connected. Active reviewers: " + self.wsClients.size);

    // Send all pending approvals to newly connected reviewer
    var pending = Array.from(self.approvalGate.pendingApprovals.values()).map(function (entry) {
      return entry.request;
    });
    ws.send(JSON.stringify({ type: "pending_approvals", data: pending }));

    ws.on("close", function () {
      self.wsClients.delete(ws);
      console.log("Reviewer disconnected. Active reviewers: " + self.wsClients.size);
    });
  });
};

ApprovalServer.prototype.getRouter = function () {
  var self = this;

  router.get("/approvals", function (req, res) {
    var pending = Array.from(self.approvalGate.pendingApprovals.values()).map(function (entry) {
      return entry.request;
    });
    res.json({ approvals: pending, count: pending.length });
  });

  router.post("/approvals/:id/decide", function (req, res) {
    var approvalId = req.params.id;
    var decision = {
      approved: req.body.approved === true,
      reviewerId: req.body.reviewerId || "anonymous",
      reason: req.body.reason || "",
      feedback: req.body.feedback || ""
    };

    try {
      self.approvalGate.submitDecision(approvalId, decision);
      res.json({ success: true, approvalId: approvalId, decision: decision.approved ? "approved" : "rejected" });
    } catch (err) {
      res.status(404).json({ error: err.message });
    }
  });

  // Batch approval endpoint
  router.post("/approvals/batch", function (req, res) {
    var ids = req.body.approvalIds || [];
    var decision = {
      approved: req.body.approved === true,
      reviewerId: req.body.reviewerId || "anonymous",
      reason: req.body.reason || "Batch decision",
      feedback: req.body.feedback || ""
    };

    var results = [];
    ids.forEach(function (id) {
      try {
        self.approvalGate.submitDecision(id, decision);
        results.push({ id: id, success: true });
      } catch (err) {
        results.push({ id: id, success: false, error: err.message });
      }
    });

    res.json({ results: results });
  });

  return router;
};

The batch approval endpoint is essential for production systems. When an agent generates 50 similar email drafts overnight, the reviewer should not have to click "approve" 50 times individually.

Designing Approval UIs That Show Agent Reasoning

A common mistake is building approval UIs that only show the proposed action. The reviewer needs to see the agent's full reasoning chain to make an informed decision. Here is a minimal but effective data structure for approval display.

function formatApprovalForDisplay(request) {
  return {
    id: request.id,
    summary: request.action + ": " + summarizeParams(request.parameters),
    reasoning: {
      steps: request.reasoning.steps || [],
      confidence: request.confidence,
      confidenceExplanation: request.reasoning.confidenceExplanation || "",
      alternativesConsidered: request.reasoning.alternatives || [],
      riskAssessment: request.reasoning.risk || "not assessed"
    },
    context: {
      triggeredBy: request.context.trigger,
      conversationHistory: request.context.recentMessages || [],
      relevantData: request.context.sourceData || {}
    },
    impact: {
      reversible: request.parameters.reversible !== false,
      affectedSystems: request.parameters.affectedSystems || [],
      estimatedCost: request.parameters.cost || null
    },
    timestamp: request.timestamp,
    expiresAt: request.expiresAt
  };
}

function summarizeParams(params) {
  var keys = Object.keys(params);
  if (keys.length === 0) return "(no parameters)";
  var parts = keys.slice(0, 3).map(function (key) {
    var val = params[key];
    if (typeof val === "string" && val.length > 50) {
      return key + '="' + val.substring(0, 50) + '..."';
    }
    return key + "=" + JSON.stringify(val);
  });
  if (keys.length > 3) {
    parts.push("+" + (keys.length - 3) + " more");
  }
  return parts.join(", ");
}

The key fields here are alternativesConsidered and riskAssessment. When a reviewer can see that the agent considered three options and picked the one with the lowest risk, approval becomes a 2-second decision. When the reasoning is opaque, the reviewer has to reconstruct the agent's logic in their head — which defeats the purpose of having an agent in the first place.

Escalation Patterns

Not all humans are equal in the escalation chain. A tier-1 reviewer might handle routine approvals, while a senior engineer handles anything involving production databases or financial transactions.

function EscalationChain(tiers) {
  this.tiers = tiers; // Array of { name, reviewers, maxRisk }
}

EscalationChain.prototype.getReviewerTier = function (action) {
  var riskLevel = this.assessRisk(action);

  for (var i = 0; i < this.tiers.length; i++) {
    if (riskLevel <= this.tiers[i].maxRisk) {
      return this.tiers[i];
    }
  }

  // Highest tier for anything that exceeds all thresholds
  return this.tiers[this.tiers.length - 1];
};

EscalationChain.prototype.assessRisk = function (action) {
  var riskScores = {
    "read_data": 1,
    "send_notification": 3,
    "update_record": 5,
    "delete_record": 8,
    "financial_transaction": 9,
    "modify_production": 10
  };
  return riskScores[action.type] || 5;
};

// Example configuration
var escalation = new EscalationChain([
  {
    name: "auto",
    reviewers: [],
    maxRisk: 2
  },
  {
    name: "tier1",
    reviewers: ["[email protected]"],
    maxRisk: 5
  },
  {
    name: "tier2",
    reviewers: ["[email protected]"],
    maxRisk: 8
  },
  {
    name: "tier3",
    reviewers: ["[email protected]", "[email protected]"],
    maxRisk: 10
  }
]);

A multi-tier escalation structure also handles the common problem of reviewer fatigue. If tier-1 reviewers are seeing 200 approvals a day, most of those should probably be auto-approved. The escalation chain forces you to think about which actions genuinely need human eyes.

Feedback Loops

Human corrections are the highest-quality training signal you can get. Every time a reviewer rejects an action or modifies the agent's output, that feedback should flow back into the system.

function FeedbackCollector(db) {
  this.db = db;
  this.collection = db.collection("agent_feedback");
}

FeedbackCollector.prototype.recordFeedback = function (approvalRequest, decision, correction) {
  var feedback = {
    approvalId: approvalRequest.id,
    agentAction: approvalRequest.action,
    agentParameters: approvalRequest.parameters,
    agentReasoning: approvalRequest.reasoning,
    agentConfidence: approvalRequest.confidence,
    humanDecision: decision.approved ? "approved" : "rejected",
    humanCorrection: correction || null,
    humanReason: decision.reason || null,
    reviewerId: decision.reviewerId,
    timestamp: new Date()
  };

  return this.collection.insertOne(feedback);
};

FeedbackCollector.prototype.getAccuracyStats = function (actionType, days) {
  var cutoff = new Date();
  cutoff.setDate(cutoff.getDate() - (days || 30));

  return this.collection.aggregate([
    {
      $match: {
        agentAction: actionType,
        timestamp: { $gte: cutoff }
      }
    },
    {
      $group: {
        _id: null,
        total: { $sum: 1 },
        approved: {
          $sum: { $cond: [{ $eq: ["$humanDecision", "approved"] }, 1, 0] }
        },
        avgConfidence: { $avg: "$agentConfidence" },
        corrected: {
          $sum: { $cond: [{ $ne: ["$humanCorrection", null] }, 1, 0] }
        }
      }
    }
  ]).toArray().then(function (results) {
    if (results.length === 0) return null;
    var stats = results[0];
    stats.approvalRate = stats.total > 0 ? stats.approved / stats.total : 0;
    stats.correctionRate = stats.total > 0 ? stats.corrected / stats.total : 0;
    return stats;
  });
};

Over time, this feedback data enables progressive autonomy: if the agent's approval rate for a specific action type hits 98% over the last 500 decisions, you can safely raise the auto-approve threshold for that action type.

Implementing Undo and Rollback

Any action the agent takes should be reversible. This means designing your action handlers to capture enough state for a rollback.

function ActionExecutor(db) {
  this.db = db;
  this.actionLog = db.collection("action_log");
}

ActionExecutor.prototype.execute = function (action) {
  var self = this;

  // Capture pre-action state
  return self.captureState(action).then(function (preState) {
    var logEntry = {
      actionId: "act_" + Date.now() + "_" + Math.random().toString(36).substr(2, 6),
      action: action,
      preState: preState,
      executedAt: new Date(),
      rolledBack: false
    };

    // Execute the action
    return self.performAction(action).then(function (result) {
      logEntry.result = result;
      logEntry.postState = result.newState || null;
      logEntry.success = true;

      return self.actionLog.insertOne(logEntry).then(function () {
        return { actionId: logEntry.actionId, result: result };
      });
    }).catch(function (err) {
      logEntry.success = false;
      logEntry.error = err.message;
      return self.actionLog.insertOne(logEntry).then(function () {
        throw err;
      });
    });
  });
};

ActionExecutor.prototype.rollback = function (actionId) {
  var self = this;

  return self.actionLog.findOne({ actionId: actionId }).then(function (logEntry) {
    if (!logEntry) {
      throw new Error("Action not found: " + actionId);
    }
    if (logEntry.rolledBack) {
      throw new Error("Action already rolled back: " + actionId);
    }
    if (!logEntry.preState) {
      throw new Error("No pre-state captured, cannot rollback: " + actionId);
    }

    return self.restoreState(logEntry.action, logEntry.preState).then(function () {
      return self.actionLog.updateOne(
        { actionId: actionId },
        {
          $set: {
            rolledBack: true,
            rolledBackAt: new Date()
          }
        }
      );
    });
  });
};

ActionExecutor.prototype.captureState = function (action) {
  // Override per action type. Example for a database record update:
  if (action.type === "update_record") {
    return this.db.collection(action.parameters.collection)
      .findOne({ _id: action.parameters.recordId });
  }
  return Promise.resolve(null);
};

ActionExecutor.prototype.restoreState = function (action, preState) {
  if (action.type === "update_record" && preState) {
    return this.db.collection(action.parameters.collection)
      .replaceOne({ _id: preState._id }, preState);
  }
  return Promise.resolve();
};

ActionExecutor.prototype.performAction = function (action) {
  // Dispatch to actual action handlers
  throw new Error("performAction must be overridden for action type: " + action.type);
};

Not every action is reversible (you cannot unsend an email), but the rollback pattern forces you to categorize actions by reversibility — which directly feeds into your confidence routing thresholds. Irreversible actions should have much higher approval requirements.

Audit Trails

Every agent decision must be logged. Not just for compliance, but because debugging agent behavior without an audit trail is like debugging a distributed system without logs — technically possible, but miserable.

function AuditLog(db) {
  this.collection = db.collection("agent_audit_log");
}

AuditLog.prototype.record = function (entry) {
  var record = {
    timestamp: new Date(),
    sessionId: entry.sessionId || null,
    agentId: entry.agentId || "default",
    action: entry.action,
    decision: entry.decision,
    reason: entry.reason || "",
    confidence: entry.action ? entry.action.confidence : null,
    reviewedBy: entry.reviewedBy || null,
    metadata: entry.metadata || {}
  };

  return this.collection.insertOne(record);
};

AuditLog.prototype.getSessionHistory = function (sessionId) {
  return this.collection
    .find({ sessionId: sessionId })
    .sort({ timestamp: 1 })
    .toArray();
};

AuditLog.prototype.getDecisionStats = function (startDate, endDate) {
  return this.collection.aggregate([
    {
      $match: {
        timestamp: { $gte: startDate, $lte: endDate }
      }
    },
    {
      $group: {
        _id: "$decision",
        count: { $sum: 1 },
        avgConfidence: { $avg: "$confidence" }
      }
    }
  ]).toArray();
};

A practical tip: index your audit log on sessionId and timestamp. When a customer reports that an agent did something unexpected, you will need to reconstruct the entire session timeline quickly.

Progressive Autonomy

Progressive autonomy means an agent earns more independence over time based on its track record. This is not a vague principle — it is a concrete algorithm.

function AutonomyManager(feedbackCollector, confidenceRouter) {
  this.feedbackCollector = feedbackCollector;
  this.confidenceRouter = confidenceRouter;
  this.minSampleSize = 50;
  this.targetApprovalRate = 0.97;
  this.adjustmentStep = 0.02;
  this.floorThreshold = 0.7;
  this.ceilingThreshold = 0.98;
}

AutonomyManager.prototype.evaluateAndAdjust = function (actionType) {
  var self = this;

  return self.feedbackCollector.getAccuracyStats(actionType, 30).then(function (stats) {
    if (!stats || stats.total < self.minSampleSize) {
      console.log("Insufficient data for " + actionType + " (" + (stats ? stats.total : 0) + " samples). No adjustment.");
      return null;
    }

    var currentThreshold = self.confidenceRouter.autoApproveThreshold;
    var adjustment = null;

    if (stats.approvalRate >= self.targetApprovalRate && stats.correctionRate < 0.03) {
      // Agent is performing well — increase autonomy
      var newThreshold = Math.max(currentThreshold - self.adjustmentStep, self.floorThreshold);
      if (newThreshold !== currentThreshold) {
        self.confidenceRouter.autoApproveThreshold = newThreshold;
        adjustment = {
          direction: "increased_autonomy",
          oldThreshold: currentThreshold,
          newThreshold: newThreshold,
          basedOn: stats
        };
      }
    } else if (stats.approvalRate < 0.85 || stats.correctionRate > 0.15) {
      // Agent is underperforming — decrease autonomy
      var newThreshold = Math.min(currentThreshold + self.adjustmentStep, self.ceilingThreshold);
      if (newThreshold !== currentThreshold) {
        self.confidenceRouter.autoApproveThreshold = newThreshold;
        adjustment = {
          direction: "decreased_autonomy",
          oldThreshold: currentThreshold,
          newThreshold: newThreshold,
          basedOn: stats
        };
      }
    }

    if (adjustment) {
      console.log("Autonomy adjustment for " + actionType + ": " + adjustment.direction +
        " (threshold: " + adjustment.oldThreshold + " -> " + adjustment.newThreshold + ")");
    }

    return adjustment;
  });
};

The floor and ceiling thresholds are important safety rails. Even a perfect track record should not let the auto-approve threshold drop below 0.7 — you always want the agent to be somewhat confident before acting on its own. And the ceiling prevents a few bad decisions from completely locking up the system.

Handling Timeouts When Humans Don't Respond

Humans are unreliable. They go to lunch. They go on vacation. They forget to check the approval dashboard. Your system needs a timeout strategy.

function TimeoutHandler(approvalGate, options) {
  this.approvalGate = approvalGate;
  this.defaultTimeoutMs = options.defaultTimeoutMs || 300000;
  this.timeoutAction = options.timeoutAction || "reject"; // "reject", "approve", "escalate"
  this.escalationChain = options.escalationChain || null;
}

TimeoutHandler.prototype.handleTimeout = function (approvalId) {
  var self = this;
  var pending = self.approvalGate.pendingApprovals.get(approvalId);
  if (!pending) return;

  switch (self.timeoutAction) {
    case "approve":
      // Only safe for low-risk, reversible actions
      console.warn("Timeout auto-approving: " + approvalId);
      self.approvalGate.submitDecision(approvalId, {
        approved: true,
        reviewerId: "system:timeout",
        reason: "Auto-approved due to reviewer timeout"
      });
      break;

    case "escalate":
      // Move up the escalation chain
      if (self.escalationChain) {
        console.warn("Timeout escalating: " + approvalId);
        var nextTier = self.escalationChain.getNextTier(pending.request.currentTier);
        if (nextTier) {
          pending.request.currentTier = nextTier.name;
          self.notifyTier(nextTier, pending.request);
          return; // Don't resolve — keep waiting
        }
      }
      // Falls through to reject if no more tiers
      // falls through

    case "reject":
    default:
      console.warn("Timeout rejecting: " + approvalId);
      self.approvalGate.submitDecision(approvalId, {
        approved: false,
        reviewerId: "system:timeout",
        reason: "Rejected due to reviewer timeout after " + self.defaultTimeoutMs + "ms"
      });
  }
};

TimeoutHandler.prototype.notifyTier = function (tier, request) {
  // Send notifications via email, Slack, PagerDuty, etc.
  console.log("Escalating approval " + request.id + " to " + tier.name + ": " + tier.reviewers.join(", "));
};

My strong recommendation: default to "reject" on timeout. Auto-approving timed-out actions is almost always a bad idea unless you have specifically marked certain low-risk action types as safe for timeout approval. The "escalate" strategy is the best middle ground for production systems — try the next reviewer before giving up.

Notification Systems for Pending Approvals

Approvals are useless if the reviewer never sees them. Build a multi-channel notification system.

function NotificationDispatcher(channels) {
  this.channels = channels || {};
}

NotificationDispatcher.prototype.notify = function (request, tier) {
  var self = this;
  var promises = [];

  // WebSocket — real-time dashboard
  if (self.channels.websocket) {
    self.channels.websocket.broadcast({
      type: "new_approval",
      data: request,
      tier: tier.name,
      priority: self.getPriority(request)
    });
  }

  // Email — fallback for offline reviewers
  if (self.channels.email && tier.reviewers.length > 0) {
    var emailPromise = self.channels.email.send({
      to: tier.reviewers,
      subject: "[Agent Approval] " + request.action + " — " + self.getPriority(request) + " priority",
      body: self.formatEmailBody(request)
    });
    promises.push(emailPromise);
  }

  // Slack — immediate visibility
  if (self.channels.slack && tier.slackChannel) {
    var slackPromise = self.channels.slack.postMessage({
      channel: tier.slackChannel,
      text: "New agent approval pending: *" + request.action + "*",
      blocks: self.formatSlackBlocks(request)
    });
    promises.push(slackPromise);
  }

  return Promise.all(promises);
};

NotificationDispatcher.prototype.getPriority = function (request) {
  if (request.confidence < 0.4) return "high";
  if (request.confidence < 0.7) return "medium";
  return "low";
};

NotificationDispatcher.prototype.formatEmailBody = function (request) {
  return "Agent wants to execute: " + request.action + "\n\n" +
    "Confidence: " + (request.confidence * 100).toFixed(1) + "%\n" +
    "Reasoning: " + (request.reasoning.steps || []).join(" -> ") + "\n\n" +
    "Approve or reject at: https://your-app.com/approvals/" + request.id;
};

Complete Working Example

Here is a complete Node.js agent system that ties all the patterns together into a working server.

var express = require("express");
var http = require("http");
var WebSocket = require("ws");
var MongoClient = require("mongodb").MongoClient;

var app = express();
app.use(express.json());

var DB_URL = process.env.DB_MONGO || "mongodb://localhost:27017";
var DB_NAME = process.env.DATABASE || "agent_hitl";
var PORT = process.env.PORT || 3000;

MongoClient.connect(DB_URL, { useUnifiedTopology: true }).then(function (client) {
  var db = client.db(DB_NAME);

  // Initialize components
  var approvalGate = new ApprovalGate();
  approvalGate.timeoutMs = 600000; // 10 minutes

  var auditLog = new AuditLog(db);
  var feedbackCollector = new FeedbackCollector(db);

  var confidenceRouter = new ConfidenceRouter({
    autoApproveThreshold: 0.92,
    autoRejectThreshold: 0.15,
    approvalGate: approvalGate,
    auditLog: auditLog
  });

  var actionExecutor = new ActionExecutor(db);
  var approvalServer = new ApprovalServer(approvalGate);
  var autonomyManager = new AutonomyManager(feedbackCollector, confidenceRouter);

  // Mount routes
  app.use("/api", approvalServer.getRouter());

  // Agent execution endpoint
  app.post("/api/agent/execute", function (req, res) {
    var action = req.body.action;
    if (!action || !action.type) {
      return res.status(400).json({ error: "Missing action.type" });
    }

    confidenceRouter.route(action).then(function (decision) {
      if (!decision.approved) {
        return res.json({
          executed: false,
          reason: decision.reason || "Rejected",
          method: decision.method
        });
      }

      return actionExecutor.execute(action).then(function (result) {
        res.json({
          executed: true,
          actionId: result.actionId,
          result: result.result,
          method: decision.method
        });
      });
    }).catch(function (err) {
      res.status(500).json({ error: err.message });
    });
  });

  // Rollback endpoint
  app.post("/api/agent/rollback/:actionId", function (req, res) {
    actionExecutor.rollback(req.params.actionId).then(function () {
      res.json({ success: true, message: "Action rolled back" });
    }).catch(function (err) {
      res.status(400).json({ error: err.message });
    });
  });

  // Autonomy adjustment endpoint (run periodically via cron)
  app.post("/api/agent/adjust-autonomy", function (req, res) {
    var actionTypes = req.body.actionTypes || ["update_record", "send_notification", "delete_record"];
    var adjustments = [];

    var chain = Promise.resolve();
    actionTypes.forEach(function (actionType) {
      chain = chain.then(function () {
        return autonomyManager.evaluateAndAdjust(actionType).then(function (adjustment) {
          if (adjustment) adjustments.push({ actionType: actionType, adjustment: adjustment });
        });
      });
    });

    chain.then(function () {
      res.json({
        currentThreshold: confidenceRouter.autoApproveThreshold,
        adjustments: adjustments
      });
    }).catch(function (err) {
      res.status(500).json({ error: err.message });
    });
  });

  // Audit log endpoint
  app.get("/api/audit/:sessionId", function (req, res) {
    auditLog.getSessionHistory(req.params.sessionId).then(function (history) {
      res.json({ history: history });
    }).catch(function (err) {
      res.status(500).json({ error: err.message });
    });
  });

  // Create HTTP server and attach WebSocket
  var server = http.createServer(app);
  approvalServer.setupWebSocket(server);

  server.listen(PORT, function () {
    console.log("Agent HITL server running on port " + PORT);
    console.log("WebSocket endpoint: ws://localhost:" + PORT + "/ws/approvals");
    console.log("Auto-approve threshold: " + confidenceRouter.autoApproveThreshold);
  });
}).catch(function (err) {
  console.error("Failed to connect to MongoDB:", err.message);
  process.exit(1);
});

To test the system, start the server and send an approval request:

# Start the server
node server.js

# Submit a high-confidence action (auto-approved)
curl -X POST http://localhost:3000/api/agent/execute \
  -H "Content-Type: application/json" \
  -d '{"action": {"type": "read_data", "parameters": {"collection": "users", "query": {"active": true}}, "reasoning": {"steps": ["User requested active user count", "Query is read-only and safe"]}, "confidence": 0.95}}'

# Submit a low-confidence action (requires human approval)
curl -X POST http://localhost:3000/api/agent/execute \
  -H "Content-Type: application/json" \
  -d '{"action": {"type": "delete_record", "parameters": {"collection": "users", "recordId": "abc123"}, "reasoning": {"steps": ["User asked to remove inactive account", "Account has recent activity - uncertain"]}, "confidence": 0.55}}'

# Approve a pending request
curl -X POST http://localhost:3000/api/approvals/apr_1234567890_abc123/decide \
  -H "Content-Type: application/json" \
  -d '{"approved": true, "reviewerId": "shane", "feedback": "Looks correct"}'

Common Issues and Troubleshooting

1. Approval requests silently disappearing

Error: No pending approval found: apr_1707234567890_x7k2m

This happens when the timeout fires before the human responds. The approval is removed from the in-memory pendingApprovals map. Check your timeout values — 5 minutes is too short for most production workflows. Increase timeoutMs to at least 30 minutes, or implement the escalation-on-timeout pattern instead of outright rejection.

2. WebSocket connections dropping under load

WebSocket connection to 'ws://localhost:3000/ws/approvals' failed: Connection closed before receiving a handshake response

The default ws library does not handle backpressure well. If you are broadcasting to many connected clients simultaneously, add a send queue with drain handling. Also ensure your reverse proxy (nginx, HAProxy) has WebSocket upgrade properly configured with proxy_set_header Upgrade $http_upgrade.

3. MongoDB write conflicts on rapid batch approvals

MongoError: E11000 duplicate key error collection: agent_hitl.agent_audit_log index: _id_ dup key

When processing batch approvals, multiple audit log writes can collide if you are using timestamp-based IDs. Switch to MongoDB's ObjectId() for the _id field, or use a proper UUID generator like uuid for your custom IDs. Also wrap batch operations in bulkWrite instead of individual insertOne calls for better performance.

4. Agent hangs indefinitely waiting for approval in production

UnhandledPromiseRejection: Error: Approval timed out after 300000ms: apr_1707234567890_q9r1p

The approval promise rejects on timeout, but if your calling code does not have a .catch() handler, Node.js treats it as an unhandled rejection. Always wrap requestApproval calls in try/catch (in async functions) or chain .catch() on the promise. Additionally, set process.on("unhandledRejection", handler) as a safety net in your main entry point to log these rather than crashing.

5. Confidence scores clustering around 0.5, making routing useless

This is not a runtime error but a design issue. If your LLM returns confidence 0.5 for everything, your routing effectively sends everything to human review. The fix is to provide explicit confidence calibration instructions in your agent's system prompt: "Rate your confidence from 0 to 1 where 0.9+ means you are certain this is correct and have verified all parameters, 0.5-0.9 means likely correct but some ambiguity exists, and below 0.5 means you are guessing." Additionally, include few-shot examples of correct confidence ratings in your prompt.

Best Practices

  • Default to deny. If the system is uncertain whether an action needs approval, require approval. False negatives (missed approvals) are far more expensive than false positives (unnecessary approvals).

  • Always log the full reasoning chain. Store the agent's step-by-step reasoning, not just the final action. When something goes wrong six months from now, the audit trail should reconstruct exactly what the agent was thinking.

  • Make approval UIs show diffs, not just actions. For database updates, show the before/after state. For emails, show the full draft. Reviewers should never have to guess what the agent is about to do.

  • Set different thresholds per action type. A "read data" action and a "delete production database" action should not share the same confidence routing threshold. Maintain a per-action-type configuration map.

  • Implement circuit breakers. If an agent's approval rejection rate spikes above 30% in a short window, automatically pause the agent and alert the team. Something has changed — either the agent's behavior degraded or the task requirements shifted.

  • Build for graceful degradation. If the approval service is down, the agent should queue actions for later review rather than either failing completely or bypassing approval. Use a persistent queue (Redis, SQS, or MongoDB) rather than in-memory storage for production systems.

  • Never auto-approve irreversible actions. Sending emails, posting to social media, initiating financial transactions, deleting records without soft-delete — these should always require human approval regardless of confidence score.

  • Design for batch review from day one. An agent that generates 100 actions overnight needs a batch review interface, not 100 individual approval popups. Group similar actions and let reviewers approve or reject in bulk.

  • Run autonomy adjustments on a schedule, not in real-time. Changing thresholds after every single decision creates instability. Evaluate and adjust weekly or after every N decisions (e.g., every 100), whichever comes first.

References

Powered by Contentful