Agents

Building Autonomous Agents with Node.js

A hands-on guide to building autonomous AI agents in Node.js, covering the ReAct loop, tool registries, LLM integration, memory management, error recovery, and human-in-the-loop patterns.

Building Autonomous Agents with Node.js

Overview

An autonomous agent is a program that takes a high-level goal, breaks it down into steps, executes those steps using tools, and iterates until the goal is achieved -- all without human intervention at each step. Unlike a simple chatbot that answers one question at a time, an agent maintains state across a multi-turn reasoning loop and can take real actions in the world: reading files, querying databases, calling APIs, and writing output. This guide walks through building a production-quality agent framework in Node.js from scratch, covering architecture, LLM integration, tool design, memory management, cost controls, and the safety patterns you need before letting an agent run unsupervised.

Prerequisites

  • Node.js v18+ installed (LTS recommended)
  • npm for package management
  • An API key for Anthropic (Claude) or OpenAI -- examples show both
  • Basic familiarity with async/await and REST APIs in Node.js
  • Understanding of how LLM chat completion APIs work (messages, roles, tool definitions)

Install the dependencies we will use throughout this article:

npm install @anthropic-ai/sdk openai tiktoken readline

What Makes an Agent Different from a Chatbot

A chatbot takes a message, generates a response, and is done. An agent operates in a loop. The difference is fundamental -- it is the difference between a calculator and a person working through a problem with a calculator, a notepad, and a web browser.

The pattern most agents follow is called ReAct (Reason + Act), introduced by Yao et al. in 2022. The loop has three phases:

  1. Reason -- The LLM examines the current state (goal, conversation history, tool results) and decides what to do next.
  2. Act -- The agent executes the chosen action (a tool call, or a final response to the user).
  3. Observe -- The tool's output is captured and fed back into the conversation as context for the next reasoning step.

This loop repeats until the LLM decides the goal has been achieved and emits a final answer instead of another tool call.

User Goal
    |
    v
+------------------+
| REASON           |  <-- LLM decides next step
| (LLM thinks)     |
+------------------+
    |
    v
+------------------+
| ACT              |  <-- Execute tool / return answer
| (Tool execution) |
+------------------+
    |
    v
+------------------+
| OBSERVE          |  <-- Capture result, add to context
| (Read output)    |
+------------------+
    |
    +-------> Loop back to REASON

The key architectural insight: the LLM is the decision-maker, not the executor. Your code handles execution. The LLM never runs shell commands directly -- it emits structured tool calls, and your runtime decides whether and how to execute them.


Designing a Tool Registry

Tools are the agent's hands. A well-designed tool registry makes your agent extensible without touching the core loop. Each tool needs three things: a name, a JSON Schema describing its parameters, and an execution function.

var toolRegistry = {};

function registerTool(name, description, parameters, executeFn) {
  toolRegistry[name] = {
    name: name,
    description: description,
    parameters: parameters,
    execute: executeFn
  };
}

Here is how you register a few practical tools:

var fs = require("fs");
var https = require("https");

// Tool: Read a file from disk
registerTool(
  "read_file",
  "Read the contents of a file at the given path. Returns the file content as a string.",
  {
    type: "object",
    properties: {
      path: { type: "string", description: "Absolute or relative file path to read" }
    },
    required: ["path"]
  },
  function(params) {
    return fs.readFileSync(params.path, "utf-8");
  }
);

// Tool: Write content to a file
registerTool(
  "write_file",
  "Write content to a file at the given path. Creates the file if it does not exist, overwrites if it does.",
  {
    type: "object",
    properties: {
      path: { type: "string", description: "File path to write to" },
      content: { type: "string", description: "Content to write" }
    },
    required: ["path", "content"]
  },
  function(params) {
    fs.writeFileSync(params.path, params.content, "utf-8");
    return "File written successfully: " + params.path;
  }
);

// Tool: Run a web search (simplified via API)
registerTool(
  "web_search",
  "Search the web for information. Returns a summary of top results.",
  {
    type: "object",
    properties: {
      query: { type: "string", description: "Search query" }
    },
    required: ["query"]
  },
  function(params) {
    // In production, call a real search API (Brave, SerpAPI, Tavily, etc.)
    return performWebSearch(params.query);
  }
);

// Tool: Execute a database query
registerTool(
  "query_database",
  "Execute a read-only SQL query against the application database. Returns rows as JSON.",
  {
    type: "object",
    properties: {
      sql: { type: "string", description: "SQL SELECT query to execute" },
      params: {
        type: "array",
        items: { type: "string" },
        description: "Parameterized query values"
      }
    },
    required: ["sql"]
  },
  function(params) {
    // Validate it is a SELECT query -- never let an agent run arbitrary SQL
    if (!/^\s*SELECT/i.test(params.sql)) {
      throw new Error("Only SELECT queries are allowed");
    }
    return executeQuery(params.sql, params.params || []);
  }
);

Converting the Registry to LLM Tool Definitions

Both Claude and OpenAI expect tool definitions in a specific format. Here is how to convert the registry:

function getToolDefinitionsForClaude() {
  return Object.values(toolRegistry).map(function(tool) {
    return {
      name: tool.name,
      description: tool.description,
      input_schema: tool.parameters
    };
  });
}

function getToolDefinitionsForOpenAI() {
  return Object.values(toolRegistry).map(function(tool) {
    return {
      type: "function",
      function: {
        name: tool.name,
        description: tool.description,
        parameters: tool.parameters
      }
    };
  });
}

Implementing the Core Agent Loop

This is the heart of the agent. The loop sends messages to the LLM, checks if the response contains tool calls, executes those tools, appends the results, and loops again.

var Anthropic = require("@anthropic-ai/sdk");

var client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

function createAgent(options) {
  var systemPrompt = options.systemPrompt || "You are a helpful assistant with access to tools. Use them to accomplish the user's goal. When the goal is fully achieved, respond with your final answer.";
  var maxIterations = options.maxIterations || 15;
  var model = options.model || "claude-sonnet-4-20250514";

  var messages = [];
  var iterationCount = 0;
  var totalInputTokens = 0;
  var totalOutputTokens = 0;

  function run(userGoal) {
    messages.push({ role: "user", content: userGoal });

    return agentLoop();
  }

  function agentLoop() {
    if (iterationCount >= maxIterations) {
      return {
        success: false,
        error: "Max iterations reached (" + maxIterations + ")",
        messages: messages,
        usage: { inputTokens: totalInputTokens, outputTokens: totalOutputTokens }
      };
    }

    iterationCount++;
    console.log("\n--- Iteration " + iterationCount + " ---");

    var response = client.messages.create({
      model: model,
      max_tokens: 4096,
      system: systemPrompt,
      tools: getToolDefinitionsForClaude(),
      messages: messages
    });

    return response.then(function(result) {
      totalInputTokens += result.usage.input_tokens;
      totalOutputTokens += result.usage.output_tokens;

      // Check stop reason
      if (result.stop_reason === "end_turn") {
        // Agent has decided it is done
        var finalText = extractText(result.content);
        console.log("\nAgent finished: " + finalText.substring(0, 200) + "...");
        return {
          success: true,
          result: finalText,
          iterations: iterationCount,
          messages: messages,
          usage: { inputTokens: totalInputTokens, outputTokens: totalOutputTokens }
        };
      }

      if (result.stop_reason === "tool_use") {
        // Agent wants to use tools
        messages.push({ role: "assistant", content: result.content });

        var toolResults = processToolCalls(result.content);
        messages.push({ role: "user", content: toolResults });

        return agentLoop();
      }

      // Unexpected stop reason
      return {
        success: false,
        error: "Unexpected stop_reason: " + result.stop_reason,
        messages: messages,
        usage: { inputTokens: totalInputTokens, outputTokens: totalOutputTokens }
      };
    });
  }

  return { run: run };
}

Processing Tool Calls

When the LLM returns tool_use blocks, you need to execute each one and format the results:

function processToolCalls(contentBlocks) {
  var results = [];

  contentBlocks.forEach(function(block) {
    if (block.type !== "tool_use") return;

    var toolName = block.name;
    var toolInput = block.input;
    var toolUseId = block.id;

    console.log("  Tool call: " + toolName + "(" + JSON.stringify(toolInput).substring(0, 100) + ")");

    var tool = toolRegistry[toolName];
    if (!tool) {
      results.push({
        type: "tool_result",
        tool_use_id: toolUseId,
        content: "Error: Unknown tool '" + toolName + "'"
      });
      return;
    }

    try {
      var output = tool.execute(toolInput);
      var outputStr = typeof output === "string" ? output : JSON.stringify(output, null, 2);

      // Truncate very long outputs to save tokens
      if (outputStr.length > 10000) {
        outputStr = outputStr.substring(0, 10000) + "\n... [truncated, " + outputStr.length + " chars total]";
      }

      console.log("  Result: " + outputStr.substring(0, 150) + (outputStr.length > 150 ? "..." : ""));

      results.push({
        type: "tool_result",
        tool_use_id: toolUseId,
        content: outputStr
      });
    } catch (err) {
      console.log("  Error: " + err.message);
      results.push({
        type: "tool_result",
        tool_use_id: toolUseId,
        content: "Error executing " + toolName + ": " + err.message,
        is_error: true
      });
    }
  });

  return results;
}

function extractText(contentBlocks) {
  return contentBlocks
    .filter(function(b) { return b.type === "text"; })
    .map(function(b) { return b.text; })
    .join("\n");
}

Connecting to LLM APIs: Claude and OpenAI

The core loop above uses Claude. Here is the equivalent for OpenAI, so you can swap providers:

var OpenAI = require("openai");

var openaiClient = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

function agentLoopOpenAI(messages, systemPrompt, iterationCount, maxIterations, usage) {
  if (iterationCount >= maxIterations) {
    return Promise.resolve({ success: false, error: "Max iterations reached" });
  }

  var allMessages = [{ role: "system", content: systemPrompt }].concat(messages);

  return openaiClient.chat.completions.create({
    model: "gpt-4.1",
    messages: allMessages,
    tools: getToolDefinitionsForOpenAI(),
    tool_choice: "auto"
  }).then(function(response) {
    var choice = response.choices[0];
    usage.inputTokens += response.usage.prompt_tokens;
    usage.outputTokens += response.usage.completion_tokens;

    if (choice.finish_reason === "stop") {
      return { success: true, result: choice.message.content };
    }

    if (choice.finish_reason === "tool_calls") {
      messages.push(choice.message);

      var toolPromises = choice.message.tool_calls.map(function(tc) {
        var tool = toolRegistry[tc.function.name];
        var args = JSON.parse(tc.function.arguments);
        try {
          var result = tool.execute(args);
          return {
            role: "tool",
            tool_call_id: tc.id,
            content: typeof result === "string" ? result : JSON.stringify(result)
          };
        } catch (err) {
          return {
            role: "tool",
            tool_call_id: tc.id,
            content: "Error: " + err.message
          };
        }
      });

      toolPromises.forEach(function(r) { messages.push(r); });
      return agentLoopOpenAI(messages, systemPrompt, iterationCount + 1, maxIterations, usage);
    }
  });
}

The key difference: Claude uses stop_reason: "tool_use" with content blocks, while OpenAI uses finish_reason: "tool_calls" with a tool_calls array on the message. The tool result format also differs -- Claude expects tool_result content blocks in a user message, while OpenAI expects separate messages with role: "tool". Abstract this behind your agent interface and you can swap providers freely.


Building Practical Tools

The tools you build determine what your agent can actually accomplish. Here are production-ready implementations for common capabilities:

HTTP API Calls

var http = require("http");
var https = require("https");
var url = require("url");

registerTool(
  "http_request",
  "Make an HTTP GET or POST request to a URL. Returns the response body.",
  {
    type: "object",
    properties: {
      url: { type: "string", description: "Full URL to request" },
      method: { type: "string", enum: ["GET", "POST"], description: "HTTP method" },
      body: { type: "string", description: "Request body for POST requests" },
      headers: {
        type: "object",
        additionalProperties: { type: "string" },
        description: "Request headers"
      }
    },
    required: ["url", "method"]
  },
  function(params) {
    return new Promise(function(resolve, reject) {
      var parsed = new URL(params.url);
      var transport = parsed.protocol === "https:" ? https : http;

      var reqOptions = {
        hostname: parsed.hostname,
        port: parsed.port,
        path: parsed.pathname + parsed.search,
        method: params.method,
        headers: params.headers || {}
      };

      var req = transport.request(reqOptions, function(res) {
        var chunks = [];
        res.on("data", function(chunk) { chunks.push(chunk); });
        res.on("end", function() {
          var body = Buffer.concat(chunks).toString("utf-8");
          if (body.length > 8000) {
            body = body.substring(0, 8000) + "\n[truncated]";
          }
          resolve("HTTP " + res.statusCode + "\n" + body);
        });
      });

      req.on("error", function(err) { reject(err); });

      if (params.body) {
        req.write(params.body);
      }
      req.end();
    });
  }
);

Shell Command Execution (with Guardrails)

var childProcess = require("child_process");

var ALLOWED_COMMANDS = ["ls", "cat", "head", "tail", "wc", "grep", "find", "echo", "date", "curl"];

registerTool(
  "run_command",
  "Execute a shell command. Only safe, read-only commands are allowed.",
  {
    type: "object",
    properties: {
      command: { type: "string", description: "Shell command to execute" }
    },
    required: ["command"]
  },
  function(params) {
    var cmd = params.command.trim();
    var baseCmd = cmd.split(/\s+/)[0];

    if (ALLOWED_COMMANDS.indexOf(baseCmd) === -1) {
      throw new Error("Command not in allowlist: " + baseCmd + ". Allowed: " + ALLOWED_COMMANDS.join(", "));
    }

    // Block dangerous patterns even within allowed commands
    if (/[;&|`$]/.test(cmd) && baseCmd !== "grep") {
      throw new Error("Shell operators (;, &, |, `, $) are not allowed for safety");
    }

    var result = childProcess.execSync(cmd, {
      timeout: 10000,
      maxBuffer: 1024 * 1024,
      encoding: "utf-8"
    });

    return result;
  }
);

Notice the allowlist pattern. Never give an agent unrestricted shell access. Even with an allowlist, you should block shell operators that could be used for injection. Defense in depth.


Managing Conversation Context and Memory

Every iteration adds messages to the conversation. On a complex task, you can easily hit 50,000+ tokens of context. You need a strategy.

Sliding Window with Summary

The simplest effective approach: when the message history exceeds a threshold, summarize older messages and replace them:

var tiktoken = require("tiktoken");
var encoder = tiktoken.encoding_for_model("gpt-4o");

function countMessageTokens(messages) {
  var total = 0;
  messages.forEach(function(msg) {
    var content = typeof msg.content === "string"
      ? msg.content
      : JSON.stringify(msg.content);
    total += encoder.encode(content).length;
  });
  return total;
}

function compactHistory(messages, maxTokens, client) {
  var tokenCount = countMessageTokens(messages);

  if (tokenCount <= maxTokens) {
    return Promise.resolve(messages);
  }

  // Keep the first message (original goal) and last 6 messages
  var toSummarize = messages.slice(1, -6);
  var summaryContent = toSummarize.map(function(m) {
    var content = typeof m.content === "string" ? m.content : JSON.stringify(m.content);
    return m.role + ": " + content.substring(0, 500);
  }).join("\n");

  return client.messages.create({
    model: "claude-haiku-3.5-20241022",
    max_tokens: 500,
    messages: [{
      role: "user",
      content: "Summarize this agent conversation history concisely. Focus on what was accomplished, what was learned, and what remains to be done:\n\n" + summaryContent
    }]
  }).then(function(result) {
    var summary = result.content[0].text;
    var compacted = [
      messages[0],
      { role: "user", content: "[Summary of previous steps: " + summary + "]" }
    ].concat(messages.slice(-6));

    console.log("  Context compacted: " + tokenCount + " -> " + countMessageTokens(compacted) + " tokens");
    return compacted;
  });
}

Notice I use Haiku for the summarization step -- it is fast and cheap, and summarization does not require the strongest model. This is a practical form of model tiering within the agent itself.


Token Budget Management

An unbounded agent loop can drain your API budget fast. You need hard limits:

function createBudgetTracker(options) {
  var maxInputTokens = options.maxInputTokens || 100000;
  var maxOutputTokens = options.maxOutputTokens || 20000;
  var maxCostUSD = options.maxCostUSD || 1.00;
  var inputPricePer1M = options.inputPricePer1M || 3.00;   // Claude Sonnet
  var outputPricePer1M = options.outputPricePer1M || 15.00;

  var totalInput = 0;
  var totalOutput = 0;

  function recordUsage(inputTokens, outputTokens) {
    totalInput += inputTokens;
    totalOutput += outputTokens;
  }

  function getCurrentCost() {
    return (totalInput / 1000000 * inputPricePer1M)
         + (totalOutput / 1000000 * outputPricePer1M);
  }

  function checkBudget() {
    var cost = getCurrentCost();
    if (totalInput > maxInputTokens) {
      return { ok: false, reason: "Input token limit exceeded: " + totalInput + "/" + maxInputTokens };
    }
    if (totalOutput > maxOutputTokens) {
      return { ok: false, reason: "Output token limit exceeded: " + totalOutput + "/" + maxOutputTokens };
    }
    if (cost > maxCostUSD) {
      return { ok: false, reason: "Cost limit exceeded: $" + cost.toFixed(4) + "/$" + maxCostUSD.toFixed(2) };
    }
    return { ok: true, cost: cost, inputTokens: totalInput, outputTokens: totalOutput };
  }

  function getSummary() {
    return {
      inputTokens: totalInput,
      outputTokens: totalOutput,
      cost: getCurrentCost().toFixed(4),
      currency: "USD"
    };
  }

  return {
    recordUsage: recordUsage,
    checkBudget: checkBudget,
    getSummary: getSummary
  };
}

Integrate this into the agent loop -- check the budget after every LLM call, and abort gracefully if limits are hit. In production, I log every agent run's cost to a database for trend analysis. You will be surprised how quickly costs add up when agents loop 10-15 times per request.


Error Recovery and Retry Strategies

Tools fail. APIs time out. Files do not exist. A robust agent handles all of this gracefully.

Retry with Exponential Backoff

function withRetry(fn, maxRetries, baseDelay) {
  maxRetries = maxRetries || 3;
  baseDelay = baseDelay || 1000;

  return function() {
    var args = arguments;
    var attempt = 0;

    function tryCall() {
      attempt++;
      try {
        var result = fn.apply(null, args);
        if (result && typeof result.then === "function") {
          return result.catch(function(err) {
            return handleError(err);
          });
        }
        return Promise.resolve(result);
      } catch (err) {
        return handleError(err);
      }
    }

    function handleError(err) {
      if (attempt >= maxRetries) {
        return Promise.reject(err);
      }

      var isRetryable = err.status === 429
        || err.status === 500
        || err.status === 529
        || err.code === "ECONNRESET"
        || err.code === "ETIMEDOUT";

      if (!isRetryable) {
        return Promise.reject(err);
      }

      var delay = baseDelay * Math.pow(2, attempt - 1);
      console.log("  Retry " + attempt + "/" + maxRetries + " after " + delay + "ms: " + err.message);

      return new Promise(function(resolve) {
        setTimeout(resolve, delay);
      }).then(tryCall);
    }

    return tryCall();
  };
}

Feeding Errors Back to the LLM

When a tool throws an error, do not crash the agent. Return the error as a tool result with is_error: true. The LLM is surprisingly good at recovering -- it will often try a different approach, fix the input, or ask a clarifying question.

// In processToolCalls (already shown above):
catch (err) {
  results.push({
    type: "tool_result",
    tool_use_id: toolUseId,
    content: "Error executing " + toolName + ": " + err.message,
    is_error: true
  });
}

This pattern is critical. I have seen agents recover from "file not found" by listing the directory first, then retrying with the correct filename. The error message IS the feedback loop.


Human-in-the-Loop Patterns

Letting an agent run fully autonomously is fine for read-only tasks. For anything that modifies state -- writing files, sending emails, making API calls that change data -- you want approval gates.

var readline = require("readline");

function createApprovalGate(options) {
  var requireApproval = options.requireApproval || [];  // Tool names that need approval
  var autoApprove = options.autoApprove || [];           // Tool names that never need approval

  function shouldRequireApproval(toolName, toolInput) {
    if (autoApprove.indexOf(toolName) !== -1) return false;
    if (requireApproval.indexOf(toolName) !== -1) return true;
    // Default: approve reads, gate writes
    if (toolName.match(/^(read_|query_|search|list)/)) return false;
    return true;
  }

  function requestApproval(toolName, toolInput) {
    return new Promise(function(resolve) {
      var rl = readline.createInterface({ input: process.stdin, output: process.stdout });

      console.log("\n========================================");
      console.log("APPROVAL REQUIRED");
      console.log("Tool: " + toolName);
      console.log("Input: " + JSON.stringify(toolInput, null, 2));
      console.log("========================================");

      rl.question("Approve? (y/n/m to modify): ", function(answer) {
        rl.close();
        if (answer.toLowerCase() === "y") {
          resolve({ approved: true, input: toolInput });
        } else if (answer.toLowerCase() === "m") {
          // Let user modify the input
          var editRl = readline.createInterface({ input: process.stdin, output: process.stdout });
          editRl.question("Enter modified JSON input: ", function(modified) {
            editRl.close();
            try {
              resolve({ approved: true, input: JSON.parse(modified) });
            } catch (e) {
              resolve({ approved: false, reason: "Invalid JSON" });
            }
          });
        } else {
          resolve({ approved: false, reason: "User rejected" });
        }
      });
    });
  }

  return {
    shouldRequireApproval: shouldRequireApproval,
    requestApproval: requestApproval
  };
}

Wire this into processToolCalls:

function processToolCallsWithApproval(contentBlocks, approvalGate) {
  var results = [];
  var chain = Promise.resolve();

  contentBlocks.forEach(function(block) {
    if (block.type !== "tool_use") return;

    chain = chain.then(function() {
      var toolName = block.name;
      var toolInput = block.input;

      if (approvalGate.shouldRequireApproval(toolName, toolInput)) {
        return approvalGate.requestApproval(toolName, toolInput).then(function(decision) {
          if (!decision.approved) {
            results.push({
              type: "tool_result",
              tool_use_id: block.id,
              content: "Tool call rejected by user: " + (decision.reason || "no reason given"),
              is_error: true
            });
            return;
          }
          return executeAndRecord(block, decision.input, results);
        });
      }

      return executeAndRecord(block, toolInput, results);
    });
  });

  return chain.then(function() { return results; });
}

The "modify" option is underrated. Sometimes the agent gets 90% of the way there but uses the wrong file path or an incorrect parameter. Letting the user tweak the input rather than rejecting and re-running saves significant time and tokens.


Agent Evaluation and Testing

You cannot improve what you cannot measure. Agent testing is fundamentally different from unit testing because outcomes are non-deterministic.

Evaluation Framework

function createEvaluator() {
  var testCases = [];

  function addTestCase(name, goal, expectedOutcomes, maxIterations) {
    testCases.push({
      name: name,
      goal: goal,
      expectedOutcomes: expectedOutcomes,
      maxIterations: maxIterations || 10
    });
  }

  function runEvaluation(agentFactory) {
    var results = [];

    return testCases.reduce(function(chain, testCase) {
      return chain.then(function() {
        console.log("\nEval: " + testCase.name);
        var agent = agentFactory({ maxIterations: testCase.maxIterations });
        var startTime = Date.now();

        return agent.run(testCase.goal).then(function(agentResult) {
          var elapsed = Date.now() - startTime;
          var passed = testCase.expectedOutcomes.every(function(check) {
            return check(agentResult);
          });

          var result = {
            name: testCase.name,
            passed: passed,
            iterations: agentResult.iterations,
            cost: agentResult.usage,
            elapsed: elapsed
          };

          console.log("  " + (passed ? "PASS" : "FAIL") + " (" + agentResult.iterations + " iterations, " + elapsed + "ms)");
          results.push(result);
        });
      });
    }, Promise.resolve()).then(function() {
      return results;
    });
  }

  return { addTestCase: addTestCase, runEvaluation: runEvaluation };
}

// Example usage
var evaluator = createEvaluator();

evaluator.addTestCase(
  "file_read_and_summarize",
  "Read the file README.md and write a one-paragraph summary to summary.txt",
  [
    function(result) { return result.success === true; },
    function(result) { return fs.existsSync("summary.txt"); },
    function(result) { return fs.readFileSync("summary.txt", "utf-8").length > 50; }
  ],
  5
);

Run evaluations on every change to your system prompt or tool definitions. Track the results over time. I keep a spreadsheet with columns for pass rate, average iterations, average cost, and p95 latency. Regressions in any of these metrics trigger investigation.


Cost Tracking per Agent Run

Beyond budget limits, you need visibility into what each agent run actually costs:

function logAgentRun(runId, goal, result, budget) {
  var summary = budget.getSummary();
  var logEntry = {
    runId: runId,
    timestamp: new Date().toISOString(),
    goal: goal.substring(0, 200),
    success: result.success,
    iterations: result.iterations,
    inputTokens: summary.inputTokens,
    outputTokens: summary.outputTokens,
    costUSD: summary.cost,
    durationMs: result.durationMs
  };

  // Append to a JSONL log file
  fs.appendFileSync(
    "agent-runs.jsonl",
    JSON.stringify(logEntry) + "\n",
    "utf-8"
  );

  console.log("\nRun complete:");
  console.log("  Iterations: " + logEntry.iterations);
  console.log("  Input tokens: " + logEntry.inputTokens);
  console.log("  Output tokens: " + logEntry.outputTokens);
  console.log("  Cost: $" + logEntry.costUSD);

  return logEntry;
}

A typical agent run that involves 3-5 tool calls with Claude Sonnet costs between $0.01 and $0.08. Complex research tasks with 10+ iterations can hit $0.30-0.50. If you are running agents at scale, these numbers matter.


Structured Output Parsing for Tool Calls

Sometimes you need the agent to produce structured output -- not just free-form text. Both Claude and OpenAI support forcing structured output, but you can also parse it yourself for more control:

function parseStructuredOutput(text, schema) {
  // Try to extract JSON from the response
  var jsonMatch = text.match(/```json\n([\s\S]*?)\n```/);
  if (jsonMatch) {
    try {
      return JSON.parse(jsonMatch[1]);
    } catch (e) {
      // Fall through
    }
  }

  // Try parsing the entire response as JSON
  try {
    return JSON.parse(text);
  } catch (e) {
    // Fall through
  }

  // Try to find a JSON object anywhere in the text
  var braceMatch = text.match(/\{[\s\S]*\}/);
  if (braceMatch) {
    try {
      return JSON.parse(braceMatch[0]);
    } catch (e) {
      // Fall through
    }
  }

  return null;
}

For Claude, you can also use the tool_choice parameter to force a specific tool call, which guarantees structured output matching the tool's input schema:

var response = client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 4096,
  tools: [{
    name: "structured_response",
    description: "Return the final structured result",
    input_schema: {
      type: "object",
      properties: {
        summary: { type: "string" },
        keyFindings: { type: "array", items: { type: "string" } },
        confidence: { type: "number", minimum: 0, maximum: 1 }
      },
      required: ["summary", "keyFindings", "confidence"]
    }
  }],
  tool_choice: { type: "tool", name: "structured_response" },
  messages: messages
});

This guarantees you get valid JSON matching your schema. No parsing gymnastics required.


Complete Working Example

Here is a fully functional agent that takes a user goal, reasons about it, uses tools, and iterates until the goal is achieved. This example researches a topic and writes a summary to a file.

Full Agent Implementation

// agent.js
var fs = require("fs");
var https = require("https");
var Anthropic = require("@anthropic-ai/sdk");

// --- Configuration ---
var client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
var MODEL = "claude-sonnet-4-20250514";
var MAX_ITERATIONS = 12;
var MAX_COST_USD = 0.50;

// --- Tool Registry ---
var tools = {};

function register(name, description, schema, fn) {
  tools[name] = { name: name, description: description, schema: schema, fn: fn };
}

register("read_file", "Read a file from disk", {
  type: "object",
  properties: { path: { type: "string" } },
  required: ["path"]
}, function(p) {
  if (!fs.existsSync(p.path)) throw new Error("File not found: " + p.path);
  return fs.readFileSync(p.path, "utf-8");
});

register("write_file", "Write content to a file", {
  type: "object",
  properties: {
    path: { type: "string" },
    content: { type: "string" }
  },
  required: ["path", "content"]
}, function(p) {
  fs.writeFileSync(p.path, p.content, "utf-8");
  return "Written " + p.content.length + " chars to " + p.path;
});

register("list_files", "List files in a directory", {
  type: "object",
  properties: { path: { type: "string" } },
  required: ["path"]
}, function(p) {
  return fs.readdirSync(p.path).join("\n");
});

register("web_fetch", "Fetch a web page and return its text content (first 5000 chars)", {
  type: "object",
  properties: { url: { type: "string" } },
  required: ["url"]
}, function(p) {
  return new Promise(function(resolve, reject) {
    https.get(p.url, function(res) {
      var data = "";
      res.on("data", function(chunk) { data += chunk; });
      res.on("end", function() {
        // Strip HTML tags for readability
        var text = data.replace(/<[^>]+>/g, " ").replace(/\s+/g, " ").trim();
        resolve(text.substring(0, 5000));
      });
    }).on("error", reject);
  });
});

register("calculate", "Evaluate a mathematical expression", {
  type: "object",
  properties: { expression: { type: "string" } },
  required: ["expression"]
}, function(p) {
  // Only allow safe math characters
  if (!/^[\d\s+\-*/().%]+$/.test(p.expression)) {
    throw new Error("Invalid expression. Only numbers and math operators allowed.");
  }
  var result = Function('"use strict"; return (' + p.expression + ')')();
  return String(result);
});

// --- Tool Definitions for Claude ---
function getToolDefs() {
  return Object.keys(tools).map(function(name) {
    var t = tools[name];
    return { name: t.name, description: t.description, input_schema: t.schema };
  });
}

// --- Budget Tracker ---
var budget = { input: 0, output: 0 };

function getCost() {
  return (budget.input / 1000000 * 3.00) + (budget.output / 1000000 * 15.00);
}

// --- Execute Tool ---
function executeTool(name, input) {
  var tool = tools[name];
  if (!tool) return Promise.resolve("Error: unknown tool " + name);

  try {
    var result = tool.fn(input);
    if (result && typeof result.then === "function") return result;
    return Promise.resolve(result);
  } catch (err) {
    return Promise.resolve("Error: " + err.message);
  }
}

// --- Agent Loop ---
function runAgent(goal) {
  var messages = [{ role: "user", content: goal }];
  var iteration = 0;
  var systemPrompt = "You are an autonomous research agent. You have tools to read files, write files, list directories, fetch web pages, and do math. Work step by step to accomplish the user's goal. When finished, respond with your final summary.";

  function loop() {
    iteration++;
    if (iteration > MAX_ITERATIONS) {
      console.log("Max iterations reached.");
      return Promise.resolve("FAILED: max iterations");
    }
    if (getCost() > MAX_COST_USD) {
      console.log("Budget exceeded: $" + getCost().toFixed(4));
      return Promise.resolve("FAILED: budget exceeded");
    }

    console.log("\n=== Iteration " + iteration + " (cost: $" + getCost().toFixed(4) + ") ===");

    return client.messages.create({
      model: MODEL,
      max_tokens: 4096,
      system: systemPrompt,
      tools: getToolDefs(),
      messages: messages
    }).then(function(response) {
      budget.input += response.usage.input_tokens;
      budget.output += response.usage.output_tokens;

      if (response.stop_reason === "end_turn") {
        var finalText = response.content
          .filter(function(b) { return b.type === "text"; })
          .map(function(b) { return b.text; })
          .join("\n");
        console.log("\nAgent complete.");
        return finalText;
      }

      // Process tool calls
      messages.push({ role: "assistant", content: response.content });

      var toolResults = [];
      var toolBlocks = response.content.filter(function(b) { return b.type === "tool_use"; });

      return toolBlocks.reduce(function(chain, block) {
        return chain.then(function() {
          console.log("  -> " + block.name + "(" + JSON.stringify(block.input).substring(0, 80) + ")");
          return executeTool(block.name, block.input).then(function(result) {
            var resultStr = typeof result === "string" ? result : JSON.stringify(result);
            if (resultStr.length > 8000) {
              resultStr = resultStr.substring(0, 8000) + "\n[truncated]";
            }
            console.log("     " + resultStr.substring(0, 120));
            toolResults.push({
              type: "tool_result",
              tool_use_id: block.id,
              content: resultStr
            });
          });
        });
      }, Promise.resolve()).then(function() {
        messages.push({ role: "user", content: toolResults });
        return loop();
      });
    });
  }

  return loop().then(function(result) {
    console.log("\n--- Final Report ---");
    console.log("Iterations: " + iteration);
    console.log("Input tokens: " + budget.input);
    console.log("Output tokens: " + budget.output);
    console.log("Total cost: $" + getCost().toFixed(4));
    console.log("Result:\n" + result);
    return result;
  });
}

// --- Entry Point ---
var goal = process.argv.slice(2).join(" ") || "Research the topic 'Node.js streams' and write a 3-paragraph summary to research-output.txt";

console.log("Agent goal: " + goal);
runAgent(goal).catch(function(err) {
  console.error("Agent failed:", err);
  process.exit(1);
});

Running the Agent

export ANTHROPIC_API_KEY=sk-ant-your-key-here

node agent.js "Research the differences between Node.js worker threads and child processes, then write a comparison summary to comparison.txt"

Example Output

Agent goal: Research the differences between Node.js worker threads and child processes...

=== Iteration 1 (cost: $0.0000) ===
  -> web_fetch({"url":"https://nodejs.org/api/worker_threads.html"})
     Worker threads Node.js v22.x Documentation ... The worker_threads module enables...

=== Iteration 2 (cost: $0.0042) ===
  -> web_fetch({"url":"https://nodejs.org/api/child_process.html"})
     Child process Node.js v22.x Documentation ... The child_process module provides...

=== Iteration 3 (cost: $0.0089) ===
  -> write_file({"path":"comparison.txt","content":"# Worker Threads vs Child Processes in Node.js\n\n## ..."})
     Written 2847 chars to comparison.txt

=== Iteration 4 (cost: $0.0134) ===

Agent complete.

--- Final Report ---
Iterations: 4
Input tokens: 6230
Output tokens: 1847
Total cost: $0.0464
Result:
I've researched and compared Node.js worker threads and child processes. The summary
has been written to comparison.txt covering: memory model differences, communication
patterns, use cases, and performance characteristics.

Common Issues and Troubleshooting

1. Infinite Tool Loops

Symptom: The agent keeps calling the same tool repeatedly, never converging on an answer.

=== Iteration 8 ===
  -> read_file({"path":"data.json"})
=== Iteration 9 ===
  -> read_file({"path":"data.json"})
=== Iteration 10 ===
  -> read_file({"path":"data.json"})
Max iterations reached.

Fix: Add loop detection to your agent. Track the last N tool calls and if the same call repeats 3+ times, inject a message telling the LLM to try a different approach:

if (lastThreeCalls.every(function(c) { return c === currentCall; })) {
  messages.push({
    role: "user",
    content: "You have called " + toolName + " with the same arguments 3 times. Try a different approach."
  });
}

2. Token Limit Exceeded

Symptom:

Error: 400 {"error":{"type":"invalid_request_error","message":"prompt is too long: 204831 tokens > 200000 maximum"}}

Fix: Implement the context compaction strategy shown above. Trigger compaction when context exceeds 60% of the model's limit, not at 100%.

3. Tool Output Overflows Context

Symptom: A single tool call returns a massive response (e.g., fetching an entire web page) that pushes the conversation over the token limit.

Fix: Always truncate tool outputs. Set a hard cap per tool result:

var MAX_TOOL_OUTPUT = 8000; // characters
if (outputStr.length > MAX_TOOL_OUTPUT) {
  outputStr = outputStr.substring(0, MAX_TOOL_OUTPUT) + "\n[Output truncated. " + outputStr.length + " total chars. Ask me to read specific sections if needed.]";
}

4. Rate Limiting (HTTP 429)

Symptom:

Error: 429 {"error":{"type":"rate_limit_error","message":"Number of request tokens has exceeded your per-minute rate limit"}}

Fix: Implement exponential backoff with jitter. Also consider reducing max_tokens on non-final iterations, since smaller responses consume fewer rate limit tokens:

// On 429, read the retry-after header
var retryAfter = parseInt(err.headers["retry-after"] || "5", 10);
var delay = (retryAfter * 1000) + (Math.random() * 1000);

5. JSON Parse Errors from Tool Arguments

Symptom:

SyntaxError: Unexpected token in JSON at position 0

This happens when the LLM produces malformed JSON in tool arguments. It is rare with Claude but more common with smaller or older models.

Fix: Wrap JSON.parse in a try/catch and return the error to the LLM so it can retry:

try {
  var args = JSON.parse(toolCall.function.arguments);
} catch (e) {
  return {
    type: "tool_result",
    tool_use_id: toolCall.id,
    content: "Invalid JSON in tool arguments: " + e.message + ". Raw: " + toolCall.function.arguments,
    is_error: true
  };
}

Best Practices

  • Start with fewer tools, not more. An agent with 3-5 focused tools outperforms one with 20 vague ones. The LLM wastes reasoning tokens parsing large tool lists, and it gets confused about which tool to pick. Add tools only when you hit a concrete capability gap.

  • Always set a max iteration limit. Without it, a confused agent will loop forever, burning money. 10-15 iterations covers most tasks. If your agent routinely needs more, your tools are too low-level -- combine operations into higher-level tools.

  • Truncate all tool outputs. A 50KB API response does not help the agent reason better. It burns your token budget and pushes important context out of the window. 5-10KB per tool result is a reasonable cap.

  • Use model tiering within the agent. Use your most capable model (Claude Sonnet, GPT-4.1) for the main reasoning loop. Use a cheaper model (Claude Haiku, GPT-4.1-nano) for context summarization, output formatting, and classification subtasks.

  • Log everything. Every tool call, every LLM response, every token count. When an agent fails at 2 AM, the logs are your only window into what happened. Use structured logging (JSON lines) so you can query it.

  • Implement defense in depth for dangerous tools. Allowlists, input validation, sandboxing, and human approval gates. Never trust the LLM's output as safe by default. It is generating text, not reasoning about security.

  • Design tools for composability. A read_file tool and a write_file tool are more flexible than a copy_file tool. Let the LLM compose simple tools into complex workflows -- that is what it is good at.

  • Test with adversarial goals. Give the agent contradictory instructions, impossible tasks, and ambiguous requests. Observe how it fails. A good agent says "I cannot accomplish this" rather than hallucinating success or looping forever.

  • Pin your model versions. Do not use claude-sonnet-4-latest in production agent code. Model behavior changes between versions, and an agent that worked perfectly on one version might loop on the next. Pin to a specific version like claude-sonnet-4-20250514 and test before upgrading.


References

Powered by Contentful