Implementing the ReAct Framework in Node.js
Build a ReAct (Reasoning + Acting) agent framework in Node.js with tool integration, observation feedback, and multi-step reasoning chains.
Implementing the ReAct Framework in Node.js
ReAct (Reasoning + Acting) is a prompting and orchestration framework that interleaves chain-of-thought reasoning with concrete tool actions, producing agents that can solve multi-step problems by thinking out loud, taking actions, and observing results. This article walks through a full Node.js implementation of the ReAct loop, from prompt design and action parsing to tool execution, observation feedback, and stopping conditions. By the end you will have a working ReAct agent that reasons through complex questions using search and calculator tools.
Prerequisites
- Node.js 18 or later installed
- An OpenAI API key (or any OpenAI-compatible endpoint)
- The
openainpm package (npm install openai) - Familiarity with LLM prompting concepts (system messages, temperature, tokens)
- Basic understanding of what an "agent" means in the LLM context
What ReAct Is and Why It Matters
The ReAct framework was introduced in the 2022 paper "ReAct: Synergizing Reasoning and Acting in Language Models" by Yao et al. The core insight is simple but powerful: if you let an LLM alternate between thinking (generating a chain-of-thought rationale) and acting (invoking external tools), it outperforms both pure chain-of-thought reasoning and pure action-based approaches on complex tasks.
In a standard function-calling setup, you give the LLM a list of tools and it picks one. That works for single-hop questions. But when the problem requires multiple steps of reasoning -- where the answer to one question informs what to ask next -- function calling alone falls apart. The model has no structured place to reason about intermediate results.
ReAct fixes this by imposing a strict loop:
- Thought -- The model reasons about what it knows so far and what it needs to do next.
- Action -- The model specifies a tool call with concrete inputs.
- Observation -- The system executes the tool and feeds the result back to the model.
This cycle repeats until the model decides it has enough information to produce a final answer, or until a maximum iteration limit is reached. The explicit thought step forces the model to articulate its reasoning, which dramatically improves accuracy on multi-step problems.
The Think-Act-Observe Cycle Explained
Consider a question like: "What is the population of the country where the Eiffel Tower is located, divided by 1000?" A pure chain-of-thought approach might hallucinate the population. A pure tool-calling approach might search for "population Eiffel Tower divided by 1000" and get garbage. ReAct handles it cleanly:
Thought: I need to find out which country the Eiffel Tower is in.
Action: search("Eiffel Tower location")
Observation: The Eiffel Tower is located in Paris, France.
Thought: The Eiffel Tower is in France. Now I need the population of France.
Action: search("population of France 2024")
Observation: The population of France is approximately 68,170,000.
Thought: I have the population (68,170,000). I need to divide it by 1000.
Action: calculate("68170000 / 1000")
Observation: 68170
Thought: I now have the final answer. 68,170,000 / 1000 = 68,170.
Action: finish("The population of France divided by 1000 is 68,170.")
Each thought step gives the model a place to synthesize what it just learned, decide what's missing, and plan the next move. The observation step grounds the model in real data rather than letting it guess.
Implementing the Core ReAct Loop in Node.js
The heart of any ReAct implementation is a while loop that accumulates a conversation history (the "scratchpad") and calls the LLM on each iteration. Here is the minimal structure:
var OpenAI = require("openai");
var client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
function createReActAgent(tools, options) {
var maxIterations = (options && options.maxIterations) || 10;
var model = (options && options.model) || "gpt-4o";
return {
run: function (question) {
return runAgent(question, tools, model, maxIterations);
}
};
}
async function runAgent(question, tools, model, maxIterations) {
var scratchpad = "";
var iteration = 0;
var systemPrompt = buildSystemPrompt(tools);
while (iteration < maxIterations) {
iteration++;
var response = await callLLM(systemPrompt, question, scratchpad, model);
var parsed = parseResponse(response);
scratchpad += response + "\n";
if (parsed.action === "finish") {
return {
answer: parsed.actionInput,
iterations: iteration,
scratchpad: scratchpad
};
}
var observation = await executeTool(parsed.action, parsed.actionInput, tools);
scratchpad += "Observation: " + observation + "\n";
}
return {
answer: "Agent reached maximum iterations without a final answer.",
iterations: iteration,
scratchpad: scratchpad
};
}
The scratchpad variable is the accumulated history of all thoughts, actions, and observations. Each iteration appends to it, giving the LLM full context of what has happened so far. This is the key mechanism that enables multi-step reasoning.
Designing the Thought/Action/Observation Prompt Format
The system prompt is where you define the ReAct format. The model needs to understand exactly what format to produce so your parser can extract actions reliably.
function buildSystemPrompt(tools) {
var toolDescriptions = Object.keys(tools).map(function (name) {
return " " + name + ": " + tools[name].description;
}).join("\n");
return [
"You are a helpful assistant that answers questions by reasoning step-by-step",
"and using tools when needed.",
"",
"You have access to the following tools:",
toolDescriptions,
" finish: Use this when you have the final answer. Input is the answer string.",
"",
"For each step, you MUST use exactly this format:",
"",
"Thought: <your reasoning about what to do next>",
"Action: <tool_name>",
"Action Input: <input to the tool>",
"",
"After each action, you will receive an Observation with the result.",
"Continue the Thought/Action/Observation cycle until you can provide a final answer.",
"When you have the final answer, use the finish action.",
"",
"IMPORTANT: Always start with a Thought. Never skip the Thought step.",
"IMPORTANT: Only use one Action per step. Wait for the Observation before continuing."
].join("\n");
}
The explicit format instructions are critical. Without them, the model will drift into conversational responses that break your parser. I have found that including the IMPORTANT reminders at the end reduces format violations by roughly 80 percent.
Tool Integration Within the ReAct Framework
Tools are plain objects with a description and an execute function. Keep the interface simple:
var tools = {
search: {
description: "Search the web for information. Input is a search query string.",
execute: async function (input) {
// In production, call a real search API (SerpAPI, Tavily, Brave, etc.)
var https = require("https");
return new Promise(function (resolve, reject) {
var url = "https://api.tavily.com/search";
var postData = JSON.stringify({
api_key: process.env.TAVILY_API_KEY,
query: input,
max_results: 3
});
var options = {
method: "POST",
headers: {
"Content-Type": "application/json",
"Content-Length": Buffer.byteLength(postData)
}
};
var req = https.request(url, options, function (res) {
var body = "";
res.on("data", function (chunk) { body += chunk; });
res.on("end", function () {
try {
var data = JSON.parse(body);
var results = (data.results || []).map(function (r) {
return r.title + ": " + r.content;
}).join("\n");
resolve(results || "No results found.");
} catch (e) {
resolve("Search returned an unparseable response.");
}
});
});
req.on("error", function (err) { resolve("Search failed: " + err.message); });
req.write(postData);
req.end();
});
}
},
calculate: {
description: "Evaluate a mathematical expression. Input is a math expression string.",
execute: async function (input) {
try {
// Use Function constructor for safe-ish math evaluation
// In production, use a proper math parser like mathjs
var sanitized = input.replace(/[^0-9+\-*/().%\s]/g, "");
if (sanitized !== input.trim()) {
return "Error: Expression contains invalid characters.";
}
var result = new Function("return (" + sanitized + ")")();
return String(result);
} catch (e) {
return "Calculation error: " + e.message;
}
}
}
};
Notice that tool execute functions always return strings and never throw. The ReAct loop expects observations to be text that gets appended to the scratchpad. If a tool throws, the agent loses context about what went wrong. Always catch errors inside the tool and return a descriptive error string instead.
Parsing LLM Output to Extract Actions
This is where most ReAct implementations break in practice. The LLM does not always produce perfectly formatted output. Your parser needs to be robust:
function parseResponse(text) {
var thoughtMatch = text.match(/Thought:\s*(.*?)(?=\nAction:|$)/s);
var actionMatch = text.match(/Action:\s*(\S+)/);
var actionInputMatch = text.match(/Action Input:\s*(.*?)$/ms);
var thought = thoughtMatch ? thoughtMatch[1].trim() : "";
var action = actionMatch ? actionMatch[1].trim().toLowerCase() : "";
var actionInput = actionInputMatch ? actionInputMatch[1].trim() : "";
// Handle common format variations
if (!action && text.toLowerCase().indexOf("final answer") !== -1) {
action = "finish";
var finalMatch = text.match(/Final Answer:\s*(.*?)$/ms);
actionInput = finalMatch ? finalMatch[1].trim() : text;
}
// Strip quotes from action input if present
if (actionInput.charAt(0) === '"' && actionInput.charAt(actionInput.length - 1) === '"') {
actionInput = actionInput.slice(1, -1);
}
return {
thought: thought,
action: action,
actionInput: actionInput
};
}
The key decisions here: use case-insensitive matching for the action name, handle the "Final Answer" variant that some models produce instead of finish, and strip surrounding quotes from action inputs. In production you should also handle the case where the model produces multiple actions in a single response -- take only the first one and ignore the rest.
Calling the LLM
The LLM call assembles the system prompt, the user question, and the scratchpad into a conversation:
async function callLLM(systemPrompt, question, scratchpad, model) {
var messages = [
{ role: "system", content: systemPrompt },
{ role: "user", content: "Question: " + question }
];
if (scratchpad) {
messages.push({
role: "assistant",
content: scratchpad.trim()
});
messages.push({
role: "user",
content: "Continue from where you left off. Remember to follow the Thought/Action/Action Input format."
});
}
var completion = await client.chat.completions.create({
model: model,
messages: messages,
temperature: 0.1,
max_tokens: 1024
});
return completion.choices[0].message.content;
}
I use a low temperature (0.1) because ReAct agents need deterministic, structured output. Higher temperatures cause format drift. The scratchpad is placed as an assistant message so the model treats it as its own prior output and continues naturally from it.
Executing Tools
The tool executor maps action names to tool objects and calls them:
async function executeTool(actionName, actionInput, tools) {
var tool = tools[actionName];
if (!tool) {
return "Error: Unknown tool '" + actionName + "'. Available tools: " +
Object.keys(tools).join(", ") + ", finish";
}
try {
var result = await tool.execute(actionInput);
// Truncate very long observations to avoid context overflow
if (result.length > 2000) {
result = result.substring(0, 2000) + "\n[Truncated - result was " + result.length + " characters]";
}
return result;
} catch (err) {
return "Tool execution error: " + err.message;
}
}
The truncation is important. Search results can be enormous, and stuffing the entire result into the scratchpad wastes tokens and confuses the model. I have found that 2000 characters is a good ceiling for most observations. For search results, you might truncate even more aggressively.
Handling Multi-Step Reasoning Chains
Real questions often require three, four, or more steps. The scratchpad grows with each iteration, and the model must maintain coherence across all of them. There are two practical concerns:
Context window management. If your scratchpad exceeds the model's context window, the call will fail. Track the token count of your scratchpad and implement a sliding window or summarization strategy when it gets too long:
function trimScratchpad(scratchpad, maxTokenEstimate) {
// Rough estimate: 4 characters per token
var estimatedTokens = Math.ceil(scratchpad.length / 4);
if (estimatedTokens <= maxTokenEstimate) {
return scratchpad;
}
// Keep the first thought (establishes the task) and the last N entries
var entries = scratchpad.split(/(?=Thought:)/);
var firstEntry = entries[0];
var recentEntries = entries.slice(-5).join("");
return firstEntry +
"\n[Earlier reasoning steps omitted for brevity]\n\n" +
recentEntries;
}
Reasoning drift. After five or six steps, models sometimes lose track of the original question. Adding the original question as a reminder in the continuation prompt helps:
// In callLLM, modify the continuation message:
messages.push({
role: "user",
content: "The original question was: " + question +
"\n\nContinue your reasoning. Follow the Thought/Action/Action Input format."
});
Stopping Conditions
The agent needs to know when to stop. There are three mechanisms:
Explicit finish action. The model calls
finishwith the final answer. This is the happy path.Maximum iterations. A hard cap prevents infinite loops. Ten iterations is a sensible default for most tasks. For complex research tasks, you might raise it to 20.
Task completion detection. You can add a heuristic check after each observation to see if the model has enough information. This is optional but useful for efficiency:
function detectCompletion(scratchpad, question) {
// Count how many observations we have
var observationCount = (scratchpad.match(/Observation:/g) || []).length;
// If the last thought contains phrases suggesting completion, nudge the model
var lastThought = scratchpad.match(/Thought:.*$/s);
if (lastThought) {
var completionPhrases = ["i now have", "i can now answer", "the final answer",
"putting it all together", "i have all the information"];
var lower = lastThought[0].toLowerCase();
for (var i = 0; i < completionPhrases.length; i++) {
if (lower.indexOf(completionPhrases[i]) !== -1) {
return true;
}
}
}
return false;
}
ReAct vs Chain-of-Thought vs Simple Function Calling
These three approaches sit on a spectrum:
Chain-of-thought (CoT) asks the model to reason step-by-step but gives it no tools. Great for pure logic and math problems where the model has the knowledge. Falls apart when it needs external data.
Simple function calling gives the model tools but no structured reasoning step. The model picks a tool, gets a result, and produces an answer. Works for single-hop questions ("What's the weather?") but struggles with multi-step problems where intermediate reasoning is needed.
ReAct combines both. The explicit thought step before each action means the model can reason about what it learned from the last observation, plan what to do next, and maintain a coherent strategy across many steps. The cost is more tokens per query (each thought step adds overhead) and higher latency (each iteration is a separate LLM call).
My recommendation: use simple function calling for straightforward tool-use cases. Use ReAct when the task requires more than two steps of reasoning or when the next tool call depends on analyzing the result of the previous one.
Adding Scratchpad Memory to ReAct Agents
The scratchpad is a form of working memory, but it is ephemeral -- it exists only for the duration of a single question. For agents that handle multiple related questions in a session, you can add a persistent memory layer:
function createMemoryStore() {
var facts = [];
return {
add: function (fact) {
facts.push({
content: fact,
timestamp: new Date().toISOString()
});
},
retrieve: function (query, maxResults) {
// Simple keyword matching; in production use embeddings
var scored = facts.map(function (f) {
var words = query.toLowerCase().split(" ");
var score = words.reduce(function (acc, word) {
return acc + (f.content.toLowerCase().indexOf(word) !== -1 ? 1 : 0);
}, 0);
return { fact: f, score: score };
});
scored.sort(function (a, b) { return b.score - a.score; });
return scored.slice(0, maxResults || 5).map(function (s) {
return s.fact.content;
});
},
getAll: function () {
return facts.map(function (f) { return f.content; });
}
};
}
Integrate it into the system prompt by appending relevant memories before the scratchpad:
var relevantMemories = memory.retrieve(question, 3);
if (relevantMemories.length > 0) {
systemPrompt += "\n\nRelevant information from previous interactions:\n" +
relevantMemories.join("\n");
}
After each successful agent run, extract key facts from the final answer and store them:
var result = await agent.run(question);
memory.add("Q: " + question + " -> A: " + result.answer);
Error Recovery Within the ReAct Loop
Things go wrong. Tools fail, the model produces unparseable output, APIs time out. Robust error recovery keeps the agent moving:
async function runAgentWithRecovery(question, tools, model, maxIterations) {
var scratchpad = "";
var iteration = 0;
var consecutiveErrors = 0;
var maxConsecutiveErrors = 3;
var systemPrompt = buildSystemPrompt(tools);
while (iteration < maxIterations) {
iteration++;
var response;
try {
response = await callLLM(systemPrompt, question, scratchpad, model);
} catch (err) {
consecutiveErrors++;
if (consecutiveErrors >= maxConsecutiveErrors) {
return {
answer: "Agent failed after " + maxConsecutiveErrors + " consecutive LLM errors: " + err.message,
iterations: iteration,
scratchpad: scratchpad
};
}
scratchpad += "Thought: The previous LLM call failed. Let me try again.\n";
continue;
}
var parsed = parseResponse(response);
scratchpad += response + "\n";
// Handle unparseable responses
if (!parsed.action) {
consecutiveErrors++;
if (consecutiveErrors >= maxConsecutiveErrors) {
return {
answer: "Agent failed: could not parse a valid action from LLM output.",
iterations: iteration,
scratchpad: scratchpad
};
}
scratchpad += "Observation: Your response did not follow the required format. " +
"Please respond with Thought/Action/Action Input format.\n";
continue;
}
consecutiveErrors = 0; // Reset on successful parse
if (parsed.action === "finish") {
return {
answer: parsed.actionInput,
iterations: iteration,
scratchpad: scratchpad
};
}
var observation = await executeTool(parsed.action, parsed.actionInput, tools);
scratchpad += "Observation: " + observation + "\n";
}
return {
answer: "Agent reached maximum iterations (" + maxIterations + ").",
iterations: iteration,
scratchpad: scratchpad
};
}
The consecutive error counter is important. A single parse failure is recoverable -- the model can self-correct on the next iteration. Three in a row usually means something is fundamentally wrong (bad prompt, model downgrade, etc.) and you should bail out rather than burning tokens.
Extending ReAct with Reflection (ReAct + Self-Critique)
A powerful extension is adding a reflection step after every N iterations. The model reviews its reasoning so far and critiques its own approach:
function buildReflectionPrompt(question, scratchpad, iteration) {
return [
"You are reviewing the reasoning of an AI agent working on this question:",
'"' + question + '"',
"",
"Here is the agent's work so far (iteration " + iteration + "):",
scratchpad,
"",
"Please critique the agent's approach:",
"1. Is the agent making progress toward answering the question?",
"2. Has the agent gone down any unproductive paths?",
"3. What should the agent do differently in the next steps?",
"4. Does the agent already have enough information to answer?",
"",
"Respond with a brief critique (2-3 sentences)."
].join("\n");
}
async function reflect(question, scratchpad, iteration, model) {
var prompt = buildReflectionPrompt(question, scratchpad, iteration);
var completion = await client.chat.completions.create({
model: model,
messages: [{ role: "user", content: prompt }],
temperature: 0.2,
max_tokens: 256
});
return completion.choices[0].message.content;
}
Inject the reflection into the scratchpad every three iterations:
if (iteration % 3 === 0 && iteration > 0) {
var critique = await reflect(question, scratchpad, iteration, model);
scratchpad += "Reflection: " + critique + "\n";
}
This doubles your LLM calls on reflection iterations, but it catches reasoning dead ends early. In my testing, reflection reduces average iterations to answer by about 20 percent on complex multi-hop questions because the agent course-corrects before wasting steps.
Complete Working Example
Here is the full, runnable ReAct agent combining everything above:
var OpenAI = require("openai");
var client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// --- Tool Definitions ---
var tools = {
search: {
description: "Search for factual information. Input: a search query string.",
execute: async function (input) {
// Stub implementation for demonstration
// Replace with a real search API in production
var knowledge = {
"eiffel tower location": "The Eiffel Tower is located in Paris, France.",
"population of france": "The population of France is approximately 68,170,000 as of 2024.",
"capital of japan": "The capital of Japan is Tokyo.",
"population of tokyo": "The population of Tokyo is approximately 13,960,000.",
"height of mount everest": "Mount Everest is 8,849 meters (29,032 feet) tall."
};
var key = Object.keys(knowledge).find(function (k) {
return input.toLowerCase().indexOf(k) !== -1;
});
return key ? knowledge[key] : "No results found for: " + input;
}
},
calculate: {
description: "Evaluate a math expression. Input: a mathematical expression.",
execute: async function (input) {
try {
var sanitized = input.replace(/[^0-9+\-*/().%\s]/g, "");
var result = new Function("return (" + sanitized + ")")();
return String(result);
} catch (e) {
return "Calculation error: " + e.message;
}
}
}
};
// --- Prompt Builder ---
function buildSystemPrompt(tools) {
var toolDescriptions = Object.keys(tools).map(function (name) {
return " " + name + ": " + tools[name].description;
}).join("\n");
return [
"You are a helpful assistant that answers questions by reasoning step-by-step and using tools.",
"",
"Available tools:",
toolDescriptions,
" finish: Provide the final answer. Input is the complete answer string.",
"",
"You MUST use this exact format for every step:",
"",
"Thought: <reasoning about what to do next>",
"Action: <tool_name>",
"Action Input: <input string for the tool>",
"",
"After each action, you will see an Observation with the result.",
"Continue until you can use the finish action with the complete answer.",
"",
"IMPORTANT: Always start with a Thought. Only one Action per step."
].join("\n");
}
// --- LLM Caller ---
async function callLLM(systemPrompt, question, scratchpad, model) {
var messages = [
{ role: "system", content: systemPrompt },
{ role: "user", content: "Question: " + question }
];
if (scratchpad) {
messages.push({ role: "assistant", content: scratchpad.trim() });
messages.push({
role: "user",
content: "Original question: " + question + "\nContinue with Thought/Action/Action Input format."
});
}
var completion = await client.chat.completions.create({
model: model,
messages: messages,
temperature: 0.1,
max_tokens: 1024
});
return completion.choices[0].message.content;
}
// --- Response Parser ---
function parseResponse(text) {
var actionMatch = text.match(/Action:\s*(\S+)/);
var actionInputMatch = text.match(/Action Input:\s*(.*?)$/ms);
var action = actionMatch ? actionMatch[1].trim().toLowerCase() : "";
var actionInput = actionInputMatch ? actionInputMatch[1].trim() : "";
if (!action && text.toLowerCase().indexOf("final answer") !== -1) {
action = "finish";
var finalMatch = text.match(/Final Answer:\s*(.*?)$/ms);
actionInput = finalMatch ? finalMatch[1].trim() : text;
}
if (actionInput.charAt(0) === '"' && actionInput.charAt(actionInput.length - 1) === '"') {
actionInput = actionInput.slice(1, -1);
}
return { action: action, actionInput: actionInput };
}
// --- Tool Executor ---
async function executeTool(actionName, actionInput, tools) {
var tool = tools[actionName];
if (!tool) {
return "Error: Unknown tool '" + actionName + "'. Available: " +
Object.keys(tools).join(", ") + ", finish";
}
try {
var result = await tool.execute(actionInput);
if (result.length > 2000) {
result = result.substring(0, 2000) + "\n[Truncated]";
}
return result;
} catch (err) {
return "Tool error: " + err.message;
}
}
// --- Main Agent Loop ---
async function runAgent(question, options) {
var model = (options && options.model) || "gpt-4o";
var maxIterations = (options && options.maxIterations) || 10;
var systemPrompt = buildSystemPrompt(tools);
var scratchpad = "";
var iteration = 0;
var consecutiveErrors = 0;
console.log("Question: " + question);
console.log("---");
while (iteration < maxIterations) {
iteration++;
var response;
try {
response = await callLLM(systemPrompt, question, scratchpad, model);
} catch (err) {
consecutiveErrors++;
if (consecutiveErrors >= 3) {
return { answer: "Failed after 3 consecutive LLM errors.", iterations: iteration };
}
scratchpad += "Thought: LLM call failed, retrying.\n";
continue;
}
console.log("Step " + iteration + ":\n" + response);
var parsed = parseResponse(response);
scratchpad += response + "\n";
if (!parsed.action) {
consecutiveErrors++;
if (consecutiveErrors >= 3) {
return { answer: "Failed: could not parse valid actions.", iterations: iteration };
}
scratchpad += "Observation: Response format was invalid. Use Thought/Action/Action Input.\n";
continue;
}
consecutiveErrors = 0;
if (parsed.action === "finish") {
console.log("\nFinal Answer: " + parsed.actionInput);
return { answer: parsed.actionInput, iterations: iteration, scratchpad: scratchpad };
}
var observation = await executeTool(parsed.action, parsed.actionInput, tools);
console.log("Observation: " + observation);
console.log("---");
scratchpad += "Observation: " + observation + "\n";
}
return { answer: "Reached max iterations.", iterations: iteration, scratchpad: scratchpad };
}
// --- Run ---
(async function () {
var result = await runAgent(
"What is the population of the country where the Eiffel Tower is located, divided by 1000?",
{ model: "gpt-4o", maxIterations: 10 }
);
console.log("\n=== Agent Result ===");
console.log("Answer: " + result.answer);
console.log("Iterations: " + result.iterations);
})();
Save this as react-agent.js, set your OPENAI_API_KEY environment variable, and run it with node react-agent.js. The agent will reason through the question step-by-step, calling the search and calculate tools as needed.
Common Issues and Troubleshooting
1. "Action: none" or empty action parsing
Error: Unknown tool 'none'. Available: search, calculate, finish
This happens when the model produces a response without a clear action line. Common causes: the system prompt is too vague about the required format, or the model is a smaller variant that struggles with structured output. Fix it by adding stricter format instructions and few-shot examples in the system prompt. You can also add a retry with an explicit reminder:
scratchpad += "Observation: Your response did not include a valid Action. " +
"You must respond with exactly:\nThought: ...\nAction: ...\nAction Input: ...\n";
2. Token limit exceeded on long reasoning chains
Error: 400 This model's maximum context length is 128000 tokens.
However, your messages resulted in 131542 tokens.
The scratchpad grows with every iteration. Implement the trimScratchpad function described earlier and call it before every LLM call. Alternatively, set a lower maxIterations for your use case.
3. Model produces multiple actions in one response
Thought: I need two pieces of information.
Action: search
Action Input: population of France
Action: search
Action Input: capital of France
Your parser grabs only the first action (by design), but the model wastes tokens on the second. Prevent this by adding to the system prompt: "You must output exactly ONE Action per step. Never output multiple Actions." Some models still do it occasionally; the parser handles it gracefully by taking the first match.
4. Infinite loops where the model repeats the same action
Thought: I need to search for the population of France.
Action: search
Action Input: population of France
Observation: The population of France is approximately 68,170,000.
Thought: I need to find the population of France.
Action: search
Action Input: population of France
Observation: The population of France is approximately 68,170,000.
The model gets stuck in a loop. Detect this by comparing consecutive actions:
var lastAction = "";
var lastInput = "";
// Inside the loop, after parsing:
if (parsed.action === lastAction && parsed.actionInput === lastInput) {
scratchpad += "Observation: You already performed this exact action. " +
"The result was the same. Please use the information you have to continue.\n";
lastAction = "";
lastInput = "";
continue;
}
lastAction = parsed.action;
lastInput = parsed.actionInput;
5. Rate limiting on LLM API calls
Error: 429 Rate limit reached for gpt-4o in organization org-xxx on tokens per min.
Each ReAct iteration makes an LLM call. A ten-step reasoning chain means ten API calls in rapid succession. Add exponential backoff to your callLLM function:
async function callLLMWithRetry(systemPrompt, question, scratchpad, model, maxRetries) {
var retries = maxRetries || 3;
for (var attempt = 0; attempt < retries; attempt++) {
try {
return await callLLM(systemPrompt, question, scratchpad, model);
} catch (err) {
if (err.status === 429 && attempt < retries - 1) {
var delay = Math.pow(2, attempt) * 1000;
await new Promise(function (resolve) { setTimeout(resolve, delay); });
} else {
throw err;
}
}
}
}
Best Practices
Use low temperature (0.0 to 0.2) for ReAct agents. Structured output requires deterministic generation. High temperatures cause the model to drift from the Thought/Action/Action Input format, breaking the parser and wasting iterations.
Always set a maximum iteration limit. Without it, a confused agent will loop forever burning tokens and money. Start with 10 iterations and adjust based on your task complexity. Log the iteration count so you can spot tasks that consistently hit the limit.
Truncate tool observations aggressively. A search result with 10,000 characters of raw HTML is worse than useless -- it bloats the scratchpad, pushes out earlier reasoning, and confuses the model. Summarize or truncate observations to the most relevant 1000-2000 characters.
Return errors as observations, never throw them. The whole point of the ReAct loop is that the model can reason about what went wrong and try a different approach. If a tool throws an exception that crashes the loop, you lose that self-correction capability.
Include the original question in continuation prompts. After five or more iterations, the model may lose sight of what it was trying to answer. Re-stating the original question as part of the continuation prompt keeps the reasoning on track.
Add duplicate action detection. Models sometimes get stuck repeating the same search query expecting different results. Track the last N actions and inject an observation telling the model to move on if it repeats itself.
Log the full scratchpad for debugging. When a ReAct agent gives a wrong answer, the scratchpad tells you exactly where the reasoning went wrong. Store it alongside the final answer in your logs. This is invaluable for prompt tuning.
Consider hybrid approaches for production. Use simple function calling for single-tool tasks and ReAct only for multi-step reasoning tasks. You can route between them based on a quick classification of the user's question. This saves tokens and latency on simple queries while still handling complex ones well.
Test with adversarial questions. Questions that require no tools, questions that need tools the agent does not have, questions with no clear answer, and deliberately ambiguous questions all reveal weaknesses in your implementation. Build a test suite of these edge cases.
References
- Yao, S. et al. (2022). "ReAct: Synergizing Reasoning and Acting in Language Models." arXiv:2210.03629
- OpenAI API documentation: https://platform.openai.com/docs
- LangChain ReAct agent implementation: https://js.langchain.com/docs/modules/agents/
- Tavily Search API: https://tavily.com
- Shinn, N. et al. (2023). "Reflexion: Language Agents with Verbal Reinforcement Learning." arXiv:2303.11366