Cost Management for Agent-Based Systems
Control AI agent costs with per-task budgets, model tiering, early termination, cost tracking, and optimization strategies in Node.js.
Cost Management for Agent-Based Systems
AI agents that reason across multiple steps, invoke tools, and orchestrate sub-tasks can produce extraordinary results. They can also produce extraordinary invoices. Without deliberate cost controls, a single agent run can burn through hundreds of dollars in LLM calls before anyone notices. This article covers the architectural patterns, middleware, and guardrails you need to keep agent-based systems economically viable in production.
Prerequisites
- Working knowledge of Node.js and Express
- Familiarity with LLM API concepts (tokens, models, pricing)
- Basic understanding of AI agent architectures (tool-calling loops, multi-step reasoning)
- Node.js v18+ installed locally
Why Agent Costs Spiral
A single LLM call is cheap. A chat completion with GPT-4o might cost a few cents. But agents do not make single calls. They reason in loops. Each iteration of the loop sends the full conversation history plus tool results back to the model, and each iteration compounds the token count.
Consider a simple research agent that searches the web, reads three pages, summarizes findings, and drafts a report. That is at minimum five LLM calls. But each call includes the growing context from previous steps. By the fifth call, you are sending thousands of tokens of accumulated context. The token usage is not linear with the number of steps; it is roughly quadratic because each step adds to the context that every subsequent step must process.
Now multiply that by concurrent users, retry logic, and sub-agents that spawn their own reasoning loops. A team of 20 developers each running 10 agent tasks per day can easily generate $500-1000 in daily LLM costs if nobody is watching.
The fundamental problem is that agents are autonomous by design. You give them a goal and they figure out the steps. That autonomy is the value proposition, but it is also the cost risk. An agent stuck in a reasoning loop or pursuing a dead-end approach will keep spending money until it either succeeds, hits a token limit, or someone kills the process.
Anatomy of Agent Costs
Before you can manage costs, you need to understand where money goes in an agent system. There are four major cost categories.
LLM API Calls are typically 60-80% of total cost. This includes the reasoning model for planning, smaller models for subtasks, embedding calls for retrieval, and vision model calls for image analysis. Pricing varies dramatically by model. GPT-4o input tokens cost roughly 10x what GPT-4o-mini input tokens cost. Claude Opus costs roughly 15x what Claude Haiku costs.
Tool Execution covers the cost of external APIs your agent calls. Search APIs, database queries, code execution sandboxes, and third-party services all have their own pricing. These costs are often overlooked because they do not show up on your LLM provider invoice.
Compute and Infrastructure includes the servers running your agent orchestration code, queue workers processing agent tasks, and any GPU instances for local model inference.
Storage encompasses conversation histories, cached results, vector database storage for agent memory, and log retention for debugging and auditing.
Implementing Per-Task Cost Budgets
The single most important cost control is a per-task budget. Every agent task should have a maximum dollar amount it is allowed to spend. When the budget is exhausted, the agent stops.
var EventEmitter = require("events");
function CostBudget(options) {
this.maxCost = options.maxCost || 1.00;
this.spent = 0;
this.warningThreshold = options.warningThreshold || 0.8;
this.calls = [];
this.emitter = new EventEmitter();
this.taskId = options.taskId || "unknown";
this.userId = options.userId || "unknown";
this.startedAt = Date.now();
}
CostBudget.prototype.recordCost = function (amount, metadata) {
this.spent += amount;
this.calls.push({
amount: amount,
cumulative: this.spent,
timestamp: Date.now(),
model: metadata.model || "unknown",
inputTokens: metadata.inputTokens || 0,
outputTokens: metadata.outputTokens || 0,
operation: metadata.operation || "llm_call"
});
if (this.spent >= this.maxCost) {
this.emitter.emit("budget_exceeded", {
taskId: this.taskId,
spent: this.spent,
maxCost: this.maxCost
});
} else if (this.spent >= this.maxCost * this.warningThreshold) {
this.emitter.emit("budget_warning", {
taskId: this.taskId,
spent: this.spent,
remaining: this.maxCost - this.spent
});
}
return this;
};
CostBudget.prototype.hasRemaining = function () {
return this.spent < this.maxCost;
};
CostBudget.prototype.remaining = function () {
return Math.max(0, this.maxCost - this.spent);
};
CostBudget.prototype.summary = function () {
return {
taskId: this.taskId,
userId: this.userId,
maxCost: this.maxCost,
spent: parseFloat(this.spent.toFixed(6)),
remaining: parseFloat(this.remaining().toFixed(6)),
callCount: this.calls.length,
durationMs: Date.now() - this.startedAt,
calls: this.calls
};
};
Set budgets based on task complexity. A simple classification task might get $0.05. A multi-step research task might get $2.00. A complex code generation pipeline might get $10.00. Start conservative and increase limits as you learn what tasks actually cost.
Cost Tracking Middleware
Every LLM call in your system should pass through a cost tracking layer. This middleware wraps your LLM client, calculates the cost of each call based on token usage and model pricing, and records it against the task budget.
var MODEL_PRICING = {
"gpt-4o": { input: 2.50 / 1000000, output: 10.00 / 1000000 },
"gpt-4o-mini": { input: 0.15 / 1000000, output: 0.60 / 1000000 },
"claude-opus-4": { input: 15.00 / 1000000, output: 75.00 / 1000000 },
"claude-sonnet-4": { input: 3.00 / 1000000, output: 15.00 / 1000000 },
"claude-haiku-3.5": { input: 0.80 / 1000000, output: 4.00 / 1000000 }
};
function TrackedLLMClient(options) {
this.client = options.client;
this.budget = options.budget;
this.pricing = options.pricing || MODEL_PRICING;
this.defaultModel = options.defaultModel || "gpt-4o-mini";
}
TrackedLLMClient.prototype.calculateCost = function (model, usage) {
var pricing = this.pricing[model];
if (!pricing) {
console.warn("No pricing data for model: " + model + ", using estimate");
pricing = { input: 10.00 / 1000000, output: 30.00 / 1000000 };
}
var inputCost = (usage.prompt_tokens || 0) * pricing.input;
var outputCost = (usage.completion_tokens || 0) * pricing.output;
return inputCost + outputCost;
};
TrackedLLMClient.prototype.chat = function (params, callback) {
var self = this;
var model = params.model || self.defaultModel;
if (!self.budget.hasRemaining()) {
return callback(new Error(
"BUDGET_EXCEEDED: Task " + self.budget.taskId +
" has spent $" + self.budget.spent.toFixed(4) +
" of $" + self.budget.maxCost.toFixed(2) + " budget"
));
}
self.client.chat.completions.create(params, function (err, response) {
if (err) {
return callback(err);
}
var usage = response.usage || {};
var cost = self.calculateCost(model, usage);
self.budget.recordCost(cost, {
model: model,
inputTokens: usage.prompt_tokens,
outputTokens: usage.completion_tokens,
operation: "chat_completion"
});
callback(null, response, { cost: cost, cumulative: self.budget.spent });
});
};
The key insight here is that cost tracking must be mandatory, not optional. If any LLM call can bypass the tracking layer, your cost data is unreliable and your budgets are unenforceable. Treat the tracked client as the only way to call LLMs in your codebase.
Model Tiering Strategies
Not every agent subtask requires the most capable model. A planning step that decides which tools to call might need GPT-4o or Claude Sonnet. But a step that extracts a date from a block of text or formats output as JSON can use a model that costs 90% less.
var TIER_CONFIG = {
planning: {
model: "claude-sonnet-4",
description: "Complex reasoning, tool selection, multi-step planning"
},
execution: {
model: "gpt-4o-mini",
description: "Tool calls, data extraction, formatting"
},
validation: {
model: "gpt-4o-mini",
description: "Output validation, schema checking"
},
summarization: {
model: "gpt-4o-mini",
description: "Condensing results, generating summaries"
},
classification: {
model: "claude-haiku-3.5",
description: "Simple categorization, routing decisions"
}
};
function TieredModelSelector(config) {
this.config = config || TIER_CONFIG;
}
TieredModelSelector.prototype.getModel = function (taskType, budget) {
var tier = this.config[taskType];
if (!tier) {
tier = this.config.execution;
}
// If budget is running low, downgrade to cheapest model
if (budget && budget.remaining() < budget.maxCost * 0.2) {
return "gpt-4o-mini";
}
return tier.model;
};
TieredModelSelector.prototype.estimateCost = function (taskType, estimatedTokens) {
var model = this.getModel(taskType);
var pricing = MODEL_PRICING[model];
if (!pricing) return 0;
return (estimatedTokens * 0.3 * pricing.input) +
(estimatedTokens * 0.7 * pricing.output);
};
I have seen teams cut their agent costs by 40-60% simply by running subtask classification, validation, and formatting through cheaper models. The quality difference for those tasks is negligible. Reserve your expensive model budget for the steps where reasoning quality genuinely matters: planning, complex tool selection, and final synthesis.
Early Termination When Budget Is Exceeded
When a budget is exceeded, the agent must stop gracefully. It should not simply crash. It should return whatever partial results it has, explain what it completed and what it did not, and provide enough context for a human to decide whether to allocate more budget or take a different approach.
function AgentRunner(options) {
this.trackedClient = options.trackedClient;
this.budget = options.budget;
this.tools = options.tools || {};
this.maxIterations = options.maxIterations || 20;
this.results = [];
}
AgentRunner.prototype.run = function (task, callback) {
var self = this;
var iteration = 0;
var messages = [
{ role: "system", content: task.systemPrompt },
{ role: "user", content: task.userPrompt }
];
function step() {
iteration++;
if (iteration > self.maxIterations) {
return callback(null, {
status: "max_iterations",
message: "Agent reached " + self.maxIterations + " iterations without completing.",
partialResults: self.results,
costSummary: self.budget.summary()
});
}
if (!self.budget.hasRemaining()) {
return callback(null, {
status: "budget_exceeded",
message: "Task budget of $" + self.budget.maxCost.toFixed(2) +
" exhausted after " + iteration + " iterations. " +
"Spent: $" + self.budget.spent.toFixed(4),
partialResults: self.results,
costSummary: self.budget.summary()
});
}
var model = self.tierSelector
? self.tierSelector.getModel("planning", self.budget)
: "gpt-4o-mini";
self.trackedClient.chat({
model: model,
messages: messages
}, function (err, response, costInfo) {
if (err) {
if (err.message && err.message.indexOf("BUDGET_EXCEEDED") === 0) {
return callback(null, {
status: "budget_exceeded",
partialResults: self.results,
costSummary: self.budget.summary()
});
}
return callback(err);
}
var choice = response.choices[0];
var content = choice.message.content;
messages.push(choice.message);
if (choice.finish_reason === "tool_calls") {
// Execute tools and continue the loop
self.executeTools(choice.message.tool_calls, function (toolResults) {
toolResults.forEach(function (result) {
messages.push(result);
self.results.push(result);
});
step();
});
} else {
self.results.push({ type: "final", content: content });
callback(null, {
status: "completed",
results: self.results,
costSummary: self.budget.summary()
});
}
});
}
step();
};
Notice that early termination returns a structured response, not an error. The calling code can inspect status: "budget_exceeded" and decide what to do. Maybe it shows partial results to the user. Maybe it queues the task for retry with a higher budget. Maybe it escalates to a human. The point is that the agent fails gracefully and cheaply rather than throwing an exception.
Cost-Aware Planning
Sophisticated agents can factor cost into their planning decisions. If an agent needs to gather information from five sources but has limited budget, it should prioritize the cheapest sources first and skip expensive ones if the budget is tight.
function CostAwarePlanner(options) {
this.budget = options.budget;
this.toolCosts = options.toolCosts || {};
}
CostAwarePlanner.prototype.rankActions = function (actions) {
var self = this;
var remaining = self.budget.remaining();
var scored = actions.map(function (action) {
var estimatedCost = self.toolCosts[action.tool] || 0.01;
var estimatedValue = action.expectedValue || 0.5;
var costRatio = estimatedValue / estimatedCost;
return {
action: action,
estimatedCost: estimatedCost,
costRatio: costRatio,
affordable: estimatedCost <= remaining
};
});
// Sort by value/cost ratio, filter out unaffordable actions
scored.sort(function (a, b) {
return b.costRatio - a.costRatio;
});
return scored.filter(function (item) {
return item.affordable;
});
};
CostAwarePlanner.prototype.selectPlan = function (plans) {
var self = this;
var remaining = self.budget.remaining();
var viable = plans.filter(function (plan) {
return plan.estimatedCost <= remaining;
});
if (viable.length === 0) {
return null;
}
// Pick the cheapest viable plan that still meets quality threshold
viable.sort(function (a, b) {
return a.estimatedCost - b.estimatedCost;
});
return viable[0];
};
This approach treats cost as a first-class constraint in agent decision-making. The agent does not just ask "what is the best approach?" It asks "what is the best approach I can afford?"
Token Optimization for Agent Prompts
Agent systems are especially vulnerable to token bloat because each step accumulates context. There are several practical techniques to keep token counts manageable.
Summarize conversation history. Instead of sending the full conversation to each step, periodically summarize older messages into a compact representation. This is sometimes called "context compaction."
function compactHistory(messages, trackedClient, callback) {
if (messages.length < 10) {
return callback(null, messages);
}
var oldMessages = messages.slice(1, -4); // Keep system prompt and last 4
var recentMessages = messages.slice(-4);
var systemMessage = messages[0];
var summaryText = oldMessages.map(function (m) {
return m.role + ": " + (m.content || "").substring(0, 200);
}).join("\n");
trackedClient.chat({
model: "gpt-4o-mini",
messages: [
{
role: "system",
content: "Summarize this conversation history in 3-5 bullet points. Keep key facts, decisions, and results."
},
{ role: "user", content: summaryText }
]
}, function (err, response) {
if (err) return callback(null, messages); // Fall back to full history
var summary = response.choices[0].message.content;
var compacted = [
systemMessage,
{ role: "system", content: "Previous conversation summary:\n" + summary },
].concat(recentMessages);
callback(null, compacted);
});
}
Strip unnecessary fields from tool results. If a tool returns a large JSON payload, extract only the fields the agent needs before adding the result to the conversation.
Use structured output. Tell the model to return JSON with specific fields instead of verbose natural language. Structured responses are typically 50-70% shorter than prose.
Caching Agent Intermediate Results
Many agent tasks involve repetitive operations. If your agent searches for "Node.js error handling best practices" today and another agent searches for the same thing tomorrow, caching the result saves the cost of the search and the LLM call to process it.
var crypto = require("crypto");
function AgentCache(options) {
this.store = {};
this.ttlMs = (options.ttlMinutes || 60) * 60 * 1000;
this.maxEntries = options.maxEntries || 1000;
this.hits = 0;
this.misses = 0;
}
AgentCache.prototype.makeKey = function (operation, params) {
var input = operation + ":" + JSON.stringify(params);
return crypto.createHash("sha256").update(input).digest("hex");
};
AgentCache.prototype.get = function (operation, params) {
var key = this.makeKey(operation, params);
var entry = this.store[key];
if (!entry) {
this.misses++;
return null;
}
if (Date.now() - entry.timestamp > this.ttlMs) {
delete this.store[key];
this.misses++;
return null;
}
this.hits++;
return entry.value;
};
AgentCache.prototype.set = function (operation, params, value, costSaved) {
var key = this.makeKey(operation, params);
if (Object.keys(this.store).length >= this.maxEntries) {
this.evictOldest();
}
this.store[key] = {
value: value,
timestamp: Date.now(),
costSaved: costSaved || 0
};
};
AgentCache.prototype.evictOldest = function () {
var oldestKey = null;
var oldestTime = Infinity;
Object.keys(this.store).forEach(function (key) {
if (this.store[key].timestamp < oldestTime) {
oldestTime = this.store[key].timestamp;
oldestKey = key;
}
}.bind(this));
if (oldestKey) {
delete this.store[oldestKey];
}
};
AgentCache.prototype.stats = function () {
var totalSaved = 0;
var self = this;
Object.keys(self.store).forEach(function (key) {
totalSaved += self.store[key].costSaved || 0;
});
return {
entries: Object.keys(self.store).length,
hits: self.hits,
misses: self.misses,
hitRate: self.hits + self.misses > 0
? (self.hits / (self.hits + self.misses) * 100).toFixed(1) + "%"
: "0%",
estimatedSavings: "$" + totalSaved.toFixed(4)
};
};
For production systems, swap the in-memory store for Redis. The cache key should include the model name if different models produce different results for the same input.
Cost Allocation by User, Team, and Feature
Once you have cost tracking in place, you need to attribute costs to business entities. This is essential for chargeback, capacity planning, and identifying which features are worth their cost.
function CostAllocator() {
this.records = [];
}
CostAllocator.prototype.record = function (entry) {
this.records.push({
timestamp: Date.now(),
userId: entry.userId,
teamId: entry.teamId,
feature: entry.feature,
taskId: entry.taskId,
model: entry.model,
cost: entry.cost,
inputTokens: entry.inputTokens || 0,
outputTokens: entry.outputTokens || 0
});
};
CostAllocator.prototype.aggregate = function (groupBy, startTime, endTime) {
var filtered = this.records.filter(function (r) {
return r.timestamp >= (startTime || 0) &&
r.timestamp <= (endTime || Infinity);
});
var groups = {};
filtered.forEach(function (r) {
var key = r[groupBy] || "unknown";
if (!groups[key]) {
groups[key] = { totalCost: 0, callCount: 0, totalTokens: 0 };
}
groups[key].totalCost += r.cost;
groups[key].callCount += 1;
groups[key].totalTokens += r.inputTokens + r.outputTokens;
});
// Sort by total cost descending
var sorted = Object.keys(groups).map(function (key) {
return {
name: key,
totalCost: parseFloat(groups[key].totalCost.toFixed(6)),
callCount: groups[key].callCount,
totalTokens: groups[key].totalTokens,
avgCostPerCall: parseFloat(
(groups[key].totalCost / groups[key].callCount).toFixed(6)
)
};
});
sorted.sort(function (a, b) {
return b.totalCost - a.totalCost;
});
return sorted;
};
The allocation data answers critical business questions. Which team is spending the most on agents? Which feature has the highest cost per user? Is the research agent worth $3 per run when a developer could do the same task in 15 minutes? These are the questions that determine whether your agent investment pays off.
Real-Time Cost Dashboards and Alerts
Visibility prevents surprises. A real-time dashboard that shows current spend, budget utilization, and cost trends lets your team catch problems before they become expensive.
var express = require("express");
function createCostDashboardRouter(costAllocator, activeBudgets) {
var router = express.Router();
router.get("/summary", function (req, res) {
var now = Date.now();
var oneDayAgo = now - 86400000;
var oneHourAgo = now - 3600000;
var dailyCosts = costAllocator.aggregate("feature", oneDayAgo, now);
var hourlyCosts = costAllocator.aggregate("feature", oneHourAgo, now);
var dailyTotal = dailyCosts.reduce(function (sum, item) {
return sum + item.totalCost;
}, 0);
var hourlyTotal = hourlyCosts.reduce(function (sum, item) {
return sum + item.totalCost;
}, 0);
res.json({
period: {
daily: { total: parseFloat(dailyTotal.toFixed(4)), byFeature: dailyCosts },
hourly: { total: parseFloat(hourlyTotal.toFixed(4)), byFeature: hourlyCosts }
},
projectedDaily: parseFloat((hourlyTotal * 24).toFixed(2)),
activeTasks: Object.keys(activeBudgets).length
});
});
router.get("/active-budgets", function (req, res) {
var budgets = Object.keys(activeBudgets).map(function (taskId) {
var budget = activeBudgets[taskId];
return {
taskId: taskId,
userId: budget.userId,
maxCost: budget.maxCost,
spent: parseFloat(budget.spent.toFixed(6)),
utilization: parseFloat(
((budget.spent / budget.maxCost) * 100).toFixed(1)
),
callCount: budget.calls.length,
durationMs: Date.now() - budget.startedAt
};
});
budgets.sort(function (a, b) {
return b.utilization - a.utilization;
});
res.json({ activeBudgets: budgets });
});
router.get("/alerts", function (req, res) {
var alerts = [];
var now = Date.now();
Object.keys(activeBudgets).forEach(function (taskId) {
var budget = activeBudgets[taskId];
var utilization = budget.spent / budget.maxCost;
var duration = now - budget.startedAt;
if (utilization > 0.9) {
alerts.push({
severity: "critical",
taskId: taskId,
message: "Budget " + (utilization * 100).toFixed(0) + "% utilized"
});
} else if (utilization > 0.7) {
alerts.push({
severity: "warning",
taskId: taskId,
message: "Budget " + (utilization * 100).toFixed(0) + "% utilized"
});
}
if (duration > 300000 && budget.calls.length > 15) {
alerts.push({
severity: "warning",
taskId: taskId,
message: "Long-running task: " + (duration / 1000).toFixed(0) +
"s, " + budget.calls.length + " calls"
});
}
});
alerts.sort(function (a, b) {
var order = { critical: 0, warning: 1, info: 2 };
return (order[a.severity] || 9) - (order[b.severity] || 9);
});
res.json({ alerts: alerts });
});
return router;
}
Pair the dashboard with automated alerts. Send a Slack notification when hourly spend exceeds a threshold. Page the on-call engineer when a single task exceeds $50. These alerts are your last line of defense against runaway costs.
Cost Forecasting Based on Historical Runs
Once you have a few weeks of cost data, you can forecast future spend based on usage patterns. This helps with budgeting and capacity planning.
function CostForecaster(allocator) {
this.allocator = allocator;
}
CostForecaster.prototype.forecast = function (daysAhead) {
var now = Date.now();
var sevenDaysAgo = now - (7 * 86400000);
var records = this.allocator.records.filter(function (r) {
return r.timestamp >= sevenDaysAgo;
});
if (records.length === 0) {
return { forecastedCost: 0, confidence: "low", message: "No historical data" };
}
var totalCost = records.reduce(function (sum, r) { return sum + r.cost; }, 0);
var daysCovered = (now - sevenDaysAgo) / 86400000;
var dailyAverage = totalCost / daysCovered;
// Calculate day-over-day growth rate
var dailyBuckets = {};
records.forEach(function (r) {
var day = new Date(r.timestamp).toISOString().split("T")[0];
if (!dailyBuckets[day]) dailyBuckets[day] = 0;
dailyBuckets[day] += r.cost;
});
var days = Object.keys(dailyBuckets).sort();
var growthRates = [];
for (var i = 1; i < days.length; i++) {
if (dailyBuckets[days[i - 1]] > 0) {
growthRates.push(dailyBuckets[days[i]] / dailyBuckets[days[i - 1]]);
}
}
var avgGrowthRate = growthRates.length > 0
? growthRates.reduce(function (s, r) { return s + r; }, 0) / growthRates.length
: 1.0;
var forecasted = 0;
var currentDaily = dailyAverage;
for (var d = 0; d < daysAhead; d++) {
forecasted += currentDaily;
currentDaily *= avgGrowthRate;
}
return {
forecastedCost: parseFloat(forecasted.toFixed(2)),
dailyAverage: parseFloat(dailyAverage.toFixed(2)),
growthRate: parseFloat(((avgGrowthRate - 1) * 100).toFixed(1)) + "%",
confidence: days.length >= 5 ? "medium" : "low",
daysAhead: daysAhead
};
};
Forecasting is imprecise, especially when agent usage is growing rapidly. Treat forecasts as directional guidance, not precise predictions. The value is in catching trends early. If your daily average is growing 20% week over week, you need to either optimize costs or increase your budget before it becomes a problem.
Comparing Agent Cost vs. Manual Human Cost
This is the calculation that justifies your agent system's existence. For every agent task, you should be able to estimate the equivalent human cost and demonstrate a positive ROI.
function ROICalculator(options) {
this.avgEngineerHourlyCost = options.avgEngineerHourlyCost || 85;
this.overheadMultiplier = options.overheadMultiplier || 1.4;
}
ROICalculator.prototype.compare = function (agentCost, humanMinutes) {
var humanHours = humanMinutes / 60;
var humanCost = humanHours * this.avgEngineerHourlyCost * this.overheadMultiplier;
var savings = humanCost - agentCost;
var savingsPercent = humanCost > 0 ? (savings / humanCost) * 100 : 0;
return {
agentCost: parseFloat(agentCost.toFixed(4)),
humanCost: parseFloat(humanCost.toFixed(2)),
savings: parseFloat(savings.toFixed(2)),
savingsPercent: parseFloat(savingsPercent.toFixed(1)),
worthIt: savings > 0,
breakEvenMinutes: agentCost / (this.avgEngineerHourlyCost * this.overheadMultiplier / 60)
};
};
Be honest in these comparisons. Include the engineering time to build and maintain the agent system. Include the cost of debugging agent failures. A task that takes an agent $0.50 to perform but required $20,000 in engineering time to automate needs to run 40,000 times before it breaks even. Not every task is worth automating.
Implementing Cost Guardrails
Guardrails are hard limits that prevent catastrophic cost events. They operate at multiple levels: per-call, per-task, per-user, and system-wide.
function CostGuardrails(options) {
this.maxCostPerCall = options.maxCostPerCall || 5.00;
this.maxCostPerTask = options.maxCostPerTask || 25.00;
this.maxCostPerUser = options.maxCostPerUser || 100.00;
this.maxDailySystemCost = options.maxDailySystemCost || 500.00;
this.userDailySpend = {};
this.systemDailySpend = 0;
this.lastResetDate = new Date().toDateString();
}
CostGuardrails.prototype.resetIfNewDay = function () {
var today = new Date().toDateString();
if (today !== this.lastResetDate) {
this.userDailySpend = {};
this.systemDailySpend = 0;
this.lastResetDate = today;
}
};
CostGuardrails.prototype.check = function (userId, estimatedCost, taskBudget) {
this.resetIfNewDay();
var violations = [];
if (estimatedCost > this.maxCostPerCall) {
violations.push({
level: "per_call",
message: "Estimated call cost $" + estimatedCost.toFixed(4) +
" exceeds per-call limit of $" + this.maxCostPerCall.toFixed(2)
});
}
if (taskBudget && taskBudget.spent + estimatedCost > this.maxCostPerTask) {
violations.push({
level: "per_task",
message: "Task would exceed per-task limit of $" + this.maxCostPerTask.toFixed(2)
});
}
var userSpend = this.userDailySpend[userId] || 0;
if (userSpend + estimatedCost > this.maxCostPerUser) {
violations.push({
level: "per_user",
message: "User " + userId + " daily spend would exceed $" +
this.maxCostPerUser.toFixed(2) + " limit"
});
}
if (this.systemDailySpend + estimatedCost > this.maxDailySystemCost) {
violations.push({
level: "system",
message: "System daily spend would exceed $" +
this.maxDailySystemCost.toFixed(2) + " limit"
});
}
return {
allowed: violations.length === 0,
violations: violations
};
};
CostGuardrails.prototype.recordSpend = function (userId, cost) {
this.resetIfNewDay();
this.userDailySpend[userId] = (this.userDailySpend[userId] || 0) + cost;
this.systemDailySpend += cost;
};
Guardrails should be non-negotiable. Even administrators should not be able to bypass them without a configuration change that requires a deploy. This prevents late-night debugging sessions where someone temporarily disables limits and forgets to re-enable them.
Complete Working Example
Here is a full agent cost management module that ties together all the concepts: per-task budgets, real-time tracking, model tiering, early termination, and a cost reporting dashboard.
// agent-cost-manager.js
var EventEmitter = require("events");
var express = require("express");
var MODEL_PRICING = {
"gpt-4o": { input: 2.50 / 1000000, output: 10.00 / 1000000 },
"gpt-4o-mini": { input: 0.15 / 1000000, output: 0.60 / 1000000 },
"claude-opus-4": { input: 15.00 / 1000000, output: 75.00 / 1000000 },
"claude-sonnet-4": { input: 3.00 / 1000000, output: 15.00 / 1000000 },
"claude-haiku-3.5": { input: 0.80 / 1000000, output: 4.00 / 1000000 }
};
// ---- Cost Budget ----
function CostBudget(options) {
this.maxCost = options.maxCost || 1.00;
this.spent = 0;
this.warningThreshold = options.warningThreshold || 0.8;
this.calls = [];
this.emitter = new EventEmitter();
this.taskId = options.taskId || "task_" + Date.now();
this.userId = options.userId || "unknown";
this.feature = options.feature || "default";
this.startedAt = Date.now();
}
CostBudget.prototype.recordCost = function (amount, metadata) {
this.spent += amount;
this.calls.push({
amount: amount,
cumulative: this.spent,
timestamp: Date.now(),
model: metadata.model || "unknown",
inputTokens: metadata.inputTokens || 0,
outputTokens: metadata.outputTokens || 0,
operation: metadata.operation || "llm_call"
});
if (this.spent >= this.maxCost) {
this.emitter.emit("exceeded", { taskId: this.taskId, spent: this.spent });
} else if (this.spent >= this.maxCost * this.warningThreshold) {
this.emitter.emit("warning", {
taskId: this.taskId,
remaining: this.maxCost - this.spent
});
}
return this;
};
CostBudget.prototype.hasRemaining = function () {
return this.spent < this.maxCost;
};
CostBudget.prototype.remaining = function () {
return Math.max(0, this.maxCost - this.spent);
};
CostBudget.prototype.summary = function () {
return {
taskId: this.taskId,
userId: this.userId,
feature: this.feature,
maxCost: this.maxCost,
spent: parseFloat(this.spent.toFixed(6)),
remaining: parseFloat(this.remaining().toFixed(6)),
callCount: this.calls.length,
durationMs: Date.now() - this.startedAt
};
};
// ---- Cost Manager ----
function AgentCostManager(options) {
options = options || {};
this.pricing = options.pricing || MODEL_PRICING;
this.activeBudgets = {};
this.completedBudgets = [];
this.guardrails = {
maxCostPerCall: options.maxCostPerCall || 5.00,
maxCostPerTask: options.maxCostPerTask || 25.00,
maxCostPerUserDaily: options.maxCostPerUserDaily || 100.00,
maxSystemDaily: options.maxSystemDaily || 500.00
};
this.userDailySpend = {};
this.systemDailySpend = 0;
this.dailyResetDate = new Date().toDateString();
this.emitter = new EventEmitter();
}
AgentCostManager.prototype.resetDailyIfNeeded = function () {
var today = new Date().toDateString();
if (today !== this.dailyResetDate) {
this.userDailySpend = {};
this.systemDailySpend = 0;
this.dailyResetDate = today;
}
};
AgentCostManager.prototype.createBudget = function (options) {
var budget = new CostBudget(options);
var self = this;
budget.emitter.on("exceeded", function (data) {
self.emitter.emit("budget_exceeded", data);
});
budget.emitter.on("warning", function (data) {
self.emitter.emit("budget_warning", data);
});
this.activeBudgets[budget.taskId] = budget;
return budget;
};
AgentCostManager.prototype.completeBudget = function (taskId) {
var budget = this.activeBudgets[taskId];
if (budget) {
this.completedBudgets.push(budget.summary());
delete this.activeBudgets[taskId];
}
};
AgentCostManager.prototype.calculateCost = function (model, usage) {
var pricing = this.pricing[model];
if (!pricing) {
pricing = { input: 10.00 / 1000000, output: 30.00 / 1000000 };
}
return ((usage.prompt_tokens || 0) * pricing.input) +
((usage.completion_tokens || 0) * pricing.output);
};
AgentCostManager.prototype.checkGuardrails = function (userId, estimatedCost, budget) {
this.resetDailyIfNeeded();
var violations = [];
if (estimatedCost > this.guardrails.maxCostPerCall) {
violations.push("Per-call limit ($" + this.guardrails.maxCostPerCall + ") exceeded");
}
if (budget && budget.spent + estimatedCost > this.guardrails.maxCostPerTask) {
violations.push("Per-task limit ($" + this.guardrails.maxCostPerTask + ") exceeded");
}
var userSpend = this.userDailySpend[userId] || 0;
if (userSpend + estimatedCost > this.guardrails.maxCostPerUserDaily) {
violations.push("User daily limit ($" + this.guardrails.maxCostPerUserDaily + ") exceeded");
}
if (this.systemDailySpend + estimatedCost > this.guardrails.maxSystemDaily) {
violations.push("System daily limit ($" + this.guardrails.maxSystemDaily + ") exceeded");
}
return { allowed: violations.length === 0, violations: violations };
};
AgentCostManager.prototype.recordSpend = function (userId, cost) {
this.resetDailyIfNeeded();
this.userDailySpend[userId] = (this.userDailySpend[userId] || 0) + cost;
this.systemDailySpend += cost;
};
AgentCostManager.prototype.wrapClient = function (llmClient, budget) {
var self = this;
return {
chat: function (params, callback) {
var model = params.model || "gpt-4o-mini";
if (!budget.hasRemaining()) {
return callback(new Error(
"BUDGET_EXCEEDED: $" + budget.spent.toFixed(4) +
" of $" + budget.maxCost.toFixed(2)
));
}
var guardrailCheck = self.checkGuardrails(budget.userId, 0.05, budget);
if (!guardrailCheck.allowed) {
return callback(new Error(
"GUARDRAIL_VIOLATION: " + guardrailCheck.violations.join("; ")
));
}
llmClient.chat.completions.create(params, function (err, response) {
if (err) return callback(err);
var usage = response.usage || {};
var cost = self.calculateCost(model, usage);
budget.recordCost(cost, {
model: model,
inputTokens: usage.prompt_tokens,
outputTokens: usage.completion_tokens,
operation: "chat_completion"
});
self.recordSpend(budget.userId, cost);
callback(null, response, {
cost: cost,
totalSpent: budget.spent,
remaining: budget.remaining()
});
});
}
};
};
AgentCostManager.prototype.selectModel = function (taskType, budget) {
var tiers = {
planning: "claude-sonnet-4",
execution: "gpt-4o-mini",
validation: "gpt-4o-mini",
classification: "claude-haiku-3.5",
summarization: "gpt-4o-mini"
};
var model = tiers[taskType] || "gpt-4o-mini";
// Downgrade if budget is running low
if (budget && budget.remaining() < budget.maxCost * 0.15) {
return "gpt-4o-mini";
}
return model;
};
// ---- Dashboard Router ----
AgentCostManager.prototype.createRouter = function () {
var self = this;
var router = express.Router();
router.get("/summary", function (req, res) {
self.resetDailyIfNeeded();
var activeCount = Object.keys(self.activeBudgets).length;
var activeTotalSpent = 0;
Object.keys(self.activeBudgets).forEach(function (id) {
activeTotalSpent += self.activeBudgets[id].spent;
});
res.json({
systemDailySpend: parseFloat(self.systemDailySpend.toFixed(4)),
systemDailyLimit: self.guardrails.maxSystemDaily,
activeTasks: activeCount,
activeTasksSpend: parseFloat(activeTotalSpent.toFixed(4)),
completedToday: self.completedBudgets.length,
userSpend: Object.keys(self.userDailySpend).map(function (uid) {
return {
userId: uid,
spent: parseFloat(self.userDailySpend[uid].toFixed(4)),
limit: self.guardrails.maxCostPerUserDaily
};
})
});
});
router.get("/active", function (req, res) {
var tasks = Object.keys(self.activeBudgets).map(function (id) {
return self.activeBudgets[id].summary();
});
tasks.sort(function (a, b) { return b.spent - a.spent; });
res.json({ activeTasks: tasks });
});
router.get("/history", function (req, res) {
var limit = parseInt(req.query.limit) || 50;
var recent = self.completedBudgets.slice(-limit).reverse();
res.json({ completedTasks: recent });
});
return router;
};
module.exports = AgentCostManager;
Usage example:
var express = require("express");
var AgentCostManager = require("./agent-cost-manager");
var OpenAI = require("openai");
var app = express();
var openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
var costManager = new AgentCostManager({
maxCostPerCall: 2.00,
maxCostPerTask: 10.00,
maxCostPerUserDaily: 50.00,
maxSystemDaily: 200.00
});
// Mount the cost dashboard
app.use("/admin/costs", costManager.createRouter());
// Listen for cost events
costManager.emitter.on("budget_exceeded", function (data) {
console.error("[COST ALERT] Budget exceeded for task " + data.taskId);
});
costManager.emitter.on("budget_warning", function (data) {
console.warn("[COST WARNING] Task " + data.taskId +
" has $" + data.remaining.toFixed(4) + " remaining");
});
// Example: Run an agent task with cost management
function handleAgentRequest(req, res) {
var budget = costManager.createBudget({
maxCost: 2.00,
userId: req.user.id,
feature: "research_agent",
taskId: "task_" + Date.now()
});
var wrappedClient = costManager.wrapClient(openai, budget);
var model = costManager.selectModel("planning", budget);
wrappedClient.chat({
model: model,
messages: [
{ role: "system", content: "You are a research assistant." },
{ role: "user", content: req.body.query }
]
}, function (err, response, costInfo) {
costManager.completeBudget(budget.taskId);
if (err) {
return res.status(err.message.indexOf("BUDGET") === 0 ? 402 : 500)
.json({ error: err.message, costSummary: budget.summary() });
}
res.json({
result: response.choices[0].message.content,
costSummary: budget.summary()
});
});
}
app.post("/agent/research", handleAgentRequest);
app.listen(3000);
Common Issues and Troubleshooting
1. Token counts are missing from API responses
Error: Cannot read property 'prompt_tokens' of undefined
Some LLM providers do not include usage data in streaming responses or when certain parameters are set. Always default to zero when usage data is missing, and log a warning so you know tracking is incomplete. For streaming responses, you typically need to collect the final chunk which contains the usage summary.
2. Cost calculations drift from actual provider invoices
Warning: Tracked spend $47.23 but provider invoice shows $52.89 (11.9% discrepancy)
This happens when your local pricing table is stale, when you miss tracking some calls (especially retries from HTTP client libraries), or when the provider applies different pricing for cached/batched requests. Reconcile your tracked costs against provider invoices weekly. Update your pricing table whenever providers announce price changes. Add tracking to retry logic at the HTTP client level, not just at your application level.
3. Budget exceeded errors interrupting user workflows
Error: BUDGET_EXCEEDED: Task task_1707892345 has spent $2.0012 of $2.00 budget
This occurs when a task legitimately needs more budget than was allocated. The fix is not to remove budget limits. Instead, implement a graduated response: warn the user at 80% utilization, offer to extend the budget at 100%, and hard-stop only at an elevated limit (e.g., 150% of original budget). This gives the agent room to finish a nearly-complete task without opening the door to unlimited spend.
4. Race conditions in concurrent budget tracking
Error: Task spent $12.47 but maxCost was $10.00 — multiple calls recorded simultaneously
When an agent makes parallel tool calls, multiple cost recordings can happen simultaneously, each checking hasRemaining() before any of them record their cost. In Node.js this is less common due to the single-threaded event loop, but it can happen with truly concurrent async operations. Use an atomic check-and-record operation, or pre-reserve estimated cost before making the call and reconcile afterward.
5. Stale model pricing causing incorrect cost estimates
Warning: No pricing data for model: gpt-4o-2025-11-20, using estimate
LLM providers frequently release new model versions with different pricing. When your pricing table does not include the specific model version, the fallback estimate may be significantly wrong. Maintain a model alias map that resolves versioned model names to their base pricing, and set up automated alerts when unknown model names appear in your logs.
Best Practices
Set budgets before creating agents, not after. The budget should be a required parameter for agent instantiation. If you allow agents to run without budgets even temporarily, someone will forget to add one and you will get a surprise bill.
Log every LLM call with its cost, even in development. Developers who test agents locally without cost visibility will build expensive workflows without realizing it. Make cost a visible part of the development experience, not just a production concern.
Use model tiering aggressively. Most agent subtasks do not require the most capable model. Classification, validation, formatting, and summarization tasks can all run on models that cost 90% less with negligible quality difference. Reserve expensive models for planning and complex reasoning steps.
Implement circuit breakers for cost spikes. If your hourly spend suddenly doubles, something is wrong. Automatically throttle new agent task creation when spend exceeds normal bounds. This prevents a bug or abuse pattern from burning through your monthly budget in hours.
Cache deterministic operations. If a tool call with identical parameters will return the same result, cache it. Search results, document retrievals, and database lookups are all good candidates. Even a simple in-memory cache with a 30-minute TTL can save 20-40% on repeat queries.
Track cost per business outcome, not just per API call. The meaningful metric is not "we spent $0.03 on this LLM call." It is "we spent $1.47 to generate this research report" or "we spent $0.12 to classify this support ticket." Cost per outcome tells you whether the agent is economically viable. Cost per call is just a debugging detail.
Build cost awareness into your agent's prompt. Tell the agent its budget and encourage it to be efficient. A system prompt that says "You have a budget of $2.00 for this task. Prefer concise responses and avoid unnecessary tool calls" measurably reduces spend without significantly impacting quality.
Reconcile tracked costs against provider invoices monthly. Your internal tracking will always have some drift from actual charges. Monthly reconciliation catches systematic errors in your pricing tables and reveals untracked calls that slip through your middleware.
Plan for cost optimization as a continuous process. Model pricing changes, new cheaper models become available, and your agent architectures evolve. Schedule quarterly reviews of your agent cost patterns to identify optimization opportunities. What was the cheapest approach six months ago may not be today.
References
- OpenAI Pricing - Current model pricing for GPT-4o, GPT-4o-mini, and other models
- Anthropic Pricing - Claude model pricing tiers
- OpenAI Usage API - Programmatic access to usage and billing data
- Node.js EventEmitter Documentation - Event-driven architecture for cost alerts
- Express.js Routing - Building dashboard API endpoints
- tiktoken - Token counting library for pre-estimating costs before API calls