Tool Selection Strategies for AI Agents
Strategies for managing AI agent tool sets including dynamic loading, relevance scoring, permissions, and usage analytics in Node.js.
Tool Selection Strategies for AI Agents
Overview
The difference between a competent AI agent and one that fumbles through every task often comes down to how you manage its tools. When you hand an LLM 50 tool definitions in a single prompt, it wastes tokens parsing irrelevant options and frequently picks the wrong one. This article covers battle-tested strategies for tool registry design, dynamic loading, relevance scoring, permission enforcement, and usage analytics that keep your agent fast and accurate in production Node.js systems.
Prerequisites
- Node.js 18+ installed
- Basic understanding of LLM function calling (tool use) APIs
- Familiarity with embedding APIs (OpenAI or Anthropic)
- Working knowledge of Express.js for the complete example
- Experience building at least one simple LLM-powered application
Why Tool Selection Matters
Most developers start by dumping every tool into the system prompt. It works fine with 5 tools. By the time you reach 15 or 20, things start breaking. The model picks the wrong tool, hallucinates parameters, or gets stuck in loops calling the same function repeatedly.
Here is what actually happens inside the model when you add tools. Every tool definition consumes input tokens. A well-described tool with parameter schemas runs 200-400 tokens. Twenty tools means 4,000-8,000 tokens of your context window are just tool definitions before the user even says anything. That is not just a cost problem; it degrades reasoning quality. Research from both Anthropic and OpenAI confirms that model accuracy on tool selection drops measurably as the number of available tools increases.
I have seen this firsthand. An internal agent we built had 34 tools covering database operations, file management, API calls, and deployment tasks. It worked in testing. In production, it started calling the delete_file tool when users asked to "remove a database record." The tool descriptions were too similar, and the model had too many options to reason about correctly.
The fix is not to build fewer tools. It is to be smarter about which tools the model sees at any given moment.
Designing Clear Tool Descriptions That Guide Selection
The single most impactful thing you can do is write better tool descriptions. The model reads these descriptions to decide which tool to call. Vague descriptions lead to wrong choices.
// Bad: vague, overlapping descriptions
var badTools = [
{
name: "remove_item",
description: "Removes an item from the system"
},
{
name: "delete_record",
description: "Deletes a record"
}
];
// Good: specific, disambiguated descriptions
var goodTools = [
{
name: "remove_cart_item",
description: "Removes a product from the user's shopping cart by cart_item_id. Does NOT delete the product from the catalog. Use this when the user wants to take something out of their cart before checkout."
},
{
name: "delete_database_record",
description: "Permanently deletes a record from a database table by primary key. This is irreversible. Only use this for administrative data cleanup, never for user-facing cart or order operations."
}
];
Three rules I follow for every tool description:
- State what it does AND what it does not do. Negative constraints are powerful disambiguation signals.
- Include when to use it. Give the model a decision rule, not just a definition.
- Name parameters descriptively.
user_emailis better thanemail.source_file_pathis better thanpath.
Tool Categorization and Namespacing
Once you have more than 10 tools, organize them into categories. This is not just for your own sanity; it enables dynamic loading strategies we will cover next.
var toolRegistry = {
categories: {
"database": {
description: "Tools for reading and writing database records",
tools: ["db_query", "db_insert", "db_update", "db_delete"]
},
"filesystem": {
description: "Tools for reading, writing, and managing files",
tools: ["read_file", "write_file", "list_directory", "delete_file"]
},
"api": {
description: "Tools for making external HTTP API calls",
tools: ["http_get", "http_post", "graphql_query"]
},
"deployment": {
description: "Tools for deploying and managing application instances",
tools: ["deploy_service", "rollback_deploy", "check_health"]
}
}
};
Namespacing your tool names helps the model as well. Instead of query and get, use db_query and api_get. The prefix acts as a category hint that costs zero extra reasoning.
Dynamic Tool Loading Based on Context
This is where the real performance gains come from. Instead of loading all tools at the start of a conversation, you analyze the user's intent and load only the relevant category.
var toolLoader = require("./toolLoader");
function selectToolsForMessage(message, conversationHistory) {
var categories = detectRelevantCategories(message, conversationHistory);
var tools = [];
for (var i = 0; i < categories.length; i++) {
var categoryTools = toolLoader.getToolsByCategory(categories[i]);
tools = tools.concat(categoryTools);
}
// Always include core tools available in every turn
var coreTools = toolLoader.getToolsByCategory("core");
tools = coreTools.concat(tools);
return deduplicateTools(tools);
}
function detectRelevantCategories(message, history) {
var categories = [];
var lowerMessage = message.toLowerCase();
var categoryKeywords = {
"database": ["query", "database", "record", "table", "sql", "insert", "update", "delete record"],
"filesystem": ["file", "directory", "folder", "read", "write", "path", "upload"],
"api": ["api", "endpoint", "http", "request", "webhook", "url", "fetch"],
"deployment": ["deploy", "release", "rollback", "staging", "production", "health check"]
};
var keys = Object.keys(categoryKeywords);
for (var i = 0; i < keys.length; i++) {
var keywords = categoryKeywords[keys[i]];
for (var j = 0; j < keywords.length; j++) {
if (lowerMessage.indexOf(keywords[j]) !== -1) {
categories.push(keys[i]);
break;
}
}
}
// Fallback: if no categories detected, include the most common ones
if (categories.length === 0) {
categories = ["database", "filesystem"];
}
return categories;
}
Keyword matching is the simplest approach. It works surprisingly well for 80% of cases. For the other 20%, you need something smarter, which brings us to relevance scoring.
Implementing a Tool Registry with Metadata
A production tool registry is more than a list of names and descriptions. Each tool needs metadata that supports dynamic loading, permission checks, and usage tracking.
function ToolRegistry() {
this.tools = {};
this.categories = {};
this.usageStats = {};
}
ToolRegistry.prototype.register = function(toolDefinition) {
var name = toolDefinition.name;
this.tools[name] = {
name: name,
description: toolDefinition.description,
parameters: toolDefinition.parameters || {},
category: toolDefinition.category || "uncategorized",
permission: toolDefinition.permission || "read",
version: toolDefinition.version || "1.0.0",
deprecated: toolDefinition.deprecated || false,
deprecatedBy: toolDefinition.deprecatedBy || null,
requires: toolDefinition.requires || [],
conflicts: toolDefinition.conflicts || [],
embedding: null,
tags: toolDefinition.tags || []
};
// Index by category
var category = this.tools[name].category;
if (!this.categories[category]) {
this.categories[category] = [];
}
this.categories[category].push(name);
// Initialize usage stats
this.usageStats[name] = {
callCount: 0,
successCount: 0,
failureCount: 0,
avgLatencyMs: 0,
lastUsed: null
};
};
ToolRegistry.prototype.getToolSchema = function(name) {
var tool = this.tools[name];
if (!tool) return null;
// Return the schema format expected by LLM APIs
return {
type: "function",
function: {
name: tool.name,
description: tool.description,
parameters: tool.parameters
}
};
};
ToolRegistry.prototype.getToolsByCategory = function(category) {
var toolNames = this.categories[category] || [];
var schemas = [];
for (var i = 0; i < toolNames.length; i++) {
var tool = this.tools[toolNames[i]];
if (!tool.deprecated) {
schemas.push(this.getToolSchema(toolNames[i]));
}
}
return schemas;
};
module.exports = ToolRegistry;
The requires and conflicts fields are critical and often overlooked. They let you enforce tool dependency chains and prevent conflicting tools from appearing together.
Tool Relevance Scoring with Embeddings
Keyword matching fails when users phrase things indirectly. "Can you check if the server is alive?" should trigger deployment tools, but "server" and "alive" might not be in your keyword list. Embedding-based relevance scoring handles this gracefully.
The idea is simple: pre-compute an embedding for each tool's description, then compare the user's message embedding against all tool embeddings. Return the top-N matches.
var https = require("https");
function EmbeddingMatcher(options) {
this.apiKey = options.apiKey;
this.model = options.model || "text-embedding-3-small";
this.threshold = options.threshold || 0.3;
this.topK = options.topK || 8;
this.cache = {};
}
EmbeddingMatcher.prototype.getEmbedding = function(text, callback) {
var cacheKey = text.substring(0, 100);
if (this.cache[cacheKey]) {
return callback(null, this.cache[cacheKey]);
}
var self = this;
var postData = JSON.stringify({
model: this.model,
input: text
});
var options = {
hostname: "api.openai.com",
path: "/v1/embeddings",
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": "Bearer " + this.apiKey
}
};
var req = https.request(options, function(res) {
var body = "";
res.on("data", function(chunk) { body += chunk; });
res.on("end", function() {
var parsed = JSON.parse(body);
if (parsed.data && parsed.data[0]) {
var embedding = parsed.data[0].embedding;
self.cache[cacheKey] = embedding;
callback(null, embedding);
} else {
callback(new Error("No embedding returned: " + body));
}
});
});
req.on("error", callback);
req.write(postData);
req.end();
};
EmbeddingMatcher.prototype.cosineSimilarity = function(a, b) {
var dotProduct = 0;
var normA = 0;
var normB = 0;
for (var i = 0; i < a.length; i++) {
dotProduct += a[i] * b[i];
normA += a[i] * a[i];
normB += b[i] * b[i];
}
return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
};
EmbeddingMatcher.prototype.rankTools = function(userMessage, registry, callback) {
var self = this;
var toolNames = Object.keys(registry.tools);
self.getEmbedding(userMessage, function(err, messageEmbedding) {
if (err) return callback(err);
var scores = [];
for (var i = 0; i < toolNames.length; i++) {
var tool = registry.tools[toolNames[i]];
if (tool.deprecated) continue;
if (!tool.embedding) continue;
var similarity = self.cosineSimilarity(messageEmbedding, tool.embedding);
if (similarity >= self.threshold) {
scores.push({
name: tool.name,
score: similarity,
category: tool.category
});
}
}
scores.sort(function(a, b) { return b.score - a.score; });
callback(null, scores.slice(0, self.topK));
});
};
module.exports = EmbeddingMatcher;
Pre-compute and store tool embeddings at startup. The embedding API call for each user message adds 50-100ms of latency, but the improvement in tool selection accuracy makes it worthwhile. In my testing, embedding-based selection reduced wrong-tool errors by about 60% compared to keyword matching alone.
Limiting Tool Sets Per Conversation Turn
Even with relevance scoring, you should cap the number of tools per turn. My rule of thumb:
- 5-8 tools: Optimal. Model rarely picks wrong.
- 9-15 tools: Acceptable. Expect occasional misses.
- 16+ tools: Problematic. Error rates climb noticeably.
function selectToolsForTurn(rankedTools, registry, options) {
var maxTools = options.maxTools || 8;
var alwaysInclude = options.alwaysInclude || [];
var selected = [];
// Always include core tools first
for (var i = 0; i < alwaysInclude.length; i++) {
var schema = registry.getToolSchema(alwaysInclude[i]);
if (schema) {
selected.push(schema);
}
}
// Fill remaining slots with ranked tools
for (var j = 0; j < rankedTools.length; j++) {
if (selected.length >= maxTools) break;
var toolName = rankedTools[j].name;
// Skip if already included
var alreadyIncluded = false;
for (var k = 0; k < selected.length; k++) {
if (selected[k].function.name === toolName) {
alreadyIncluded = true;
break;
}
}
if (!alreadyIncluded) {
var schema = registry.getToolSchema(toolName);
if (schema) {
selected.push(schema);
}
}
}
return selected;
}
The alwaysInclude array is for tools that should be available on every turn regardless of context. Common examples: a "think" tool for chain-of-thought, a "done" tool to signal task completion, or a "ask_user" tool for clarification.
Tool Dependency Chains
Some tools only make sense after another tool has been called. You cannot update a database record without first querying for it. You should not deploy a service without first running a health check. Tool dependencies enforce these constraints at the registry level.
ToolRegistry.prototype.resolveDependencies = function(toolName) {
var tool = this.tools[toolName];
if (!tool) return [];
var resolved = [];
var visited = {};
function resolve(name, registry) {
if (visited[name]) return;
visited[name] = true;
var t = registry.tools[name];
if (!t) return;
// Resolve prerequisites first
for (var i = 0; i < t.requires.length; i++) {
resolve(t.requires[i], registry);
}
resolved.push(name);
}
resolve(toolName, this);
return resolved;
};
ToolRegistry.prototype.validateToolCall = function(toolName, conversationTools) {
var tool = this.tools[toolName];
if (!tool) {
return { valid: false, reason: "Tool '" + toolName + "' not found in registry" };
}
// Check that all required tools have been called previously
for (var i = 0; i < tool.requires.length; i++) {
var requiredTool = tool.requires[i];
var wasCalled = false;
for (var j = 0; j < conversationTools.length; j++) {
if (conversationTools[j].name === requiredTool && conversationTools[j].status === "success") {
wasCalled = true;
break;
}
}
if (!wasCalled) {
return {
valid: false,
reason: "Tool '" + toolName + "' requires '" + requiredTool + "' to be called first"
};
}
}
return { valid: true };
};
When validation fails, inject the error message back into the conversation so the model can self-correct. Most models handle this well: "I need to call db_query before I can call db_update. Let me look up the record first."
Handling Tool Conflicts and Overlapping Functionality
Tool conflicts arise when two tools can technically accomplish the same task but should not both be offered simultaneously. A common example: a write_file tool and a patch_file tool. If both appear, the model may use write_file to overwrite a file when it should have used patch_file to make a targeted edit.
ToolRegistry.prototype.filterConflicts = function(selectedTools) {
var filtered = [];
var excludedNames = {};
for (var i = 0; i < selectedTools.length; i++) {
var toolName = selectedTools[i].function.name;
var tool = this.tools[toolName];
if (excludedNames[toolName]) continue;
filtered.push(selectedTools[i]);
// Mark conflicting tools for exclusion
if (tool && tool.conflicts) {
for (var j = 0; j < tool.conflicts.length; j++) {
excludedNames[tool.conflicts[j]] = true;
}
}
}
return filtered;
};
The order matters here. Higher-priority tools (the ones that appear first in the ranked list) win the conflict resolution. If patch_file ranks higher than write_file for a given message, write_file gets excluded.
Tool Permission Levels
Not every tool should be callable in every context. A customer support agent should not have access to drop_database. A read-only analytics agent should not be able to modify records.
var PERMISSION_LEVELS = {
"read": 1,
"write": 2,
"admin": 3,
"dangerous": 4
};
function PermissionGate(options) {
this.maxPermission = PERMISSION_LEVELS[options.maxPermission] || 1;
this.requireConfirmation = options.requireConfirmation || ["dangerous"];
this.auditLog = [];
}
PermissionGate.prototype.filterByPermission = function(tools, registry) {
var self = this;
var allowed = [];
for (var i = 0; i < tools.length; i++) {
var toolName = tools[i].function.name;
var tool = registry.tools[toolName];
if (!tool) continue;
var toolPermLevel = PERMISSION_LEVELS[tool.permission] || 1;
if (toolPermLevel <= self.maxPermission) {
allowed.push(tools[i]);
}
}
return allowed;
};
PermissionGate.prototype.checkExecution = function(toolName, registry, userId) {
var tool = registry.tools[toolName];
if (!tool) return { allowed: false, reason: "Unknown tool" };
var toolPermLevel = PERMISSION_LEVELS[tool.permission] || 1;
if (toolPermLevel > this.maxPermission) {
this.auditLog.push({
timestamp: new Date().toISOString(),
tool: toolName,
userId: userId,
action: "blocked",
reason: "Insufficient permission"
});
return {
allowed: false,
reason: "Tool '" + toolName + "' requires '" + tool.permission + "' permission"
};
}
var needsConfirmation = this.requireConfirmation.indexOf(tool.permission) !== -1;
this.auditLog.push({
timestamp: new Date().toISOString(),
tool: toolName,
userId: userId,
action: needsConfirmation ? "pending_confirmation" : "allowed"
});
return {
allowed: true,
requiresConfirmation: needsConfirmation
};
};
I recommend three permission tiers for most agents: read (queries, lookups, searches), write (create, update operations), and dangerous (delete, deploy, anything irreversible). The admin tier exists for internal tools that should never be exposed to end-user-facing agents.
Versioning Tools Without Breaking Agent Behavior
Tools evolve. Parameters get added, descriptions change, behavior gets refined. If you swap out a tool definition between deployments, existing conversations that reference the old version can break. A versioning strategy prevents this.
ToolRegistry.prototype.registerVersion = function(toolDefinition) {
var name = toolDefinition.name;
var version = toolDefinition.version || "1.0.0";
var versionedName = name + "_v" + version.replace(/\./g, "_");
// Register the versioned tool
toolDefinition.name = versionedName;
this.register(toolDefinition);
// Point the unversioned name to the latest version
this.tools[name] = this.tools[versionedName];
this.tools[name].name = name;
this.tools[name].latestVersion = version;
this.tools[name].allVersions = this.tools[name].allVersions || [];
this.tools[name].allVersions.push(version);
};
ToolRegistry.prototype.deprecateTool = function(oldName, newName) {
if (this.tools[oldName]) {
this.tools[oldName].deprecated = true;
this.tools[oldName].deprecatedBy = newName;
// Update the old tool's description to redirect
this.tools[oldName].description =
"[DEPRECATED - use " + newName + " instead] " + this.tools[oldName].description;
}
};
The deprecation approach is a soft migration. The old tool stays registered but gets marked as deprecated so it is excluded from dynamic loading. Any hardcoded references in existing conversation histories still resolve, but new conversations will use the replacement.
For breaking changes (parameter renames, behavior changes), I deploy both versions simultaneously for a transition period. The old version logs a warning and delegates to the new version internally.
Measuring Tool Usage Patterns and Optimizing the Registry
You cannot optimize what you do not measure. Track every tool call, its outcome, and how it was selected.
ToolRegistry.prototype.recordUsage = function(toolName, result) {
var stats = this.usageStats[toolName];
if (!stats) return;
stats.callCount++;
stats.lastUsed = new Date().toISOString();
if (result.success) {
stats.successCount++;
} else {
stats.failureCount++;
}
// Running average latency
if (result.latencyMs) {
var total = stats.avgLatencyMs * (stats.callCount - 1) + result.latencyMs;
stats.avgLatencyMs = Math.round(total / stats.callCount);
}
};
ToolRegistry.prototype.getUsageReport = function() {
var report = [];
var toolNames = Object.keys(this.usageStats);
for (var i = 0; i < toolNames.length; i++) {
var name = toolNames[i];
var stats = this.usageStats[name];
var errorRate = stats.callCount > 0
? Math.round((stats.failureCount / stats.callCount) * 100)
: 0;
report.push({
tool: name,
calls: stats.callCount,
successRate: (100 - errorRate) + "%",
errorRate: errorRate + "%",
avgLatencyMs: stats.avgLatencyMs,
lastUsed: stats.lastUsed
});
}
// Sort by call count descending
report.sort(function(a, b) { return b.calls - a.calls; });
return report;
};
Usage data reveals important insights:
- Never-called tools should be removed. They consume tokens without adding value.
- High-error-rate tools have bad descriptions or broken implementations. Fix them.
- Frequently co-selected tools might be candidates for a single combined tool.
- Latency outliers may need caching or async execution.
I run a weekly review of our tool usage reports. We have removed about 15% of our tools over six months purely based on usage data. The agent got measurably more accurate each time.
Progressive Tool Disclosure
Start the agent with a minimal tool set and add tools as the conversation develops. This mirrors how humans work: you do not lay out every tool in your workshop before starting a job.
function ProgressiveToolManager(registry) {
this.registry = registry;
this.disclosedTools = {};
this.stages = {
"initial": ["search", "ask_user", "think"],
"exploration": ["read_file", "list_directory", "db_query"],
"modification": ["write_file", "db_update", "db_insert"],
"deployment": ["run_tests", "deploy_service", "check_health"]
};
this.currentStage = "initial";
}
ProgressiveToolManager.prototype.getToolsForStage = function() {
var stageTools = this.stages[this.currentStage] || [];
var schemas = [];
for (var i = 0; i < stageTools.length; i++) {
var schema = this.registry.getToolSchema(stageTools[i]);
if (schema) {
schemas.push(schema);
this.disclosedTools[stageTools[i]] = true;
}
}
return schemas;
};
ProgressiveToolManager.prototype.advanceStage = function(toolCallHistory) {
var lastTool = toolCallHistory[toolCallHistory.length - 1];
if (!lastTool) return;
// Advance based on what tools have been used
if (this.currentStage === "initial" && this.hasUsedAny(toolCallHistory, ["search", "ask_user"])) {
this.currentStage = "exploration";
} else if (this.currentStage === "exploration" && this.hasUsedAny(toolCallHistory, ["read_file", "db_query"])) {
this.currentStage = "modification";
} else if (this.currentStage === "modification" && this.hasUsedAny(toolCallHistory, ["write_file", "db_update"])) {
this.currentStage = "deployment";
}
};
ProgressiveToolManager.prototype.hasUsedAny = function(history, toolNames) {
for (var i = 0; i < history.length; i++) {
for (var j = 0; j < toolNames.length; j++) {
if (history[i].name === toolNames[j]) return true;
}
}
return false;
};
Progressive disclosure has a psychological benefit as well. When users see an agent methodically go through search, read, modify, and deploy stages, it builds confidence that the agent is being careful rather than reckless.
Complete Working Example
Here is a complete Node.js agent tool registry that ties together categorization, dynamic loading, relevance scoring, permission levels, and usage analytics. This is a working Express.js server you can run and test.
// agent-tool-server.js
var express = require("express");
var https = require("https");
var app = express();
app.use(express.json());
// ============================================
// Tool Registry
// ============================================
function ToolRegistry() {
this.tools = {};
this.categories = {};
this.usageStats = {};
}
ToolRegistry.prototype.register = function(def) {
var name = def.name;
this.tools[name] = {
name: name,
description: def.description,
parameters: def.parameters || { type: "object", properties: {} },
category: def.category || "general",
permission: def.permission || "read",
version: def.version || "1.0.0",
deprecated: def.deprecated || false,
deprecatedBy: def.deprecatedBy || null,
requires: def.requires || [],
conflicts: def.conflicts || [],
tags: def.tags || [],
embedding: null
};
if (!this.categories[def.category]) {
this.categories[def.category] = [];
}
this.categories[def.category].push(name);
this.usageStats[name] = {
callCount: 0, successCount: 0, failureCount: 0,
avgLatencyMs: 0, lastUsed: null
};
};
ToolRegistry.prototype.getSchema = function(name) {
var tool = this.tools[name];
if (!tool || tool.deprecated) return null;
return {
type: "function",
function: { name: tool.name, description: tool.description, parameters: tool.parameters }
};
};
ToolRegistry.prototype.getByCategory = function(category) {
var names = this.categories[category] || [];
var schemas = [];
for (var i = 0; i < names.length; i++) {
var schema = this.getSchema(names[i]);
if (schema) schemas.push(schema);
}
return schemas;
};
ToolRegistry.prototype.recordUsage = function(name, result) {
var s = this.usageStats[name];
if (!s) return;
s.callCount++;
s.lastUsed = new Date().toISOString();
if (result.success) { s.successCount++; } else { s.failureCount++; }
if (result.latencyMs) {
var total = s.avgLatencyMs * (s.callCount - 1) + result.latencyMs;
s.avgLatencyMs = Math.round(total / s.callCount);
}
};
// ============================================
// Permission Gate
// ============================================
var PERM_LEVELS = { "read": 1, "write": 2, "admin": 3, "dangerous": 4 };
function PermissionGate(maxLevel) {
this.maxLevel = PERM_LEVELS[maxLevel] || 1;
this.auditLog = [];
}
PermissionGate.prototype.filter = function(schemas, registry) {
var allowed = [];
for (var i = 0; i < schemas.length; i++) {
var name = schemas[i].function.name;
var tool = registry.tools[name];
if (tool && PERM_LEVELS[tool.permission] <= this.maxLevel) {
allowed.push(schemas[i]);
}
}
return allowed;
};
PermissionGate.prototype.check = function(toolName, registry, userId) {
var tool = registry.tools[toolName];
if (!tool) return { allowed: false, reason: "Unknown tool: " + toolName };
var level = PERM_LEVELS[tool.permission] || 1;
var allowed = level <= this.maxLevel;
this.auditLog.push({
timestamp: new Date().toISOString(),
tool: toolName, userId: userId,
action: allowed ? "allowed" : "blocked"
});
return { allowed: allowed, reason: allowed ? null : "Insufficient permission" };
};
// ============================================
// Embedding-Based Tool Selector
// ============================================
function ToolSelector(apiKey) {
this.apiKey = apiKey;
this.embeddingCache = {};
}
ToolSelector.prototype.getEmbedding = function(text, callback) {
var key = text.substring(0, 200);
if (this.embeddingCache[key]) return callback(null, this.embeddingCache[key]);
var self = this;
var postData = JSON.stringify({ model: "text-embedding-3-small", input: text });
var opts = {
hostname: "api.openai.com", path: "/v1/embeddings", method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": "Bearer " + self.apiKey
}
};
var req = https.request(opts, function(res) {
var body = "";
res.on("data", function(c) { body += c; });
res.on("end", function() {
try {
var parsed = JSON.parse(body);
if (parsed.data && parsed.data[0]) {
self.embeddingCache[key] = parsed.data[0].embedding;
callback(null, parsed.data[0].embedding);
} else {
callback(new Error("Embedding API error: " + body));
}
} catch (e) {
callback(e);
}
});
});
req.on("error", callback);
req.write(postData);
req.end();
};
ToolSelector.prototype.cosine = function(a, b) {
var dot = 0, na = 0, nb = 0;
for (var i = 0; i < a.length; i++) {
dot += a[i] * b[i]; na += a[i] * a[i]; nb += b[i] * b[i];
}
return dot / (Math.sqrt(na) * Math.sqrt(nb));
};
ToolSelector.prototype.precomputeEmbeddings = function(registry, callback) {
var self = this;
var names = Object.keys(registry.tools);
var pending = names.length;
if (pending === 0) return callback();
for (var i = 0; i < names.length; i++) {
(function(name) {
var tool = registry.tools[name];
var text = name + ": " + tool.description + " " + tool.tags.join(" ");
self.getEmbedding(text, function(err, embedding) {
if (!err && embedding) {
tool.embedding = embedding;
}
pending--;
if (pending === 0) callback();
});
})(names[i]);
}
};
ToolSelector.prototype.rank = function(message, registry, maxResults, callback) {
var self = this;
self.getEmbedding(message, function(err, msgEmb) {
if (err) return callback(err);
var scores = [];
var names = Object.keys(registry.tools);
for (var i = 0; i < names.length; i++) {
var tool = registry.tools[names[i]];
if (tool.deprecated || !tool.embedding) continue;
var score = self.cosine(msgEmb, tool.embedding);
if (score > 0.25) {
scores.push({ name: names[i], score: score });
}
}
scores.sort(function(a, b) { return b.score - a.score; });
callback(null, scores.slice(0, maxResults || 8));
});
};
// ============================================
// Register Tools
// ============================================
var registry = new ToolRegistry();
registry.register({
name: "search_documents",
description: "Search indexed documents by keyword or semantic query. Returns document titles, snippets, and relevance scores. Use this to find information before answering questions.",
category: "retrieval",
permission: "read",
tags: ["search", "query", "documents", "knowledge"],
parameters: {
type: "object",
properties: {
query: { type: "string", description: "Search query" },
limit: { type: "number", description: "Max results (default 10)" }
},
required: ["query"]
}
});
registry.register({
name: "read_file",
description: "Read the contents of a file by its absolute path. Returns the file content as a string. Use this to examine source code, configs, or data files.",
category: "filesystem",
permission: "read",
tags: ["file", "read", "source code", "content"],
parameters: {
type: "object",
properties: {
path: { type: "string", description: "Absolute file path" }
},
required: ["path"]
}
});
registry.register({
name: "write_file",
description: "Write content to a file, creating it if it does not exist or overwriting if it does. Use this for creating new files or completely replacing file contents. For partial edits, use patch_file instead.",
category: "filesystem",
permission: "write",
requires: ["read_file"],
conflicts: ["patch_file"],
tags: ["file", "write", "create", "overwrite"],
parameters: {
type: "object",
properties: {
path: { type: "string", description: "Absolute file path" },
content: { type: "string", description: "File content to write" }
},
required: ["path", "content"]
}
});
registry.register({
name: "patch_file",
description: "Apply a targeted edit to an existing file by replacing a specific string. Use this for modifying part of a file without rewriting the whole thing. For full rewrites, use write_file instead.",
category: "filesystem",
permission: "write",
requires: ["read_file"],
conflicts: ["write_file"],
tags: ["file", "edit", "patch", "modify"],
parameters: {
type: "object",
properties: {
path: { type: "string", description: "Absolute file path" },
old_text: { type: "string", description: "Text to find" },
new_text: { type: "string", description: "Replacement text" }
},
required: ["path", "old_text", "new_text"]
}
});
registry.register({
name: "db_query",
description: "Execute a read-only SQL SELECT query against the application database. Returns rows as JSON. Cannot execute INSERT, UPDATE, DELETE, or DDL statements.",
category: "database",
permission: "read",
tags: ["database", "query", "sql", "select", "read"],
parameters: {
type: "object",
properties: {
sql: { type: "string", description: "SQL SELECT statement" },
params: { type: "array", description: "Parameterized query values", items: { type: "string" } }
},
required: ["sql"]
}
});
registry.register({
name: "db_execute",
description: "Execute a write SQL statement (INSERT, UPDATE, DELETE) against the application database. Requires db_query to have been called first to verify the data you are modifying.",
category: "database",
permission: "write",
requires: ["db_query"],
tags: ["database", "write", "insert", "update", "delete"],
parameters: {
type: "object",
properties: {
sql: { type: "string", description: "SQL write statement" },
params: { type: "array", description: "Parameterized query values", items: { type: "string" } }
},
required: ["sql"]
}
});
registry.register({
name: "deploy_service",
description: "Deploy an application to the specified environment (staging or production). This is a destructive operation that replaces the running version. Always run check_health after deploying.",
category: "deployment",
permission: "dangerous",
requires: ["run_tests"],
tags: ["deploy", "release", "production", "staging"],
parameters: {
type: "object",
properties: {
service: { type: "string", description: "Service name" },
environment: { type: "string", enum: ["staging", "production"] },
version: { type: "string", description: "Version tag to deploy" }
},
required: ["service", "environment", "version"]
}
});
registry.register({
name: "run_tests",
description: "Run the test suite for a specified service. Returns pass/fail results and coverage. Use this before deploying to verify code quality.",
category: "deployment",
permission: "read",
tags: ["test", "testing", "verify", "ci"],
parameters: {
type: "object",
properties: {
service: { type: "string", description: "Service name" },
suite: { type: "string", description: "Test suite (unit, integration, e2e)" }
},
required: ["service"]
}
});
// ============================================
// API Routes
// ============================================
var gate = new PermissionGate("write");
var selector = new ToolSelector(process.env.OPENAI_API_KEY);
// Get tools for a message (dynamic selection)
app.post("/agent/select-tools", function(req, res) {
var message = req.body.message;
var maxTools = req.body.maxTools || 8;
var permLevel = req.body.permissionLevel || "write";
var userGate = new PermissionGate(permLevel);
selector.rank(message, registry, maxTools + 4, function(err, ranked) {
if (err) {
// Fallback to keyword matching
var schemas = keywordSelect(message, registry);
var filtered = userGate.filter(schemas, registry);
return res.json({ tools: filtered, method: "keyword_fallback" });
}
var schemas = [];
for (var i = 0; i < ranked.length; i++) {
var schema = registry.getSchema(ranked[i].name);
if (schema) schemas.push(schema);
}
var filtered = userGate.filter(schemas, registry);
filtered = filtered.slice(0, maxTools);
res.json({
tools: filtered,
method: "embedding",
scores: ranked.slice(0, maxTools)
});
});
});
// Record tool usage
app.post("/agent/record-usage", function(req, res) {
var toolName = req.body.tool;
var result = {
success: req.body.success,
latencyMs: req.body.latencyMs || 0
};
registry.recordUsage(toolName, result);
res.json({ recorded: true });
});
// Validate tool call (check dependencies)
app.post("/agent/validate", function(req, res) {
var toolName = req.body.tool;
var history = req.body.toolCallHistory || [];
var userId = req.body.userId || "anonymous";
// Check permission
var permResult = gate.check(toolName, registry, userId);
if (!permResult.allowed) {
return res.status(403).json(permResult);
}
// Check dependencies
var tool = registry.tools[toolName];
if (tool && tool.requires.length > 0) {
for (var i = 0; i < tool.requires.length; i++) {
var requiredTool = tool.requires[i];
var wasCalled = false;
for (var j = 0; j < history.length; j++) {
if (history[j].name === requiredTool && history[j].success) {
wasCalled = true;
break;
}
}
if (!wasCalled) {
return res.status(400).json({
valid: false,
reason: "Tool '" + toolName + "' requires '" + requiredTool + "' to succeed first"
});
}
}
}
res.json({ valid: true });
});
// Usage analytics endpoint
app.get("/agent/analytics", function(req, res) {
var names = Object.keys(registry.usageStats);
var report = [];
for (var i = 0; i < names.length; i++) {
var s = registry.usageStats[names[i]];
report.push({
tool: names[i],
calls: s.callCount,
errorRate: s.callCount > 0
? Math.round((s.failureCount / s.callCount) * 100) + "%"
: "0%",
avgLatencyMs: s.avgLatencyMs,
lastUsed: s.lastUsed
});
}
report.sort(function(a, b) { return b.calls - a.calls; });
res.json(report);
});
// Keyword fallback selector
function keywordSelect(message, registry) {
var lower = message.toLowerCase();
var categoryKeywords = {
"retrieval": ["search", "find", "look up", "knowledge"],
"filesystem": ["file", "directory", "folder", "read", "write", "edit"],
"database": ["database", "query", "sql", "record", "table"],
"deployment": ["deploy", "release", "test", "staging", "production"]
};
var matchedCategories = [];
var keys = Object.keys(categoryKeywords);
for (var i = 0; i < keys.length; i++) {
var words = categoryKeywords[keys[i]];
for (var j = 0; j < words.length; j++) {
if (lower.indexOf(words[j]) !== -1) {
matchedCategories.push(keys[i]);
break;
}
}
}
if (matchedCategories.length === 0) {
matchedCategories = ["retrieval"];
}
var schemas = [];
for (var k = 0; k < matchedCategories.length; k++) {
schemas = schemas.concat(registry.getByCategory(matchedCategories[k]));
}
return schemas;
}
// ============================================
// Startup
// ============================================
var PORT = process.env.PORT || 3500;
// Precompute embeddings, then start server
console.log("Precomputing tool embeddings...");
selector.precomputeEmbeddings(registry, function() {
console.log("Embeddings ready for " + Object.keys(registry.tools).length + " tools");
app.listen(PORT, function() {
console.log("Agent tool server running on port " + PORT);
});
});
Test it with curl:
# Select tools for a database-related message
curl -X POST http://localhost:3500/agent/select-tools \
-H "Content-Type: application/json" \
-d '{"message": "Show me all users who signed up last month", "maxTools": 5}'
# Validate a tool call with dependency check
curl -X POST http://localhost:3500/agent/validate \
-H "Content-Type: application/json" \
-d '{"tool": "db_execute", "toolCallHistory": [], "userId": "user123"}'
# Returns: {"valid": false, "reason": "Tool 'db_execute' requires 'db_query' to succeed first"}
# Record tool usage
curl -X POST http://localhost:3500/agent/record-usage \
-H "Content-Type: application/json" \
-d '{"tool": "db_query", "success": true, "latencyMs": 45}'
# View analytics
curl http://localhost:3500/agent/analytics
Expected output for the validation endpoint:
{
"valid": false,
"reason": "Tool 'db_execute' requires 'db_query' to succeed first"
}
Common Issues and Troubleshooting
1. Model Ignores Available Tools and Responds with Text
User: "Deploy the service to staging"
Agent: "To deploy the service, you would typically use a deployment tool..."
// Expected: agent calls deploy_service tool
This happens when tool descriptions do not match the user's phrasing closely enough, or when you have too many tools and the model decides it is safer to explain rather than act. Fix it by adding explicit trigger phrases to tool descriptions: "Use this when the user asks to deploy, release, ship, push, or launch a service."
2. Embedding API Rate Limits During Precomputation
Error: 429 Too Many Requests
{"error": {"message": "Rate limit reached for text-embedding-3-small", "type": "rate_limit_error"}}
When precomputing embeddings for 30+ tools at startup, you can hit the embeddings API rate limit. Add a sequential queue with delays between calls:
function precomputeSequential(names, index, registry, selector, callback) {
if (index >= names.length) return callback();
var tool = registry.tools[names[index]];
var text = names[index] + ": " + tool.description;
selector.getEmbedding(text, function(err, embedding) {
if (!err && embedding) {
tool.embedding = embedding;
}
// 100ms delay between calls
setTimeout(function() {
precomputeSequential(names, index + 1, registry, selector, callback);
}, 100);
});
}
3. Circular Tool Dependencies Cause Infinite Loops
Error: Maximum call stack size exceeded
at ToolRegistry.resolveDependencies (tool-registry.js:45)
If tool A requires tool B and tool B requires tool A, the dependency resolver recurses forever. Add cycle detection to your resolution logic:
ToolRegistry.prototype.resolveDependencies = function(toolName) {
var resolved = [];
var visited = {};
var inStack = {};
function resolve(name, registry) {
if (inStack[name]) {
throw new Error("Circular dependency detected: " + name);
}
if (visited[name]) return;
visited[name] = true;
inStack[name] = true;
var tool = registry.tools[name];
if (tool) {
for (var i = 0; i < tool.requires.length; i++) {
resolve(tool.requires[i], registry);
}
}
inStack[name] = false;
resolved.push(name);
}
resolve(toolName, this);
return resolved;
};
4. Permission Gate Blocks Tools Silently
POST /agent/select-tools
Response: {"tools": [], "method": "embedding"}
// All tools filtered out because permissionLevel was "read" and all matched tools were "write"
When the permission gate filters out every matched tool, the agent gets an empty tool list and falls back to plain text responses. Always ensure at least one read-level tool exists in every category, and log when permission filtering removes tools so you can diagnose it:
PermissionGate.prototype.filter = function(schemas, registry) {
var allowed = [];
var blocked = [];
for (var i = 0; i < schemas.length; i++) {
var name = schemas[i].function.name;
var tool = registry.tools[name];
if (tool && PERM_LEVELS[tool.permission] <= this.maxLevel) {
allowed.push(schemas[i]);
} else {
blocked.push(name);
}
}
if (blocked.length > 0) {
console.warn("Permission gate blocked tools: " + blocked.join(", "));
}
return allowed;
};
5. Stale Embeddings After Tool Description Updates
If you update a tool's description but do not recompute its embedding, the relevance scoring uses the old semantic representation. Add a dirty flag and recompute on next request:
ToolRegistry.prototype.updateDescription = function(name, newDescription) {
if (this.tools[name]) {
this.tools[name].description = newDescription;
this.tools[name].embedding = null; // Force recomputation
}
};
Best Practices
Start with fewer tools and add as needed. Five well-described tools will outperform twenty mediocre ones. Measure tool selection accuracy before expanding the registry.
Write tool descriptions for the model, not for humans. Include trigger phrases, negative constraints ("do NOT use this for X"), and explicit decision rules. The description is the model's only guide.
Always cap the number of tools per turn. Eight is a good default. Even if your registry has 50 tools, the model should never see more than 10-12 at once.
Enforce tool dependencies at the validation layer, not in the prompt. Telling the model "call A before B" in the system prompt is unreliable. Programmatic validation that rejects invalid sequences is airtight.
Log every tool call with its selection method, latency, and outcome. This data is more valuable than you think. It reveals which tools need better descriptions, which are never used, and where the model struggles.
Version your tool definitions and support graceful deprecation. Do not rename or change tool parameters without a migration period. Existing conversations and cached prompts will reference the old names.
Use embedding-based selection as the primary method with keyword matching as a fallback. Embeddings handle indirect phrasing that keywords miss, but they require an API call. Having both paths means your agent works even when the embedding API is down.
Separate read tools from write tools in your permission model. An agent should always be able to look things up. Write access should be gated by user role, and dangerous operations should require explicit confirmation.
Test tool selection separately from tool execution. Write unit tests that verify the correct tools are selected for representative user messages. This catches regressions when you update descriptions or add new tools.
Pre-compute embeddings at startup, not at request time. The 50-100ms per embedding call is acceptable during a one-time boot sequence but adds unacceptable latency if done on every user message.
References
- Anthropic Tool Use Documentation - Official guide to tool use with Claude, including best practices for tool descriptions
- OpenAI Function Calling Guide - OpenAI's documentation on function calling patterns and tool schemas
- OpenAI Embeddings API - Reference for the text-embedding-3-small model used in relevance scoring
- Gorilla: Large Language Model Connected with Massive APIs - Research paper on LLM tool selection accuracy at scale
- ToolBench: Large-Scale API Retrieval and Planning - Benchmark dataset and evaluation for tool retrieval and selection strategies
- ReAct: Synergizing Reasoning and Acting in Language Models - Foundational paper on agent reasoning with tool use