Why I Stopped Building Bigger Agents and Started Building Smarter Systems
A Node.js engineer explains why single-agent AI systems fail at scale and builds a multi-agent orchestrator with smart routing to fix it.
Every AI agent I built followed the same arc. It started clean — a focused system prompt, two or three tools, solid results. Then someone asked me to add something. Then someone else. Then a third request came in, and before I knew it, my "simple agent" had a system prompt the size of a short story, a dozen tools jammed into the function-calling context, and a model that spent more time figuring out which hat to wear than actually doing useful work.
I call this the single-agent ceiling, and every team building production AI systems hits it eventually.
The symptoms are predictable. Token costs creep up because every request carries instructions for domains it does not need. Response quality degrades because the model's attention is diluted across irrelevant context. The system prompt becomes fragile — editing instructions for one capability breaks behavior in another. Testing becomes a nightmare because you cannot isolate anything.
The answer is not a bigger model or a longer context window. The answer is multiple specialized agents coordinated by a router that knows which one to call.

The Multi-Agent Pattern
The idea is simple. Instead of one agent that does everything, you build several agents that each do one thing well. A research agent that searches and cites. A code agent that reviews, generates, and runs code in a sandbox. A data agent that computes statistics. A writing agent that drafts and edits content. Each has its own system prompt, its own tools, and its own temperature setting tuned to its domain.
In front of them sits a router — itself an LLM call — that classifies incoming requests and picks the right specialist. The router sees a code review request and sends it to the code agent. It sees a writing request and sends it to the writing agent. Each specialist only pays the token cost for its own domain.
const orchestrator = new Orchestrator({
agents: createAllAgents(),
});
// The router picks the right agent automatically
const result = await orchestrator.process(
'Review this function for bugs: function add(a, b) { return a - b }'
);
// → Routed to: code (95% confidence)
This is the router-delegate pattern, and it is what I built for a new book and open-source project.
What I Actually Built
The companion repository at github.com/grizzlypeaksoftware/multi-agent-orchestrator is a complete multi-agent orchestrator in Node.js. Not a framework. Not an abstraction layer. A working system you can clone, run, and modify.
It is pure JavaScript — ES Modules, no TypeScript, no build step. It uses the OpenAI SDK, which works with any OpenAI-compatible provider: OpenAI itself, Azure, Ollama, LM Studio, Groq, Together AI. Set one environment variable and you are talking to a different model.
The whole thing has two dependencies: openai and dotenv. That is it.
Here is what the architecture looks like:
Router. One LLM call that classifies the request, picks an agent, and returns a confidence score. Temperature 0.1 for deterministic routing. Falls back gracefully when parsing fails.
Four specialist agents. Each extends a BaseAgent class that encapsulates the agent loop — send messages, handle tool calls, iterate until done, track tokens. The agents:
- Research — web search and citation (temperature 0.3)
- Code — review, generation, sandboxed execution via Node.js
vmmodule (temperature 0.2) - Data — numerical analysis with validated math expressions (temperature 0.1)
- Writing — content drafting with word count verification (temperature 0.7)
Memory. Two backends — in-memory for development, SQLite with WAL mode for persistence. A summarizer compresses long conversation histories using an LLM call, keeping recent messages intact and condensing older ones into bullet points.
Observability. The orchestrator extends EventEmitter. It emits events for every routing decision, every agent start and completion, every low-confidence warning. A Metrics class tracks counters, histograms, and gauges. A Tracer class creates request-level spans with timing data. You can see exactly which agent handled each request, how long it took, and how many tokens it consumed.
The Router Is the Product
Here is something I did not appreciate until I built this: in a multi-agent system, the router is the most important component. Not the specialists. The router.
Get routing wrong and nothing else matters. The best code agent in the world is useless if writing requests keep getting sent to it. The router is doing intent classification — the same problem that chatbot platforms have been solving for years — but with the full power of an LLM instead of keyword matching or intent trees.
The production router in the repository is 70 lines of code:
async route(message, agents) {
const agentList = agents
.map((a) => `- "${a.name}": ${a.description}`)
.join('\n');
const response = await this.openai.chat.completions.create({
model: this.model,
max_tokens: 200,
temperature: 0.1,
messages: [
{
role: 'system',
content: `You are a request router. Pick the best agent.
Available agents:
${agentList}
Respond with ONLY JSON: {"agent": "<name>", "confidence": <0.0-1.0>, "reasoning": "<why>"}`,
},
{ role: 'user', content: message },
],
});
try {
const text = response.choices[0].message.content.trim();
return JSON.parse(text.replace(/```json\n?|\n?```/g, '').trim());
} catch {
return { agent: agents[0]?.name || 'unknown', confidence: 0.3, reasoning: 'Parse failure' };
}
}
The prompt is dynamic — it builds the agent list from whatever agents are registered. Add a new agent and the router automatically knows about it. No prompt editing required.
The confidence score is the key insight. When confidence is high (0.8+), the routing is clear and the request goes straight through. When confidence is low (below 0.7), the system emits a warning event. Depending on your application, you might log it, ask the user to clarify, or try a different routing strategy.
Why Not Use a Framework?
LangChain, CrewAI, AutoGen, and others all provide multi-agent abstractions. They are useful tools. But I wanted to understand what they abstract before using them.
When your LangChain agent chain breaks at 2 AM, you need to know what is happening under the hood. When CrewAI's crew produces unexpected results, you need to understand the routing and delegation model to debug it. When you need to optimize token costs across agents, you need to know where tokens are being spent.
Building from scratch taught me things that framework documentation does not cover:
Agent descriptions are more important than agent implementations. The router's only information about each agent is its description string. A vague description means bad routing. I rewrote descriptions more often than I rewrote agent code.
Temperature is architecture. Setting the data agent to 0.1 and the writing agent to 0.7 is not a minor config choice — it fundamentally shapes output quality. One-size-fits-all temperature is why single-agent systems produce inconsistent results across domains.
Error handling is the hard part. The agent loop has to handle tool failures, parsing errors, hallucinated tool names, iteration limits, and rate limits — all without crashing. Sending errors back to the model as strings instead of throwing exceptions lets the model recover gracefully.
Token tracking per agent changes how you optimize. When you can see that the research agent uses 3x the tokens of the code agent, you know where to focus your optimization effort.
Multi-Step Orchestration
The system also handles requests that need multiple agents. "Research the latest JavaScript frameworks and write a comparison blog post" gets decomposed into a plan:
{
"steps": [
{"agent": "research", "task": "Find the top 5 current JavaScript frameworks"},
{"agent": "writing", "task": "Write a comparison blog post from the research"}
]
}
The orchestrator executes steps in sequence, passing each agent's output as context to the next. The research agent's findings become part of the writing agent's system prompt. The writing agent produces better content because it has real research to work with, not hallucinated facts.
const result = await orchestrator.processMultiStep(
'Research current AI trends and write a summary email'
);
console.log(result.finalContent);
This is where the architecture really shines. Each agent stays focused on its specialty. The orchestrator handles the coordination. The context passing is automatic.
Where This Is Going
The repository and the book cover the full journey from a single API call to a production-ready orchestrator with error handling, streaming, observability, testing, and deployment. But the advanced patterns are what get interesting:
- Semantic routing with embeddings — faster and cheaper than LLM-based routing for high-traffic systems
- The critic pattern — one agent generates output, another reviews it, the first revises
- Fan-out/fan-in — multiple agents work in parallel, results are merged
- Human-in-the-loop — escalation and approval workflows for high-stakes decisions
These patterns compose. You can nest a critic loop inside a fan-out/fan-in pipeline. You can add semantic routing as a fast path with LLM routing as a fallback. The building blocks are simple; the combinations are powerful.
Try It
The code is open source and MIT licensed:
Repository: github.com/grizzlypeaksoftware/multi-agent-orchestrator
git clone https://github.com/grizzlypeaksoftware/multi-agent-orchestrator.git
cd multi-agent-orchestrator
cp .env.example .env
# Add your API key to .env
npm install
npm start
You will get an interactive REPL that routes your messages to the right specialist automatically. Try a code review, a research question, a data analysis request, and a writing task. Watch the router pick the right agent each time.
The book that accompanies this code — Building a Multi-Agent Orchestrator in Node.js — walks through every line, every design decision, and every production pattern. It is available on Amazon: https://www.amazon.com/dp/B0GSLV2GHR
If you are building AI applications in Node.js and you have hit the single-agent ceiling, this is the way forward. Not a bigger agent. A smarter system.