Field notes

Why multi-agent beats single-agent for operational work

A single agent, given a clear task and the right tools, can do remarkable things. Most of the public demos you’ve seen (the ones that go viral on X, the ones featured in keynotes) are single agents. One model, one prompt, one task, one output. They’re impressive, and they’re real, and they’re not what we build.

We build multi-agent systems. The distinction is not academic. It is the entire reason an operational function can be owned by software rather than just assisted by it. This essay is about why.

What a single agent actually is

A single agent is, mechanically, a loop: a model receives an instruction, calls some tools, observes the results, and decides what to do next. The frontier models are now good enough that this loop can do real work. Draft an email. Pull a report. Summarise a meeting. Search the web and answer a question. For any task that fits inside a clear instruction and resolves in a single coherent thread of action, single agents are excellent.

But operational functions are not single tasks. A “customer onboarding function” is dozens of tasks, executed in parallel and in sequence, by different specialists, with different tools, against different data sources, on different timescales: some triggered by user action, some by the calendar, some by upstream signals from other functions entirely. A “competitive research function” is a continuous orchestration of crawling, monitoring, synthesising, distributing, and responding to changes that nobody told the system to expect.

Try to compress all of that into one agent and you discover the limits of the single-agent paradigm quickly. The context window fills up with conflicting instructions. The tool set sprawls past the point where the model can choose intelligently. The agent’s reasoning, which was crisp on a focused task, becomes muddy when it must hold half a dozen unrelated concerns in mind at once. You don’t get an operational function. You get a single agent doing a worse job at many things than several agents would each do at their own.

What changes when you split it up

The shift to multi-agent is not about scale. It’s about specialisation, scope, and contract.

Specialisation: Each agent has one job. Its system prompt is short and focused. Its tool set is small. Its examples are sharp. The model spends its full capacity on the narrow problem in front of it, rather than juggling many. The same frontier model that struggled to coordinate ten responsibilities at once will execute one of them brilliantly.

Scope: A specialist agent’s outputs are predictable in a way that a generalist’s are not. You can wrap them in evaluations, monitor them, set thresholds. When a specialist agent misbehaves, you know which one and you know where to look. Generalist agents fail diffusely: something is wrong, but the failure could be anywhere in a thousand-token reasoning chain.

Contract: Multi-agent systems force you to design the interfaces between agents. What does the research agent hand to the synthesis agent? In what format? With what guarantees? These contracts are not a tax on the system; they are the structure that makes the system observable and evolvable. Without them, you have a black box. With them, you have something operators can reason about.

The orchestration layer is where the work lives

The interesting engineering in a multi-agent system is not in any individual agent. The interesting engineering is in the layer that decides which agent runs when, on what input, with what context, and what to do with the output.

This layer is responsible for things that have nothing to do with prompting:

  • Routing inbound work to the right specialist
  • Holding state across multiple agent invocations
  • Deciding when to retry, escalate, or hand back to a human
  • Maintaining the audit trail
  • Enforcing rate limits, cost limits, and policy guardrails
  • Surfacing what’s happening to the operators in real time

These are operational engineering concerns, not AI concerns. They look more like the architecture of a small distributed system than the architecture of a chatbot. Teams that come to multi-agent work from a “build a better prompt” mindset are usually surprised by how much of the real work happens here.

Why this matters for who owns the function

Single agents fit naturally into a world where humans still own the function and the agent assists. The human knows the goal, dispatches the agent at the right moment, reviews the output, decides what to do next. The agent is a power tool.

Multi-agent systems fit a different model: the system owns the function, with humans in the loop for exceptions and oversight. The orchestration layer makes the dispatch decisions, holds the state, decides when escalation is warranted. Operators monitor, intervene, and tune, but they aren’t pulling the trigger on every action.

This is what “an agent system runs the function” means in practice. It is not one agent acting autonomously. It is a coordinated team of agents, with an orchestrator that knows the function’s shape, running the function the way a team of specialists with a manager would run it.

Two failure modes to watch for

The shift to multi-agent introduces failure modes that single-agent systems don’t have. Worth flagging two:

Cascading errors. When agents pass work to each other, an error in an early agent can poison every downstream step. Defending against this requires investment in inter-agent validation: each agent treats its inputs with appropriate suspicion, even when those inputs come from a sibling agent.

Coordination bloat. It is tempting to keep adding agents. Every new specialist is one more contract to maintain, one more failure mode, one more thing to observe. We resist agent proliferation aggressively. Smaller systems with sharper specialists are almost always better than large systems with fuzzy ones.

Where this leaves us

Single agents are the right tool when the work is bounded, episodic, and human-initiated. They will keep getting better, and they will remain useful indefinitely.

But for operational functions (the continuous, multi-task, multi-source, multi-timescale work that lives at the heart of every growth-stage company) multi-agent systems are the right paradigm. They map onto how the work is actually structured. They are observable in ways monolithic agents are not. They evolve in ways monolithic agents can’t. And they let humans step out of the loop on the work that no longer needs them, while keeping the loop intact for the work that does.

Building them well is a discipline. Building them poorly is worse than not building them at all. That’s the next post.