Multi-agent & what comes next

The frontier isn't a bigger model — it's many agents coordinating: an orchestrator delegating to workers, agents calling agents, whole workflows automated. The promise is enormous. So is the problem: reliability compounds the wrong way.

Read at your depth: 01 The answer · 02 Intuition · 03 Mechanics · 04 The math · 05 The code · 06 The economics · 07 Sources

01The answer, then the intuition

The whole rides on the weakest step

A single agent can do a task; a multi-agent system decomposes a big job across many — a planner splits the work, specialist workers run in parallel, a synthesizer merges the results. Done well, that's how AI moves from answering questions to completing projects. It's the shape most people mean by "agentic AI."

But chaining steps multiplies their failure rates. If each step is 90% reliable, ten steps in a row succeed only $0.9^{10} \approx 35\%$ of the time — the chain is far less reliable than any part of it. Drag the length of the workflow and watch success collapse, then turn on verification and watch it come back:

The reliability wall — and how verification beats it

A workflow of n sequential agent steps, each reliable with probability p. Overall success is pⁿ.

—

overall workflow success

workflow length (steps)10

per-step reliability90%

02Mechanics

How you build a system that survives its own length

Orchestration patterns. The common shapes: an orchestrator-worker split (a lead agent plans and delegates), parallel fan-out (independent subtasks run at once, then merge), and pipelines (each stage feeds the next). Parallelism is powerful because it keeps the dependent chain short — and it's the short chain that reliability depends on.
Verification is the fix. The escape from $p^n$ is to raise $p$ at each step. Have several agents independently check a step and take the majority; a step that's 90% reliable becomes ~99.9% with three votes. Adversarial verification — agents trying to refute a result — catches plausible-but-wrong outputs a single pass misses.
Bounded autonomy. Real systems cap the number of steps, validate tool outputs, and insert human checkpoints at high-stakes moments. The art is spending verification where errors are costly and letting cheap steps run free.
What comes next. The open frontiers: continual learning (models that update from experience), agent-to-agent economies, better long-horizon planning, and — underneath all of it — reliability that holds over hundreds of steps. None is solved. That's not a caveat; it's the actual state of the art.

The honest summary: multi-agent systems work impressively in demos and unevenly in production, and the gap between the two is almost entirely this reliability problem. Whoever closes it unlocks the automation the whole industry is betting on.

04The math

expand ▾

Why length is the enemy, and redundancy the cure

A workflow of $n$ sequential steps, each independently reliable with probability $p$, succeeds only if all succeed:

$$ P_{\text{success}} = p^{\,n} $$

Because $p < 1$, this decays exponentially in length — the source of the wall. Verification attacks $p$ directly. With $k$ independent checks per step, the step's effective reliability rises to $1-(1-p)^k$, so the whole chain becomes:

$$ P_{\text{verified}} = \Big(1 - (1-p)^{k}\Big)^{n} $$

The leverage is enormous. At $p=0.9$, $n=10$: the naive chain is $0.9^{10}=34.9\%$, but with $k=3$ the per-step reliability is $1-0.1^{3}=0.999$, and the chain is $0.999^{10}=99.0\%$. The cost is that each verified step now runs $k+1$ model calls — so reliability is bought with tokens. The central engineering trade of the agentic era is exactly this: how much verification to buy, and where.

05The code

expand ▾

The wall, and the way through

Naive chains collapse with length; verification restores them — at a token cost.

reliability.py

def chain(p, n):     return p ** n              # all n steps must succeed
def verified(p, k):  return 1 - (1 - p) ** k    # k independent checks per step

p = 0.90
print(f"naive 10-step chain: {chain(p, 10)*100:.1f}%")
print(f"per-step w/ 3 votes: {verified(p, 3):.3f}")
print(f"verified 10-step:    {chain(verified(p, 3), 10)*100:.1f}%")

for n in (1, 5, 10, 20):
    print(f"  n={n:>2}: naive {chain(p, n)*100:5.1f}%   "
          f"verified {chain(verified(p, 3), n)*100:5.1f}%")
# naive 10-step chain: 34.9%
# per-step w/ 3 votes: 0.999
# verified 10-step:    99.0%
#   n=20: naive 12.2%   verified 98.0%   <- length kills naive chains; verification saves them

06The economics

The bet the entire build-out is waiting on

Reliability → money

Multi-agent automation is the demand story that justifies the whole build-out. A reliable system that completes real multi-step work doesn't sell tokens — it competes with salaries, a market orders of magnitude larger than chat. If agents cross the reliability threshold for knowledge work, the revenue easily clears the capex. If they don't, the demand the spending assumes simply doesn't arrive.

The reliability math is why this is genuinely uncertain, not just hype. Verification is the known fix, but it multiplies the token cost per task — so the very thing that makes agents trustworthy also makes them expensive. Whether reliable-enough automation lands below the price of the human it replaces is an open, quantitative question, and it's the one that decides the payoff.

This is the Circuit's forward edge. Not "will AI get smarter" — the scaling law answers that — but "will agents get reliable and cheap enough, fast enough, to generate the demand the capital already assumes." That, more than any benchmark, is what an honest desk should be measuring. The book has taught the mechanics; this is where they meet the biggest open bet.

Part V complete

You've followed the mechanics all the way to the money

From the scaling law that makes the bet rational, through the labs and the supply chain that concentrate the power, to the economics of a single token and the agentic reliability wall that decides the demand — Part V is where everything the earlier parts built collides with economics. This is the layer only a think tank is built to tell.

One part remains: the practitioner's Part VI — choosing a model, optimizing cost, guardrails, and the tool landscape. The theory is done; what's left is doing it well. See the full curriculum →

07Going deeper

expand ▾

The primary sources

Anthropic — Building a Multi-Agent Research System · orchestrator-worker patterns in practice.
Wu et al. (2023) — AutoGen · a framework for multi-agent conversation.
Zaharia et al. (2024) — The Shift to Compound AI Systems · systems over single models.
Yao et al. (2022) — ReAct · the reasoning-and-acting loop agents are built on.

Cite this chapter: Divergent Compute, "Multi-agent & what comes next", First Principles, 2026. divergentcompute.com/first-principles-multi-agent · v1.0 · CC-BY.

← Chapter 29

The economics of a token

Part VI · Next →

Choosing a model