First Principles / Part V · The frontier & the industry / Chapter 30
First Principles · The frontier & the industry · 30
The frontier isn't a bigger model — it's many agents coordinating: an orchestrator delegating to workers, agents calling agents, whole workflows automated. The promise is enormous. So is the problem: reliability compounds the wrong way.
01The answer, then the intuition
A single agent can do a task; a multi-agent system decomposes a big job across many — a planner splits the work, specialist workers run in parallel, a synthesizer merges the results. Done well, that's how AI moves from answering questions to completing projects. It's the shape most people mean by "agentic AI."
But chaining steps multiplies their failure rates. If each step is 90% reliable, ten steps in a row succeed only $0.9^{10} \approx 35\%$ of the time — the chain is far less reliable than any part of it. Drag the length of the workflow and watch success collapse, then turn on verification and watch it come back:
A workflow of n sequential agent steps, each reliable with probability p. Overall success is pⁿ.
02Mechanics
The honest summary: multi-agent systems work impressively in demos and unevenly in production, and the gap between the two is almost entirely this reliability problem. Whoever closes it unlocks the automation the whole industry is betting on.
04The math
expand ▾A workflow of $n$ sequential steps, each independently reliable with probability $p$, succeeds only if all succeed:
Because $p < 1$, this decays exponentially in length — the source of the wall. Verification attacks $p$ directly. With $k$ independent checks per step, the step's effective reliability rises to $1-(1-p)^k$, so the whole chain becomes:
The leverage is enormous. At $p=0.9$, $n=10$: the naive chain is $0.9^{10}=34.9\%$, but with $k=3$ the per-step reliability is $1-0.1^{3}=0.999$, and the chain is $0.999^{10}=99.0\%$. The cost is that each verified step now runs $k+1$ model calls — so reliability is bought with tokens. The central engineering trade of the agentic era is exactly this: how much verification to buy, and where.
05The code
expand ▾Naive chains collapse with length; verification restores them — at a token cost.
reliability.py
def chain(p, n): return p ** n # all n steps must succeed
def verified(p, k): return 1 - (1 - p) ** k # k independent checks per step
p = 0.90
print(f"naive 10-step chain: {chain(p, 10)*100:.1f}%")
print(f"per-step w/ 3 votes: {verified(p, 3):.3f}")
print(f"verified 10-step: {chain(verified(p, 3), 10)*100:.1f}%")
for n in (1, 5, 10, 20):
print(f" n={n:>2}: naive {chain(p, n)*100:5.1f}% "
f"verified {chain(verified(p, 3), n)*100:5.1f}%")
# naive 10-step chain: 34.9%
# per-step w/ 3 votes: 0.999
# verified 10-step: 99.0%
# n=20: naive 12.2% verified 98.0% <- length kills naive chains; verification saves them
06The economics
Reliability → money
Multi-agent automation is the demand story that justifies the whole build-out. A reliable system that completes real multi-step work doesn't sell tokens — it competes with salaries, a market orders of magnitude larger than chat. If agents cross the reliability threshold for knowledge work, the revenue easily clears the capex. If they don't, the demand the spending assumes simply doesn't arrive.
The reliability math is why this is genuinely uncertain, not just hype. Verification is the known fix, but it multiplies the token cost per task — so the very thing that makes agents trustworthy also makes them expensive. Whether reliable-enough automation lands below the price of the human it replaces is an open, quantitative question, and it's the one that decides the payoff.
This is the Circuit's forward edge. Not "will AI get smarter" — the scaling law answers that — but "will agents get reliable and cheap enough, fast enough, to generate the demand the capital already assumes." That, more than any benchmark, is what an honest desk should be measuring. The book has taught the mechanics; this is where they meet the biggest open bet.
Part V complete
From the scaling law that makes the bet rational, through the labs and the supply chain that concentrate the power, to the economics of a single token and the agentic reliability wall that decides the demand — Part V is where everything the earlier parts built collides with economics. This is the layer only a think tank is built to tell.
One part remains: the practitioner's Part VI — choosing a model, optimizing cost, guardrails, and the tool landscape. The theory is done; what's left is doing it well. See the full curriculum →
07Going deeper
expand ▾
Anthropic — Building a Multi-Agent Research System · orchestrator-worker patterns in practice.
Wu et al. (2023) — AutoGen · a framework for multi-agent conversation.
Zaharia et al. (2024) — The Shift to Compound AI Systems · systems over single models.
Yao et al. (2022) — ReAct · the reasoning-and-acting loop agents are built on.
Cite this chapter: Divergent Compute, "Multi-agent & what comes next", First Principles, 2026. divergentcompute.com/first-principles-multi-agent · v1.0 · CC-BY.