Agents & tool use

A model on its own can only write text. Put it in a loop with tools — search, a calculator, code, an API — and it can act: take a step, read the result, decide the next step, and repeat until the task is done. That loop is an agent.

Read at your depth: 01 The answer · 02 Intuition · 03 Mechanics · 04 The math · 05 The code · 06 The economics · 07 Sources

Think, act, observe — then think again

A bare model can't check today's price, run a calculation exactly, or query your database — its knowledge is frozen and its only output is text. An agent gets around this by letting that text be an action. The model writes a structured tool call; the surrounding program runs it and feeds the result back; the model reads the result and decides what to do next. Loop until it has the answer.

This "think → act → observe → repeat" cycle turns a predictor into a doer. Step through one agent solving a task that needs two tools it doesn't have built in — a live lookup and exact arithmetic:

An agent loop, one step at a time

The model can’t know a live price or do exact math alone — so it calls tools and reads the results.

Task: "What's a 15% tip on the current price of Bitcoin?"

What actually makes a model an agent

Tools with schemas. Each tool is described to the model — its name, what it does, and the shape of its arguments. The model is trained to emit a structured call (e.g. JSON) when it wants one, which the program can parse and execute.
The loop. The core is dead simple: send the context to the model; if it returns a tool call, run the tool and append the result as an observation; if it returns an answer, stop. Repeat. Each turn the context grows with the full history of thoughts, actions, and observations — this is the ReAct pattern (reason + act).
Why tools matter. They patch the model's weaknesses precisely: a calculator for exact math it's bad at, search for current facts, code execution for real computation, database queries for private data. The model supplies the judgment about which tool to use when; the tools supply the ground truth.
The hard part: reliability. More steps means more chances to go wrong — a bad tool call, a misread result, or an infinite loop. Real agents need guardrails: step limits, validation, retries, and human checkpoints. An agent that's right 95% per step is only ~60% right after ten steps.

So an agent isn't a smarter model — it's the same model wrapped in a control loop that lets it interact with the world and correct course. The intelligence is in the model; the agency is in the loop.

At each step $t$, the model acts as a policy, sampling an action $a_t$ from the full history of what it has thought, done, and seen:

$$ a_t \sim P\big(a \mid x,\; h_t\big), \qquad h_t = (a_1, o_1, \dots, a_{t-1}, o_{t-1}) $$

If $a_t$ is a tool call, the environment returns an observation $o_t = \text{tool}(a_t)$, which is appended to the history; if $a_t$ is an answer, the loop halts. Each step is a full forward pass over an ever-growing context — so a $k$-step task costs on the order of:

$$ \text{cost} \approx \sum_{t=1}^{k} 2N \cdot \lvert x + h_t \rvert \;\; \Rightarrow \;\; \text{grows faster than linearly in } k $$

And reliability compounds the wrong way: if each step succeeds with probability $p$, a $k$-step chain succeeds with only $p^{k}$. At $p=0.95$ and $k=10$, that's $0.95^{10} \approx 0.60$ — which is why long agent runs are fragile and why bounding $k$ matters as much as raising $p$.

def search(q): return "$67,000" if "BTC" in q else "unknown" def calc(expr): return eval(expr, {"__builtins__": {}}, {}) TOOLS = {"search": search, "calc": calc} # a scripted policy standing in for the model's decisions script = [ ("think", "I need the current BTC price."), ("act", ("search", "BTC price")), ("think", "Now compute a 15% tip on 67000."), ("act", ("calc", "67000 * 0.15")), ("answer", "15% of $67,000 is ${}."), ] obs = None for kind, payload in script: # the agent loop if kind == "act": tool, arg = payload obs = TOOLS[tool](arg) # run the tool, observe result print(f"ACT {tool}({arg!r}) -> {obs}") elif kind == "think": print(f"THINK {payload}") else: print("ANSWER", payload.format(obs)) # THINK I need the current BTC price. # ACT search('BTC price') -> $67,000 # THINK Now compute a 15% tip on 67000. # ACT calc('67000 * 0.15') -> 10050.0 # ANSWER 15% of $67,000 is $10050.0.

The expensive bet the whole build-out rests on

Agency → money

Agents are where AI stops answering questions and starts doing work — and that's the entire economic thesis of the build-out. A chatbot sells tokens; an agent that can complete a multi-step task competes with labor, a far larger market. This is the demand that the hundreds of billions in compute are betting will arrive.

But the cost structure is unforgiving. Every step is another model call over a longer context, so a single agent task can cost 10–100× a single chat reply — and the reliability math ($p^k$) means longer tasks fail more often, forcing retries that cost even more. The value has to clear a bill that grows with both the length and the fragility of the task.

That tension is the crux of the Circuit's central question. If agents become reliable enough to automate real knowledge work, the demand easily justifies the clusters being built. If they stay just unreliable enough to need a human watching, the economics stay stubbornly hard. The whole payoff of the build-out rides on which way that goes — which is exactly what an honest research desk should be measuring, not assuming.

Agents & tool use

Think, act, observe — then think again

An agent loop, one step at a time

What actually makes a model an agent

A policy over a growing context

The whole loop, in twenty lines

The expensive bet the whole build-out rests on

The primary sources