First Principles / Part V · The frontier & the industry / Chapter 29
First Principles · The frontier & the industry · 29
Strip away the narrative and all of AI's economics reduce to one unit: the token. What it costs to produce, what it sells for, and how many you must sell to repay the training bill. This is the Circuit's central question, made arithmetic.
01The answer, then the intuition
Every token has a cost and a price, and the gap between them is the entire business. But there are really two economics stacked inside it. The first is the gross one: does the revenue from a token exceed the inference cost of producing it? That depends almost entirely on utilization — a lightly-batched GPU makes every token a loss; a well-packed one makes it a profit.
The second is the fully-loaded one: even at a positive gross margin, you must sell enough tokens to repay the hundred-million-dollar training run and the hardware. Drag the batch size and watch a single token flip from a catastrophic loss to a thin profit — then see how many trillions it takes to reach the first real dollar:
70B @ 4-bit, ~$30/hr node, $3 revenue per 1M tokens, $100M training capex. Illustrative but internally consistent.
02Mechanics
So "is AI profitable?" isn't one question. A token can be gross-margin positive and the company still deeply unprofitable, because the fixed costs are gigantic and the price per token keeps sliding. Both P&Ls have to work — and they're in tension.
04The math
expand ▾Cost per token is hourly hardware cost over hourly throughput, where throughput scales with the batch:
Gross margin per token is just price minus cost; the fully-loaded break-even is the training capex divided by that margin:
The numbers are sobering. At batch 40 the margin is ~\$0.81 per million tokens — \$8.1\times10^{-7}$ each — so repaying \$100M in training takes $V^{*} \approx 1.24\times10^{14}$ tokens, 124 trillion. And $V^{*}$ moves the wrong way twice: as competition pushes $p_{\text{tok}}$ down, and as scaling pushes capex up. That widening gap between a shrinking margin and a growing fixed cost is the divergence the whole desk exists to measure.
05The code
expand ▾Utilization decides whether a token makes money — and repaying the training run is a different order of magnitude.
token_economics.py
node_cost_hr = 30.0 # $/hr, 8-GPU node
base_tps = 95.0 # tokens/sec single stream (70B @ 4-bit)
price_1M = 3.00 # $ revenue per 1M tokens
train_capex = 100e6 # $ one-time training cost
def econ(batch):
tok_per_hr = base_tps * batch * 3600
cost_1M = node_cost_hr / (tok_per_hr / 1e6)
return cost_1M, price_1M - cost_1M # cost, margin per 1M
for b in (1, 8, 40):
c, m = econ(b)
print(f"batch {b:>2}: cost ${c:6.2f}/1M margin ${m:6.2f}/1M "
f"-> {'profit' if m > 0 else 'LOSS'}")
_, m = econ(40)
print(f"tokens to clear ${train_capex:.0e} capex: {train_capex/(m/1e6):.2e}")
# batch 1: cost $ 87.72/1M margin $-84.72/1M -> LOSS
# batch 8: cost $ 10.96/1M margin $ -7.96/1M -> LOSS
# batch 40: cost $ 2.19/1M margin $ 0.81/1M -> profit
# tokens to clear $1e+08 capex: 1.24e+14 <- 124 trillion tokens
06The economics
The token → money
This chapter is the Circuit reduced to a single number you can hold. Everything upstream — the chips, the supply chain, the scaling laws, the clusters — exists to change the cost of a token. Everything downstream — the products, the agents, the enterprise deals — exists to raise the revenue from one. The business is the wedge between the two, multiplied by an almost unimaginable volume.
And the wedge is under attack from both sides. Revenue per token falls as competition and open weights commoditize intelligence; the capex per model rises as scaling demands more compute. A token that's gross-margin positive today can still leave a company far from repaying its fixed costs — and the finish line keeps moving away. That's not pessimism; it's the arithmetic.
So when someone claims AI is or isn't profitable, this is the calculation to demand. Which margin — gross or fully-loaded? At what utilization, what price, what capex? The honest answer is a spreadsheet, not a slogan — and building that spreadsheet, transparently, is precisely what an independent research desk is for.
07Going deeper
expand ▾
Sequoia — AI's $600B Question · the revenue-vs-capex gap, framed by an investor.
SemiAnalysis — inference cost economics · cost-per-token teardown from the hardware up.
Epoch AI — Training cost of frontier models · the capex side of the equation.
a16z — The Economics of Generative AI · unit economics and margin structure.
Cite this chapter: Divergent Compute, "The economics of a token", First Principles, 2026. divergentcompute.com/first-principles-token-economics · v1.0 · CC-BY.