The economics of a token

Strip away the narrative and all of AI's economics reduce to one unit: the token. What it costs to produce, what it sells for, and how many you must sell to repay the training bill. This is the Circuit's central question, made arithmetic.

Read at your depth: 01 The answer · 02 Intuition · 03 Mechanics · 04 The math · 05 The code · 06 The economics · 07 Sources

Two P&Ls hiding in one token

Every token has a cost and a price, and the gap between them is the entire business. But there are really two economics stacked inside it. The first is the gross one: does the revenue from a token exceed the inference cost of producing it? That depends almost entirely on utilization — a lightly-batched GPU makes every token a loss; a well-packed one makes it a profit.

The second is the fully-loaded one: even at a positive gross margin, you must sell enough tokens to repay the hundred-million-dollar training run and the hardware. Drag the batch size and watch a single token flip from a catastrophic loss to a thin profit — then see how many trillions it takes to reach the first real dollar:

Per-token unit economics — drag the batch

70B @ 4-bit, ~$30/hr node, $3 revenue per 1M tokens, $100M training capex. Illustrative but internally consistent.

—

cost / 1M tokens

—

gross margin / 1M

—

tokens to repay capex

revenue

inference cost

margin

batch size (utilization)1

1 (idle GPU)64 (packed)

Where every number comes from

Cost per token. It's the hardware's hourly cost divided by the tokens it produces per hour. Throughput is set by everything in Part III — batching, quantization, and beating the memory wall. This is why utilization dominates: the same GPU, idle or packed, produces the same cost per hour but wildly different cost per token.
Revenue per token. What you charge, pressured downward by open-weight competition and the falling cost of "good enough" intelligence. Prices have fallen fast and keep falling.
Gross margin. Revenue minus inference cost per token. At low utilization it's deeply negative; batching is what carries it across zero. Most public arguments about "AI profitability" are really arguments about this one number.
The capex overhang. Above gross margin sits the fixed cost — training the model and buying the cluster. A positive per-token margin still has to be multiplied by an enormous volume to repay it, and the model may be obsolete before it does.

So "is AI profitable?" isn't one question. A token can be gross-margin positive and the company still deeply unprofitable, because the fixed costs are gigantic and the price per token keeps sliding. Both P&Ls have to work — and they're in tension.

Cost per token is hourly hardware cost over hourly throughput, where throughput scales with the batch:

$$ c_{\text{tok}} = \frac{\text{cost}_{\text{hr}}}{\text{throughput}_{\text{hr}}}, \qquad \text{throughput}_{\text{hr}} = B \cdot r \cdot 3600 $$

Gross margin per token is just price minus cost; the fully-loaded break-even is the training capex divided by that margin:

$$ m_{\text{tok}} = p_{\text{tok}} - c_{\text{tok}}, \qquad V^{*} = \frac{\text{capex}}{m_{\text{tok}}} \;\; (\text{requires } m_{\text{tok}} > 0) $$

The numbers are sobering. At batch 40 the margin is ~\$0.81 per million tokens — \$8.1\times10^{-7}$ each — so repaying \$100M in training takes $V^{*} \approx 1.24\times10^{14}$ tokens, 124 trillion. And $V^{*}$ moves the wrong way twice: as competition pushes $p_{\text{tok}}$ down, and as scaling pushes capex up. That widening gap between a shrinking margin and a growing fixed cost is the divergence the whole desk exists to measure.

node_cost_hr = 30.0 # $/hr, 8-GPU node base_tps = 95.0 # tokens/sec single stream (70B @ 4-bit) price_1M = 3.00 # $ revenue per 1M tokens train_capex = 100e6 # $ one-time training cost def econ(batch): tok_per_hr = base_tps * batch * 3600 cost_1M = node_cost_hr / (tok_per_hr / 1e6) return cost_1M, price_1M - cost_1M # cost, margin per 1M for b in (1, 8, 40): c, m = econ(b) print(f"batch {b:>2}: cost ${c:6.2f}/1M margin ${m:6.2f}/1M " f"-> {'profit' if m > 0 else 'LOSS'}") _, m = econ(40) print(f"tokens to clear ${train_capex:.0e} capex: {train_capex/(m/1e6):.2e}") # batch 1: cost $ 87.72/1M margin $-84.72/1M -> LOSS # batch 8: cost $ 10.96/1M margin $ -7.96/1M -> LOSS # batch 40: cost $ 2.19/1M margin $ 0.81/1M -> profit # tokens to clear $1e+08 capex: 1.24e+14 <- 124 trillion tokens

The whole thesis, in one unit

The token → money

This chapter is the Circuit reduced to a single number you can hold. Everything upstream — the chips, the supply chain, the scaling laws, the clusters — exists to change the cost of a token. Everything downstream — the products, the agents, the enterprise deals — exists to raise the revenue from one. The business is the wedge between the two, multiplied by an almost unimaginable volume.

And the wedge is under attack from both sides. Revenue per token falls as competition and open weights commoditize intelligence; the capex per model rises as scaling demands more compute. A token that's gross-margin positive today can still leave a company far from repaying its fixed costs — and the finish line keeps moving away. That's not pessimism; it's the arithmetic.

So when someone claims AI is or isn't profitable, this is the calculation to demand. Which margin — gross or fully-loaded? At what utilization, what price, what capex? The honest answer is a spreadsheet, not a slogan — and building that spreadsheet, transparently, is precisely what an independent research desk is for.

The economics of a token

Two P&Ls hiding in one token

Per-token unit economics — drag the batch

Where every number comes from

Cost, margin, and the first dollar

A token's P&L, batch by batch

The whole thesis, in one unit

The primary sources