Divergent Compute.AI Economic Think Tank

First Principles / Part V · The frontier & the industry / Chapter 29

First Principles · The frontier & the industry · 29

The economics of a token

Strip away the narrative and all of AI's economics reduce to one unit: the token. What it costs to produce, what it sells for, and how many you must sell to repay the training bill. This is the Circuit's central question, made arithmetic.

Read at your depth:  01 The answer · 02 Intuition · 03 Mechanics · 04 The math · 05 The code · 06 The economics · 07 Sources

01The answer, then the intuition

Two P&Ls hiding in one token

Every token has a cost and a price, and the gap between them is the entire business. But there are really two economics stacked inside it. The first is the gross one: does the revenue from a token exceed the inference cost of producing it? That depends almost entirely on utilization — a lightly-batched GPU makes every token a loss; a well-packed one makes it a profit.

The second is the fully-loaded one: even at a positive gross margin, you must sell enough tokens to repay the hundred-million-dollar training run and the hardware. Drag the batch size and watch a single token flip from a catastrophic loss to a thin profit — then see how many trillions it takes to reach the first real dollar:

Per-token unit economics — drag the batch

70B @ 4-bit, ~$30/hr node, $3 revenue per 1M tokens, $100M training capex. Illustrative but internally consistent.

cost / 1M tokens
gross margin / 1M
tokens to repay capex
revenue
inference cost
margin
batch size (utilization)1
1 (idle GPU)64 (packed)

02Mechanics

Where every number comes from

  • Cost per token. It's the hardware's hourly cost divided by the tokens it produces per hour. Throughput is set by everything in Part III — batching, quantization, and beating the memory wall. This is why utilization dominates: the same GPU, idle or packed, produces the same cost per hour but wildly different cost per token.
  • Revenue per token. What you charge, pressured downward by open-weight competition and the falling cost of "good enough" intelligence. Prices have fallen fast and keep falling.
  • Gross margin. Revenue minus inference cost per token. At low utilization it's deeply negative; batching is what carries it across zero. Most public arguments about "AI profitability" are really arguments about this one number.
  • The capex overhang. Above gross margin sits the fixed cost — training the model and buying the cluster. A positive per-token margin still has to be multiplied by an enormous volume to repay it, and the model may be obsolete before it does.

So "is AI profitable?" isn't one question. A token can be gross-margin positive and the company still deeply unprofitable, because the fixed costs are gigantic and the price per token keeps sliding. Both P&Ls have to work — and they're in tension.

04The math

expand ▾

Cost, margin, and the first dollar

Cost per token is hourly hardware cost over hourly throughput, where throughput scales with the batch:

$$ c_{\text{tok}} = \frac{\text{cost}_{\text{hr}}}{\text{throughput}_{\text{hr}}}, \qquad \text{throughput}_{\text{hr}} = B \cdot r \cdot 3600 $$

Gross margin per token is just price minus cost; the fully-loaded break-even is the training capex divided by that margin:

$$ m_{\text{tok}} = p_{\text{tok}} - c_{\text{tok}}, \qquad V^{*} = \frac{\text{capex}}{m_{\text{tok}}} \;\; (\text{requires } m_{\text{tok}} > 0) $$

The numbers are sobering. At batch 40 the margin is ~\$0.81 per million tokens — \$8.1\times10^{-7}$ each — so repaying \$100M in training takes $V^{*} \approx 1.24\times10^{14}$ tokens, 124 trillion. And $V^{*}$ moves the wrong way twice: as competition pushes $p_{\text{tok}}$ down, and as scaling pushes capex up. That widening gap between a shrinking margin and a growing fixed cost is the divergence the whole desk exists to measure.

05The code

expand ▾

A token's P&L, batch by batch

Utilization decides whether a token makes money — and repaying the training run is a different order of magnitude.

token_economics.py

node_cost_hr = 30.0     # $/hr, 8-GPU node
base_tps     = 95.0     # tokens/sec single stream (70B @ 4-bit)
price_1M     = 3.00     # $ revenue per 1M tokens
train_capex  = 100e6    # $ one-time training cost

def econ(batch):
    tok_per_hr = base_tps * batch * 3600
    cost_1M = node_cost_hr / (tok_per_hr / 1e6)
    return cost_1M, price_1M - cost_1M          # cost, margin per 1M

for b in (1, 8, 40):
    c, m = econ(b)
    print(f"batch {b:>2}: cost ${c:6.2f}/1M  margin ${m:6.2f}/1M  "
          f"-> {'profit' if m > 0 else 'LOSS'}")

_, m = econ(40)
print(f"tokens to clear ${train_capex:.0e} capex: {train_capex/(m/1e6):.2e}")
# batch  1: cost $ 87.72/1M  margin $-84.72/1M  -> LOSS
# batch  8: cost $ 10.96/1M  margin $ -7.96/1M  -> LOSS
# batch 40: cost $  2.19/1M  margin $  0.81/1M  -> profit
# tokens to clear $1e+08 capex: 1.24e+14   <- 124 trillion tokens

06The economics

The whole thesis, in one unit

The token → money

This chapter is the Circuit reduced to a single number you can hold. Everything upstream — the chips, the supply chain, the scaling laws, the clusters — exists to change the cost of a token. Everything downstream — the products, the agents, the enterprise deals — exists to raise the revenue from one. The business is the wedge between the two, multiplied by an almost unimaginable volume.

And the wedge is under attack from both sides. Revenue per token falls as competition and open weights commoditize intelligence; the capex per model rises as scaling demands more compute. A token that's gross-margin positive today can still leave a company far from repaying its fixed costs — and the finish line keeps moving away. That's not pessimism; it's the arithmetic.

So when someone claims AI is or isn't profitable, this is the calculation to demand. Which margin — gross or fully-loaded? At what utilization, what price, what capex? The honest answer is a spreadsheet, not a slogan — and building that spreadsheet, transparently, is precisely what an independent research desk is for.

07Going deeper

expand ▾

The primary sources

Sequoia — AI's $600B Question · the revenue-vs-capex gap, framed by an investor.
SemiAnalysis — inference cost economics · cost-per-token teardown from the hardware up.
Epoch AI — Training cost of frontier models · the capex side of the equation.
a16z — The Economics of Generative AI · unit economics and margin structure.

Cite this chapter: Divergent Compute, "The economics of a token", First Principles, 2026. divergentcompute.com/first-principles-token-economics · v1.0 · CC-BY.

← Chapter 28
The compute supply chain
Next · Chapter 30 →
Multi-agent & what comes next