Choosing a model

The most common and most expensive mistake in applied AI is reaching for the best model when you needed the cheapest one that clears the bar. The right rule is nearly the opposite of the instinct: match the task, don't max the model.

Read at your depth: 01 The answer · 02 Intuition · 03 Mechanics · 04 The math · 05 The code · 06 The economics · 07 Sources

Cheapest that clears the bar

Every task has a quality threshold — the level below which the output is useless and above which extra capability is wasted. Classifying a support ticket needs far less than drafting a legal brief. The job isn't to find the smartest model; it's to find the cheapest model whose capability sits just above your task's threshold, verified with an eval, not a hunch.

Because model cost varies by ~50× across tiers while capability varies far less, this one decision dominates your economics. Drag the quality bar and watch the pick jump between tiers — always landing on the cheapest model above the line:

The model map — cheapest that clears your bar

Illustrative tiers on cost ($/1M tokens) vs relative capability. Drag the quality threshold.

your pick

—

required capability (your quality bar)70

The axes that actually decide it

Capability vs the threshold. Measure it on your task with an eval, not a public leaderboard — benchmarks rarely match your workload. The only capability that matters is whether it clears your bar.
Cost per token. The 50× spread across tiers is the biggest lever on margin. This is where "just use the frontier for everything" quietly bankrupts a product at scale.
Latency, context, modality. Interactive UX needs a fast model; long-document work needs a big context window; image or audio input needs multimodality. Each can rule tiers in or out regardless of raw capability.
Open vs closed. Open-weight models you can self-host (data stays private, cost is your hardware); closed models are an API (best frontier quality, per-token price, data leaves your walls). Privacy and control often decide this before capability does.
Routing & cascades. You don't have to pick one. Send every request to a cheap model first, and escalate only the hard ones to a frontier model. A good router captures most of the frontier's quality at a fraction of its cost.

Put together, model choice is a constrained optimization: minimize cost subject to clearing quality, latency, context, and privacy. The instinct to grab the best model skips the constraint that matters most — the budget.

The selection rule is a one-line optimization — cheapest model that clears the quality bar $q$:

$$ \text{pick} = \operatorname*{arg\,min}_{i}\; \text{cost}_i \quad \text{subject to}\quad \text{capability}_i \geq q $$

Routing does better than any single pick. Send everything to a cheap model at cost $c_{\text{cheap}}$, and escalate a fraction $p$ of hard cases to the frontier at $c_{\text{frontier}}$. Expected cost per request:

$$ \mathbb{E}[\text{cost}] = c_{\text{cheap}} + p \cdot c_{\text{frontier}} $$

With a mid model at \$0.60, a frontier at \$5.00, and only 20% escalating, that's $0.60 + 0.2\times5.00 = \$1.60$ — 68% cheaper than sending everything to the frontier, while the escalated hard cases still get top quality. The whole art is a good escalation signal: knowing which requests actually need the expensive model, and letting the rest ride the cheap one.

models = { # $ per 1M tokens, relative capability 0-100 "small-open": (0.10, 55), "mid": (0.60, 72), "frontier-open": (1.50, 82), "frontier-closed": (5.00, 92), } def cheapest_clearing(q): ok = {n: c for n, (c, cap) in models.items() if cap >= q} return min(ok, key=lambda n: models[n][0]) if ok else None for q in (50, 70, 85): print(f"quality bar {q}: {cheapest_clearing(q)}") # quality bar 50: small-open | 70: mid | 85: frontier-closed # routing: cheap first, escalate 20% of hard cases to frontier route = 0.60 + 0.20 * 5.00 print(f"always-frontier $5.00 vs route $ {route:.2f} " f"-> {(1-route/5.0)*100:.0f}% cheaper") # always-frontier $5.00 vs route $ 1.60 -> 68% cheaper

The decision that makes or breaks the margin

The choice → money

Model selection is the single highest-leverage decision on an AI product's unit economics. Because cost varies ~50× across tiers, using a frontier model where a mid one would do can multiply your bill by an order of magnitude for quality nobody needed — the classic way a promising AI feature quietly loses money. The discipline of "cheapest that clears the bar," verified by evals, is worth more to a P&L than almost any prompt tweak.

Routing compounds the win. A cascade that sends easy requests to a cheap model and escalates only the hard ones captures most of the frontier's quality at a fraction of the price — the ~68% saving above is typical, not exceptional. It's the applied-AI version of the whole open-vs-closed market dynamic, run inside a single product.

For the Circuit, this is the demand side made rational. As open-weight models keep raising the capability you can get cheaply, the quality bar that requires a frontier model rises too — squeezing the premium tier's addressable work. The everyday choice of which model to call, made by millions of developers, is quietly one of the forces deciding whether the frontier's economics hold.

Choosing a model

Cheapest that clears the bar

The model map — cheapest that clears your bar

The axes that actually decide it

The rule, and the router

Pick, and route

The decision that makes or breaks the margin

The primary sources