First Principles / Part IV · Building with AI / Chapter 23
First Principles · Building with AI · 23
You've now met three ways to bend a model to your needs: prompting changes its context, RAG feeds it knowledge, and fine-tuning changes its weights. Choosing well is most of applied AI — and the rule is simpler than it looks.
01The answer, then the intuition
The whole decision hinges on one question: are you adding knowledge or shaping behavior? If the model needs facts it doesn't have — current, private, or too large to memorize — that's RAG. If it needs to act a certain way — a format, a tone, a skill — that's prompting for small changes, or fine-tuning when you need it consistent at scale.
Start at the cheapest rung and climb only when you must: prompt first, add RAG when knowledge is the gap, fine-tune when behavior must be baked in. They also compose — most serious systems fine-tune for style, RAG for facts, and prompt for the task, all at once. Pick a goal and watch the right tool light up:
Choose what you're trying to do; the fitting technique(s) highlight, with the reason.
02Mechanics
The failure mode to avoid is reaching for the heavy tool first. Teams routinely try to fine-tune away a problem that a better prompt or a retrieval step would solve faster and cheaper. Climb the ladder; don't jump to the top.
04The math
expand ▾The clearest quantitative case is cost. Prompting pays a per-call premium (a long prompt every time); fine-tuning pays upfront but shortens every prompt afterward. Over $M$ calls:
They cross where the upfront cost is repaid by the per-call savings:
Below $M^{*}$, prompting is cheaper; above it, fine-tuning wins. With a \$600 training cost and a \$0.06 per-call saving, $M^{*} = 600/0.06 = 10{,}000$ calls. So low-volume or exploratory work should stay on prompts; only steady, high-volume traffic justifies the fixed cost. (This axis ignores knowledge and freshness — where RAG wins regardless of volume.)
05The code
expand ▾The call volume at which fine-tuning's upfront cost is repaid by shorter prompts.
breakeven.py
prompt_per_call = 0.09 # $/call: long few-shot prompt every time
ft_per_call = 0.03 # $/call: short prompt, behavior baked in
ft_upfront = 600.0 # $ one-time training cost
M_star = ft_upfront / (prompt_per_call - ft_per_call)
print(f"break-even at {M_star:,.0f} calls")
for M in [5_000, 10_000, 50_000]:
p = prompt_per_call * M
f = ft_upfront + ft_per_call * M
print(f"{M:>6,} calls: prompt ${p:,.0f} fine-tune ${f:,.0f} -> "
f"{'tie' if f == p else 'fine-tune' if f < p else 'prompt'}")
# break-even at 10,000 calls
# 5,000 calls: prompt $450 fine-tune $750 -> prompt
# 10,000 calls: prompt $900 fine-tune $900 -> tie
# 50,000 calls: prompt $4,500 fine-tune $2,100 -> fine-tune
06The economics
The choice → money
This chapter is where AI strategy meets a spreadsheet. Prompting is pure operating cost — cheap to start, but you pay the premium on every call forever. Fine-tuning is capital cost — a fixed investment that lowers the marginal cost of each call after. RAG is infrastructure — a system you build once that keeps answers correct and citable as the world changes. Choosing among them is a classic build-vs-buy decision, priced by volume, freshness, and the cost of being wrong.
Get it wrong in the expensive direction — fine-tuning a model for a low-volume task, or stuffing a giant prompt into millions of calls — and margins evaporate. Most of the difference between an AI feature that's profitable and one that quietly loses money is this choice, made well.
For the Circuit, it's the demand side in microcosm: the same token costs that make the build-out expensive also discipline how businesses actually use AI. The economics don't just live in the data center — they reach all the way down to which of these three levers a team pulls for each feature.
07Going deeper
expand ▾
Gao et al. (2023) — RAG for LLMs: A Survey · when retrieval beats parametric knowledge.
Hu et al. (2021) — LoRA · low-cost fine-tuning that shifts the break-even.
Brown et al. (2020) — GPT-3 · in-context learning as the cheap default.
Anthropic — start simple, add complexity only when needed · the climb-the-ladder principle.
Cite this chapter: Divergent Compute, "RAG vs fine-tune vs prompt", First Principles, 2026. divergentcompute.com/first-principles-adaptation · v1.0 · CC-BY.