Divergent Compute.AI Economic Think Tank

First Principles / Part IV · Building with AI / Chapter 23

First Principles · Building with AI · 23

RAG vs fine-tune vs prompt

You've now met three ways to bend a model to your needs: prompting changes its context, RAG feeds it knowledge, and fine-tuning changes its weights. Choosing well is most of applied AI — and the rule is simpler than it looks.

Read at your depth: 01 The answer · 02 Intuition · 03 Mechanics · 04 The math · 05 The code · 06 The economics · 07 Sources

01The answer, then the intuition

Knowledge or behavior?

The whole decision hinges on one question: are you adding knowledge or shaping behavior? If the model needs facts it doesn't have — current, private, or too large to memorize — that's RAG. If it needs to act a certain way — a format, a tone, a skill — that's prompting for small changes, or fine-tuning when you need it consistent at scale.

Start at the cheapest rung and climb only when you must: prompt first, add RAG when knowledge is the gap, fine-tune when behavior must be baked in. They also compose — most serious systems fine-tune for style, RAG for facts, and prompt for the task, all at once. Pick a goal and watch the right tool light up:

What should I use? — pick a goal

Choose what you're trying to do; the fitting technique(s) highlight, with the reason.

Prompt

change the context

Instant, no data
Cheapest to start
Cost is per call
Can't add real knowledge

● recommended

RAG

fetch knowledge

Current & private facts
Citations, auditable
Update instantly
Needs retrieval infra

● recommended

Fine-tune

change the weights

Consistent behavior
Shorter prompts at scale
Upfront cost + data
Stale; no citations

● recommended

02Mechanics

What each one actually moves

Prompting changes only the context — nothing in the model moves. Best for formats, tone, and simple tasks; fastest to iterate. Its ceiling: it can't teach the model facts or skills it doesn't already have, and long prompts cost tokens every call.
RAG adds knowledge at query time by retrieving documents into the prompt. Best when the facts are current, private, large, or need citing. Its ceiling: it's only as good as retrieval, and it doesn't change how the model behaves, just what it knows in the moment.
Fine-tuning updates the weights on your examples, baking behavior in permanently. Best for consistent style/format at scale and for shortening prompts. Its ceiling: upfront cost and data, it goes stale (retrain to update), and it can't cite sources — so it's poor for fast-changing facts.
They compose. The three aren't rivals. A production assistant might be fine-tuned to speak in a brand voice, use RAG to ground answers in a live knowledge base, and be prompted per request for the specific task. Behavior, knowledge, and task — one from each.

The failure mode to avoid is reaching for the heavy tool first. Teams routinely try to fine-tune away a problem that a better prompt or a retrieval step would solve faster and cheaper. Climb the ladder; don't jump to the top.

04The math

expand ▾

When fine-tuning pays for itself

The clearest quantitative case is cost. Prompting pays a per-call premium (a long prompt every time); fine-tuning pays upfront but shortens every prompt afterward. Over $M$ calls:

$$ C_{\text{prompt}} = c_p \cdot M, \qquad C_{\text{ft}} = U + c_f \cdot M \quad (c_f < c_p) $$

They cross where the upfront cost is repaid by the per-call savings:

$$ M^{*} = \frac{U}{c_p - c_f} $$

Below $M^{*}$, prompting is cheaper; above it, fine-tuning wins. With a \$600 training cost and a \$0.06 per-call saving, $M^{*} = 600/0.06 = 10{,}000$ calls. So low-volume or exploratory work should stay on prompts; only steady, high-volume traffic justifies the fixed cost. (This axis ignores knowledge and freshness — where RAG wins regardless of volume.)

05The code

expand ▾

The break-even, computed

The call volume at which fine-tuning's upfront cost is repaid by shorter prompts.

breakeven.py

prompt_per_call = 0.09     # $/call: long few-shot prompt every time
ft_per_call     = 0.03     # $/call: short prompt, behavior baked in
ft_upfront      = 600.0    # $ one-time training cost

M_star = ft_upfront / (prompt_per_call - ft_per_call)
print(f"break-even at {M_star:,.0f} calls")

for M in [5_000, 10_000, 50_000]:
    p = prompt_per_call * M
    f = ft_upfront + ft_per_call * M
    print(f"{M:>6,} calls: prompt ${p:,.0f}  fine-tune ${f:,.0f}  -> "
          f"{'tie' if f == p else 'fine-tune' if f < p else 'prompt'}")
# break-even at 10,000 calls
#  5,000 calls: prompt $450    fine-tune $750    -> prompt
# 10,000 calls: prompt $900    fine-tune $900    -> tie
# 50,000 calls: prompt $4,500  fine-tune $2,100  -> fine-tune

06The economics

The build-vs-buy of applied AI

The choice → money

This chapter is where AI strategy meets a spreadsheet. Prompting is pure operating cost — cheap to start, but you pay the premium on every call forever. Fine-tuning is capital cost — a fixed investment that lowers the marginal cost of each call after. RAG is infrastructure — a system you build once that keeps answers correct and citable as the world changes. Choosing among them is a classic build-vs-buy decision, priced by volume, freshness, and the cost of being wrong.

Get it wrong in the expensive direction — fine-tuning a model for a low-volume task, or stuffing a giant prompt into millions of calls — and margins evaporate. Most of the difference between an AI feature that's profitable and one that quietly loses money is this choice, made well.

For the Circuit, it's the demand side in microcosm: the same token costs that make the build-out expensive also discipline how businesses actually use AI. The economics don't just live in the data center — they reach all the way down to which of these three levers a team pulls for each feature.

07Going deeper

expand ▾

The primary sources

Gao et al. (2023) — RAG for LLMs: A Survey · when retrieval beats parametric knowledge.
Hu et al. (2021) — LoRA · low-cost fine-tuning that shifts the break-even.
Brown et al. (2020) — GPT-3 · in-context learning as the cheap default.
Anthropic — start simple, add complexity only when needed · the climb-the-ladder principle.

Cite this chapter: Divergent Compute, "RAG vs fine-tune vs prompt", First Principles, 2026. divergentcompute.com/first-principles-adaptation · v1.0 · CC-BY.