First Principles / Part IV · Building with AI / Chapter 20
First Principles · Building with AI · 20
A prompt is a program — written in plain English, for a machine whose only instinct is to continue text. You don't change the weights; you change the context you condition them on. Done well, that's enough to steer the model completely.
01The answer, then the intuition
Because a model just predicts the next token given everything before it, the tokens you supply are the program. The frozen weights are the interpreter; your prompt is the code. This is why the same model can write poetry, extract JSON, or debug Python — you're not switching models, you're switching the context that conditions its predictions.
Three levers do most of the work. Zero-shot is just an instruction. Few-shot adds worked examples so the model infers the exact pattern you want — remarkably, with no training at all. Chain-of-thought asks it to reason step by step, spending more tokens to think before answering. Switch between them on the same task and watch the output sharpen:
Task: classify a review's sentiment and return strict JSON. Illustrative outputs; token counts are representative.
The prompt sent to the model
Illustrative output
02Mechanics
The craft is real but bounded: prompting can only elicit what's already in the weights. When the model simply lacks the knowledge or skill, no wording fixes it — that's when you reach for retrieval or fine-tuning (next chapters).
04The math
expand ▾Everything a prompt does is condition the same distribution. The model samples the answer $y$ from:
Few-shot just makes the prompt longer — the examples $\{(x_i,y_i)\}$ are extra conditioning tokens, so "in-context learning" is Bayesian conditioning, not gradient descent. Nothing in the weights changes:
Chain-of-thought factorizes the answer through an intermediate reasoning $r$, letting the model spend computation on the path before committing:
Generating $r$ token-by-token turns "think harder" into literal extra forward passes — more compute at inference time, which is why it helps on hard problems and costs more.
05The code
expand ▾Prompting is free to build, but every example and reasoning step is more tokens per call — the real tradeoff.
prompt_cost.py
instr, per_example, output = 18, 22, 12 # representative token counts
zero = instr + output # instruction only
few3 = instr + 3*per_example + output # + three worked examples
cot = instr + 45 + output # + a reasoning trace in the output
print(f"zero-shot: {zero} tokens/call")
print(f"3-shot: {few3} tokens/call ({few3/zero:.1f}x)")
print(f"CoT: {cot} tokens/call ({cot/zero:.1f}x)")
# zero-shot: 30 tokens/call
# 3-shot: 96 tokens/call (3.2x)
# CoT: 75 tokens/call (2.5x) <- better answers, more tokens, every call
06The economics
Prompting → money
Prompting is the cheapest possible way to customize a model: no training run, no data pipeline, just words — change it and redeploy in seconds. That's why most AI products start here. But the cost moves from upfront to per call: every example in a few-shot prompt and every step of chain-of-thought is more tokens, paid on every single request, forever.
At scale that arithmetic dominates. A prompt that's 3× longer is roughly 3× the inference bill across millions of calls — so serious teams trim prompts token by token, cache shared prefixes, and reserve chain-of-thought for the queries that truly need it. Prompt engineering is, underneath, cost engineering.
It also frames the build-vs-buy choice the next chapters unpack: prompting is a recurring per-token cost, while fine-tuning is an upfront cost that can shorten prompts later. For the Circuit, prompting is the demand side in miniature — the knob that turns a fixed model into useful work, one metered token at a time.
07Going deeper
expand ▾
Brown et al. (2020) — Language Models are Few-Shot Learners (GPT-3) · in-context learning.
Wei et al. (2022) — Chain-of-Thought Prompting · reasoning steps improve hard tasks.
Kojima et al. (2022) — "Let's think step by step" · zero-shot chain-of-thought.
Anthropic — Prompt Engineering Guide · practical, current techniques.
Cite this chapter: Divergent Compute, "Prompting", First Principles, 2026. divergentcompute.com/first-principles-prompting · v1.0 · CC-BY.