Divergent Compute.AI Economic Think Tank

First Principles / Part IV · Building with AI / Chapter 20

First Principles · Building with AI · 20

Prompting

A prompt is a program — written in plain English, for a machine whose only instinct is to continue text. You don't change the weights; you change the context you condition them on. Done well, that's enough to steer the model completely.

Read at your depth:  01 The answer · 02 Intuition · 03 Mechanics · 04 The math · 05 The code · 06 The economics · 07 Sources

01The answer, then the intuition

Programming without changing the program

Because a model just predicts the next token given everything before it, the tokens you supply are the program. The frozen weights are the interpreter; your prompt is the code. This is why the same model can write poetry, extract JSON, or debug Python — you're not switching models, you're switching the context that conditions its predictions.

Three levers do most of the work. Zero-shot is just an instruction. Few-shot adds worked examples so the model infers the exact pattern you want — remarkably, with no training at all. Chain-of-thought asks it to reason step by step, spending more tokens to think before answering. Switch between them on the same task and watch the output sharpen:

Prompt lab — one task, three techniques

Task: classify a review's sentiment and return strict JSON. Illustrative outputs; token counts are representative.

The prompt sent to the model

Illustrative output

prompt tokens: vs zero-shot:

02Mechanics

Why context alone can steer a frozen model

  • Zero-shot. Just describe the task. The model relies entirely on what it learned in pretraining and alignment. Fast and cheap, but format and edge cases are hit-or-miss.
  • Few-shot (in-context learning). Put a few input→output examples in the prompt. The model infers the pattern and continues it — with no weight update. This was the headline surprise of GPT-3: examples in the context act like temporary training. It nails formats and conventions that are hard to describe in words.
  • Chain-of-thought. Ask it to "think step by step." By generating intermediate reasoning tokens before the answer, the model effectively does more computation — each token is another forward pass — which sharply improves multi-step and math problems.
  • System prompts & structure. A system message sets persistent role and rules; clear delimiters, explicit output schemas, and "return only JSON" instructions reduce ambiguity. You're shaping the probability distribution toward the tokens you want.

The craft is real but bounded: prompting can only elicit what's already in the weights. When the model simply lacks the knowledge or skill, no wording fixes it — that's when you reach for retrieval or fine-tuning (next chapters).

04The math

expand ▾

Conditioning, not learning

Everything a prompt does is condition the same distribution. The model samples the answer $y$ from:

$$ y \sim P(y \mid \text{prompt}) $$

Few-shot just makes the prompt longer — the examples $\{(x_i,y_i)\}$ are extra conditioning tokens, so "in-context learning" is Bayesian conditioning, not gradient descent. Nothing in the weights changes:

$$ P\big(y \mid x,\, (x_1,y_1),\dots,(x_k,y_k)\big) $$

Chain-of-thought factorizes the answer through an intermediate reasoning $r$, letting the model spend computation on the path before committing:

$$ P(y \mid x) = \sum_{r} P(y \mid r, x)\,P(r \mid x) $$

Generating $r$ token-by-token turns "think harder" into literal extra forward passes — more compute at inference time, which is why it helps on hard problems and costs more.

05The code

expand ▾

The price of a better prompt

Prompting is free to build, but every example and reasoning step is more tokens per call — the real tradeoff.

prompt_cost.py

instr, per_example, output = 18, 22, 12    # representative token counts

zero = instr + output                       # instruction only
few3 = instr + 3*per_example + output       # + three worked examples
cot  = instr + 45 + output                  # + a reasoning trace in the output

print(f"zero-shot: {zero} tokens/call")
print(f"3-shot:    {few3} tokens/call  ({few3/zero:.1f}x)")
print(f"CoT:       {cot} tokens/call  ({cot/zero:.1f}x)")
# zero-shot: 30 tokens/call
# 3-shot:    96 tokens/call  (3.2x)
# CoT:       75 tokens/call  (2.5x)   <- better answers, more tokens, every call

06The economics

The cheapest way to program — with a running meter

Prompting → money

Prompting is the cheapest possible way to customize a model: no training run, no data pipeline, just words — change it and redeploy in seconds. That's why most AI products start here. But the cost moves from upfront to per call: every example in a few-shot prompt and every step of chain-of-thought is more tokens, paid on every single request, forever.

At scale that arithmetic dominates. A prompt that's 3× longer is roughly 3× the inference bill across millions of calls — so serious teams trim prompts token by token, cache shared prefixes, and reserve chain-of-thought for the queries that truly need it. Prompt engineering is, underneath, cost engineering.

It also frames the build-vs-buy choice the next chapters unpack: prompting is a recurring per-token cost, while fine-tuning is an upfront cost that can shorten prompts later. For the Circuit, prompting is the demand side in miniature — the knob that turns a fixed model into useful work, one metered token at a time.

07Going deeper

expand ▾

The primary sources

Brown et al. (2020) — Language Models are Few-Shot Learners (GPT-3) · in-context learning.
Wei et al. (2022) — Chain-of-Thought Prompting · reasoning steps improve hard tasks.
Kojima et al. (2022) — "Let's think step by step" · zero-shot chain-of-thought.
Anthropic — Prompt Engineering Guide · practical, current techniques.

Cite this chapter: Divergent Compute, "Prompting", First Principles, 2026. divergentcompute.com/first-principles-prompting · v1.0 · CC-BY.

← Chapter 19
The data-center cluster
Next · Chapter 21 →
What is RAG?