First Principles / Part IV · Building with AI / Chapter 25
01The answer, then the intuition
An embedding turns every document into a point in high-dimensional space, where nearness means similar meaning. Retrieval is then just geometry: find the points closest to the query point. The naive way — measure the distance to every stored vector and sort — is exact but costs a full scan per query. At a billion vectors, that's hopeless.
The insight of approximate nearest-neighbor (ANN) search is that you don't need to check everything. Pre-build an index that lets you jump to the right neighborhood and examine only a few candidates. You give up a tiny chance of missing a true neighbor in exchange for searching orders of magnitude fewer points. Move the query below and watch exact scan versus approximate:
Click or drag anywhere to move the query. The 3 nearest points light up; examined points are ringed.
02Mechanics
So vector search is the unglamorous machinery that makes semantic retrieval possible at real scale — the reason "search by meaning over everything you own" is a product and not a research demo.
04The math
expand ▾The $k$ nearest neighbors of a query $q$ minimize distance over the store:
Exact search evaluates that distance for every vector — cost $O(N \cdot d)$, linear in the number of documents. An HNSW index changes the exponent: greedy graph traversal reaches the query's neighborhood in roughly
The difference between $N$ and $\log N$ is the whole game: at a billion vectors, $\log_2 N \approx 30$, so ANN examines on the order of tens of candidates instead of a billion. The price is recall@k $= \frac{|\text{retrieved} \cap \text{true top-}k|}{k}$, which a good index keeps at 95–99% while searching a vanishing fraction of the data.
05The code
expand ▾Exact k-NN is a full scan. The comparison count is the whole reason ANN indexes exist.
knn.py
import numpy as np
q = np.array([5.0, 5.0])
store = {"A":[5.2,4.8], "B":[1.0,1.0], "C":[6.0,5.5], "D":[8.5,2.0],
"E":[4.6,5.3], "F":[9.0,9.0], "G":[2.0,7.0], "H":[5.1,6.2]}
comparisons = 0
dists = {}
for name, p in store.items():
comparisons += 1 # exact = compare to EVERY vector
dists[name] = np.linalg.norm(q - np.array(p))
top3 = sorted(dists, key=dists.get)[:3]
print(f"exact scan: {comparisons} comparisons") # 8 (here) ... N at scale
print("3 nearest:", [(n, round(float(dists[n]), 2)) for n in top3])
# exact scan: 8 comparisons
# 3 nearest: [('A', 0.28), ('E', 0.5), ('C', 1.12)]
# an HNSW index would touch ~log(N) candidates and return the same 3
06The economics
Search → money
Vector search is why AI can have a usable memory at all. Without ANN, grounding a model in a large corpus would mean scanning every document on every query — computationally and financially impossible at scale. ANN turns that linear cost into a logarithmic one, and that single change is what makes retrieval cheap enough to put under every product. The vector database has become its own infrastructure category precisely because this problem is universal.
The economics live in the recall-speed-memory triangle. Tighter recall costs more compute and memory per query; looser recall is cheaper but risks missing the right chunk and sending the model back to guessing. Tuning that trade-off is a real operating decision, because retrieval quality caps the quality of everything built on top of it.
For the Circuit, this is the demand side's foundation: the same embeddings that opened this book close the loop here, as the searchable memory that lets AI reason over a company's — or a researcher's — entire corpus. It's the least visible layer, and one of the most load-bearing.
Part IV complete
From prompting a frozen model, through grounding it with retrieval, giving it tools to act, choosing among adaptation strategies, measuring quality with evals, and the vector search underneath it all — you've gone from theory to a working mental model of an AI application. Part I built the machine, Part II the model, Part III the cost of running it, and Part IV the craft of using it.
That's 25 chapters and four complete parts — the mechanics, end to end. What remains is the part only this think tank is built to tell: where these mechanics meet the money. See the full curriculum →
07Going deeper
expand ▾
Malkov & Yashunin (2016) — HNSW · the navigable small-world graph index.
Johnson et al. (2017) — Billion-scale similarity search (FAISS) · ANN at scale.
Jégou et al. (2011) — Product Quantization · compressing vectors for search.
ANN-Benchmarks · the recall-vs-speed trade-off, measured across libraries.
Cite this chapter: Divergent Compute, "Vector search", First Principles, 2026. divergentcompute.com/first-principles-vector-search · v1.0 · CC-BY.