AI
How LLMs work — a non-PhD explainer
Tokens, embeddings, attention, transformers — explained with 0 math.
The pipeline
- Tokenize your text into pieces (~3.5 chars/token).
- Embed each token as a vector.
- Attention lets each token look at every other token.
- Predict the next token based on context.
- Repeat until done.
Why it sometimes hallucinates
The model predicts plausible text — it doesn’t “know” facts. Strong prompts + tools + retrieval reduce this.
Why temperature matters
- Temperature 0 = deterministic
- Temperature 1 = creative
- Temperature 2 = chaotic
The future
Tool-use, multimodality, longer context, smaller efficient models. We’re in year 5 of a 30-year shift.
Related reads