How LLMs work — a non-PhD explainer

AI

How LLMs work — a non-PhD explainer

Tokens, embeddings, attention, transformers — explained with 0 math.

Elevatools Team·2026-01-15· 3 min

Share

The pipeline

Tokenize your text into pieces (~3.5 chars/token).
Embed each token as a vector.
Attention lets each token look at every other token.
Predict the next token based on context.
Repeat until done.

Why it sometimes hallucinates

The model predicts plausible text — it doesn’t “know” facts. Strong prompts + tools + retrieval reduce this.

Why temperature matters

Temperature 0 = deterministic
Temperature 1 = creative
Temperature 2 = chaotic

The future

Tool-use, multimodality, longer context, smaller efficient models. We’re in year 5 of a 30-year shift.

Related reads

AI

How AI document analysis actually works

AI

GPT vs Claude vs Gemini: which to use when (2026)

AI

How to write a great AI prompt (a 7-rule cheat sheet)