Try another language — tokenization handles any script— tokens
↓ split into tokens
Tokens
↓ map each token to an ID
Token IDs — the numbers the model actually reads
1
Tokenization
Text is split into tokens (words or sub-words), each given a numeric ID.
Handles any languageBreaks text into reusable piecesRare words split into sub-words
Token IDs
↓
Embedding Layer · look up a dense vector for every ID
↓
Embeddings (dmodel dimensions — shown as 4 for clarity)
2
Embeddings & Attention Scores
Vectors are compared with each other to decide what each token should pay attention to.
Captures meaning & relationshipsQueries · Keys · ValuesSoftmax turns scores into weights
Each token produces a Query (what it's looking for), a Key (what it offers) and a Value (what it passes on). Click a token to see what it attends to.
Highlight matrix:
Scores · QKᵀ / √dₖ
rows = queries · columns = keys
→
Selected query attends to
click a token below
→
Context
weighted sum of values
3
The Transformer Block
Attention and a feed-forward network, repeated many times, refine every token's meaning.
Looks at all tokens at onceResidual Add & Norm keeps it stableStacked × L layers deep
× 12 layers
Input
→
Multi-Head Attention
→
+
→
Feed Forward Network
→
+
→
Output
Tip: click any block above to learn what it does. Drag the slider to change how many times the block repeats.
12 ×
4
Context Window
The model can only "see" a fixed number of recent tokens at once.
Larger window = more contextLimited by model capacityWindow: 16K tokens
16K
Visible to the modelCurrent positionNext tokens — to be generated
5
Output Tokens
The decoder predicts the next token, samples one, appends it, and repeats.
Autoregressive — one token at a timeSampled from probabilitiesRuns until an EOS token
Generated so far
0.70
Low = focused & predictable · High = diverse & surprising
↻ append & repeat — each token feeds back in as new context
LLM decoder · next-token probabilities
6
End-to-End Flow
The whole journey, start to finish. Press play to trace it — or click any stage to jump there.
⌨
Input Text
your prompt
✂
Tokenization
text → IDs
⬡
Embeddings
IDs → vectors
▤
Transformer
× L blocks
▭
Context Window
what it sees
◎
Output Tokens
one at a time
💡
Key takeaway
LLMs convert text into numbers, understand relationships through attention, process information through many layers, and generate coherent text — one token at a time.