From input text to intelligent output

LLM ArchitectureExplained

Type a sentence below and watch it travel the whole pipeline — interactively.
New to this? Take the full From Zero to Transformer course.
Data / Vector
Flow of information
Embedding
Input text — edit me
Try another language — tokenization handles any script — tokens
↓   split into tokens
Tokens
↓   map each token to an ID
Token IDs — the numbers the model actually reads
1

Tokenization

Text is split into tokens (words or sub-words), each given a numeric ID.

Handles any languageBreaks text into reusable piecesRare words split into sub-words
Token IDs
Embedding Layer  ·  look up a dense vector for every ID
Embeddings (dmodel dimensions — shown as 4 for clarity)
2

Embeddings & Attention Scores

Vectors are compared with each other to decide what each token should pay attention to.

Captures meaning & relationshipsQueries · Keys · ValuesSoftmax turns scores into weights

Each token produces a Query (what it's looking for), a Key (what it offers) and a Value (what it passes on). Click a token to see what it attends to.

Highlight matrix:
Scores · QKᵀ / √dₖ
rows = queries · columns = keys
Selected query attends to
click a token below
Context
weighted sum
of values
3

The Transformer Block

Attention and a feed-forward network, repeated many times, refine every token's meaning.

Looks at all tokens at onceResidual Add & Norm keeps it stableStacked × L layers deep
× 12 layers
Input
Multi-Head
Attention
+
Feed Forward
Network
+
Output
Tip: click any block above to learn what it does. Drag the slider to change how many times the block repeats.
12 ×
4

Context Window

The model can only "see" a fixed number of recent tokens at once.

Larger window = more contextLimited by model capacityWindow: 16K tokens
16K
Visible to the model Current position Next tokens — to be generated
5

Output Tokens

The decoder predicts the next token, samples one, appends it, and repeats.

Autoregressive — one token at a timeSampled from probabilitiesRuns until an EOS token
Generated so far
0.70
Low = focused & predictable · High = diverse & surprising
↻ append & repeat — each token feeds back in as new context
LLM decoder · next-token probabilities
6

End-to-End Flow

The whole journey, start to finish. Press play to trace it — or click any stage to jump there.

Input Text
your prompt
Tokenization
text → IDs
Embeddings
IDs → vectors
Transformer
× L blocks
Context Window
what it sees
Output Tokens
one at a time
💡

Key takeaway

LLMs convert text into numbers, understand relationships through attention, process information through many layers, and generate coherent text — one token at a time.