From input text to intelligent output

LLM ArchitectureExplained

Type a sentence below and watch it travel the whole pipeline — interactively.

New to this? Take the full From Zero to Transformer course.

Data / Vector

→ Flow of information

Embedding

Input text — edit me

Try another language — tokenization handles any script — tokens

↓ split into tokens

Tokens

↓ map each token to an ID

Token IDs — the numbers the model actually reads

Tokenization

Text is split into tokens (words or sub-words), each given a numeric ID.

Handles any languageBreaks text into reusable piecesRare words split into sub-words

Token IDs

↓

Embedding Layer · look up a dense vector for every ID

↓

Embeddings (d_model dimensions — shown as 4 for clarity)

Embeddings & Attention Scores

Vectors are compared with each other to decide what each token should pay attention to.

Captures meaning & relationshipsQueries · Keys · ValuesSoftmax turns scores into weights

Each token produces a Query (what it's looking for), a Key (what it offers) and a Value (what it passes on). Click a token to see what it attends to.

Highlight matrix:

Scores · QKᵀ / √dₖ

rows = queries · columns = keys

→

Selected query attends to

click a token below

→

Context

weighted sum
of values

The Transformer Block

Attention and a feed-forward network, repeated many times, refine every token's meaning.

Looks at all tokens at onceResidual Add & Norm keeps it stableStacked × L layers deep

× 12 layers

Input

→

Multi-Head
Attention

→

Feed Forward
Network

→

Output

Tip: click any block above to learn what it does. Drag the slider to change how many times the block repeats.

Depth (L) 12 ×

Context Window

The model can only "see" a fixed number of recent tokens at once.

Larger window = more contextLimited by model capacityWindow: 16K tokens

Window size 16K

Visible to the model Current position Next tokens — to be generated

Output Tokens

The decoder predicts the next token, samples one, appends it, and repeats.

Autoregressive — one token at a timeSampled from probabilitiesRuns until an EOS token

Generated so far

Temperature 0.70

Low = focused & predictable · High = diverse & surprising

↻ append & repeat — each token feeds back in as new context

LLM decoder · next-token probabilities

End-to-End Flow

The whole journey, start to finish. Press play to trace it — or click any stage to jump there.

⌨

Input Text

your prompt

✂

Tokenization

text → IDs

⬡

Embeddings

IDs → vectors

▤

Transformer

× L blocks

▭

Context Window

what it sees

◎

Output Tokens

one at a time

💡

Key takeaway

LLMs convert text into numbers, understand relationships through attention, process information through many layers, and generate coherent text — one token at a time.