LLM Intuition Tool v2.1

Toy model (n-grams + heuristics), but more “LLM-like”: readable subword-ish tokens (stems + -ing), softmax-style sampling controls, and a toy attention strip. Predictions update as you type.

Analyze Text

Next-word Predictor

1) Paste training text

Walkthrough highlights current token (blue), then +1 (green), +2 (yellow), +3 (red). It increments 1–4 token phrases and stores next-token maps for contexts of length 1, 2, 3, and 4. The loss plot is a toy online NLL under an interpolated n-gram model.

Animated walkthrough Idle

Step — / —

Current: —

+1: —

+2: —

+3: —

Live counts

Vocab: —

Unique bigrams: —

Unique trigrams: —

Unique 4-grams: —

Just added

1: —

2: —

3: —

4: —

Toy loss (NLL)

avg: —

Lower = higher probability on the true next token. This is not backprop—just “counts getting better”.

Top tokens in final heatmap

final view only

Co-occurrence window

for heatmap + toy attention

Ignore common stopwords

optional

Correlation metric

final heatmap

Animation speed

walkthrough

Readable “tokenization”

human-friendly subwords

2) What the model learned

No model yet

Tokens

—

Vocabulary

—

Unique bigrams

—

Unique trigrams

—

Unique 4-grams

—

Top tokens (click to highlight)count

Token–token correlation heatmap

low

high

Click a top token to highlight its row/column in the heatmap.

Top 2-token phrases

Top 3-token phrases

Top 4-token phrases

Note

Predictor shows a single probability-ordered list, but still includes a “by context length” breakdown below for intuition.

Next-token predictor (updates as you type)

Needs a model

We compute candidate next tokens from contexts of length 4, 3, 2, and 1. We blend them (or choose strictly) based on your settings (right column), then list predictions ordered by probability. If you are mid-word, we treat the last fragment as a prefix filter (token completion).

Your prompt

No model yet — go to “Analyze Text” first.

Predictions (ordered by probability) —

Toy “attention” over recent context recency + co-occurrence with last token

Settings

Temperature

higher = flatter distribution

Sampling

used for “Insert top / Resample”

Top-k

0 = off

Top-p (nucleus)

1 = off

Blend contexts

4→1 contexts

Max items shown

keeps list readable

What this still misses

Real LLMs learn patterns in weights, not explicit n-gram tables.
Real tokenizers are learned (BPE/Unigram). Here we use a readable suffix splitter.
Attention here is a heuristic, not transformer attention.
Training is done via gradient descent on huge corpora; this tool is “count-and-lookup”.