A model
thinking out loud.

A very small Gemma is downloaded once and run entirely on your machine — nothing is sent to a server. Type a prompt, and watch it complete one token at a time, with the top-five next-word log-odds printed down the page as it goes.

1 Load the model

precision backend

Idle. Needs a WebGPU browser (recent Chrome/Edge) for the fast path; first load downloads the weights once, then caches them.

3 The prompt

max tokens 36

4 Top-5 next-word log-odds one block per generated token

Run a completion to watch the distribution print here, step by step.

5 Notes & references

The model: onnx-community/gemma-3-270m-it-ONNX · google/gemma-3-270m
Runtime: 🤗 transformers.js (ONNX Runtime Web + WebGPU)
The “read the prediction at each layer” idea: the logit lens (nostalgebraist)
Why next-token distributions are worth watching: LLMs Are Bad Dice Players · the rest of this site is built on these top-of-distribution defaults.

On the layer toggles: in-browser runtimes (transformers.js / ONNX Runtime Web) run the model as one compiled graph — they can’t skip a transformer block mid-forward, and this export doesn’t emit per-layer hidden states, so a true logit-lens read isn’t available here. The layer strip below is therefore a faithful map of the real architecture (pulled live from the loaded model’s config) that you can switch on and off to explore its shape — but switching a layer off does not alter the live computation. The top-5 log-odds above, by contrast, are the model’s genuine output distribution at each step.

1 Load the model

2 The layers

3 The prompt

4 Top-5 next-word log-odds one block per generated token

5 Notes & references