LANGUAGE MODEL · ET 270 · RUNS IN YOUR BROWSER← index
Gemma 3 · 270M · WebGPU · transformers.js

A model
thinking out loud.

A very small Gemma is downloaded once and run entirely on your machine — nothing is sent to a server. Type a prompt, and watch it complete one token at a time, with the top-five next-word log-odds printed down the page as it goes.

1 Load the model

Idle. Needs a WebGPU browser (recent Chrome/Edge) for the fast path; first load downloads the weights once, then caches them.

3 The prompt

4 Top-5 next-word log-odds one block per generated token

Run a completion to watch the distribution print here, step by step.

5 Notes & references

On the layer toggles: in-browser runtimes (transformers.js / ONNX Runtime Web) run the model as one compiled graph — they can’t skip a transformer block mid-forward, and this export doesn’t emit per-layer hidden states, so a true logit-lens read isn’t available here. The layer strip below is therefore a faithful map of the real architecture (pulled live from the loaded model’s config) that you can switch on and off to explore its shape — but switching a layer off does not alter the live computation. The top-5 log-odds above, by contrast, are the model’s genuine output distribution at each step.