FORWARD PASS · WRITTEN BY HAND · RUNS ON YOUR GPU← index
Qwen3-0.6B · 28 layers · RoPE · GQA · QK-norm · SwiGLU · TensorFlow.js · WebGPU

A transformer,
layer by layer.

This page implements a modern Qwen3 forward pass by hand — RMSNorm, rotary embeddings, grouped-query attention with QK-normalisation, SwiGLU — and runs it on your GPU. Because we own the loop, we can read the prediction at every one of the 28 layers (the logit lens) and genuinely switch a layer off to watch the output change.

1 Load the model

Idle. Weights are fetched from Hugging Face once (then cached) and everything runs locally on your machine.
Heads-up: this is a real 0.6-billion-parameter model. First load downloads ~1.2 GB (BF16) and holds ~2.4 GB in memory once expanded to FP32. Best on a desktop with a discrete GPU and a recent Chrome/Edge; lower-end devices may run out of memory — if so, switch the backend to CPU.

2 The prompt

4 Top-5 next-word log-odds one block per generated token, printed as it goes

Load the model and run a forward pass.

5 Notes & references

Every part of Qwen3’s modern stack is implemented from scratch here: RMSNorm, rotary position embeddings (θ=1,000,000), grouped-query attention (16 query heads sharing 8 key/value heads), per-head QK-normalisation, and a SwiGLU feed-forward. Verifying that against a from-scratch port is the whole point — and it’s why this uses Qwen3 rather than something with sliding-window or mixture-of-experts attention, which are far harder to reproduce faithfully.