Show HN: Prompt-to-Excalidraw demo with Gemma 4 E2B in the browser (3.1GB)

April 19, 2026
Finger pointing at a business infographic circle on a laptop screen in grayscale.
Photo by Artem Podrez on Pexels

What it does

A demo lets you prompt Gemma 4 E2B and get an Excalidraw diagram — entirely in your browser. It has been reported that the demo runs on Desktop Chrome 134+ only and ships as a hefty ~3.1 GB download, so this is not a phone trick. Want to sketch with an LLM but keep the pixels local? This is exactly that: no cloud round-trip, just you, the model, and a canvas.

How it works

Instead of spitting out raw Excalidraw JSON (roughly 5,000 tokens), the LLM allegedly outputs compact drawing code of about ~50 tokens, which the demo expands into a full diagram. It has been reported that the TurboQuant algorithm (polar + QJL) compresses the KV cache by roughly 2.4× so longer conversations can fit in GPU memory. The team reimplemented TurboQuant in WGSL compute shaders so the heavy lifting runs on WebGPU; it has been reported that this GPU path hits around 30+ tokens/sec. There’s also a sibling package, turboquant-wasm, that implements the same algorithm in WASM+SIMD for CPU-side vector search.

Why it matters

This is part of the on-device AI trend: pushing large-model capabilities into the browser, trading download size and strict platform requirements for privacy and latency. Limitations are real — you need WebGPU subgroups (so Safari/iOS is out, allegedly), roughly 3 GB of available memory, and a modern desktop Chrome — but the demo serves as a proof of concept. It’s a glimpse of what edge-first tooling could look like: powerful, local, and a little bit cheeky about how much it asks you to download.

Sources: teamchong.github.io, Hacker News