Yesterday Mistral released a 3B vision model that runs entirely in the browser via WebGPU.
Sounded really cool to me - a model you can run locally in the browser without connecting or installing anything complicated. I wanted to understand what that means so I did some research.
What is WebGPU?
WebGPU is a new API that allows the browser to access the GPU directly - this enables the browser to run ML models.
Before (WebGL): The browser could draw graphics, but couldn’t do heavy computations
Now (WebGPU): The browser can run ML models on the GPU - like a real server
It’s like the difference between calculating something on a calculator (CPU) versus calculating on a thousand calculators in parallel (GPU).
Why Does This Matter?
- No server needed
- No API calls
- No ongoing costs
- User downloads ~3GB and runs everything locally
Additional benefits: Complete privacy, Offline apps, Zero-cost prototyping.
I asked Claude what you could do with this and it had some nice examples.
My Experiment
I had to try it myself: I sent Claude Code the HuggingFace demo of the model and asked it to figure out how to build a small webapp with it.
We started with camera recognition like in the demo, added invoice scanning - the model tries but struggles a bit.
Then I also added:
- Screenshot analysis
- Food recognition
- Chat for fun
What’s Next?
Basically this is an open model you can run in the browser from anywhere without usage limits - sounds like a significant development.
Even though it’s currently a small model with limitations, I wonder what models like this will be able to do in another year?