On the browser extension side, we're forking WebLLM, adding support for more modern multimodal models, and doing some optimization so that an M4 chip can keep up with scrolling. You can actually use it in bouncer today by going into settings and turning on the experimental local models.
On the mobile side, we're working to get 4B models running in the Apple Neural Engine. Main bottleneck for Mobile is actually battery life. Neither are quite optimized enough to formally brag about, but we're almost there!