hear about what people might build with it
My startup is making software for firefighters to use during missions on tablets, excited to see (when I get the time) if we can use this as a keyboard alternative on the device. Due to the sector we try to rely on the cloud as little as possible and run things either on device or with the possibility of being self-hosted/on-premise. We would need Norwegian though for our current customers.Looking through your web
[1]: https://huggingface.co/spaces/hf-audio/open_asr_leaderboard
I'm actually a little surprised they haven't added model size to that chart.
Oh and I type this in handy with just my voice and parakeet version three, which is absolutely crazy.
I can tell that this is now definitely going to be my go-to model and app on all my clients.
The one built in is much faster, and you only have to toggle it on.
Are these so much more accurate? I definitely have to correct stuff, but pretty good experience.
Also use speech to text on my iphone which seems to be the same accuracy.
edit: holy shit parakeet is good.... Moonshine impressive too and it is half the param
Now if only there was something just as quick as Parakeet v3 for TTS ! Then I can talk to codex all day long!!!
i was using assmeblyAI but this is fast and accurate and offline wtf!
Weird to only release English as open weights.
I'd love a faster and more accurate option than Whisper, but streamers need something off-the-shelf they can install in their pipeline, like an OBS plugin which can just grab the audio from their OBS audio sources.
I see a couple obvious problems: this doesn't seem to support translation which is unfortunate, that's pretty key for this usecase. Also it only supports one language at a time, which is problematic with how streamers will frequently code-switch while talking to their chat in different languages or on Discord with their gameplay partners. Maybe such a plugin would be able to detect which language is spoken and route to one or the other model as needed?
For voice agents, the painful failure mode is partials getting rewritten every few hundred ms. If you can share it, metrics like median first-token latency, real-time factor, and "% partial tokens revised after 1s / 3s" on noisy far-field audio would make comparisons much more actionable.
If those numbers look good, this seems very promising for local assistant pipelines.
There was an issue with a demo but it's missing now. I can't recall for sure but I think I got it working locally myself too but then found it broke unexpectedly and I didn't manage to find out why.
The minimum useful data for this stuff is a small table of language | WER for dataset
The authors do acknowledge this though and give a slightly too complex way to do this with uv in an example project (FYI, you dont need to source anything if you use uv run)
> This code, apart from the source in core/third-party, is licensed under the MIT License, see LICENSE in this repository.
> The English-language models are also released under the MIT License. Models for other languages are released under the Moonshine Community License, which is a non-commercial license.
> The code in core/third-party is licensed according to the terms of the open source projects it originates from, with details in a LICENSE file in each subfolder.
The English-language models are also released under the MIT License. Models for other languages are released under the Moonshine Community License, which is a non-commercial license.
The code in core/third-party is licensed according to the terms of the open source projects it originates from, with details in a LICENSE file in each subfolder."
Timestamp 1: 2026-02-25T00:31:28 1771979488 https://news.ycombinator.com/item?id=47145661
Timestamp 2: 2026-02-25T00:32:03 1771979523 https://news.ycombinator.com/item?id=47145666
Two detailed large comments in two different threads in a 35 second span from a new account.