I have enough RAM on my Mac that I can run smaller LLMs locally. So for me the whole thing stays local
It's been a while, so I don't know if it's going to work because of the Nemo toolkit ASR numpy dependency issues.
I use it for Linux using whisper CPP and it works great
Accuracy Average WER: Whisper-large-v3 4.91 vs Parakeet V3 5.05
Speed RTFx: Whisper-large-v3 126 vs PArakeet V3 2154
~17x faster
> NVIDIA’s ParakeetV3 model
You can't install .exe's, but you can connect to the Internet, download and install approximately two hundred wheels (judging by uv.lock), many of which contain opaque binary blobs, including an AI model?
Why does your organization think this makes any sense?
My use case is to generate subtitles for Youtube videos (downloaded using yt-dlp). Word-level accurracy is also nice to have, because I also translate them using LLMs and edit the subtitles to better fit the translation.
I'm using that to dictate prompts, it struggles with technical terms: JSON becomes Jason, but otherwise is fine
(this was transcribed using whisper.cpp with no edits. took less than a second on a 5090)
i loved whisper but it was insanely slow on cpu only and even then it was with a smaller whisper that isn't as accurate as parakeet.
my windows environment locks down the built-in windows option so i don't have a way to test it. i've heard it's pretty good if you're allowed to use it, but your inputs don't stay local which is why i needed to create this project.