Spoke runs a 600M-parameter speech model (NVIDIA Parakeet TDT) entirely on-device — no internet required, audio never leaves your Mac. On Apple Silicon it transcribes 60 seconds of audio in ~400ms (150x realtime). Word error rate is 6.34% vs Whisper large-v3's 7.4%, at 2.6x smaller model size.
The part I'm most proud of is the Flow builder — a visual automation engine on top of the transcription layer. Instead of just "speak → insert text", you can chain 14 node types: AI Skills (with 5 provider options including Ollama for fully local LLMs), webhooks, AppleScript, Shortcuts, conditional routing by active app, text transforms, clipboard, file saves, and more. So you can do things like: speak casually → rewrite to professional tone → insert into the active app → send a webhook log → save to a daily journal file. All triggered from a single keypress.
A few things I deliberately did differently:
- Native SwiftUI, not Electron. Under 50MB RAM at idle vs 500-800MB for cloud alternatives - No account required - $9.99 one-time vs $180/year competitors (50 free uses to try it) - API keys stored in macOS Keychain, not their servers - Per-app flow configuration (different behavior in VS Code vs Slack vs Mail) - Voice ID — biometric speaker verification so it only responds to you
I'm a solo developer, shipped this about two weeks ago. It's had its first real users and I've been iterating fast based on feedback. Just shipped v1.1.0 yesterday.
Would love honest feedback — especially from people who've tried Superwhisper, Wispr Flow, or similar tools. What did I miss? What would make you switch?