4 pointsby docstryder3 hours ago2 comments
  • 1024bits3 hours ago
    Congrats on launch!

    I'm a little bit confused by this. You say it supports 100+ languages, but on the landing page some languages are colored in and the rest are greyed out, and the total number doesn't seem to amount to 100+.

    Also, presumably the local model doesn't cost you anything per token. So why isn't that one the free tier, with the cloud model being in the paid plan? Wouldn't that help you get a lot more users cost-efficiently?

    Lastly, your landing page has a lot of "AI hallmarks". This may or may not be a bad thing, but at least on here I imagine many people are fatigued from this pattern.

    I'm all for apps that don't use Electron. What did you use for this?

  • docstryder3 hours ago
    I've been working on Shoute, a speak-to-text app for Mac and Windows that's built around one idea: the full loop has to feel instant

    I do know this isn’t a new category. A lot of people here already have some version of this: whisper.cpp behind a hotkey, macOS dictation, SuperWhisper, Wispr Flow, or some other hand-rolled version.

    I built one anyway because I kept bouncing off dictation tools in my actual workday.

    My problem was not “can an app transcribe my voice?” Most of them can, and impressively well. The problem was the full loop: press shortcut -> speak -> release -> cleaned up text appears where I was already typing - and that this happens consistently, quickly, day after day.

    If that loop has enough delay, I lose the thread. If the output is too raw, I am back to editing. If the app needs screenshots to understand context, I start feeling uneasy about using it everywhere. You want to be confident that it always will work - or else you lose trust in it.

    So the version I wanted was pretty narrow: - it should feel super quick for short everyday dictation

    - the output should be cleaned up before insertion

    - it should work across ALL the usual apps

    - it should never lose data

    - it should support both local and cloud modes (personally for flying but privacy too for specific things)

    - it should use only minimal context

    Shoute solves all of that really well and is lightweight (native code) and fluid to use day to day. It has a generous free tier (2000 words/week - should be enough for most casual use), one time purchase for both local and cloud, and cloud with subscription ($6.99/mo) for when folks who need the latest cloud models. Not a fan of subscriptions too but hard to have ongoing support for the latest cloud models without it.

    Learned some really cool things building this:

    The interesting eng lesson for me has been that voice UX is so much more latency-sensitive than normal app UX - the major part of the work on this was on making it consistently low latency end to end.

    On latency - the model is only one part of the delay. Shoute runs three backends for different modes and fallback (ElevenLabs streaming, Groq Whisper, and WhisperKit for on-device) and each has different latency profiles. For short recordings (~15s is my avg - Shoute can do really long but not the primary use case for hour long recordings), the annoying delays often come from everything around the model: audio finalization, connection warmup, WebSocket setup, token fetching, fallback paths, local model cold starts, and finally pasting into the active app. Getting all this right consistently took significant time and eng effort despite Claude helping with all of it - taste and architectural direction is still absolutely essential in 2026, especially with desktop and system apps.

    Native development is still hard - things like WebSockets are fundamentally web technologies and their native libraries have a lot of hard edges and inconsistencies that only show up when you use something 100 times a day - took some engineering to get around this. Native does make the UX fast but it almost made me wish I had chosen Electron for something with this much network management, but speed and resource efficiency is worth going native for.

    Okay, this already feels long - please try it, let me know how it feels, glad to hear feedback and feature requests. Thank you! Here is the link: https://getshoute.com/deepdive