2 pointsby glossardi3 hours ago2 comments
  • glossardi3 hours ago
    Hey HN,

    I built Dictator because I wanted a lightweight, highly controllable voice-to-text tool for macOS that uses my own OpenAI API key instead of a monthly subscription service.

    It’s a Lua-based extension for Hammerspoon.

    How it works:

    Hold Fn (or a custom hotkey) to record.

    Release to transcribe.

    The text is auto-pasted into your active application (or copied to clipboard).

    Technical details & optimizations:

    Audio Pipeline: Uses SoX to record directly to FLAC (16kHz mono). This reduces upload size by ~50% compared to WAV, which significantly speeds up the Whisper API response time.

    Reliability: Implements a token bucket rate limiter to prevent API abuse and exponential backoff for handling 429/5xx errors gracefully.

    Debouncing: I added strict debouncing logic to ignore accidental short taps (<0.4s) and prevent double-triggers.

    Security: Your API key is stored locally and sent directly to OpenAI; there is no intermediate server.

    Repo: https://github.com/Glossardi/Dictator-Speech-to-Text

    I’d love to hear your thoughts on the push-to-talk UX versus a toggle approach, and if anyone has ideas on further reducing latency!

  • vee-kay3 hours ago
    This is cool.

    MS Windows (Win10 or Win11) has this dictate-speech-to-text feature built-in, but its accurate isn't up to the mark.

    Win11 users can use Win+H hotkeys for dictation: https://support.microsoft.com/en-us/windows/use-voice-typing...

    MS Copilot (obviously) understands dictation a lot better, but that speech recognition using Copilot is not (and should not be) available in every text box or editor on Windows.