I built this because I was frustrated with existing AI dictation tools failing in my daily workflow. I work in a locked-down corporate environment involving a lot of Remote Desktop (RDP) and Citrix sessions.
Because those environments block clipboard sharing for security, most dictation apps that rely on "transcribe-then-paste" simply don't work.
How it works: Instead of pasting text, DictaFlow mimics human keyboard input (sending keystrokes).
1. Audio: Captures raw PCM audio (16kHz).
2. Processing: Runs a quantized Whisper model locally.
3. Output: Simulates keypress events with a tunable delay. This tricks the remote desktop into thinking a physical keyboard is typing, bypassing the clipboard restriction entirely.
Features:
- Context Aware: It handles code formatting surprisingly well (e.g., "function def" turns into python syntax).
- Privacy: It doesn't upload audio to the cloud for training.
- Local-First: Designed to be lightweight.
It's currently Windows-only. I'd love feedback on the latency or any edge cases you find in other VDI environments.