Your use of Whisper models on-device for macOS aligns well with the goals of speech-swift (which I maintain), offering robust integration with CoreML for ASR and TTS. It could serve as an alternative with its native Swift async support on Apple Silicon. Explore more here:
https://github.com/soniqo/speech-swift