1 pointby schappim12 days ago1 comment
  • storystarling12 days ago
    Nice work on the pure Go implementation. I built a similar pipeline for generating audio versions of articles and the pricing trade-offs are tough at scale. ElevenLabs is obviously the quality winner but their per-character pricing eats up all the margin if you're doing anything high volume. I've found Deepgram to be the most pragmatic choice lately since OpenAI's prosody can be a bit flat on longer texts.
    • schappim11 days ago
      Deepgram is really good for diorization. And speech to text, but they're not as good as the others for text to speech.

      ElevenLabs is awesome for speech generation (nothing beats it), but their speech to text is terrible especially for voice activity detection.