3 pointsby rohan_joshi8 hours ago1 comment
  • rohan_joshi8 hours ago
    New Kitten TTS V0.8 models are out in three variants - 80M, 40M, 14M. The largest model has the highest quality. The 14M variant reaches new SOTA in expressivity among similar sized models, despite being <25MB in size. All models are highly expressive and realistic with high quality voices. Kitten TTS is an open-source series of tiny and expressive text-to-speech models for on-device applications, built by KittenML (with < 3) . This release supports English text-to-speech applications in eight voices: four male and four female. Most models are quantized to int8 + fp16, and it uses onnx for runtime. The model is designed to run literally anywhere eg. raspberry pi, low-end smartphones, wearables, browsers etc. No GPU required! This release bridges the gap between on-device and cloud models for tts applications. Multi-lingual support is planned for the future.

    We'd love your feedback! On-device AI is currently bottlenecked by the availability of tiny performant models. We're trying to change that by releasing open-source models that can unlock on-device voice agents and applications in the next few months.

    Code, weights and more information available on our github: https://github.com/KittenML/KittenTTS

    • 9999000009997 hours ago
      Some actual audio examples would be nice. I'd like to see what this is before taking the time to run it
      • rohan_joshi5 hours ago
        we also launched on reddit and got great feedback on locallamma. the video with samples are posted there too.
      • rohan_joshi5 hours ago
        hi, the readme in the github has a video. the entire audio is outputted from the models ^^

        would love the feedback.

        • 9999000009992 hours ago
          Thank you, this is what I'm excited about. I could run this on a raspberry pi and easily build a locally ran home assistant