Show HN: Three new models by KittenML. <25 MB Open-source TTS. Highly Expressive(kittenml.com)

3 pointsby rohan_joshi8 hours ago1 comment

rohan_joshi8 hours ago
New Kitten TTS V0.8 models are out in three variants - 80M, 40M, 14M. The largest model has the highest quality. The 14M variant reaches new SOTA in expressivity among similar sized models, despite being <25MB in size. All models are highly expressive and realistic with high quality voices. Kitten TTS is an open-source series of tiny and expressive text-to-speech models for on-device applications, built by KittenML (with < 3) . This release supports English text-to-speech applications in eight voices: four male and four female. Most models are quantized to int8 + fp16, and it uses onnx for runtime. The model is designed to run literally anywhere eg. raspberry pi, low-end smartphones, wearables, browsers etc. No GPU required! This release bridges the gap between on-device and cloud models for tts applications. Multi-lingual support is planned for the future.
We'd love your feedback! On-device AI is currently bottlenecked by the availability of tiny performant models. We're trying to change that by releasing open-source models that can unlock on-device voice agents and applications in the next few months.
Code, weights and more information available on our github: https://github.com/KittenML/KittenTTS
- 9999000009997 hours ago
  Some actual audio examples would be nice. I'd like to see what this is before taking the time to run it
  - rohan_joshi5 hours ago
    we also launched on reddit and got great feedback on locallamma. the video with samples are posted there too.
  - rohan_joshi5 hours ago
    hi, the readme in the github has a video. the entire audio is outputted from the models ^^
    would love the feedback.
    9999000009992 hours ago
    Thank you, this is what I'm excited about. I could run this on a raspberry pi and easily build a locally ran home assistant