New Kitten TTS V0.8 models are out in three variants - 80M, 40M, 14M. The largest model has the highest quality. The 14M variant reaches new SOTA in expressivity among similar sized models, despite being <25MB in size. All models are highly expressive and realistic with high quality voices. Kitten TTS is an open-source series of tiny and expressive text-to-speech models for on-device applications, built by KittenML (with < 3) .
This release supports English text-to-speech applications in eight voices: four male and four female. Most models are quantized to int8 + fp16, and it uses onnx for runtime. The model is designed to run literally anywhere eg. raspberry pi, low-end smartphones, wearables, browsers etc. No GPU required! This release bridges the gap between on-device and cloud models for tts applications.
Multi-lingual support is planned for the future.
We'd love your feedback! On-device AI is currently bottlenecked by the availability of tiny performant models. We're trying to change that by releasing open-source models that can unlock on-device voice agents and applications in the next few months.
Code, weights and more information available on our github: https://github.com/KittenML/KittenTTS