6 pointsby marques57621 hours ago1 comment
  • marques57621 hours ago
    Some features:

    169M parameters

    Streaming support

    Zero-shot voice cloning

    0.25 RTF on CPU, meaning it generates 30 seconds of audio in 7.5 seconds

    Requires 3-12 seconds of reference audio for voice cloning

    Apache 2.0 license

    The model was trained on a single L40S GPU. It’s not SOTA in most cases, can be a bit unstable, and sometimes fails to capture voice likeness.