150 pointsby meetpateltech4 days ago9 comments
  • homarp4 days ago
    Running Voxtral-Mini-3B-2507 on GPU requires ~9.5 GB of GPU RAM in bf16 or fp16.

    Running Voxtral-Small-24B-2507 on GPU requires ~55 GB of GPU RAM in bf16 or fp16.

  • GaggiX4 days ago
    There is also a Voxtral Small 24B small model available to be downloaded: https://huggingface.co/mistralai/Voxtral-Small-24B-2507
  • ipsum24 days ago
    24B is crazy expensive for speech transcription. Conspicuously no comparison with Parakeet, a 600M param model thats currently dominating leaderboards (but only for English)
    • azinman24 days ago
      But it also includes world knowledge, can do tool calls, etc. It’s an omnimodel
    • qwertox3 days ago
      Only the mini is meant for pure transcription. And with the tests I just did on their API, comparing to Whisper large, they are around three times faster, more accurate and cheaper.

      24B is, as sibling comment says, an omni model, it can also do function calling.

  • kamranjon4 days ago
    Im pretty excited to play around with this. I’ve worked with whisper quite a bit, it’s awesome to have another model in the same class and from Mistral, who tend to be very open. I’m sure unsloth is already working on some GGUF quants - will probably spin it up tomorrow and try it on some audio.
  • lostmsu4 days ago
    Does it support realtime transcription? What is the ~latency?
    • rolisz3 days ago
      Unlikely. The small model is much larger than whisper (which is already hard to use for realtime)
  • sheerun4 days ago
    In demo they mention polish prononcuation is pretty bad, spoken as if second language of english-native speaker. I wonder if it's the same for other languages. On the other hand whispering-english is hillariously good, especially different emotions.
    • Raed6674 days ago
      It is insane how good the "French man speaking English" demo is. It captures a lot of subtleties
      • potlee3 days ago
        That’s an actual French man speaking English
  • lostmsu4 days ago
    My Whisper v3 Large Turbo is $0.001/min, so their price comparison is not exactly perfect.
  • danelski4 days ago
    They claim to undercut competitors of similar quality by half for both models, yet they released both as Apache 2.0 instead of following smaller - open, larger - closed strategy used for their last releases. What's different here?
    • halJordan4 days ago
      They didn't release voxtral large so your question doesn't really make sense
      • danelski3 days ago
        It's about what their top offering is at the moment, not having Large in name. Mistral Medium 3 is notably not Mistral Large 3, but it was released as API-only.
    • wmf4 days ago
      They're working on a bunch of features so maybe those will be closed. I guess they're feeling generous on the base model.
    • Havoc4 days ago
      Probably not looking to directly compete in transcription space
  • homarp4 days ago
    • homarp4 days ago
      Running Voxtral-Mini-3B-2507 on GPU requires ~9.5 GB of GPU RAM in bf16 or fp16.

      Running Voxtral-Small-24B-2507 on GPU requires ~55 GB of GPU RAM in bf16 or fp16.