We tested 13 providers on 100 real customer calls with:
- Background noise (vans, job sites, crying babies)
- UK regional accents (England North/South, Scotland, Ireland)
- Critical info: postcodes, addresses, phone numbers
- Variable turn length (1-5 words vs 16+)
Results: 2.5x performance gap
Best: Deepgram Flux (15.86% WER)
Worst: OpenAI Whisper (39.78% WER)
Interesting findings:(1) Postcode recognition was hardest across ALL providers (50%+ WER).
(2) Regional variance was massive. Ireland accents destroyed most models (20-30% higher WER than Southern England).
(3) Short confirmations ("yeah", "ok") actually had worse WER than long explanations. Counter-intuitive but likely due to less context for the language model.
Full breakdown with graphs: https://x.com/pstrav/status/2018416957003866564
Context: We're Elyos AI (YC S23), handling 100k+ calls/month for trades businesses across the world.