I don't think the future as painted for us presently is as guaranteed as those would profit from it would like you to think.
Text is just human thoughts in their most simple form. Writing is about expressing ideas, and there is almost an infinite number of ways to express them. Extremely difficult task, and LLMs only "imitate" it to the best of their training
This is not at all true for voice. There are an infinite number of possible voices, but a finite number of tones and phonemes you can use to express the text.
It's a much easier technical problem; it's just that it's much harder to gather proper data (you cannot just scrape Reddit and hope for the best, as LLMs do). And voice gets like 1/100th of LLMs' funding
Right now, the main thing making these things recognisable is there's so few voices. The voices themselves are basically celebrities, albeit in the same way as some annoying D-list celebrity who somehow managed to get a bajillion contracts for advertising cheap tat.
Given that LLM slop is currently rapidly degrading the trustworthiness of search results (even moreso than SEO already had), it's probably for the best if the major AI providers don't release a bunch more voices.