You say SOTA but I don't see any benchmarks/leaderboards that back this up except in voice cloning, which I don't particularly care about. Consider getting it added to https://huggingface.co/spaces/TTS-AGI/TTS-Arena-V2 and https://artificialanalysis.ai/text-to-speech/arena.
Also, can these voices be plugged into the Unreal/Unity SDKs?