In the past, when I used the local KokoroTTS model, I did something a bit more primitive for longer texts where it constantly mispronounced a word: a pre-TTS pass with regex to replace those words with a phonetic-sounding equivalent that worked more consistently (e.g. replacing “Danish” with “Day-nish.”)