When listening to long documents like PDFs, articles, or drafts, headings, sections, and transitions all sound the same. That makes audio hard to follow, even when the voice quality is good.
FlowSpeech is a context-aware TTS tool that tries to preserve structure when generating audio — treating headings, paragraphs, and pauses differently so long-form content is easier to listen to.
This is still an early project, and I’m curious how others here think about structure vs voice quality in TTS.