compare with whatsapp voice messages and short videos already being the widely accepted norm.
we just need more focus on voice input/output support with llm agents now to accelerate this.