What I'm most curious about is the native audio generation - is it just ambient sound/music, or can it generate synchronized speech? If it's the latter with reasonable lip-sync, that could eliminate a lot of post-production work for explainer videos and short-form content.
Also wondering about the API availability. Having this accessible programmatically would open interesting possibilities for automated content pipelines.