2 pointsby davedx7 hours ago1 comment
  • davedx7 hours ago
    Wan Streamer is a native-streaming, end-to-end interactive foundation model, designed from the ground up for real-time, low-latency, full-duplex audio-visual interaction. It models language, audio, and video as both input and output within a single Transformer: the sequence is an interleaving of visual, audio, and text input tokens with visual, audio, and text output tokens, coordinated by block-causal attention for incremental streaming.