13 pointsby smusamashah16 hours ago1 comment
  • kgeist4 hours ago
    I tried a few SOTA realtime avatar systems from Chinese labs and the actual quality was far worse than the amazing (cherrypicked) videos on their demo pages

    I ran an analysis on hundreds generated videos featuring various races/ethnicities and found that Chinese models are overfitted on East Asian faces (predictable though) and have trouble properly animating many European/most African faces (bad lipsync).

    They all had accumulating artifacts over the long term (the video stops being stable after N seconds, for example the image gets more and more washed out)

    So I don't have high hopes here, everyone on the demo page is predictably East Asian and the output quality doesn't look better than prior art. I guess the innovation here is that it's end-to-end but we need to see if it's any good. WAN-derived image-audio-to-video systems used to be notoriously slow, here they boast 25 FPS for 192p but it's pretty slow actually, I managed to reach similar FPS for 720p with prior art.