This paper was posted on arXiv today. It shows surprisingly strong results on ImageNet at 512 resolution (FID 1.41) with one-step generation, while requiring 50% less training-time memory. Do you think this could become the next standard training method for image foundation models? Feel free to leave your comments.