But this could be a legitimate way to design apps in general if you could tell the models what you liked and didn't like.
To preserve the voter experience without introducing bias, our current approach waits for the slowest model within each binary comparison — so even if one model is faster, we don’t display until both are ready. You're right that this does introduce some bias for the two smallest models, and we'd love to hear suggestions for how to make this better!
As for the 5th request: we actually kick off one reserve model alongside the four randomly selected for the tournament. This backup isn’t shown unless one of the four fails — it’s not the fastest or lowest-latency model, just a randomly selected fallback to keep the system robust without skewing results.
Generating a new image is great, but it would be even better if I could see multiple images from different models in the /feed, just to explore how other prompts look without having to generate and wait.
If you have a tool/mode/prompt that creates good mobile UI designs, I'd love to know. Doesn't even have to generate code!