Show HN: Imagedojo.ai – Blind arena for Google, OpenAI, and xAI image generators(imagedojo.ai)

1 pointby vtail2 hours ago1 comment

vunderba2 hours ago
For reference, have you seen the Artificial Analysis Image Arena Leaderboard? They also show you two images from anonymized models (shows after you vote), and calculates crowdsourced ELO ratings.
https://huggingface.co/spaces/ArtificialAnalysis/Text-to-Ima...
- vtail2 hours ago
  Thanks - and no, I haven't seen this one. I like how they have the edit mode dashboard - show the original image + two edits; I was thinking about doing something like this.
  I'm also a bit surprised they have gpt-image-1.5 so high above Nano Banana 2 - my limited testing shows that, at least for the visual styles, people like Nano Banana more.
  - vunderba2 hours ago
    Yeah I think that it's part of the issue with a single "squashed" comparative metric. Some users are going to grade higher based on the overall visual fidelity and others are going to value following the prompt.
    For a point of reference, I run a pretty comprehensive image model comparison site heavily weighted in favor of prompt adherence.
    https://genai-showdown.specr.net
    EDIT: FWIW, I agree with your assessment. OpenAI's models have always been very strong in prompt adherence but visually weak (gpt-image-1 had the famous "piss filter" until they finally pushed out gpt-image-1.5)
    vtail2 hours ago
    Very cool site - I think I saw it before here on HN, and I liked it a lot.
    Did you manually review all the edit results manually yourself, or do you have some kind of automated procedure?
    vunderbaan hour ago
    Thanks. So I have a bespoke python program that basically does this:
    - Takes the platonic set of prompts
    - Uses model specific tuning directives with LLMs to create a bunch of prompt variations so that they get a diverse set of natural language expressions to "roll" generations
    But I still have to manually review each of the final image - which is pretty time-consuming. I've tried automating it using VLMs (like Qwen3-VL) but unfortunately they can miss the small details and didn't provide as much value as I was hoping.