132 pointsby meetpateltech3 hours ago17 comments
  • vunderbaan hour ago
    OpenAI’s gpt-image-1.5 and Google’s NB2 have been pretty much neck and neck on my comparison site which focuses heavily on prompt adherence, with both hovering around a 70% success rate on the prompts for generative and editing capabilities. With the caveat being that Gemini has always had the edge in terms of visual fidelity.

    That being said, gpt-image-1.5 was a big leap in visual quality for OpenAI and eliminated most of the classic issues of its predecessor, including things like the “piss filter.”

    I’ll update this comment once I’ve finished running gpt-image-2 through both the generative and editing comparison charts on GenAI Showdown.

    Since the advent of NB, I’ve had to ratchet up the difficulty of the prompts especially in the text-to-image section. The best models now score around 70%, successfully completing 11 out of 15 prompts.

    For reference, here’s a comparison of ByteDance, Google, and OpenAI on editing performance:

    https://genai-showdown.specr.net/image-editing?models=nbp3,s...

    And here’s the same comparison for generative performance:

    https://genai-showdown.specr.net/?models=s4,nbp3,g15

    UPDATE: gpt-image-2 has already managed to overcome one of the so‑called “model killers” on the test suite: the nine-pointed star.

  • ea0162 hours ago
    Price comparison:

    GPT Image 2

      Low     : 1024×1024 $0.006 | 1024×1536 $0.005 | 1536×1024 $0.005
    
      Medium  : 1024×1024 $0.053 | 1024×1536 $0.041 | 1536×1024 $0.041
    
      High    : 1024×1024 $0.211 | 1024×1536 $0.165 | 1536×1024 $0.165
    
    GPT Image 1

      Low     : 1024×1024 $0.011 | 1024×1536 $0.016 | 1536×1024 $0.016
    
      Medium  : 1024×1024 $0.042 | 1024×1536 $0.063 | 1536×1024 $0.063
    
      High    : 1024×1024 $0.167 | 1024×1536 $0.25  | 1536×1024 $0.25
    • Melatonican hour ago
      Weird that they restrict the resolution so much. Does it fall apart with more detail (when zoomed in) or does the cost just skyrocket?
      • BoredPositrona few seconds ago
        We are still at 128px latent space and we won't see an increase without new hardware. The VAEs doing a lot of heavy lifting in the last few sota image models like in nano images 2.0 als produces a lot of low frequency artifacts they are in all directions so it's hard to determine if its their RoPE or the VAE. Looking at some image edits I would say both.
      • vunderbaan hour ago
        It's usually based on what they've been trained on. There aren't very many models that'll do higher resolutions outside of Seedream but adherency is worse.
  • kibibu12 minutes ago
    Genuine question: what positive use cases are sufficient to accept the harm from image generators?

    One that i can think of:

    - replacing photography of people who may be unable to consent or for whom it may be traumatic to revisit photographs and suitable models may not be available, e.g. dementia patients, babies, examples of medical conditions.

    Most other vaguely positive use cases boil down to "look what image generators can do", with very little "here's how image generators are necessary for society.

    On the flip side, there are hundreds of ways that these tools cause genuine harm, not just to individuals but to entire systems.

  • throwaway20272 hours ago
    I know people like to dunk on ChatGPT and Gemini and say Claude is or used to be better, but you can still use worse models when you're out of usage AND make use of Nano Banana and and ChatGPT Image generation with separate limits for your subscription. I think it could make it a more package as a whole for some people (non-programmers). I do like having the option and am excited for which improvements they've done to ChatGPT Image generation because in the past it had this yellow piss filter and 1.5 it sort of fixed it but made things really generic with Nano Banana beating it (altough Gemini also had a too aggressively tuned racial bias which they fixed), it seems the images ChatGPT generates have gotten better.
  • joegibbs24 minutes ago
    The quality of the text is really impressive and I can’t seem to see any artefacts at all. The fake desktop is particularly good: Nano Banana would definitely slip up with at least a few bits of the background.
  • samiwami2 hours ago
    do they have anything similar to SynthID, or are they just pretending that problem doesn't exist?

    I know this is probably mega cherry-picked to look more impressive, but some of the images are terrifyingly realistic. They seem to have put a lot of effort into the lighting.

    • alextheparrot2 hours ago
      > Integrating an imperceptible, robust, and content-specific watermark

      From the system card someone linked elsewhere in the discussion

    • Legend24402 hours ago
      I think we are just going to have to accept that realistic images can be easily fabricated now.

      Seeing is not believing anymore, and I don't think SynthID or anything like it can restore that trust in images.

  • louiereederson2 hours ago
    The image of the messy desktop with the ASCII art is so impressive - the text renders, the date is consistent, it actually generated ASCII art in "ChatGPT", etc. I was skeptical that it was cherry-picked but was able to generate something very similar and then edit particular parts on the desktop (i.e. fixing content in the browser window and making the ASCII dog "more dog like"). It's honestly astounding, to me at least.
  • Melatonican hour ago
    We were afraid it would be Skynet and instead we got the ultimate meme generator !
  • throw310822an hour ago
    Ok, I can hear the sound of entire industries crumbling right now.
  • an hour ago
    undefined
  • thevinter2 hours ago
    Every time a new image gen comes out I keep saying that it won't get better just to be surprised again and again. Some of the examples are incredible (and incredibly scary. I feel like this is truly the point where understanding if something is AI becomes impossible)
    • lehmacdj2 hours ago
      So do you think there will be a better image model in a year?
      • throw310822an hour ago
        I'll bite: no I don't think so. If the examples are not cherry-picked and by "image model" we mean just the ability to generate pictures, this looks like parity with human excellence, there isn't much space for further improvement. The images don't just look real, they look tasteful- the model is not just generating a credible image, it's generating one that shows the talent of a good photographer/ designer/ artist.
      • Vachyasan hour ago
        I'm honestly unsure what could be improved at this point.

        Consistency? So it fails less often?

        Based on the released images, (especially the one "screenshot" of the Mac desktop) I feel like the best images from this model are so visually flawless that the only way to tell they're fake is by reasoning about the content of the image itself (ex. "Apple never made a red iPhone 15, so this image is probably fake" or "Costco prices never end in .96 so this image is probably fake")

        • RobinLan hour ago
          I'm been impressed when testing this model today, but it still can't consistently adhere to the following prompt: make me an image of a pizza split into 10 equal slices with space in between the them, to help teach fractions to a child.

          It doesn't reliably give you 10 slices, even if you ask it to number them. None of the frontier models seem to be able to get this right

        • thevinteran hour ago
          There is definitely room for improvement: https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a...

          Especially when it comes to detailed outputs or non-standard prompts.

          I do believe it will get even better - not sure it will happen within a year but I wouldn't be incredibly surprised if it did.

          • vunderbaan hour ago
            Yep. “Where’s Waldo” has been a classic challenge for generative models for a while because it requires understanding the entire concept (there’s only one Waldo), while also holding up to scrutiny when you examine any individual, ordinary figure.

            I experimented with the concept of procedural generation of Waldo-style scavenger images with Flux models with rather disappointing results. (unsurprisingly).

          • throw310822an hour ago
            I wonder if at this point you could just ask the agent to iteratively refine the image in smaller portions.
        • jinushaun17 minutes ago
          Cost? Speed?
  • minimaxir3 hours ago
    Model card for the API endpoint gpt-image-2 (which may or may not reflect the output from ChatGPT Images 2): https://developers.openai.com/api/docs/models/gpt-image-2

    API Pricing is mostly unchanged from gpt-image-1.5, the output price is slightly lower: https://developers.openai.com/api/docs/pricing

    ...buuuuuuuuut the price per image has changed. For a high quality image generation the 1024x1024 price has increased? That doesn't make sense that a 1024x1024 is cheaper than a 1024x1536, so assuming a typo: https://developers.openai.com/api/docs/guides/image-generati...

    The submitted page is annoyingly uninformative, but from the livestream it proports the same exact features as Gemini's Nano Banana Pro. I'll run it through my tests once I figure out how to access it.

    • strongpigeonan hour ago
      > That doesn't make sense that a 1024x1024 is cheaper than a 1024x1536, [...]

      I think you meant more expensive, right? Because it would make sense for it to be cheaper as there are less pixels.

  • ieie3366an hour ago
    It's great. Also doesn't seem to have any "slop" standard look, the images it produces are quite diverse.

    I would imagine this will hit illustrators / graphics designers / similar people very hard, now that anyone can just generate professional looking graphical content for pennies on the dollar.

  • retrac98an hour ago
    The page keeps crashing on my iPhone 17 Pro.
  • Bennettheynan hour ago
    fal has the endpoint under openai/gpt-image-2
  • ChrisArchitectan hour ago
    Fake layouts, fake handwritten kid story, fake drunk photos? All from training on real things people did.

    As with anything AI, we are not ready for the scale of impact. And for what? Like, why are you proud of this?