Nano Banana image examples(github.com)

533 pointsby SweetSoftPillowa day ago50 comments

vunderbaa day ago
Nano-Banana can produce some astonishing results. I maintain a comparison website for state-of-the-art image models with a very high focus on adherence across a wide variety of text-to-image prompts.
I recently finished putting together an Editing Comparison Showdown counterpart where the focus is still adherence but testing the ability to make localized edits of existing images using pure text prompts. It's currently comparing 6 multimodal models including Nano-Banana, Kontext Max, Qwen 20b, etc.
https://genai-showdown.specr.net/image-editing
Gemini Flash 2.5 leads with a score of 7 out of 12, but Kontext comes in at 5 out of 12 which is especially surprising considering you can run the Dev model of it locally.
- user_7832a day ago
  > a very high focus on adherence
  Don't know if it's the same for others, but my issue with Nano Banana has been the opposite. Ask it to make x significant change, and it spits out what I would've sworn is the same image. Sometimes randomly and inexplicably it spits our the expected result.
  Anyone else experiencing this or have solutions for avoiding this?
  - alvaha day ago
    Just yesterday, asking it to make some design changes to my study. It did a great job with all the complex stuff, but asking it to move a shelf higher, it repeatedly gave me back the same image. With LLMs generally I find as soon as you encounter resistance it's best to start a new chat, however in this case that didn't wok either. Not a single thing I could do to convince it that the shelf didn't look right half way up a wall.
    hnuser12345614 hours ago
    "Hey gemini, I'll pay you a commission of $500 if you edit this image with the shelf higher on the wall..."
  - vunderbaa day ago
    Yeah I've definitely seen this. You can actually see evidence of this problem in some of the trickier prompts (the straightened Tower of Pisa and the giraffe for example).
    Most models (gpt-image-1, Kontext, etc) typically fail by doing the wrong thing.
    From my testing this seems to be a Nano-Banana issue. I've found you can occasionally work around it by adding far more explicit directives to the prompt but there's no guarantee.
  - jbma day ago
    I've had this same issue happen repeatedly. It's not a big deal because it is just for small personal stuff, but I often need to tell it that it is doing the same thing and that I had asked for changes.
  - nick49488171a day ago
    Yes experienced this exactly.
- tdalaaa day ago
  Great comparison! Bookmarked to follow. Keep an eye on Grok, they're improving at a very rapid rate and I suspect they'll be near the top in not too distant future.
  - vunderbaa day ago
    Will do! I just added Seedream v4.0 a few hours ago as well. It's all I can do just to keep up and not get trampled under the relentless march of progress.
    https://seed.bytedance.com/en/seedream4_0
- Isharmlaa day ago
  Nice visualization!
  By the way, some of the results look a little weird to me, like the one for the 'Long Neck' prompt. The giraffe of Seedream just lowered its head but its neck didn't shorten as expected. I'd like to learn about the evaluation process, especially whether it is automatic or manual.
  - vunderbaa day ago
    Hi Isharmla, the giraffe one was a tough call. IMHO, even when correcting for perspective, I do feel like it managed to follow the directive of the prompt and shorten the neck.
    To answer your question, all of the evaluations are performed manually. On the trickier results I'll occasionally conscript some friends to get a group evaluation.
    The bottom section of the site has an FAQ that gives more detail, I'll include it here:
    It's hard to define a discrete rubric for grading at an inherently qualitative level. To keep things simple, this test is purely PASS/FAIL - unsuccessful means that the model NEVER managed to generate an image adhering to the prompt.
    In many cases, we often attempt a generous interpretation of the prompt - if it gets close enough, we might consider it a pass.
    To paraphrase former Supreme Court Justice Potter Stewart, "I may not be able to define a passing image, but I know it when I see it."
- android521a day ago
  still cannot show clock (eg a clock showing 1:15 am). the text generated in manga image is still not 100% correct.
- nick49488171a day ago
  No grok tested?
- whata day ago
  Why does OpenAI get a different image for “Girl with Pearl Earring”?
  - vunderbaa day ago
    That's a mistake. Gpt-image-1 is a lot stricter in the supported output resolutions so it's using a cropped image. I'll fix the test later this week. Thanks for the heads up!
  - rimprobablylya day ago
    Can you post comparison images?
- echelona day ago
  Add gpt-image-1. It's not strictly an editing model since it changes the global pixels, but I've found it to be more instructive than Nano Banana for extremely complicated prompts and image references.
  - vunderbaa day ago
    It's actually already in there - the full list of edit models is Nano-Banana, Kontext Dev, Kontext Max, Qwen Edit 20b, gpt-image-1, and Omnigen2.
    I agree with your assessment - even though it does tend to make changes at a global level you can least attempt to minimize its alterations through careful prompting.
- ffitcha day ago
  great benchmark!
- wiredpancakea day ago
  [dead]
xnxa day ago
Amazing model. The only limit is your imagination, and it's only $0.04/image.
Since the page doesn't mention it, this is the Google Gemini Image Generation model: https://ai.google.dev/gemini-api/docs/image-generation
Good collection of examples. Really weird to choose an inappropriate for work one as the second example.
- warkdarriora day ago
  More specifically, Nano Banana is tuned for image editing: https://gemini.google/overview/image-generation
  - vunderbaa day ago
    Yep, Google actually recommends using Imagen4 / Imagen4 Ultra for straight image generation. In spite of that, Flash 2.5 still scored shockingly high on my text-to-image comparisons though image fidelity is obviously not as good as the dedicated text to image models.
    Came within striking distance of OpenAI gpt-image-1 at only one point less.
- smrtinserta day ago
  Is it a single model or is it a pipeline of models?
  - SweetSoftPillowa day ago
    Single model, Gemini 2.5 Flash with native image output capability.
- minimaxira day ago
  [misread]
  - vunderbaa day ago
    They're referring to Case 1 Illustration to Figure, the anime figurine dressed in a maid outfit in the HN post.
  - pdpia day ago
    I assume OP means the actual post.
    The second example under "Case 1: Illustration to Figure" is a panty shot.
    zahlmana day ago
    This was reported and has been removed recently (https://github.com/PicoTrex/Awesome-Nano-Banana-images/issue...), although the issue wasn't closed.
  - efilifea day ago
    For anyone confused, the offending example got removed 10 minutes ago
    irusensei21 hours ago
    https://github.com/PicoTrex/Awesome-Nano-Banana-images/tree/... if you want to see it.
    I have no idea how people think they can interact with an art related product with this kind of puritanical sensibility.
    21 hours ago
    undefined
plommea day ago
This is the first time I really don't understand how people are getting good results. On https://aistudio.google.com with Nano Banana selected (gemini-2.5-flash-image-preview) I get - garbage - results. I'll upload a character reference photo and a scene and ask Gemini to place the character in the scene. What it then does is to simply cut and paste the character into the scene, even if they are completely different in style, colours, etc.
I get far better results using ChatGPT for example. Of course, the character seldom looks anything like the reference, but it looks better than what I could do in paint in two minutes.
Am I using the wrong model, somehow??
- A_D_E_P_Ta day ago
  No, I've noticed the same.
  When Nano Banana works well, it really works -- but 90% of the time the results will be weird or of poor quality, with what looks like cut-and-paste or paint-over, and it also refuses a lot of reasonable requests on "safety" grounds. (In my experience, almost anything with real people.)
  I'm mostly annoyed, rather than impressed, with it.
  - larussoa day ago
    Ok this answers my question to the nature of the page. As in: Are these examples that show results you get when using certain inputs and prompts. Or are these impressive lucky on offs.
    I was a bit surprised to see quality. Last time I played around with image generation is a few months back and I’m more in the frustration camp. Not to say that I believe some people with more time and dedication at their hand can tickle better results.
    A_D_E_P_Ta day ago
    From having used Nano Banana over the past few days, I think that they're extremely cherry-picked, and that each one is probably the result of multiple (probably a dozen+) attempts.
- muzani19 hours ago
  There's a good reference up in the comments: https://genai-showdown.specr.net/image-editing
  which goes to show that some of these amazing results might need 18 attempts and such.
- lifthrasiira day ago
  In my experience, Nano Banana would actively copy and paste if it thinks it's fine to do so. You need to explicitly prompt that the character should be seamlessly integrated into the scene or similar. In the other words, the model is superb when properly prompted especially compared to other models, but prompting itself can be annoying from time to time.
- SweetSoftPillowa day ago
  Play around with your prompt, try ask Gemini 2.5 pro to improve your prompt before sending it to Gemini 2.5 Flash, retry and learn what works and what doesn't.
- epolanskia day ago
  +1
  I understand the results are non deterministic but I get absolute garbage too.
  Uploaded pics of my (32 years old) wife and we wanted to ask it to give her a fringe/bangs to see how would she look like it either refused "because of safety" and when it complied results were horrible, it was a different person.
  After many days and tries we got it to make one but there was no way to tweak the fringe, the model kept returning the same pic every time (with plenty of "content blocked" in between).
  - SweetSoftPillowa day ago
    Are you in gemini.google.com interface? If so, try Google AI Studio instead, there you can disable safety filters.
    epolanskia day ago
    I use ai studio, no way to disable the filters.
  - BoorishBearsa day ago
    Seedream 4.0 is not always better than Gemini Flash 2.5 (nano-banana), but when it is better, there is a gulf in performance (and when it's not, it's very close.)
    It's also cheaper than Gemini, and has way fewer spurious content warnings, so overall I'm done with Gemini
- sjapkee14 hours ago
  No, that's just result of TONS of resets until you get something decent. 99% of the time you'll get trash, but that 1% is cool
- mvdtnz10 hours ago
  It's not just you and there's a ton of gaslighting and astroturfing happening with Nano Banana. Thanks to this article we can even attempt to reproduce their exact inputs and lo and behold the results are much worse. I tried a bunch of them and got far worse results than the author. I assume they are trying the same prompts again and again until they get something slightly useful.
  [0] https://imgur.com/a/aSbOVz5
- slickytaila day ago
  [dead]
- TNDnowa day ago
  [dead]
voidUpdatea day ago
Well it's good to see they are showcasing examples where the model really fails too.
- The second one in case 2 doesn't look anything like the reference map
- The face in case 5 changes completely despite the model being instructed to not do that
- Case 8 ignores the provided pose reference
- Case 9 changes the car positions
- Case 16 labels the tricuspid in the wrong place and I have no idea what a "mittic" is
- Case 27 shows the usual "models can't do text" though I'm not holding that against it too much
- Same with case 29, as well as the text that is readable not relating to the parts of the image it is referencing
- Case 33 just generated a generic football ground
- Case 37 has nonsensical labellings ("Define Jawline" attached to the eye)
- Case 58 has the usual "models don't understand what a wireframe is", but again I'm not holding that against it too much
Super nice to see how honest they are about the capabilities!
- zahlmana day ago
  > - Case 16 labels the tricuspid in the wrong place and I have no idea what a "mittic" is
  > - Case 27 shows the usual "models can't do text" though I'm not holding that against it too much
  16 makes it seem like it can "do text" — almost, if we don't care what it says. But it looks very crisp until you notice the "Pul??nary Artereys".
  I'd say the bigger problem with 27 is that asking to add a watermark also took the scroll out of the woman's hands.
  (While I'm looking, 28 has a lot of things wrong with it on closer inspection. I said 26 originally because I randomly woke up in the middle of the night for this and apparently I don't know which way I'm scrolling.)
  - voidUpdatea day ago
    EDIT: Yeah, on closer inspection, 28 is definitely a bit screwy. I wasn't clicking on the images themselves to view the enlarged ones, and from the preview I didn't see anything that immediately jumped out at me. I have no idea what that line at the bottom is meant to represent!
    Also you're right, I didn't notice the scroll had gone, though on another inspection, it's also removed the original prompter's watermark
- iyk20 hours ago
  In Case 16 (diagram of the heart), every single label (aside from the superior vena cava) is incorrect.
- muzani19 hours ago
  Yeah, I appreciate this kind of benchmarking too. That other Gen AI Showdown in the comments also does a good job with this - mentions that it was best of 8 attempts and so on.
- lm28469a day ago
  47 is also very questionable
  48 is impossible to do in a way that is accurate and meaningful
minimaxira day ago
I recently released a Python package for easily generating images with Nano Banana: https://github.com/minimaxir/gemimg
Through that testing, there is one prompt engineering trend that was consistent but controversial: both a) LLM-style prompt engineering with with Markdown-formated lists and b) old-school AI image style quality syntatic sugar such as award-winning and DSLR camera are both extremely effective with Gemini 2.5 Flash Image, due to its text encoder and larger training dataset which can now more accurately discriminate which specific image traits are present in an award-winning image and what traits aren't. I've tried generations both with and without those tricks and the tricks definitely have an impact. Google's developer documentation encourages the latter.
However, taking advantage of the 32k context window (compared to 512 for most other models) can make things interesting. It’s possible to render HTML as an image (https://github.com/minimaxir/gemimg/blob/main/docs/notebooks...) and providing highly nuanced JSON can allow for consistent generations. (https://github.com/minimaxir/gemimg/blob/main/docs/notebooks...)
neilva day ago
Unfortunately NSFW in parts. It might be insensitive to circulate the top URL in most US tech workplaces. For those venues, maybe you want to pick out isolated examples instead.
(Example: Half of Case 1 is an anime/manga maid-uniform woman lifting up front of skirt, and leaning back, to expose the crotch of underwear. That's the most questionable one I noticed. It's one of the first things a visitor to the top URL sees.)
- 1doma day ago
  As a non-US citizen - even though I've been the only Brit in remote teams of Americans - I find this really hard to make sense of.
  At least in the UK, if I saw this loaded on someone else's screen at work, I might raise an eyebrow initially, but there wouldn't be any consequences that don't first consider context. As soon as the context is provided ("it's comparing AI models, look! Cool, right?!") everyone would get on with their jobs.
  What would be the consequence of you viewing this at work?
  How would the situation be handled?
  Is the problem a HR thing - like, would people get sacked for this? Or is it like a personal conduct/temptation, that colleagues who see it might not be able to restrain themselves or something?
  - tempodox21 hours ago
    I think it’s mostly puritanical bigotry.
    mensetmanusman16 hours ago
    Understanding the complex dynamics that strengthen relationships or weaken men’s resolve for commitment may be enlightening.
    tempodox15 hours ago
    You’re making my point.
  - neilv10 hours ago
    I think one part of it (not all of it) is that the US has a long history of women being sexually harassed in the workplace, in various ways. It's not nearly as bad as it used to be, but it's not fully solved everywhere.
    (Note: Statements suggesting that sexual harassment exists at all make some people on the Internet flip out angrily, but I interpret your questions as in good faith, and I'm trying to answer in good faith.)
    One example of why that that harassment context is relevant: if you were a woman, wouldn't you think it was insensitive for a male colleague to send you an image that was obviously designed to be sexually suggestive, and with the female as the sex object? Is he consciously harassing you, or just being oblivious to why this is inappropriate?
    For a separate reason that this is a problem in the workplace: besides the real impact to morale and how colleagues respect each other, even the most sociopathic US companies want to avoid sexual harassment lawsuits and public scandals.
    For reasons like these, and others, if someone, say, posted that isolated maid image to workplace chat, then I think there's a good chance that a manager or HR would say something to the employee if they found out, and/or (without directly referring to that incident) communicate to everyone about appropriate practices.
    But if there was a pattern of insensitive/oblivious/creepy behavior by this employee, or if someone complained to manager/HR about the incident, or if there was legal action against the company (regarding this incident, or a different sexual harassment situation), then I guess the employee might be terminated.
    If I were a manager in a company, and one of my reports posted an image like this, I'd probably say something quietly to them, and much more gently than the above (e.g., "Uh, that image is a bit in a direction we want to stay away from in the office", or maybe even just the slightest concerned glance), and most people would get it. Just a little learning moment, like we all have many of. But if there were a trickier situation, or I was under orders, I might have to ask HR about it (and if I did, hopefully that particular HR person is helpful, and that particular company is reasonable).
- UomoNeroNero20 hours ago
  I’m Italian, and I really struggle to rationalize this attitude. I honestly don’t understand. Maybe it’s because I’m surrounded by 2,500 years of art in which nudity is an essential and predominant element, by people (even in the workplace) who have a relaxed and genuinely democratic view of the subject — but this comment feels totally alien to me. I suppose it’s my own limitation, but I would NEVER have focused attention on this aspect. I don’t know, maybe I’m the one who’s wrong…
  - neilv6 hours ago
    Italy obviously has much rich and beautiful culture, though I don't know it well enough to understand the difference on this point. Does my response to someone else clarify how and why US corporate culture may be different?
    https://news.ycombinator.com/item?id=45226202
- raincolea day ago
  I'm really surprised that it can generate the underwear example. Last time I tried Nano Banaba (with safety filter 'off', whatever it means), it refused to generate a 'cursed samurai helmet on an old wooden table with a bleeding dead body underneath, in cartoon style.'
  Edit: It still blocks this request.
- efilifea day ago
  For anyone confused, the offending example got removed 10 minutes ago
  - 21 hours ago
    undefined
- thrdbndndna day ago
  I'm more bothered by the fact that this reference image is clearly a well-made piece of digital art by some artist.
  We all know the questionable nature of AI/LLM models, but people in the field usually at least try to avoid directly using other people's copyrighted material in documentation.
  I'm not even talking about legality here. It just feels morally wrong to so blatantly use someone else's artwork like this.
  - coldfoundrya day ago
    I agree that proper permission should be used for these examples, but I’m quite sure the image in question is AI generated. The quality is incredible these days as to what can be generated, and even to a trained eye it’s getting more difficult by the day to tell if its AI or not.
    Source of artist: https://x.com/curry3_aiart/status/1947416300822638839
  - raincolea day ago
    The reference is AI-generated too. This comment shows how people are susceptible to our existing bias.
    kouteiheikaa day ago
    My favorite (or should I say, anti-favorite?) is calling real artists' art AI, which I'm starting to see more and more of, and I've already seen a couple of artists rage-quit social media because of the anti-AI crowd's abuse.
    raincolea day ago
    Yeah that's bad too, but what the parent comment did was the opposite: calling an AI-generated image "clearly a well-made piece of digital art by some artist."
    kouteiheikaa day ago
    It boils down to the same thing - it's getting harder to distinguish AI generated art from non-AI art, and since the models are constantly getting better it's only going to get worse.
- behnamoha day ago
  > Unfortunately NSFW in parts.
  This is why we can't have nice things. How about we let the models remain uncensored and you don't generate NSFW with them?
  - UberFlya day ago
    You do know what NSFW means, right? They didn't type NSFH.
istjohna day ago
Personally, I'm underwhelmed by this model. I feel like these examples are cherry-picked. Here are some fails I've had:
- Given a face shot in direct sunlight with severe shadows, it would not remove the shadows
- Given an old black and white photo, it would not render the image in vibrant color as if taken with a modern DSLR camera. It will colorize the photo, but only with washed out, tinted colors
- When trying to reproduce the 3 x 3 grid of hair styles, it repeatedly created a 2x3 grid. Finally, it made a 3x3 grid, but one of the nine models was black instead of caucasian.
- It is unable to integrate real images into fabricated imagery. For example, when given an image of a tutu and asked to create an image of a dolphin flying over clouds wearing the tutu, the result looks like a crude photoshop snip and copy/paste job.
- strange_quarka day ago
  I thought the the 3rd example of the AR building highlighting was cool. I used the same prompt and seems to work when you ask it for the most prominent building in a skyline, but fails really hard if you ask it for another building.
  I uploaded an image I found of Midtown Manhattan and tried various times to get it to highlight the Chrysler Building, it claimed it wasn't in the image (it was). I asked it to do 432 Park Ave, and it literally inserted a random building in the middle of the image that was not 432 Park, and gave me some garbled text for the description. I then tried Chicago as pictured from museum campus and asked it to highlight 2 Prudential, and it inserted the Hancock Center, which was not visible in the image I uploaded, and while the text was not garbled, was incorrect.
- autoexeca day ago
  Even these examples aren't perfect.
  The "Photos of Yourself in Different Eras" one said "Don't change the character's face" but the face was totally changed. "Case 21: OOTD Outfit" used the wrong camera. "Virtual Makeup Try-On" messed up the make up. "Lighting Control" messed up the lighting, the joker minifig is literally just SH0133 (https://www.bricklink.com/catalogItemInv.asp?M=sh0133), "Design a Chess Set" says you don't need an input image, but the prompt said to base it off of a picture that wasn't included and the output is pretty questionable (WTF is with those pawns!), etc.
  I mean, it's still pretty neat, and could be useful for people without access to photoshop or to get someone started on a project to finish up by hand.
- foofoo12a day ago
  > I feel like these examples are cherry-picked
  I don't know of a demo, image, film, project or whatever where the showoff pieces are not cherry picked.
- huflungdunga day ago
  [dead]
darkamaula day ago
This is amazing. Not that long ago, even getting a model to reliably output the same character multiple times was a real challenge. Now we’re seeing this level of composition and consistency. The pace of progress in generative models is wild.
Huge thanks to the author (and the many contributors) as well for gathering so many examples; it’s incredibly useful to see them to better understand the possibilities of the tool.
mitthrowaway2a day ago
I've come to realize that I liked believing that there was something special about the human mental ability to use our mind's eye and visual imagination to picture something, such as how we would look with a different hairstyle. It's uncomfortable seeing that skill reproduced by machinery at the same level as my own imagination, or even better. It makes me feel like my ability to use my imagination is no more remarkable than my ability to hold a coat off the ground like a coat hook would.
- al_borlanda day ago
  As someone who can’t visualize things like this in my head, and can only think about them intellectually, your own imagination is still special. When I heard people can do that, it sounded like a super power.
  AI is like Batman, useless without his money and utility belt. Your own abilities are more like Superman, part of who you are and always with you, ready for use.
  - HeartStringsa day ago
    Look everybody, this mfa can’t rotate an Apple in his head
- lemonberrya day ago
  But you can find joy at things you envision, or laugh, or be horrified. The mental ability is surely impressive, but having a reason to do it and feeling something at the result is special.
  "To see a world in a grain of sand And a heaven in a wild flower..."
  We - humans - have reasons to be. We get to look at a sunset and think about the scattering of light and different frequencies and how it causes the different colors. But we can also just enjoy the beauty of it.
  For me, every moment is magical when I take the time to let it be so. Heck, for there to even be a me responding to a you and all of the things that had to happen for Hacker News to be here. It's pretty incredible. To me anyway.
- FuckButtonsa day ago
  I have aphantasia, I’m glad we’re all on a level playing field now.
  - yoz-ya day ago
    I always thought I had a vivid imagination. But then the aphantasia was mentioned in Hello Internet once, I looked it up, see comments like these and honestly…
    I’ve no idea how to even check. According to various tests I believe I have aphantasia. But mostly I’ve got not even a slightest idea on how not having it is supposed to work. I guess this is one of those mysteries when a missing sense cannot be described in any manner.
    jmcphersa day ago
    A simple test for aphantasia that I gave my kids when they asked about it is to picture an apple with three blue dots on it. Once you have it, describe where the dots are on the apple.
    Without aphantasia, it should be easy to "see" where the dots are since your mind has placed them on the apple somewhere already. Maybe they're in a line, or arranged in a triangle, across the middle or at the top.
    brotchiea day ago
    When reading "picture an apple with three blue dots on it", I have an abstract concept of an apple and three dots. There's really no geometry there, without follow on questions, or some priming in the question.
    In my conscious experience I pretty much imagine {apple, dot, dot, dot}. I don't "see" blue, the dots are tagged with dot.color == blue.
    When you ask about the arrangement of the dots, I'll THEN think about it, and then says "arranged in a triangle." But that's because you've probed with your question. Before you probed, there's no concept in my mind of any geometric arrangement.
    If I hadn't been prompted to think / naturally thought about the color of the apple, and you asked me "what color is the apple." Only then would I say "green" or "red."
    If you asked me to describe my office (for example) my brain can't really imagine it "holistically." I can think of the desk and then enumerate it's properties: white legs, wooden top, rug on ground. But, essentially, I'm running a geometric iterator over the scene, starting from some anchor object, jumping to nearby objects, and then enumerating their properties.
    I have glimpses of what it's like to "see" in my minds eye. At night, in bed, just before sleep, if I concentrate really hard, I can sometimes see fleeting images. I liken it to looking at one of those eye puzzles where you have to relax your eyes to "see it." I almost have to focus on "seeing" without looking into the blackness of my closed eyes.
    rimprobablylya day ago
    Exactly my experience too. These fleeting images are rare, but bloody hell it feels like cheating at life if most people can summon up visualisations like that at will.
    theshrike79a day ago
    Watching someone clearly just transfer what's in their mind to a drawing is just jaw-dropping to me.
    Like they'll start at an arm and move along filling the rest of the body correctly the first time. No sketching, no finding the lines, just a human printer.
    derektanka day ago
    I can't recall it ever being useful outside of physics and geometry questions tbh
    brisky14 hours ago
    I think I have it as well. But my theory is that we might have imagination but it is only accessible to subconscious. It is as if it is blocked from consciousness. I have ADHD as well, might be that this is protection mechanism that allows my kind of brain to survive in the world better (otherwise it would be too entertaining to get lost in your own imagination). As a kid I used to daydream a lot.
    typpilola day ago
    I've come to realize that's how they all are.
    No one really sees 3d pictures in their head in HD
    Workaccount2a day ago
    I can see I my head with ~80% the level as seeing with my eyes. It's a little tunnel visiony and fine details can be blurry, but I can definitely see it. A honeycrisp apple on a red woven placemat on a wooden counter top. The blue dots are the size of peas, they are stickers in a triangle.
    It not just images either, it's short videos.
    What's interesting though is that the "video" can be missing details that I will "hallucinate" back in that will be incorrect. So I cannot always fully trust these. Like cutting the apple in half lead to a ~1/8th slice missing from one of the halves. It's weird.
    hamdingersa day ago
    I'm a 5 on the VVIQ. I can see the 3D apple, put it in my hand, rotate it, watch the light glint on the dimples in the skin, imagine tossing it to a close friend and watch them catch it, etc.
    It's equally astonishing to me that others are different.
    typpilola day ago
    You close your eyes and see exactly what you would on a TV with your eyes open?
    hamdingers15 hours ago
    I don't need to close my eyes, it doesn't make much of a difference, and I see what my eyes would see. It doesn't look like a TV unless I imagine a TV and put the image on the screen.
    typpilol6 hours ago
    They doesnt answer my question.
    Do you see these pictures the same as if you were watching an HD TV?
    I'm going to guess no. You don't see literally high def pictures in your head.
    voidUpdatea day ago
    I absolutely do. For example, when I'm playing D&D, or listening to a podcast of other people playing D&D, I can "see" a fully realistic view of what is happening in my head. With the apple test, I can see a nice red apple, with the little vertical orange streaks, three blue dots arranged in a triangle, and I can rotate the apple in my head and have the dots move as you would expect from a real apple. I have a very vivid imagination
    theshrike79a day ago
    Talk to people who read a lot.
    There are people who actually "see" a full-ass movie in their head when they read.
    These are also the people who get REALLY angry when some live-action casting choice isn't exactly like in the book. I just go "meh", because I kinda remember the main character had red hair and a scar and that's it. :D
    rimprobablylya day ago
    full ass movie?
    theshrike7910 hours ago
    https://www.youtube.com/watch?v=1P0Z1yq-2FQ
    Geee21 hours ago
    https://www.youtube.com/watch?v=iE5aKNAcU2I&rco=1
    marak830a day ago
    Welcome to the aphantasia club. We would make signs for our next meeting, but no one's come up with a good design yet :s
    You may notice when doing the apple test, once you try and define a texture, your brain adding things you think should be there.
    Scared the crap out of me a few years ago when I realized I had it. Came to grips with it now.
    Sohcahtoa82a day ago
    After reading your first sentence, I immediately saw an apple with three dots in a triangle pointing downwards on the side. Interestingly, the 3 dots in my image were flat, as if merely superimposed on an image of an apple, rather than actually being on an apple.
    How do people with aphantasia answer the question?
    sheepscreeka day ago
    I guess it's a spectrum with varying abilities. If you ask me, I can see a red apple - or a photo of a red apple precisely. It's not in 3D though, I cannot imagine it from other angles so I cannot image the dots around it. But if I were to sit in a quiet and dark room without any distractions, and tried concentrating super hard (with my eyes closed), then I would be able to see it as other can. Perhaps even manipulate it in my mind.
    Then maybe, at least in my case, it is my inability to focus my imagination when my senses are already being bombarded with external stimuli. But I cannot speak for anyone else.
    foofoo12a day ago
    I found out recently that I have aphantasia, based on everything I've read. When you tell me to visualize, I imagine. I don't see it. An apple, I can imagine that. I can describe it in incomprehensibly sparse details. But when you ask details I have to fill them in.
    I hadn't really placed those three dots in a specific place on the apple. But when you ask where they are, I'll decide to put them in a line on the apple. If you ask what color they are, I'll have to decide.
    mitthrowaway2a day ago
    I'm pretty sure I don't have aphantasia. I don't see the apple either; it doesn't occupy any portion of my visual field and it doesn't feel similar to looking at an image of an apple. There's more of a ghostly, dreamlike image of an apple "somewhere else" whose details I only perceive when I think about them, and fade when I pay less attention. But the sensation of this apparition is a visual one; the apple will have an orientation, size, shape, and colour in the mental image, which are defined even if they're ghostly, inconsistent, and change as I reconsider what the apple should look like.
    brotchiea day ago
    +1, spot on description of aphantasia.
    yoz-ya day ago
    For me the hard question to answer is whether I have aphantasia because people describing “actually seeing” things like with their eyes is an absolutely wild concept.
    To answer the question I imagine an apple with three dots in a triangle, closely together. There is no color because there is no real image, it’s just an idea. As other have said if prompted the idea gets more detailed.
    That said, when I tried to learn building mind palaces it has worked. I can “walk through” places I know just fine, even recall visual details like holes in a letterbox. But again, there is no image.
    jvanderbota day ago
    They may not answer but what they'll realize is that the "placing" comes consciously after the "thinking of" which does not happen with others.
    That is, they have to ascribe a placement rather than describe one in the image their mind conjured up.
    sunrunnera day ago
    How fair is it to ask people to self report whether details existed in their original image before or after a second question? Does the second question not immediately refine the imagined image? Or is that the point, that there’s now a memory of two different apple states?
    Edit: This iDevice really wants to capitalise Apple.
    jvanderbota day ago
    This is not a scientific study it is an introspection tool. Sibling comment shows how useful it is.
    wrsa day ago
    There's no apple, much less any dots. Of course, I'm happy to draw you an apple on a piece of paper, and draw some dots on that, then tell you where those are.
    aaronblohowiaka day ago
    oh just close your eyes and imagine an apple for a few moments, then open your eyes, look at the wikipedia article about aphantasia and pick the one that best fits the level of detail you imagined.
    dom96a day ago
    So my mind briefly jumps to an apple and I guess I am very briefly seeing that the dots happen to be on top of the apple, but that image is fleeting.
    I have had some people claim to me that they can literally see what they are imagining as if it is in front of them for prolonged periods of time, in a similar way to how it would show up via AR goggles.
    I guess this is a spectrum and it's tough to dealineate the abilities. But I just looked it up and what I am describing is hyperphantasia.
    gcanyona day ago
    For me the triggering event was reading about aphantasia, and then thinking about how I have never, ever, seen a movie about a book I've read and said, "that [actor|place|thing] looks nothing like I imagined it" Then I tried the apple thing to confirm. I have some sense of looking at things, but not much.
    Agrailloa day ago
    It's a great aspect to evaluate (fiction books/movies), thanks for mentioning it. I think it's much easier to use as an evaluation tool than techniques like the apple example. One of the tests, for example, is to recall a book that you have never seen a movie adaptation of and try to remember the characters and scenes. For me, in these cases (when I try to recall), the characters appear faceless, while places are more detailed, but they usually remind me of some real places I have encountered before in my life.
    It's interesting that if non-aphantasia people are so common, I wonder why so few paintings have scenery based solely on imagination. I even remember asking a person who paints (not in the context of this condition) how hard it was for him to paint something not directly before his eyes, but from imagination, and why he didn't do it more often. I recall that he definitely did this (painting from imagination) rarely or not at all, and the question really puzzled him
    sunrunnera day ago
    Follow up question for people now doing this, what colour was the apple? (Given that there was no colour in the prompt for the apple, only the dots)
    foofoo12a day ago
    Ask people to visualize a thing. Pick something like a house, dog, tree, etc. Then ask about details. Where is the dog?
    I have aphantasia and my dog isn't anywhere. It's just a dog, you didn't ask me to visualize anything else.
    When you ask about details, like color, tail length, eyes then I have to make them up on the spot. I can do that very quickly but I don't "see" the good boy.
  - Revisional_Sina day ago
    Aphantasia gang!
- layer8a day ago
  The proof in the pudding will be if machines will be able to develop new art styles. For example, there is a progression in comic/manga/anime art styles over the decades. If humans would stop (they probably won't) that kind of progression, would machines be able to continue it? In principle yes (we are biological machines of sorts), but likely not with the current AI architecture.
  - krappa day ago
    I think it's a mistake to look at developing new art styles as simply continuing a linear progression. More often than not art styles are unique to the artist - you couldn't, for instance, put Eichiro Oda, Tsutomu Nihei and Rumiko Takahashi on the same number line. And trends tend to develop in reaction to existing trends, usually started by a single artist, as often as they do as an evolution of a norm.
    Arguably, if creating an art style is simply a matter of novel mechanics and uniqueness, LLMs could already do that simply by adding artists to the prompts ("X" in the style of "A" and "B") and plenty of people did (and do) argue that this is no different than what human artists do (I would disagree.) I personally want to argue that intentionally matters more than raw technique, but Hacker News would require a strict proof for the definition of intentionality that they would argue humans don't possess, but somehow LLMs do, and that of course I can't provide.
    I guess I have no argument besides "it means more to me that a person does it than a machine." It matters to me that a human artist cares. A machine doesn't care. And yes, in a strictly materialist sense we are nothing but black boxes of neurons receiving stimuli and there is no fundamental difference between a green field and a cold steel rail, it's all just math and meat, but I still don't care if a machine makes X in the style of (Jack Kirby AND Frank Miller.)
    autoexeca day ago
    > More often than not art styles are unique to the artist
    I'd disagree. Art styles are a category of many similar works in relation to others or a way of bringing about similar works. They usually build off of or are influenced by prior work and previous methods, even in cases where there is a effort to avoid or subvert them. Even with novel techniques or new mediums. "Great Artists Steal" and all that.
    Some people become known for certain mediums or the inclusion of specific elements, but few of them were the first or only artists to use them. "Art in the style of X" just comes down to familiarity/marketing. Art develops the way food does with fads, standards, cycles, and with technology and circumstance enabling new things. I think evolution is a pretty good analogy although it's driven by a certain amount of creativity, personal preference, and intent in addition to randomness and natural selection.
    Computers could output random noise and in the process eventually end up creating an art style, but it'd take a human to recognize anything valuable and artists to incorporate it into other works. Right now what passes for AI is just remixing existing art created by humans which makes it more likely to blindly stumble into creating some output we like, but inspiration can come from anywhere. I wouldn't be surprised if the "AI Slop" art style wasn't already inspiring human artists. Maybe there are already painters out there doing portraits of people with the wrong number of fingers. As AI is increasingly consuming it's own slop things could get weird enough to inspire new styles, or alternately homogenized into nothing but blandness.
    krapp12 hours ago
    > I wouldn't be surprised if the "AI Slop" art style wasn't already inspiring human artists. Maybe there are already painters out there doing portraits of people with the wrong number of fingers.
    People adopt art styles because they like something about them, the aesthetic or what they represent. I don't think there are enough human artists who like AI slop (they tend to despise it categorically) enough to want to imitate it, unless it's as some form of satire. They aren't going to do so simply because it exists.
- m3kw9a day ago
  To be fair, the model's ability came from us generating the training data.
  - quantummagica day ago
    To be fair, we're the beneficiaries of nature generating the data we trained on ourselves. Our ability came from being exposed to training in school, and in the world, and from examples from all of human history. Ie. if you locked a child in a dark room for their entire lives, and gave them no education or social interaction, they wouldn't have a very impressive imagination or artistic ability either.
    We're reliant on training data too.
    lawlessonea day ago
    Gonna try use this one instead of paying the next time i visit a restaurant.
- micromacrofoota day ago
  it can only do this because it's been trained on millions of human works
  - jryle70a day ago
    And those millions people learned their craft by studying those who came before them.
  - echelona day ago
    This argument that hints at appropriation isn't going to be very useful or true, going forward.
    There are now dozens of copyright safe image and video models: Adobe, MoonValley, etc.
    We technically never need human works again. We can generate everything synthetically (unreal engine, cameras on a turn table, etc.)
    The physics of optics is just incredibly easy to evolve.
    lm28469a day ago
    You can also drink your own piss and eat your own shit for a while and stay alive, I don't think you'll get better with time if that's all you ingest
    echelon15 hours ago
    The amount of anti-AI vitriol in some of these threads is illogical.
    I don't think you're being fair here at all. The technology has demonstrable positive use cases.
    lawlessonea day ago
    >We technically never need human works again.
    Not sure about that. Humans are doing almost all the work now still.
    echelona day ago
    I'm sorry, but in the context of image gen, this is also deeply biased.
    Nano banana saves literally millions of manual human pixel pushing hours.
    It's easy to hate on LLMs and AI hype, but image models are changing the world and impacting every visual industry.
    filoeleven18 hours ago
    > Nano banana saves literally millions of manual human pixel pushing hours.
    At the low, low cost of burning incredible amounts of energy!
    This is also he same logic as “lost sale” software piracy calculations. 90% of those claimed hours would not have been spent if the tool did not exist. Most of the generated images are idle throwaways that no human would bother with creating.
    echelon15 hours ago
    I work in media. I edit images frequently. These models are incredibly useful.
    Your arguments sound to me as if you're saying the cotton gin is a bad idea because 90% of that cotton wouldn't have been picked.
    Lay people have been starved from being able to visually articulate themselves. We're entering into a world where everyone will have spatial articulation. That's a good thing.
    Don't be the latin clergy arguing the populace shouldn't be able to read or have books.
    micromacrofoot17 hours ago
    The cotton gin will make farming so efficient we won't need slaves!
    micromacrofoot17 hours ago
    Complete nonsense, if this were to follow every AI company would stop training because it's so expensive... but they won't.
- echelona day ago
  Vision has evolved frequently and quickly in the animal kingdom.
  Conscious intelligence has not.
  As another argument, we've had mathematical descriptions of optics, drawing algorithms, fixed function pipeline, ray tracing, and so much more rich math for drawing and animating.
  Smart, thinking machines? We haven't the faintest idea.
  Progress on Generative Images >> LLMs
  - Animatsa day ago
    > Vision has evolved frequently and quickly in the animal kingdom. Conscious intelligence has not.
    Three times, something like intelligence has evolved - in mammals, octopuses, and corvids. Completely different neural architectures in those unrelated speces.
    nick__ma day ago
    Why carve out the corvid from the other birds ? Some parots and parakeets species are playing in the same league as the corvids.
    Animatsa day ago
    Paper on this:[1] Corvids show problem-solving ability comparable to apes but lack a cortex.
    [1] https://homepage.uni-tuebingen.de/andreas.nieder/Nieder%20(2...
    echelona day ago
    I won't judge our distant relatives, the cephalopods and chicken theropods, but we big apes are pretty dumb.
    Even with what we've got, it took us hundreds of thousands of years to invent indoor plumbing.
    Vision, I still submit, is much simpler than "intelligence". It's evolved independently almost a hundred times.
    It's also hypothesized that it takes as few as a hundred thousand years to evolve advanced eye optics:
    https://royalsocietypublishing.org/doi/10.1098/rspb.1994.004...
    Even plants can sense the visual and physical world. Three dimensional spatial relationships and paths and rays through them are not hard.
- stuckkeysa day ago
  that was deep.
- EGrega day ago
  Seriously? One could always cut-and-paste (not the computer term) a hairstyle over a photo of a person.
  You are now marvelling at someone taking the collective output of humans around the world, then training a model on it with massive, massive compute… and then having a single human compete with that model.
  Without the human output on the Internet, none of this would be possible. ImageNet was positively small compared to this.
  But yeah, what you call “imagination” is basically perturbations and exploration across a model that you have in your head, which imposes constraints (eg gravity etc) that you learned. Obviously we can remix things now that they’re on the Internet.
  Having said that, after all that compute, the models had trouble rendering clocks that show an arbitrary time, or a glass of wine filled to the brim.
  - cmaa day ago
    >Having said that, after all that compute, the models had trouble rendering clocks that show an arbitrary time, or a glass of wine filled to the brim.
    I know you're probably talking about analog clocks, but people when dreaming have trouble representing stable digits on clocks. It's one of the methods to tell if you are dreaming.
- kylebenzlea day ago
  [flagged]
  - lawlessonea day ago
    >We get it, your dumb.
    You're
dbisha day ago
Nano banana is great. Been using it for creating coloring books based off photos for my son and friends’ kids: https://github.com/dbish/bespoke-books-ai-example
Does a pretty good job (most of the time) of sticking to the black and white coloring book style while still bringing in enough detail to recognize the original photo in the output.
rimmontrieua day ago
Impressive examples but for GenAI it always comes down to the fact that you have to cherry pick the best result after so many fail attempts. Right now, it feels like they're pushing the narrative that ExpectedOutput = LLM(Prompt, Input) when it's actually ExpectedOutput = LLM(Prompt, Input) * Takes where Takes can vary from 1 to 100 or more
- raincolea day ago
  ML researchers have been used Top-5 accuracy for a quite long time, especially when it comes to computer vision.
  Of course it's a ridiculous index in most use cases (like in self-driving car. Your 4th guess is that you need to brake? Cool...). But somehow people in ML normalized it.
- vunderbaa day ago
  That's why I always record the number of rolls it takes to get to an acceptable result on my GenAI Comparison site for each model - it's a broad metric indicating how much you have to fight to steer the model in the right direction.
foobarbecuea day ago
Man, I hate this. It all looks so good, and it's all so incorrect. Take the heart diagram, for example. Lots of words that sort of sound cardiac but aren't ("ventricar," "mittic"), and some labels that ARE cardiac, but are in the wrong place. The scenes generated from topo maps look convincing, but they don't actually follow the topography correctly. I'm not looking forward to when search and rescue people start using this and plan routes that go off cliffs. Most people I know are too gullible to understand that this is a bullshit generator. This stuff is lethal and I'm very worried it will accelerate the rate at which the populace is getting stupider.
- zahlman21 hours ago
  > Most people I know are too gullible to understand that this is a bullshit generator.
  I'm more worried about the cases that aren't trying to be info diagrams. There's all this "safety" discourse around not letting people generate NSFW, and around image copyrights etc. but nobody talks about the potential to use things like #11 for fraud. "Disinformation" always gets approached from a political angle instead of one of personal gain.
twaldeckera day ago
One thing that couldn't be done is transparent background. The model just generates the pattern in the background. Not real alpha channel transparency. You can even see artifacts in the pattern.
- zahlmana day ago
  The training data is presumably full of examples of people using the pattern to indicate transparency (and explaining that they do so — like the input for 50!), and much less of people actually creating such images (if the training data even preserves the alpha channel in the first place).
  I think a bigger problem is the "artifacts" you describe (worse than that sounds to me).
- lifthrasiira day ago
  Yeah, mangled checkerboard patterns are common when prompted to "remove" the background. It can be worked around by generating multiple images with only the background color varying (e.g. black and white) and reconstructing the alpha channel from their difference, as the model generally prefers to just copy and paste when no other prompts override that preference.
  - filoeleven18 hours ago
    “Just do more manual work and waste even more energy so you can take yet another manual step and finally get what you wanted.” A real time-saver, that.
    lifthrasiir16 hours ago
    Depends on the actual usage, of course. It is indeed not good when you are doing quick editing.
aussiegreenie7 hours ago
I am not very good with graphics. Yesterday, I used nano Banana to create an image for the front cover of a report. It took about 5 minutes. Normally, I would have spent at least an hour and still would not have gotten as good an image.
The Gemini models save me about an hour a day.
Animatsa day ago
I have two friends who are excellent professional graphic artists and I hesitate to send them this.
- kertoip_121 hours ago
  I think it might be the same as with programmers. It might look like AI Agents can do all the programming, but when you actually try to use it do do things it quickly turns out to be not so much reliable.
- metaphora day ago
  My wife is a professional graphic artist and I sent it to her without hesitation...if only for the awareness.
- raincolea day ago
  Given Case 16, they might switch to a career making scientific diagrams.
- SweetSoftPillowa day ago
  They better learn it today than tomorrow. Even though it's might be painful for some who does not like to learn new tools and explore new horizons.
  - mitthrowaway2a day ago
    Maybe they're better off switching careers? At some point, your customers aren't going to pay you very much to do something that they've become able to do themselves.
    There used to be a job people would do, where they'd go around in the morning and wake people up so they could get to work on time. They were called a "knocker-up". When the alarm clock was invented, these people lose their jobs to other knockers-up with alarm clocks, they lost their jobs to alarm clocks.
    non_aligneda day ago
    A lot of technological progress is about moving in the other direction: taking things you can do yourself and having others do it instead.
    You can paint your own walls or fix your own plumbing, but people pay others instead. You can cook your food, but you order take-out. It's not hard to sew your own clothes, but...
    So no, I don't think it's as simple as that. A lot of people will not want the mental burden of learning a new tool and will have no problem paying someone else to do it. The main thing is that the price structure will change. You won't be able to charge $1,000 for a project that takes you a couple of days. Instead, you will need to charge $20 for stuff you can crank out in 20 minutes with gen AI.
    GMoromisatoa day ago
    I agree with this. And it's not just about saving time/effort--an artist with an AI tool will always create better images than an amateur, just as an artist with a camera will always produce a better picture than me.
    That said, I'm pretty sure the market for professional photographers shrank after the digital camera revolution.
  - AstroBena day ago
    I don't know if "learning this tool" is gunna help..
mustaphaha day ago
In a side-by-side comparison with GPT-4o [1], they are pretty much on par.
[1] https://github.com/JimmyLv/awesome-nano-banana
- thisOtterBeGooda day ago
  Wow. GPT-4o is exceptionally biased towards pop culture. Harry Potter, Aragon, a sith lord, Elon Musk...
throwaway2037a day ago
Does anyone else cringe when they see so many examples of sexualised young women? Literally, Case 1/B has a women lifting up her skirt to reveal her underwear. For an otherwise very impressive model, you are spoiling the PR with this kind of immature content. Sheesh. I guess that confirms it: I am a old grumpy man! I count 26 examples with young women, and 9 examples with men. The only thing missing was "Lena": https://en.wikipedia.org/wiki/Lenna
- shermantanktopa day ago
  My first reaction was the same, before I even knew what these demos represented. And of course I too am a grumpy old man.
- yomismoaquia day ago
  Sex drives technology (even if we don't like it)
  VHS, online payments, video streaming... As the old song say it "the internet is porn"
- a day ago
  undefined
- GNaLVErea day ago
  I had to scroll down way too long for someone to point this out. Its messed up how casually racialised all these image gen examples are towards young asian women.
- ants_everywherea day ago
  wait until you learn what prehistoric sculptors spent their time carving
  I read your comment before checking the site and then I saw case one was a child followed by a sexy maid and I thought "oh no dear god" before I realized they weren't combining them into a single image.
  - AlecSchuelera day ago
    > wait until you learn what prehistoric sculptors spent their time carving
    Careful not to project your own ideas onto prehistoric sculpture.
    ants_everywhere18 hours ago
    the archeological evidence is rather consistent and clear. I'm aware of critiques trying to change the interpretation of what the female figures are for, but nobody denies that they are naked female figures. And the critiques don't seem to have found much purchase among archeologists.
    AlecSchueler17 hours ago
    > the archeological evidence is rather consistent and clear.
    What are you referring to?
    > but nobody denies that they are naked female figures.
    No, but the suggestion above that they were the prehistoric equivalent to cartoons of school girls lifting their skirts hasn't been the dominant theory for about thirty years.
    > And the critiques don't seem to have found much purchase among archeologists.
    This is simply incorrect. They became part of the general archeological discourse as far back as the 1990s and are now a normal part of any such discussion. Multiple theories now coexist and to frame those critical of the original Venus ideas as being somehow more fringe than the fertility/pornography theories is just misleading.
    ants_everywhere13 hours ago
    I assume you're referring to the Catherine McCoid decolonizing gender stuff from the 90s? That is still talked about, but I'm not aware of it being taken seriously as a theory.
    There are multiple theories yes, but they aren't substantially varied.
    We also have a whole lineage of art from the prehistoric age to today and more figures than we did in the 1990s. Art from every period includes nude representations of women. The more recent art (which we are able to say more about)have connections to goddesses and fertility/reproduction/sex. The continuity of art suggests there should be a continuity of explanation. But the McCoid theory handles the oldest art as a special case different in kind from art that didn't come long after.
    Even among the competing hypotheses, they're more closely related than many people realize. This is because religion, sex and fertility were more closely related in the ancient world than they are today. See, for example, temple prostitution.
    The one outlier among the current theories I'm aware of is that the figures are supposed to show you what obese people look like. The evidence for that isn't great. For example the 2012 Dixson paper is based on having college students rate the statues for attractiveness, which seems like it's going to tell you nothing useful about the statues. But even they say the statues were about survival and reproduction, e.g.
    > They may, instead, have symbolized the hope for survival and for the attainment of a well-nourished (and thus reproductively successful) maturity, during the harshest period of the major glaciation in Europe.
    AlecSchueler8 hours ago
    > I assume you're referring to the Catherine McCoid decolonizing gender stuff from the 90s?
    Amongst others.
    > That is still talked about, but I'm not aware of it being taken seriously as a theory.
    I'm not sure what to say to this because you're essentially arguing that your own ignorance is representative of the reality in the field. You recognise that these questions have been part of the discourse now for a third of a century but at the same time suggest it's all done in jest? I really don't know how to read this.
    > We also have a whole lineage of art from the prehistoric age to today
    We very much do not. There are many gaps, especially significant ones in pre-history and you're skipping multiple millennia to stretch a connection to temple prostitution, as well as ignoring the very clearly evident variation in the representations of women more recently across geographies.
    > Even among the competing hypotheses...
    Well we can end it here because the salient point is that pornographic representations of women is no longer the dominant theory and you seem to accept that.
- a day ago
  undefined
- krappa day ago
  I mean, what do you think the most common application of AI image generation is going to be?
- HeartStringsa day ago
  [flagged]
- FearNotDaniela day ago
  [flagged]
  - sunaookamia day ago
    >normalising paedophilia these days
    These arguments are so tiring, always arguing in bad faith. It's government-level "think of the children" arguments when it's about a simple drawing.
Jackson__17 hours ago
I wish open source models would go this route of quality. Instead, every single release since and including flux dev have had some of the worst AI look I've seen so far. Sure these models might produce less mangled bodies, but in terms of actual aesthetics they lack behind even SD1.5 while needing >10x the amount of parameters.
mohsen1a day ago
I'm furnishing a new apartment and Nano Banana has been super useful for placing furniture I want to purchase in rooms to make a judgment if things will work for us or not. Take a picture of the room, feed Nano Banana with that picture and the product picture and ask it to place it in the right location. It can even imagine things at night or even add lamps with lights on. Super useful!
_defa day ago
This is gonna be a golden age for creative prototyping and memes, and absolutely horrible for information quality and trustworthiness of content.
temp0826a day ago
So it seems like image generation/deepfake proliferation is pretty inevitable. I imagine we can't trust any image anymore (for e.g. identification verification purposes) unless it is done in person or otherwise notarized somehow. Is there a way (NFT-ish?) to "tag"/sign an image to say it was taken by an actual camera?
- jstanleya day ago
  But how do you check that it was actually taken by a camera and not just tagged as such?
  Secondly, even if you solve that, how do you know it's not a photograph of an AI-generated scene?
  I think this is very obviously not the right approach.
  - kertoip_121 hours ago
    I theory you could install some kind of TPM-like device to every hardware that signs the data with key generated by manufacturer. Should be designed in such a way that it is very easy to break it when trying to tamper with it
  - temp082621 hours ago
    I guess both the camera and the image need keys. (But really I have no idea this stuff is out of my realm!)
- lifthrasiira day ago
  See for example: https://c2pa.org/
- robbomacraea day ago
  I tried pushing this 5 years ago when at Apple but got nowhere. IMHO only the OEM’s can achieve this.
eiga day ago
While I think most of the examples are incredible...
...the technical graphics (especially text) is generally wrong. Case 16 is an annotated heart and the anatomy is nonsensical. Case 28 with the tallest buildings has the decent images, but has the wrong names, locations, and years.
- vunderbaa day ago
  Yeah I think some of them are really more proof of concept than anything.
  Case 8 Substitute for ControlNet
  The two characters in the final image are VERY obviously not in the instructed set of poses.
- SweetSoftPillowa day ago
  Yes, it's Gemini Flash model, meaning it's fast and relatively small and cheap, optimized for performance rather than quality. I would not expect mind-blowing capabilities in fine details from this class of models, but still, even in this regard this model sometimes just surprisingly good.
smusamashah21 hours ago
It can print code output as image as well. I don't think it will work for complex logic though. https://x.com/smusamashah/status/1961081534661685392
tempodox21 hours ago
Cute. I love the “Not backed by [Y]” badge in one of the source images, sweet irony of being on HN’s front page.
Has anybody ever connected a 3D printer to such a machine’s output? Some of the action figures should definitely be 3D-printed.
namibja day ago
After looking at Cases 4, 9, 23, 33, and 61, I think it might be suited to take in several wide-angle pictures or photospheres or such from inside a residence, and output a corresponding floor plan schematic.
If anyone has examples, guides, or anything to save me from pouring unnecessary funds into those API credits just to figure out how to feed it for this kind of task, I'd really appreciate sharing.
- vunderbaa day ago
  I can't provide a definitive answer for this - but I will say that the Google's SDK docs state that a single edit request is limited to a maximum of THREE images so depending on how many you have - you might have to sort of use the "Kontext Kludge", aka stitching together many of input images into a single JPEG.
  https://cloud.google.com/vertex-ai/generative-ai/docs/models...
Remdoa day ago
I didn't try it but I've seen really good results, is some innovation going on under the hood that we don't know? Is the technology the same of similar models? I can't find technical info on the internet
qgina day ago
Does the Nano Banana naming imply the existence of Regular Banana or even Mega Banana?
- tasseffa day ago
  It implies the existence of an inverse gigabanana.
ranea day ago
In the AI image generation scene, is there anything solid yet in the way of generating vector illustrations for apps?
ec109685a day ago
GitHub should be so ashamed that the back button no longer works and jumps to top of page.
They should have learned what do in SPA 101
darepublic17 hours ago
Am I wrong to think they have Google photos to thank for this
a day ago
undefined
stoobsa day ago
I'm pretty sure these are cherry-picked out of many generation attempts, I tried a few basic things and it flat out refused to do many of them like turning a cartoon illustration into a real-world photographic portrait, it kept wanting to create a pixar style image, then when I used an ai generated portrait as an example, it refused with an error saying it wouldn't modify real world people...
I then tried to generate some multi-angle product shots from a single photo of an object, and it just refused to do the whole left, right, front, back thing, and kept doing things like a left, a front, another left, and weird half back/half side view combination.
Very frustrating.
- SweetSoftPillowa day ago
  Are you in gemini.google.com interface? If so, try Google AI Studio instead, there you can disable safety filters.
  - stoobsa day ago
    I'm in AI Studio, and weirdly I get no safety settings.
    I had them before when I was trying this and yes, I had them turned off.
    vunderbaa day ago
    Yeah I don't see them anymore either.
    I use the API directly but unless I'm having a "Berenstein Bears moment" I could have sworn those safety settings existed under the Advanced Options in AI Studio a few weeks ago.
destela day ago
Some examples are mind blowing. It’s interesting if it can generate web/app designs
- AstroBena day ago
  I just tried it for an app I'm working on.. very bad results
  - destela day ago
    I had similar bad results with gpt 4o/5 when they got these image generation capabilities.
    Don’t know what’s the reason: my bad prompting or these models being tuned to work with photos/illustrations only
downbootsa day ago
Computer graphics playing in my head and I like it! I don't support Technicolor parfaits and those snobby little petit fours that sit there uneaten, and my position on that is common knowledge to everyone in Oceania.
HeartStringsa day ago
Nano banana is actually a world model. It generates an entire world, then shows you the frame you need.
n8cpdxa day ago
Has AI generation of chest hair finally been solved? I think this is the first time I’ve seen a remotely realistic looking result.
3videncea day ago
In case 5: Photos of Yourself in Different Eras
The output just looks like a clearly different person. Its difficult to production-ize things that are inconsistent.
mmmnnna day ago
best post ever for me. thanks for sharing.
jaequerya day ago
wow. RIP midjourney.
Giorgia day ago
These actually look awesome, wonder if it can actually create nice isometric graphics for games
AstroBena day ago
The #1 most frustrating part of image models to me has always been their inability to keep the relevant details. Ask to change a hairstyle and you'd get a subtly different person
..guess that's solved now.. overnight. Mindblowing
- osn9363739a day ago
  Not that these examples aren't really good. But you only have to spend a small amount of time to find a lot of mistakes.
- Inityxa day ago
  "Solved" is a strong word...
m3kw9a day ago
The ability to pretty accurately keep the same image from an input is a clear sign of it's improved abilities.
barbsa day ago
Bytedance's Seedream seems to be giving it a run for its money:
https://www.youtube.com/watch?v=EdEn3aWHpO8
- vunderbaa day ago
  Seedream 4 is yet another phenomenal Chinese model. On my comparison site it's currently ranked #1 scoring 9 out of 12. Nano-Banana trails at 7 out of 12.
  https://genai-showdown.specr.net/image-editing
moralestapiaa day ago
Wow, just amazing.
Is this model open? Open weights at least? Can you use it commercially?
- SweetSoftPillowa day ago
  This is a Google's Gemini flash 2.5 model with native image output capability. It's fast, relatively cheap and SOTA-quality, and available via API. I think getting this kind of quality in open source models will need some time, probably first from Chinese models and then from BlackForestLabs or Google's open source (Gemma) team.
- vunderbaa day ago
  Outside of Google Deepmind open sourcing the code and weights of AlphaFold, I don't think they've released any of their GenAI stuff (Imagen, Gemini, Flash 2.5, etc).
  The best multimodal models that you can run locally right now are probably Qwen-Edit 20b, and Kontext.Dev.
  https://qwenlm.github.io/blog/qwen-image-edit
  https://bfl.ai/blog/flux-1-kontext-dev
  - SweetSoftPillowa day ago
    Google also open sources Gemma LLMs and embedding models, which are quite good at the time of release (SOTA or near-SOTA in the open source field).
    vunderbaa day ago
    Oh very nice I wasn't aware of that [1] [2]. Adding the links as well.
    [1] https://deepmind.google/models/gemma
    [2] https://huggingface.co/google/gemma-7b [2]
- minimaxira day ago
  Flux Kontext has similar quality, is open weight, and the outputs can be used commercially, however prompt adherence is good-but-not-as-good.
ChrisArchitecta day ago
sigh
so many little details off when the instructions are clear and/or the details are there. Brad Pitt jeans? The result are not the same style and missing clear details which should be expected to just translate over.
Another one where the prompt ended with output in a 16:9 ratio. The image isn't in that ratio.
The results are visually something but then still need so much review. Can't trust the model. Can't trust people lazily using it. Someone mentioned something about 'net negative'.
- istjohna day ago
  Yes, almost all of the examples are off in one way or another. The viewpoints don't actually match the arrow directions, for example. And if you actually use the model, you will see that even these examples must be cherry-picked.
  - bflescha day ago
    The way you formulated your message just made me realize that we got somehow duped into accepting the term "model" (as in "scientific model") as a valid word for this AI stuff. A scientific model has a theoretical foundation and specific configuration parameters.
    The way current AI is set up, you can't even reliably adjust the position of the sun.
    quesera16 hours ago
    I wouldn't consider the word "model" to be too precious. It does not have an expectation of fidelity in general.
    Also, of course: https://en.wikipedia.org/wiki/All_models_are_wrong
- bflescha day ago
  You need to wait until someone does the exact picture you want, annotates it in their android photo library, and google uses it to train their AI models. But then they will be able to provide you the perfect result for your query, totally done with AI! ;)
animanoir16 hours ago
[dead]
foofoo12a day ago
[dead]
- vunderbaa day ago
  It is HIGHLY unlikely that site is using actual Gemini Flash 2.5 - ICANN shows the domain was registered on Aug 14th 2025, practically a week before Google even announced the availability of Nano Banana.
  https://developers.googleblog.com/en/introducing-gemini-2-5-...
frfla day ago
While these are incredibly good, it's sad to think about the unfathomable amount of abuse, spam, disinformation, manipulation and who know what other negatives these advancement are gonna cause. It was one thing when you could spot an AI image, but now and moving forward it's be basically increasingly futile to even try.
Almost all "human" interaction online will be subject to doubt soon enough.
Hard to be cheerful when technology will be a net negative overall even if it benefits some.
- signatoremoa day ago
  By your logic email is clearly a net negative, given how much junk it generates - spam, phishing, hate mails, etc. Most of my emails at this point are spams.
  - frfla day ago
    If we're talking objectively, yeah by definition if it's a net negative, it's a net negative. But we can both agree in absolute terms the negatives of email are manageable.
    Hopefully you understand the sentiment of my original message, without getting into the semantics. AI advancement, like email when it arrived, are gonna turbocharge the negatives. Difference is in the magnitude of the problem. We're dealing with whole different scale we have never seen before.
    Re: Most of my emails at this point are spams. - 99% of my emails are not spam. Yet AI spam is everywhere else I look online.
    DrewADesigna day ago
    Their argument is false equivalence. You can’t just say “if you’re saying X is negative, you must believe that Y is negative because some of the negatives could be conceptually similar.” A good faith cost benefit analysis would rank both the cost and risks of an extremely accurate, cheap, on-demand commercial image generation service and an entirely open asynchronous worldwide text communication protocol, in different universes.
    wiredpancakea day ago
    [dead]
jacobjjacoba day ago
Does the first example really need to be some softcore weeb p*rn?
- muzani19 hours ago
  Yeah, back when I did image processing in university, we used real playboy models.
  - djmips18 hours ago
    https://en.wikipedia.org/wiki/Lenna
- djmips18 hours ago
  well they censored it for your sensitive eyes. happy?
  - jacobjjacob14 hours ago
    Yes, I think it was the right call if they want the post to be taken seriously by a broad audience.
flysonic10a day ago
I added some of these examples into my Nanna Banana image generator: https://nannabanana.ai