It doesn't seem to depend on what model they use or how they prompt it. In code, there seems to be a loose correlation with testing styles; I've previously noticed that some people write tests to show that the code works as intended, and others try to write tests to show that it can't fail in ways that were unintended. But that correlation is weak.
I'm really puzzled by this.
I mostly use it for boilerplate code nowadays. Anything more complicated and it takes me more effort to review the output than to just code it slowly
It kinda feels like Michigan J Claude sometimes.
But how is it impossible? Or rather, how is it possible to distinguish actual intelligence from some internet page--not "some random internet page", but some very well selected, timely and topical internet page?
Carly Simon's "Killing Me Softly" describes a similar experience, decades before LLMs. It's amazingly easy to feel like someone is understanding you when they are just pattern matching on a common shared experience.
This seems so likely that I have a hard time understanding why some people think it's impossible.
FWIW this is my own mental gymnastics around the Chinese Room; at some point the question "where is the understanding?" is moot, because if the Chinese Room reliably delivers context-specific correct results in unique contexts, then what more do we require of intelligence? I have to admit that sometimes it does better than I can do, and I've been called intelligent by intelligent people my whole life.
(BTW Killing Me Softly wasn't written/sung/anything by Carly Simon, but composed by Charles Fox with lyrics by Norman Gimbel, in collaboration with Lori Lieberman [after she was inspired by a Don McLean performance in late 1971].)
The additional prompting doesn't necessarily need to tell the model specifically what to fix or do better, sometimes it's just enough to break it out of its habit. Asking for a smart looking, middle aged pelican on a sporty red bike isn't making the problem easier but does break it out of its boring defaults.
I wouldn't go so far as to say PEBKAC but the good news is there's still a role for humans in the loop.
I wonder how long it will take to fix the quirks?
This will get sorted out in time and in the meantime, instruct the LLM away from averaged answers. It’s not a problem.
If you do not want images that have that instantly recognizable AI style to them - busy, perfect, bright colors pseudo realistic, outside what humans would usually make - then instruct the AI that such images are a failure to meet your goals and instead instruct them in other directions.
You can do it today just by prodding the LLM correctly so it’s not hard for the LLM devs to do in an organized way.
The future of AI/LLMs will include the development of distinct personalities and behavior characteristics instead of the generic interface to an AI brain that is an averaged monotone.