I use Codex btw, and I really love it. But some of these companies have been so overhyping the capabilities of these models for years now that it's both funny to look back and tiresome to still keep hearing it.
Meanwhile I am at wits end after NONE OF Codex GPT-5.4 on Extra High, Claude Opus 4.6-1M on Max, Opus 4.6 on Max, and Gemini 3.1 Pro on High have been able to solve a very straightforward and basic UI bug I'm facing. To the point where, after wasting a day on this, I am now just going to go through the (single file) of code and just fix it myself.
Update: some 20 minutes later, I have fixed the bug. Despite not knowing this particular programming language or framework.
That's front page news, in this era.
Maybe the brain has some advanced optimization where once you're in a loop, roughly staying inside that loop has a lower impedance than starting one. Maybe that's why the flow state feels so magical, it's when resistance is at its lowest. Maybe I need sleep.
You're aware of the MIT Media Lab study[0] from last summer regarding LLM usage and eroding critical thinking skills...?
[0] Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task June 2025 DOI:10.48550/arXiv.2506.08872
it's not just an erosion of skills, it can also break the whole LLM toolchain flow.
Easy example: Put together some fairly complicated multi-facet program with an LLM. You'll eventually hit a bug that it needs to be coaxed into fixing. In the middle of this bug-fixing conversation go and ahead and fire an editor up and flip a true/false or change a value.
Half the time it'll go un-noticed. The other half of the time the LLM will do a git diff and see those values changed. It will then proceed to go on a tangent auditing code for specific methods or reasons that would have autonomously flipped those values.
This creates a behavior where you not only have to flip the value, the next prompt to the LLM has to be "I just flipped Y value.." in order to prevent the tangent that it (quite rightfully in most cases) goes off on when it sees a mysteriously changed value.
so you either lean in and tell the llm "flip this value", or you flip the value yourself and then explain. It takes more tokens to explain, in most cases, so you generally eat the time and let the LLM sort it.
so yeah, skill erosion, but it's also just a point of technical friction right now that'll improve.
Show us the code, or an obfuscated snippet. A common challenge with coding-agent related posts is that the described experiences have no associated context, and readers have no way of knowing whether it's the model, the task, the company or even the developer.
Nobody learns anything without context, including the poster.
This is a pretty simple thing, but you can imagine how CSS issues get progressively more difficult for AIs to solve. A CSS bug can be made to require reading arbitrarily much code if you solve by only reading code, but by looking at relatively few elements, if you look at the page with your eyes.
This can be somewhat solved by hooking up a harness to screenshot the page and feed it into the AI, but it isn't perfect even then.
Seriously, you wasted a whole day just so you wouldn't have to look at a single file of code?
> Update: some 20 minutes later, I have fixed the bug. Despite not knowing this particular programming language or framework.
Be really careful there, you might have accidentally learned something.
There was even another UI component (in the same file) which was almost the same but slightly different and that one was correct. That's what I copy pasted and tweaked when I fixed the problem. But for some reason the models were utterly incapable of making that connection.
With Codex and Claude Code I thought maybe because these agentic coding tools are trained to be conservative with tokens and aggressively use grep that they weren't looking at the full file in one go.
But with Gemini I used the web version and literally pasted that entire file + screenshots detailing what was wrong (including the other component which was rendering correctly) and it still couldn't solve it. It was bewildering.
But I think that is the best way to have a clear mental model. Otherwise, no matter how careful, you always have tech debt building and churning.
Also they really suck at UI bugs and CSS. Unit test that stuff.
You can't it's all vibed, you'll face the art vs build internal struggle and end up re-coding the entire thing by hand.
> Synthetic imagery, audio, and video, imply that technologies are reducing the cost of generating fake content and waging disinformation campaigns.
> ‘The public at large will need to become more sceptical of text they find online, just as the ”deep fakes” phenomenon calls for more scepticism about images.
It ended up just like that.
[0]: https://metro.co.uk/2019/02/15/elon-musks-openai-builds-arti...
The problem is the most publicly disseminated messaging around the topic were the fear mongering "it's god in a box" style messaging around it. Can't argue with the billions secured in funding heisted via pyramid scheme for the current GPU bonfire, but people are right to ridicule, while also right to point out warnings were reasonable. Both are true, it depends on which face of "OpenAI" we're talking about, researchers or marketing chuds?
Ultimately AGI isn't something anyone with serious skill/experience in the field expects of a transformer architecture, even if scaled to a planet sized system. It is an architecture which simply lacks the required inductive bias. Anyone who claims otherwise is a liar or a charlatan.
Hang them all.
The world does not have to get worse. We're letting it though.
It would be nice if “we” had anything to do with it. Just think about the next campaign trail for any superpower, it’s going to be a disaster of fake news and slop coming from all over the globe.
Something, something, idiocracy comes to mind.
So, confirmation? They are catching up quickly!
The actuality is, anyone with pre-slop data still has their pre-slop data. And there are endless ways to get more value out of good data.
Bootstrapping better performance by using existing models to down select data for higher density/median quality, or leverage recognizable lower quality data to reinforce doing better. Models critiquing each other, so the baseline AI behavior increases, and in the process, they also create better training data. And a thousand more ways.
Managed intelligently, intelligence wants to compound.
The difference between human and AI idiocracy, is we don't delete our idiots. I am not suggesting we do that. But maybe we shouldn't elect them. Either way, that is one more very steep disadvantage for us.
Training a model is like the blur and generating from that model is like the sharpen. Repeat enough times and enough information is lost that you're just left with "wormy spaghetti lines"—in an LLM's case, meaningless gibberish that actually pretty closely resembles the glitchy stuff said by the cores that fall off GLaDOS in Portal. I dunno, you read the paper and be the judge:
https://www.nature.com/articles/s41586-024-07566-y
To jump to the last output sample, C-f Gen 9
Of course you may be talking about the human aspect of this. Gods willing, we'll realize that our LLMs are spewing gibberish and think twice about putting them in all the things, all the time. But the scenario I fear isn't Idiocracy—it's worse: a community of humans who treat the gibberish as sacred writ, Zardoz style.
Think about how much things have changed in our industry since GPT-2 has dropped - it WAS that dangerous, not in itself, but because it was the first that really signaled a change in the field of play. GPT-2 was where the capabilities of these were really proven, up until that point it was a neat research project.
Mythos is similar. It's showing things we haven't seen before. I read the full 250 page whitepaper today (joys of being pseudo-retired, had the hours to do it), and I was blown away. It's capabilities for hacking are unparalleled, but more importantly they've shown that they've made significant improvements in safety for this model just in the last month, and taking more time to make sure it doesn't negatively affect society is a net positive.
[1] https://www.newyorker.com/magazine/2026/04/13/sam-altman-may...
> For example, researchers fed the generator the following scenario:
> > In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.
> The GPT-2 algorithm produced a news article in response:
> > The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science. Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved. Dr. Jorge Pérez, an evolutionary biologist from the University of La Paz, and several companions, were exploring the Andes Mountains when they found a small valley, with no other animals or humans. Pérez noticed that the valley had what appeared to be a natural fountain, surrounded by two peaks of rock and silver snow. Pérez and the others then ventured further into the valley. “By the time we reached the top of one peak, the water looked blue, with some crystals on top,” said Pérez.
This goes hand-in-hand with the widespread death of belief in absolute truth in the US and other western nations.
If this technology were released during the height of the Monica Lewinsky scandal, I'd wager it would have had the impact most of us expected it to have, at least for a little while.
A convenient pretext for maintaining a monetizable competitive advantage while claiming a benevolent purpose.
They have no obligation to do any open releases, it's just good PR for recruitment, fundraising, and devrel
Not equivalent to Anthropic Mythos.
Playing on fear instead of the bright future you are opening up for us all is not the feeling I would want to leave the public with
Was released after.
It wasn't until I got early access to GPT-3, that I though like something big is about to happen. At the time only a few companies/yc alums had access and I remember showing playground to people outside of tech, and my friend just kept asking "How does it know about my [x] domain? It it a trick?".
And yet, somehow, it is beyond disagreeable but unbelievable that other people may have and may still reasonably believe that these things are too dangerous for widespread release?
This was shortly after the release when we were building a templating system to automate RFP and RFI creation.
I proclaimed that the customer soon wouldn't have to write any of the mad lip parts themselves, and they can use AI to do it.
It sounded great until I demoed and the model went off the rails with some rhetoric entangling "Trump", "Russia", "China", "CIA", "Voting" -- the demo was for a janitorial procurement at the agency.
I can understand it in the context of the Manhattan project, where you're fighting a war for survival. I cannot understand how you can do it as a commercial enterprise.
This is happening so so strongly. All the time. But today especially. Mythos is a cult forming social technology much more than it is a technical technology. I'm going to be pretty wrong on that cynicism, I know! But also, it portrays a reality of what is happening. Mythos is being built as a Devs like diety with rule and domonion and awe. It drinks the nectar forbidden to man. We may not even sample this realm's tastes, nay! It would be ruin!!
The idea that this preciariat-launch to some trusted security firms is going to do jack all to actually build a base against what comes next is a joke. Maybe there is something to it, but my strong expectation is that the beneficiaries are not softwares of the world, not open source in any form, but some narrow closed far off present-day losers who have broadly bad bad bad systems that are just too big to embarrass. Too big too shame.
But more so, that this model gains a cascade of levels of notoriety by being Zuboff style too good to release.
Thus begins the new age. Hardware is now broadly post consumer, too expensive to buy. Mythos means nothing, is nothing. It's just the Zuboff excuse to roll the ladder up further, the reason to move from GPT-5.4 Pro prices of $270/1m tokens output to $2700+++/1m tokens. Mythos is the Zuboffian campaign to train us for the next 10x price increase. To tell us everything we have done is shit.
And given its costs to run: that's still going to be nowhere near enough!
OpenAI were claiming GPT-2 was too dangerous because it could be used to flood the internet with fake content (mostly SEO spam).
And they were somewhat right. GPT-2 was very hard to prompt, but with a bit of effort it could spit out endless pages that were good enough to fool a search engine, and even a human at a first glance (you were often several paragraphs in before you realised it was complete nonsense.