What a bizarre conclusion. It "just" cannot close the exploitation step? "Just?"
Developing the working exploit is the hardest part, not finding the bugs. A self-proclaimed security professional should know this.
How is this stuff even making it to the top of HN? Is it just the trendy Anthropic hate? I wonder if these folks will publicly walk back their statements if Mythos turns out to be legit.
It's a good model update. We've had these before, and it looks like OpenAI is gearing up to match it this week.
-
Mythos launch has felt like a showsman overlplaying their hand.
Opus 4.5 put them in an awkward position after everyone went Opus-only and suddenly Sonnet's quota was getting treated like you were asking people to use Haiku.
So a new pretraining run completes and instead of just releasing it as Opus 5, they stick the model in a new tier and name it Mythos Preview, while simultaneously launching Project Glasswing to literally build a mythos around the model.
Some people are even confusing it for some sort of completely new paradigm of model centered on cybersecurity not realizing it's 'just' a new model tier, and the cybersecurity stuff is separate.
While Mythos Preview is simmering a Sonnet-sized distill gets launched as Opus 4.7, at Opus prices, and fixes the margins and compute needs of the Opus tier again.
Improved pretraining + progress on RL allows it to compete even though it's a smaller model, but some things still regress like understanding nuance (hence the regression on Tau bench and agentic search)
-
It's clear they plan to price Mythos like they used to price Opus (so high that you don't see it as a strict replacement for the smaller tiers) and heal the compute crunch just a tad.
The main problem is OpenAI doesn't have to play these games.
They have compute, and GPT-5 is already a very parameter efficient model so they're just going to release their model without the fanfare and mystery.
Mythos might get deflated before they even get to cash in on all the fanfare they created. Unfortunate timing really (if you're Anthropic)
smaller companies, even startups, are held to much much higher standards
is anthropic somehow immune? what have they done to earn that immunity? what good will, good stewardship, good faith have they shown to the developer community in the past few quarters?
call a spade a spade
I'm not saying they're lying about it being a great model, they're just presenting a great model in a very intentional way. That way happens to be drumming up its cybersecurity skills, but those skills are present in all their previous LLMs too.
If you run Project Glasswing with Opus 4.7 instead of Mythos, it still works, just not as effectively... or honestly, maybe even more effectively if you account for final token cost! Since they'll likely want to squeeze better margins out of Mythos than the workhorse models, Mythos might be so expensive that just getting un-moderated access to 4.7 and throwing in the same number of dollars worth of tokens at various codebases uncovers more vulnerabilities!
But the latter half of that paragraph is assuming you're outside Anthropic: Anthropic is doing all of this at cost, so obviously the best model they can muster is the best option to offer.
-
The key is, Mythos isn't using scaffolding or and approach that no other model can meet the floor of. People jumped to small models and that's a bit of a stretch... but the best non-Mythos models can obviously be put in harnesses and used to find vulnerabilities at scale.
Part of the proof there is Anthropic themselves cranking up their cybersecurity request filters and going overboard with CC prompt injections.
There is healthy skepticism and then there is sticking your head in the sand. When companies and orgs with no financial interest in Anthropic issue a joint statement describing a problem, it is likely that the problem is real (unless you go off into wacky conspiracy territory.)
I think the real purpose of the Mythos security sham is to mask that Anthropic simply can't release their new model because their data centers are already on fire. There are so many other red flags pointing to this: the no-Claude-Code-for-Pro-users "test", the AWS data center rental deal, the fact Microsoft rug pulled hard on Copilot, specifically removing Opus... and that's just the past 2 days?
If it’s indeed as bad as the article says it’s going to be a (yet another) PR disaster, but it won’t matter one bit as the whole industry is compute-constrained, not reputation-constrained. You’ll shout at clouds and them and their competitors and still be paying for tokens.
In what world does this author live where the system card is meant to be a scientific paper?
It's worth being skeptical, but it's nonsense to assume that the system card is meant for him or anyone to be able to reproduce and determine what the model actually did or did not. We won't know that until it is actually available.
https://www.flyingpenguin.com/ox-security-report-anthropic-m...
I'm sure the hyper-paranoid cybersecurity researchers are all about ensuring well-behaved model stays well behaved.