This weekend I programmed a matrix bot with encryption and a Rust agent with some tools. Because I need one and OpenClaw just felt... not what I wanted. Two days later and 20 dollars poorer I have what I need: a multimodal agent written in rust that has access to my homelab.
Nothing felt off with GLM. It did what I wanted, was fast, had a decent not very annoying personality and was much cheaper than Opus or GPT.
I used it unquantized through Fireworks, but there are multiple other providers too.
I kind of wanted to see if I can make a Matrix agent from scratch with Rust with GLM and it was surprisingly easy. Just make something for myself how I want it. Maybe I'll take a look on Hermes later...
https://swelljoe.com/post/will-it-mythos/
Also of note, I found giving models access to the open source semgrep as a tool makes some perform worse and none perform better, though it's plausible there's a way to wire it up in a harness that presents useful information to the model without the model having to know how to use it (my theory is that semgrep isn't heavily represented in the training data, so you're asking the model to do two things at once: figure out how to use semgrep and find security bugs, and both tasks suffer for the lack of focus...most small models, and some big models, can't do that well).
Edit: But, also, more testing is ongoing. I suspect GLM 5.2 will also be a consistently strong performer. It seems to excel at most things I've tested on it.
Don't worry though, open source evangelists will tell you that these will be running on your phone in the next 3 years.
For $100k you could run this model 24/7 through open router with 10 concurrent sessions at 50tps for a decade and have money left over for a vacation. There's no point in investing this type of money in local models unless you have a business where you're already paying for many employee's individual token usage.
8 x RTX6000 GPUs cost $100,000 alone. You then need to build a system that can support those GPUs with enough PCIe lanes through a PCIe switch.
It's going to be $120K to $150K to build or buy a system to run this.
Or even just electricity costs vs token cost
The real gangstas are running 16x RTX6000s. Too rich for my blood, and the NV4FP quant doesn't seem to be that much worse.
Personally, I’m waiting for hardware to hit the secondary market before I buy something to run unquantized models like GLM. But I have no doubt that I will, at some point.
oil workers buy 100k trucks they do not-much with. why not a 100k in computer?
Isn’t the performance gap between quantized and full models indicative that even if you aren’t using it directly, the model knowing the colors in the Russian flag does have something to do with the intelligence you demand?
Likewise, LLMs do not violate the laws of information theory, and therefore the only way to encode X amount of information in Y amount of bits where X > Y is by performing what is effectively lossy compression, and as X grows larger relative to Y the compression ratio must change to lose ever more information.
Yes, for the sake of making chatbots that are "conversational" in that they can interpret natural language as input and produce code as output you can easily benefit in incidental and unintuitive ways by training it on more natural language text. But for a given fixed parameter size, it's possible to produce a better model for a specific task by selectively not muddying its training set in the first place with things that are likely irrelevant to the task.
assuming demand doesn't keep on increasing. even google has trouble having enough capacity apparently.
Claude Code is an agent harness, not an LLM.
Claude is a brand (or group of LLMs), not an LLM.
Opus 4.8/4.7 scored 28%
Opus 4.6 score 37%
So the author thought as let's not get into that just write Claude.wild guess - I wouldn't be surprised if Opus 4.6 was run quantized for a while, and 4.7/4.8 have QAT for that nerfed size.
GLM 5.2 is already capable enough to assist in self-training which is similar to what we saw happen with frontier models and they appear to be getting there at a significantly lower cost than openai/anthropic.
Not that it would make any sense.
Any prohibition on open source models will do nothing to fix the problem.. since attackers will never feel bound to the law. All advanced models must be available for defensive purposes.
If the real motive is profit, then open source models are likely simply not a viable means to that end.
But that's the whole point.
Fall out of favor with the admin and you lose access to the good American models, aren't allowed to use Chinese ones, and fall prey to the attackers and behind your competitors.
Yes, you get your free model, but the cost of this is not developing your own capability and tying your fate to a country which may or may not have your best interests as a nation in mind.
This is just the deindustrialization that occurred in my home region (the American Midwest) playing out on a global scale in different sectors. It was originally driven by the Japanese, who, to their credit, acted more as partners than competition. Eventually that desire for larger margins went to China, and now you basically can't build anything of consequence without at least some Chinese parts, because there's "no economic case" for it. This means that you have to play Beijing's game if you want access to any sort of modern market.
You see this happening with Volkswagen's restructuring, next you'll see it with non-American, non-Chinese AI.
I’m sceptical they could find the legal framework to do this even if they wanted to
They have legal authority to (a) prevent export of US goods/services; (b) ban imports of physical goods; (c) ban transactions (including purchasing services or license agreements) with foreign firms
But I’m not aware of any legal authority which lets them ban US firms from running a Chinese-developed open source AI model in the United States, if they are at arms length from the vendor, and aren’t using it for government contracts or regulated applications
Possibly they could order HuggingFace/etc to suspend Chinese accounts. But if someone in the US (or a third country) downloads the model from China then reuploads it to a US server, completely independently of the vendor - where is the legal hook to prohibit that?
I agree, my only caveat is that the current administration has shown it's willing to go beyond aggressive regulatory interpretations to questionable and outright implausible interpretations. As we've seen recently, the federal courts and SCOTUS are overturning most of these but that can take a year or more to resolve. The one positive light is they seem to push the hardest on certain culture war issues (immigration, voting, districting, etc). AI doesn't seem like a core hot button issue for the White House and there is a strong pro-AI / business faction.
This would be extremely heavy handed and probably end up accelerating the loss of the virtual US monopoly of payment network. The reast of the world isn't going to let the US dictate that only they get the frontier models whether their US made or otherwise
Can they actually though? Do they have legal authority to tell a payment processor that it has to block transactions of a legal US company, just because the company is hosting a Chinese-developed open source model? I’m sceptical
And what about companies (e.g. AWS) that let you “bring your own model”?
US imposing export restrictions on a model from China?
The reason GLM-5.2 hasn't been banned is that despite these cherry picked use cases, GLM-5.2 isn't even close to Opus in all use cases. These vibe benchmarks are ran by companies that are not part of the cyber services offered by Anthropic and OpenAI where they can use the models without the safeguards and refusals so their actual cyber capabilities can be utilized.
These guys that wrote the article compared a gimped Opus to GLM-5.2, knew full well it's misleading, and got the clicks regardless. They don't have enough clout to be a part of something like Project Glasswing, GPT Cyber, etc.
The weights are already available and downloaded, is it going to be a crime to have them, run them, make them available? Constitutional rights still exist (I hope)
Now you're getting it! Commerce will call it a munition and those harboring it as harboring illegal/foreign munitions.
No business will take the hit, so they will quickly deplatform the models.
No end user has the GPU capacity to use GLM 5.2 or similar models at full precision so the government will call the problem "mostly solved." But they might choose to "make examples" out of a few people using p2p software to download the weights if they choose to.
I'm for making software better instead of banning it based on what the rich and powerful claim.
I suspect the real fear is that open weight models undermine the financials and token prices they thought were going to pay off their ludicrous spending because they have all raced and raised hardware prices.
We're still in the middle of the cambrian explosion.
If Anthropic was capable of developing Opus 4.49-4.5 2H 2025.... then any company with a research team capable of reading all the papers and press releases will be capable of producing Opus 4.8 by the end of 2027, either raw model competency, or in a harness like claude code (or better with both). I guess what I am trying to say is that Opus 4.5 does not represent the edge of agentic capability, merely somewhere in the thick meaty layer of "functional and achievable".
We can draw the line at Sonnet 4.6 in the US but much like encryption export restrictions in the 1980s, the line drawn will be laughably low within a few years and simply unthinkable in a decade.
That would be the rational thing to do.
> financials and token prices
I do not think the government thinks this deeply. Market manipulation might be a rational, if unethical reason to ban open source models.
But this admin banned Anthropic models to "own the libs." They will continue to ban what they want for whatever reason they want. I don't think those reasons will be particularly coherent.
Yeah. Illegal numbers.
Instead of shilling for the LLM providers.
Secondly these are "just" IDORs, arguably the easiest class of vulnerabilities.
Thirdly it compares to GPT 5.5 and Opus 4.8.
No, we don't have Mythos at home.
mythos is <10% ahead of gpt 5.5 on all benchmarks, which it gains by being several times the size of opus. had it been economical to provide, it would've been released to the public on day one instead of the marketing circus those effective altruism clowns had exhibited. admitting that it costs >1000% to run inference on a <10% better model would've been very damning.
do you have a source for this claim? i thought LLM providers earn high margins from inference (charged by token). is this no longer the case?
The only ones who seem to profit are the ones running smaller Chinese models. Even NVIDIA seems to have to "reinvest" their profits into sponsoring companies to buy their cards now.
no one has a source, because no one knows closed model parameter counts. we have only heuristics which strongly indicate that Mythos is simply a big fucking model that any other lab could make an equivalent of.
> No, we don't have Mythos at home.
That's still useful. To paraphrase the kids these days, GLM5.2 is in the room with us, today. Mythos is not. And for us in the EU, it's even more complicated, as Mythos might be with us in the room one day, and go poof the next day, on the whims of political entities that we have 0 control over.
Knowing where open, accessible, local models are is important. We know they're behind. But there comes a time when "good enough" is useful. Even if they're "just IDORs" today, and even if they're behind SotA today.
As someone else said above, GLM5.2 (and other models in the same tier like kimi, dsv4, etc) is / are slowly becoming "good enough" to assist in automated repo prepare work (download, install, test, edit, re-test, etc). And that translates in RL traces ready to be trained into the next generations. That might be more important than x% behind on benchmarks.
Beats which model in Claude? Whenever a "benchmark" doesn't put precise model numbers in their headlines I am immediately skeptical. Either they don't know the difference (bad) or they are benchmarking against weaker models (misleading, also bad).
It's like when studies say "AI is bad at X" and they used GPT-3.5 in current year.
How are we supposed to stay skeptical of everything if we read anything!?
What we're they? Also, wouldn't one expect a more recently released coding agent (with a more recent knowledge cut off) to perform better because they have access to more knowledge about vulns in these OSS projects, and even possibly have knowledge of your own "prior research"?
After installing, do a `n8 build` to build the image, then `n8 --danger --provider opencode interactive` to launch it in a container.
Signup for GLM-5.2 here: https://z.ai
I think they give $5 trail credits to test with any of the open weight models.
The incentive to develop these Chinese models further is to trash the business case of most American AI labs.
This article only talks about detecting vulnerabilities, so it's unclear if it's a true Mythos equivalent.
I'd be mostly fine switching to it.
I just can't find a cost effective way to do that. z.AI's coding plan is both overpriced and unreliable. ollama's is also overpriced. Paying by the token for it on openrouter etc is more expensive than just having a Codex or Claude coding plan.
If you have to pay by the token, it's clearly cheaper. It's not competitive with a coding plan though.
Which I guess makes what semgrep sells obsolete. Unless they have built a pareto-optimal point in terms of capabilities and token usage maybe?
What explains it?
Is TFA lying? Is the most upvoted comment here lying?