The spigot can be turned off at any time.
Until there's some sort of "community owned hardware", open weights models are always at risk of being discontinued.
And there will always be incentivised parties that release models. Nvda for one has every incentive to keep the nemotron line going, as they're directly profiting from people running this. And the models aren't really far from open SotA anyway.
Goog will probably continue to release the small models, since they'll use them for browser stuff anyway, and know that they'll leak. So for them it's a win-win to release the small models and gain some dev market share.
And the chinese labs also have incentives to keep releasing models, and will likely continue to get gov support to do so (yay commercial wars between nations).
Not really.
Your right to 3d print whatever you want is about to be taken away (in California).
What software you can run on your computer can already be restricted.
Absolutely everything can be taken away. The simplest way to remove open models is probably to declare them a tool that terrorists could use. Crazy? Yes, the world is totally crazy these days.
Open source and open hardware can be called illegal by a government, but, if we collectively invest our energy into open alternatives, they can't be taken away in the same sense. I can build a RepRap printer and I can use a local AI model. It's on all of us to make sure that the open alternatives are viable, maybe in the current global political reality now more than ever.
Making something illegal isn't a disincentive for everyone. When they start banning books, some of us start assembling printing presses.
* Drugs
* Media piracy
* Alcohol
* Sex work
* Unlicensed gambling
The government is not an all powerful entity with absolute control over its people. Even in countries under past and present dictatorship there are examples of people getting access to what the government deemed as illegal.
Plus for a certain type of person "Piracy" is more of a philosophical belief or political position - there are fundamentalist equivalent, very proficient, "Pirates" who will under no circumstances stop and are not doing it for money. There are obviously an enormous amount who are in it for the money - "big brand names" now reportedly comprise as high as 63% of the advertising on illicit piracy sites - I'm too lazy to get the link, that sentence ought to be enough tho if you want to look into that bizarre reality.
I'm not certain either of those things are in the Government's direct control - both require society at large to share the belief and essentially choose not to do said activities.
(Regarding your second example, unfortunately most abusers are people children know, the Epstein Class was supposed to be just Q Anon crazy conspiracy stuff, none of this is ok in any fashion. Both exist, one local entirely beyond the government - the other appears to have incorporated people from government.)
My point is simply this - WE determine what the Government can do. What we believe matters more than anything else. Don't ever discredit The People's ability - we are pretty awesome.
It is on people to realize we have the ultimate power and oppose the overreach of government in all ways we can to keep our freedoms.
Freedom is not free, after all
Everything cannot, in fact, be taken away. Don't propagandize yourself. Some things, like information, are free. Not even China can prevent all its citizens from accessing Western internet. USGov simply does not have the resources to find and audit every hard drive and USB stick in the country for illegal files. The internet cannot be censored 100% without literally cutting every cable and confiscating every radio.
The software that runs on my computer cannot, in fact, be restricted. It can be declared illegal, but there literally is no mechanism by which it can be enforced other than a government goon standing over my shoulder 24/7.
Some freedoms really cannot be removed without utterly implausible amounts of effort. Arguing otherwise is helping to erode freedom. So stop it.
Are laws that are inherently unenforceable even laws?
In theory yes, but the average person can't really run the big open models.
This is already happening, try to find a provider that still hosts older, especially less popular or succeeded open models.
For me personally, I've been trying to access Kimi K2-0711. There seems to be only one provider left on openrouter (NovitaAI) and 3/4 requests error out
They're releases so far have been kind of lackluster compared to Qwen and other Chinese models. My suspicion is that Nvidia won't be releasing models that appear to compete with frontier models because that would upset their big customers.
This is pure speculation, but I have a hunch that the Nemotron line is intended as a shot across the bow, and that's why its capabilities have been strong but not quite open-frontier level.
A model that writes code without knowledge of any language or library changes for half a decade is less useful. A 2021 era chatgpt would be quite quaint in 2026.
Right now the Chinese labs might have incentives to release their models for free, and maybe Google is happy to release open weights today, but I'm sure there are already bean counters at Google salivating at the idea of having Gemini in Chrome as part of a Google AI monthly subscription just like YouTube premium and other Google subscriptions.
I think this matters less than you think. If the spigot turns off, open LLM research is going to have a powerful incentive to focus on post-training to refresh stale base models. And post-training, in general, is so much cheaper and faster than pre-training anyway. I was pretty surprised to learn that GLM-5.2's entire RL training (the part that makes it reliable at agentic tasks) was completed in just TWO DAYS.
Correction: The capabilities and knowledge of that model can be improved via self-distillation, so the value of that model increases over time.
This is where I think self-distillation is the main way forward, and probably the second best thing ever happened to AI/LLM after the transformer.
Based on self-distillation, the value of the open weights models will incease over time for sub-specialization through post-training and fine-tuning.
Please check these very promising recent works and results from MIT/ETH, UCLA and Apple [1],[2,[3]. For example the MIT/ETH self-distillation approach was demonstrated by a single H200 GPU. Apple approach is even simpler that it's simply called Simple Self-Distillation (SSD), pun intended.
[1] Self-Distillation Enables Continual Learning:
https://arxiv.org/abs/2601.19897
[2] Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models:
https://arxiv.org/abs/2601.18734
[3] Embarrassingly Simple Self-Distillation Improves Code Generation:
It’s sad to think that Mozilla spent years and millions doing virtual reality and AI, they would have been perfect to do this but let’s face it - who knows if Mozilla will be around even 5 years from now
Plus I am certain it makes financial sense. I am guessing here but fully utilizing a subscriptions limits probably costs the operator more money than the subscription revenue, that is why anthropic is making such a big stink about the chinese data harvesting. By releasing the weights, you are relieving yourself from that burden because the competition does not need to hammer your subscription service they can just download your model and analyze it and run it all day.
Also for the largest models it makes no sense to run it yourself unless you are a major player. Renting the hardware is ludicrously more expensive than their subscription tens of thousands of dollars. And buying the hardware to run them is in the hundreds of thousands of dollars.
The most popular LLM product in China is Bytedance's Doubao. You probably haven't heard of them since they never released weights and don't benchmark particularly well, but Bytedance already had enough users on its other apps that they could directly advertise Doubao to.
Open source and open weights model is how you can harness the potential of all humans to continue development and improving the SOTA of your model. Literally every student on the planet wants to play and improve these models for their own use case.
Plus the ecosystem, once you have users in the ecosystem on your open weight model, this is a giant leverage point in itself
I remain hopeful that we'll be able to democratize the entire tech stack for this tech.
I don’t think we should describe these companies as simply releasing these highly capable open weight models out of the goodness of their hearts
Among over countries that are consistent being on top on gross national happiness are Finland, Denmark, Iceland, Switzerland, and the Netherlands. Among them the current abilities to release open models is observable.
USA unfortunately continues to fall down quickly in World Happiness Report rank, and that's not because many other countries made great progresses.
Or until some bright people figure out drastically more efficient means of training.
True. And it's possible that this has already happened at Alibaba Qwen - at least for the smaller models that people had a chance of running at home (122B and smaller).
Its higley unlikely we get another open llama model though after the llama4 flop, even if their muse spark seems pretty good.
[1] https://www.theinformation.com/articles/deepseek-using-banne...
Moreover, China has just demonstrated a supercomputer faster than any US supercomputer, which unlike the US supercomputers, which need GPUs, achieves its high computational throughput with custom CPUs designed in China (implementing an Armv9-A ISA with SME, i.e. the scalable matrix extension, and with BF16/INT8 operations for AI).
The CPUs used in that supercomputer can reach both a computational throughput and a memory bandwidth sufficiently high for training any LLMs (they have fast HBM memory). Their only disadvantage in comparison with the best NVIDIA GPUs is a slightly lower energy efficiency, but China has abundant cheap energy so this is not a serious disadvantage for them.
But consider the alternative. OpenAI and Anthropic can shut off your account or API key at any time for any reason. How is this better? You have way more security when you're running your own model.
Nobody cares if your AGI is 100% made out of neural networks or if it's like 50% neural networks and 50% perl scripts.
For an (Chinese) open weight model to surpass the (US lab) frontier models, this equation must flip and the Chinese labs must entirely retool from harvesting frontier model data to producing the data systems and efforts to produce novel data; as well as procuring latest generation hardware en masse for this. This does not happen easily. Also training a frontier scale model is actually not such an unimaginable feat: doing all the inference with the teacher models is where the hardware goes.
You don't know what's happening in z.ai nor alibaba. And you don't know what's happening in anthropic and open ai.
I don't know what they are all doing, but I find it extremely unlikely that they are not all collecting data from one another. I am confident anthropic has a team going over GML 5.2 weights even if it's just to see where the competition is.
Just because some labs are getting data from Anthropic does not mean they are not also doing their own research.
They were focused on optimization because they could not get the best hardware.The only reason their top labs are behind may be because they did not have h200s and MI350s. And now they do.
Plus you are discounting other risks, Anthropic is currently sitting on "the best" models in the world because they got in a pissing match with the US administration.
btw: This could be the case in china as well, their administration has been surprisingly open on AI exports and open weight models, that we know of. There is a very small but not trivial chance they are hogging a better version of glm 5.2 for example, but no one is allowed to talk about it. Now I am not saying that is the case, I am saying the two cases (chinese labs are 6 months behind, they are forced to suppress their best models) are indistinguishable.
Even if your characterization is accurate, they could do this tomorrow and are not so myopic that they wouldn’t have thought about it. I don’t see this as a barrier, and I see a lot of the same underestimation of Asia that’s been happening for 50 years. There’s not some innate American advantage to building LLMs, and personally I think whatever head start the US has is going to be squandered on delays from the export control “to dangerous for release” LARPing we’re seeing.
Also I was responding to a claim about what will happen in less than 6 months (that’s about the edge of what you can meaningfully say too much about in this field).
These strategies take materially different resources; it’s not an overnight decision made by leadership. I suppose there is a natural experiment ongoing at Meta regarding this, it seems they recently moved a number of people into a division to produce such data overnight. So we will find out soon how quick they climb the leaderboards.
But if they can stay on pace, within say 6 to 12 months of the bleeding edge of the American frontier models, that’s a huge problem.
If they can just piggyback on the Herculean efforts of Anthropic, OpenAI, Google etc., accept a little bit of lag, and save billions of dollars? Why wouldn’t they?
And for the end user, why would they pay a premium subscription price for something they can just wait six months for and run on their own hardware at home? In my opinion, this is the cat and mouse game that’s being played right now. And I suspect it’s intentional on the side of the open weight models. I would bet they are playing a war of attrition
Distilling even with small amounts of data from a better model is still helpful, but not in the sense of transferring capabilities the raw internet-trained model doesn't have at all, but for identifying those capabilities that are compatible with the servile assistant persona and suppressing others that are undesirable (e.g. trolling). A primitive version of this were instruction-tuning datasets generated with ChatGPT, as used e.g. for Alpaca.
Without a clear target to emulate, competitors might have to rely more on human raters, but there are plenty of data labeling companies in China, so that's hardly a hurdle.
The use of US models for Chinese model training is part of the motivation of all of this.
They don't even need to 'win' in the sense of maxing the benchmark. They can be 20% worse/50% cheaper and many of us (and our managers who approve our token budgets) will be in.
Deepseek is 30x cheaper for input/75x cheaper for output than sonnet on openrouter, and it's not a whole lot worse for many things.
It is enough to kneecap their pricing power to trigger the valuation reset by an order of magnitude and humble them a bit.
Plus there are always infrastructure and hardware providers who want to keep their share of profits and will squeeze Anthropic's margins to deflate their valuation (nvidia, aws, RAM manufacturers, etc)
1. It's unclear if there is a law of diminishing returns with ever-larger models. They're more expensive to run and for many applications, you'll probably find smaller models are sufficient;
2. There's an inbuilt market for local LLMs. This is an effective limit on how large models can get. Case law hasn't been established yet on, for example, if a law firm using ChatGPT breaks privilege. Specifically, chat logs may be discoverable. Medical applications have this issue too and I think you'll find that financial firms are going to be leery about this as well;
3. Better, larger models will bleed into smaller, open source models. The chat logs themselves are training data. There's a whole market in China for Claude tokens around this;
4. China has a national security interest in not being beholden to US tech giants when it comes to AI. China has a history of being able to commit to large-scale long-term projects and Anthropic just won't be able to compete with a national project by one of the world's superpowers, if it comes down to it;
5. Winning doesn't necessarily mean being the best. Often it's just being good enough;
6. As an example of a national project, China is busy replicating EUV because of the US ban on ASML and NVidia exporting their best stuff. I don't think many in the West are prepared for how rapid this will be. I'm reminded of the policy debate in 1945 when many in American policy and militarey circles thought the USSR would never catch up with atomic bomb or, if they did, it would take 20+ years. It took 4 years. For the hydrogen bomb, it took 1. The US hardware advantage is a lot more tenuous than many realize.
If the closed models stop improving will the progress of open models slow?
The Americans should wake up to reality because their fantasies that are repeated continuously in all Internet media, that supposedly the Chinese copy the US technology so they will not be able to surpass it, were true many years ago, but there are already many years since this theory has become false and now there are many domains where USA would have to copy the Chinese technology if they do not want to remain behind.
Among other "sanctions", USA has forbidden the export to China of high-performance computing devices, but this has backfired as China has just demonstrated a supercomputer that is faster than any US supercomputer and which uses custom CPUs designed in China, apparently by Huawei, the company that was the main target of the US efforts to sabotage the Chinese competitors.
The US "sanctions" have hurt China for a few years, but they have convinced them that they must allocate resources to become able to make themselves everything that they previously bought from USA. The result is that now China has become stronger and USA weaker.
USA should have never sold technology to China a quarter of century ago and then the power relationship between the 2 countries would have been very different. But even 5 years ago it was already too late for any US "sanctions" to have lasting effects. Nowadays any hopes that US "sanctions" will keep China in the dark ages are pathetic.
With the kind of policies that are promoted by the US government, the chances that USA will keep its leading position in AI are minimal.
Some people in China surely know.
> Like if the closed models stop improving will all the closed models also stop improving?
Seems extremely unlikely, unless the models all hit some kind of wall soon. The Chinese companies may be behind the US in compute capacity, but they have excellent researchers [0] who are probably approximately as good as their US counterparts at the kind of problem generation and RL that is currently working so well.
I would be very surprised, though, if the models cannot continue to be improved rapidly in any area that allows a tight feedback loop like programming, at least up to the point where we puny humans lose the ability to define objective functions.
(And, conversely, I don’t expect magic in fields where the feedback is slow or expensive. A model is not about to reliably invent a wonderful medicine for the same reason that a large and extremely competent pharma company cannot: the evaluation process is extremely slow and it’s so expensive that the kind of utterly enormous corpus that is driving the current progress in coding is simply not available. Running RL on m iterations of n medication-development trajectories each is going to cost n*m times $10-100 million and take m years if it’s even possible at all.)
[0] The US advantage in this space will likely decline, since the brain drain from the rest of the world via the US university system to US labs is drying up.
I think it's much more immediate/present: the weights and the information breach significant strategic controls on national data and posture, which can be back-derived from the models. If you can analyse a model, you can infer what structural inputs dictate it.
China, a "authoritarian state" country, "the antonym of freedom", with a software industry that is especially capitalist, has produced all the competitive open-weight models.
It really is IRONIC.
Disclosure: I am Chinese, and I understand this strategy comes from being behind, using open source as an asymmetric way to compete and make up for missing compute by sharing the burden, etc. But still, very ironically.
[1] The story: https://nob.cs.ucdavis.edu/classes/ecs153-2019-04/readings/s...
[2] Wikipedia: https://en.wikipedia.org/wiki/Superiority_(short_story)
Not the same thing.
It’s used right in the articles body, but title is misleading.
The name is bad, doesn’t even make any fucking sense and it gives open source a bad rep.
I gave up. No one cares. And no one will ever tell the truth about the training anyways.
Substantial and growing freedom beats zero freedom ever again.
LLMs are an undeniably valuable tool, and governments like to control those.
On paper frontier models will be ahead of the curve but I don't think hardly anyone will be able to tell if a piece of work, say a landing page, is created with Fable or GLM and that is the point. The perceptible intelligence will reach a point beyond which it is no longer considered, except for some narrow use-case.
I think it's entirely the opposite. For narrow use cases, like web pages and crud/GUI, the open source models don't show much of a difference.
In this case it may actually apply though, no? Open models get better from closed model distillation?
The question is not whether they'll prohibit open-weight models better than the US ones, because we all know the obvious answer.
gemma4-26B (#7)
qwen-3.6-27B (#9)
Certainly the gap is closing but I feel it still makes more sense to pay pennies to run the full sized open models hosted on much better hardware.