Anthropic announces proof of distillation at scale by MiniMax, DeepSeek,Moonshot(twitter.com)

105 pointsby Jimmc4144 hours ago46 comments

logicprog4 hours ago
Just reading the headline, I say good.
A) These models are trained by ignoring IP. It is hypocritical and absurd to then try to assert IP over them. And I am for the destruction of IP on all ends.
B) What this essentially means is that the Chinese labs are taking the work of these mega corporations into making it freely accessible to other labs and businesses, to serve inference, fine tune, and host privately on prem. That's clearly a good thing for competition in the market as a whole.
C) I don't see why we should have to duplicate the massive energy and infrastructure investment of building foundation models over and over forever just because we want to preserve the IP rights of a few companies. That seems a shame and it seems better to me for everything to learn from everything else for the whole ecosystem to get better by topping each other and building off each other; that's also why publishing research into the architecture and training of these models is so much better than what the proprietary labs do (keeping everything a secret), although tbf Anthropic's interpretability research is cool.
D) these Chinese models give 90% of the performance of frontier proprietary models at a 10th or 20th of the cost. That seems like a win for everyone. Not to mention the fact that this distilling also allows them to make much smaller local models that everyone can run. This is a win for actual democratization, decentralization, and accessibility for the little guy.
- spudlyo3 hours ago
  > And I am for the destruction of IP on all ends.
  While I'm not unsympathetic to the plight of creatives, and their need to eat, I feel like the pendulum has swung so far to the interests of the copyright holders and away from the needs of the public that the bargain is no longer one I support. To the extent that AI is helping to expose the absurdity of this system, I'm all for it.
  I don't think "burn it all down" is the answer, but I'd love to see the pendulum swing back our way.
  - paxys3 hours ago
    Because copyright laws rarely serve small independent creatives, but rather corporations like Disney that are in the business of hoarding and monetizing culture.
    logicprog3 hours ago
    Yeah, I would argue that, just systemically, intellectual property laws can't really do anything but overwhelmingly serve the interests of the wealthy and mega corporations. I also think they're ethically wrong and run counter to the kind of artistic and information culture that I would prefer, but those are arguments more people are likely to disagree on.
    spudlyo2 hours ago
    I think most people would argue that dismantling intellectual property would mean the end of all new creative endeavors, as if humanity is only driven to create art for practical reasons.
    Schopenhauer, on the other hand, would argue that true art must serve absolutely no practical or utilitarian purpose, and that pecuniary concerns only corrupt artistic and intellectual labors leading to mediocrity and dishonesty.
    paxys2 hours ago
    Copyright laws as we know them came into being sometime in the 18th century. The earliest recorded works of art produced by humans are from 40,000-45,000 BC. So it's hard to take the "we'll never have creative output without strict copyright!!" extremism seriously.
    logicprogan hour ago
    > Schopenhauer, on the other hand, would argue that true art must serve absolutely no practical or utilitarian purpose, and that pecuniary concerns only corrupt artistic and intellectual labors leading to mediocrity and dishonesty.
    As, similarly, would Bataille, one of the philosophers I'm interested in!
    DamnInteresting30 minutes ago
    While true, 'rarely' ought not be conflated with 'never.' I am a small, independent creator, and I've used copyright laws many times over the years to stop larger entities from raiding my catalog for content. Of course now Anthropic et al. are gobbling up such catalogs for indirect misappropriation, with no sign of consequences, so perhaps copyright has truly shrunk to a one-way street favoring the major players.
- jsheard3 hours ago
  They're trying to kidnap what Anthropic has rightfully stolen!
  Jokes and complete lack of sympathy aside, it does complicate the narrative that these small labs are always on the heels of the big labs for pennies on the dollar, if they rely on distilling the big labs models. That means there still has to be big bucks coming from somewhere.
  - Imustaskforhelp3 hours ago
    I don't see Z.ai (GLM 5) in the list though. I consider Qwen/Kimi to have a close relationship so I might not be sure but Qwen might be using Kimi data (I have written another comment in more depth)
    I still prefer kimi fwiw. It's one of the best models I have witnessed open source and when I tried GLM 5, it really was lacklustre for me on its launchday but I will have to see it for myself now comparing the two maybe as I do see GLM 5 do some good things in benchmarks but we all know how benchmarks should be less trusted.
    I still think that there is still some hope in chinese models even after this ie. they aren't completely dependent on the large models seeing GLM 5.
    I am seeing an accusation of GLM 5 doing Distillation[0] but I am not seeing any hard evidence of it.
    [0]: https://mtsoln.com/id/blog/wawasan-720/the-temu-fication-of-...
- impulser_2 hours ago
  It's greed, now that they have all the data and infrastructure they are pulling up the ladder.
  Why do you think not a single one of these labs have released an open source models distilled on their own SOTA model?
  They are all preaching they want to provide AI to everyone, wouldn't this be the best way to do this? Use your SOTA model to produce a lesser but open source model?
- ashertrockman3 hours ago
  A) The "IP" they're concerned about isn't the same IP you speak of. It's the investment in RL training / GPU hours that it takes to go from a base model to a usable frontier model.
  B) I don't think the story is so clean. The distilled models often have regressions in important areas like safety and security (see, for example, NIST's evaluation of DeepSeek models). This might be why we don't see larger companies releasing their own tiny reasoning models so much. And copying isn't exactly healthy competition. Of course, I do find it useful as a researcher to experiment with small reasoning models -- but I do worry that the findings don't generalize well beyond that setting.
  C) Maybe because we want lots of different perspectives on building models, lots of independent innovation. I think it's bad if every model is downstream of a couple "frontier" models. It's an issue of monoculture, like in cybersecurity more generally.
  D) Is it really 90% of the performance, or are they just extremely targeted to benchmarks? I'd be cautious about running said local models for, e.g., my agent with access to the open web.
  - _aavaa_23 minutes ago
    > Maybe because we want lots of different perspectives on building models, lots of independent innovation.
    That’s only really possible if the front runner don’t buy up all of the chips on the market.
  - logicprog3 hours ago
    Fair points, and worth responding to for a more nuanced discussion! I hope you take these responses in that light :)
    A) Well, sure, yes, it's different specific IP being distilled on versus what was trained on. But I don't see why the same principles should not apply to both. If companies ignore IP when training on material, then it should be okay for other companies to ignore IP when distilling on material — either IP is a thing we care about or it isn't. (I don't).
    B) I'm really not sure how seriously I take the worries about safety and security RLing models. You can RLA amodel to refuse to hack something or make a bio weapon or whatever as much as you want, but ultimately, for one thing, the model won't be capable of helping a person who has no idea what they're doing. Do serious harm anyway. And for another thing, the internet already exists for finding information on that stuff. And finally, people are always going to build the jailbreak models anyway. I guess the only safety related concern I have with models is sychophancy, and from what I've seen, there's no clear trend where closed frontier models are less sychophantic than open source ones. In fact, quite the opposite, at least in the sense that the Kimi models are significantly less psychophantic than everyone else.
    C) This is a pretty fair point. I definitely think that having more base frontier models in the world, trained separately based on independent innovations, would be a good thing. I'm definitely in favor of having more perspectives.
    But it seems to me that there is not really much chance for diversity in perspectives when it comes to training a base frontier model anyway because they're all already using the maximum amount of information available. So that set is going to be basically identical.
    And as for distilling the RL behaviors and so on of the models, this distillation process is still just a part of what the Chinese labs do — they've also all got their own extensive pre-training and RL systems, and especially RL with different focuses and model personalities, and so on.
    They've also got diverse architectures and I suspect, in fact, very different architectures from what's going on under the hood from the big frontier labs, considering, for instance, we're seeing DSA and other hybrid attention systems make their way into the Chinese model mainstream and their stuff like high variation in size, and sparsity, and so on.
    D) I find that for basically all the tasks that I perform, the open models, especially since K2T and now K2.5, are more than sufficient, and I'd say the kind of agentic coding, research, and writing review I do is both very broad and pretty representative. So I'd say that for 90% of tasks that you would use an AI for, the difference between the large frontier models and the best open weight models is indistinguishable just because they've saturated them, and so they're 90% equivalent even if they're not within 10% in terms of the capabilities on the very hardest tasks.
    ashertrockman2 hours ago
    Yeah of course, I've been thinking about this a lot and I'm updating my beliefs all the time, so it's good to hear some more perspectives
    A) I see what you mean. But I'm more so thinking: companies consider their models an asset because they took so much compute and internal R&D effort to train. Consequently, they'll take measures to protect that investment -- and then what do the downstream consequences look like for users and the AI ecosystem more broadly? That is, it's less about what's right and wrong by conventional wisdom, and more about what consequences are downstream of various incentives.
    B) I don't really care about AI safety in the traditional sense either, i.e., can you get an LLM to tell you to do some thing that has been ordained to be dangerous. There's lots of attacks and it's basically an insoluble problem until you veer into outright censorship. But now that people are actually using LLMs as agents to _do things_, and interact with the open web, and interact with their personal data and sensitive information, the safety and security concerns make a lot more sense to me. I don't want my agent to read an HN post with a social-engineering-themed prompt injection attack and mail my passwords to someone. (If this sounds absurd, my Clawbot defaulted to storing passwords in a markdown file... which could possibly be on me, but was also the default behavior.)
    C) This is a completely fair point, there's amazing work coming out of these smaller labs, and the incentives definitely work out for them to do a distillation step to ship faster and more cheaply. I think the small labs can iterate fast and make big changes in a way that the monolithic companies cannot, and it'd be nice to see that effort routed into creating new data-efficient RL algorithms or something that pick up all the slack that distillation is currently carrying. Which is not to say they're doing none of that, GRPO for example is a fantastic idea.
    One way you could have a change in perspective is not just in the architecture/data mix, but in the way you spend test-time compute. The current paradigm is chain-of-thought, and to my knowledge, this is what distillation attacks typically target. So at least, all models end up "reasoning" with the same sort of template, possibly just to interlock with the idea of distilling a frontier API.
    D) Interesting to hear. In my research, I find these models to be quite a bit harder to work with, with significantly higher failure rates on simple instruction following. But my work also tends to be on the R&D side, so my usage patterns are likely in the long-tail of queries.
    logicprogan hour ago
    Thanks for the response!
    > it'd be nice to see that effort routed into creating new data-efficient RL algorithms or something that pick up all the slack that distillation is currently carrying
    It seems to me like they're already doing that. Some of the most fun I've had actually is reading their papers on the different R.L. environments, especially Egentic ones they set up and the various new algorithms they use to do RL and training in general. Combine that with how much they are innovating with attention mechanisms and I feel like distillation doesn't seem to be really replacing research into these means as just supplementing it — and maybe even making it possible in the first place, because otherwise it would be simply too expensive to get a reasonably intelligent model to experiment with!
    > But now that people are actually using LLMs as agents to _do things_, and interact with the open web, and interact with their personal data and sensitive information, the safety and security concerns make a lot more sense to me.
    Ah, I see what you mean. Can you point me to any benchmarks or research on how good various models are out of waiting social engineering and prompt injection attacks? That would be extremely interesting to me. Fundamentally, though, I don't think that's really a soluble problem either, and the right approach is to surround an agent with a sufficiently good harness to prevent that. Perhaps with an approach like this:
    https://simonwillison.net/2023/Apr/25/dual-llm-pattern/
    Or this, which builds on it with more verifiable machinery, if you're less bitter-lesson pilled (like me):
    https://simonwillison.net/2025/Apr/11/camel/
    > That is, it's less about what's right and wrong by conventional wisdom, and more about what consequences are downstream of various incentives.
    Ahhh, I see. Yeah, that could be negative. That's worth thinking about.
impulser_4 hours ago
Why would anyone care about this at all?
MiniMax, DeepSeek, and Moonshot are all releasing models for the public to use for free.
Anthropic, OpenAI, Google ect have been scraping information to train their models that they had no right in scraping yet when these company pay them to scrap data we are suppose to be worried?
Labs like Anthropic always preach we are trying to build AI for everyone while releasing expensive models that are closed source.
The only reason AI is affordable at all is because of these Chinese AI labs.
- lumost4 hours ago
  Also - how can this be prevented? the AI labs can't seriously expect that each lab will filter LLM generated content from their training sets based on the source model. Leakage of AI behavior into public datasets is inevitable.
- reactordev4 hours ago
  Turn the lens the other way around. By publicly posting that these models violate IP and anyone can run them, they are painting a specific political picture here…
- NitpickLawyer4 hours ago
  > Why would anyone care about this at all?
  Anthropic have been the loudest in pushing for regulatory capture, often citing "muh security" as FUD. People should care what they write on this topic, because they're not writing for us, they're writing for "the regulators". Member when the usgov placed a dude in solitary confinement because they thought he could launch nukes with a whistle? Yeah... Let's hope they don't do some cray cray stuff with open LLMs.
  Anthropic make amazing coding models, kudos for that. But they should be mocked for any communication like the one linked. Boo-hoo. Deal with it, or don't, I don't care. No one will feel for you. What goes around, comes around. Etc.
  - bigyabai3 hours ago
    Administratively, Anthropic seems to misunderstand politics. You don't get to wear the "people's champion" and "government sweetheart" hats at the same time, when push comes to shove you'll be forced to pick a lane. We saw it with Microsoft, we saw it with Apple and Google, and now we're seeing it with OpenAI too. You can't drive down both paths at the same time.
    As a member of the target audience for Claude, their messaging just leaves me confused. Are you a renegade success, or do you need the government's help? Are you a populist juggernaut, or do you hide from competition? OpenAI, for all their myriad issues, understood this from the start and stuck to the blithely profitable federal ass-kisser route.
- PlatoIsADisease4 hours ago
  Go free stuff! But... no one is running 400B models on their computers.
  You are just giving them data instead. Its not like China is known to protect IP. Your data is going to be used against you, and we cant use western laws to keep it safe.
  - impulser_4 hours ago
    Yeah they do.
    https://openrouter.ai/minimax/minimax-m2.5/providers https://openrouter.ai/z-ai/glm-5/providers https://openrouter.ai/moonshotai/kimi-k2.5/providers
    PlatoIsADisease3 hours ago
    This is not 'running it on our own computers'.
    But I like the option to give my data to a rando rather than one of the big 5 US companies that can get sued. At least the rando probably has no idea what to do with 10M of my customer's IP.
    Actually... thank you for the links. Unironically.
  - SlavikCA4 hours ago
    So, only Americans can use data against others?
    By the way, I'm running 400B model on my computer with 72GB VRAM: Qwen3.5-397B-A17B-GGUF/UD-Q4_K_XL getting 13 t/s. Subjectively, I feel it's runs at the level of Anthropic Claude, just slower.
    PlatoIsADisease3 hours ago
    Question for you, that 13t/s, is that pretty solid even with high context/tokens?
    I know Apple marketing says 'look at our 20t/s' but they sent less than 40 tokens.
  - selfhoster11an hour ago
    It doesn't take much hardware. I have run larger models.
- LZ_Khan4 hours ago
  If you care about improvement of models, you would support the US labs here.
  It costs hundreds of millions of dollars to train a frontier model. It's not just "scraping the web."
  Distillation allows labs to replicate these results at 1/100th of the cost. This creates a prisoner's dilenmma which incentivizes labs to withhold their models from the public.
  - ElevenLathe4 hours ago
    How much did it cost to produce all the data on the internet and every book ever published? Surely even the most conservative calculations put it at multiple years of planetary GDP. The same argument can be made to say that letting the big labs get away with pirating it will disincentivize people to publish anything.
    ceroxylon3 hours ago
    I personally have stopped publishing publicly, since my research is still on the fuzzy boundary of AI's current knowledge, my website gets scraped daily, and I don't want to contribute to paid models for zero acknowledgement or compensation.
    Imustaskforhelp3 hours ago
    > I personally have stopped publishing publicly, since my research is still on the fuzzy boundary of AI's current knowledge, my website gets scraped daily, and I don't want to contribute to paid models for zero acknowledgement or compensation.
    I don't know about your works so pardon me but thinking about it, would a better solution be for gated communities at the very least, say matrix or xmpp or irc be better?
    I suppose that scraping bots of matrix would be quite hard for AI companies to setup? but anyone interested in reading your contents can still find the data if they are interested plus you get the additional benefit of a community/like-minded people.
    piva003 hours ago
    Not only publishing, it has already disincentivised a huge part of what made Web 2.0: public APIs for data access to platforms.
    It was amazing to be able to create some toy projects using data from big platforms, now they're all afraid LLM trainers will scrape their contents and create a competitor to their moat, the data.
    It just sucks at many different levels.
  - contravariant4 hours ago
    If 'we' really cared about the improvement of models all of them would be public.
    Anything else just proves someone prefers making money to improving the models.
  - falcor844 hours ago
    > incentivizes labs to withhold their models from the public.
    Does it really? How would they get revenue if they withhold their models? And doesn't economics generally say that if it's easier for your competitor to catch up, you have a higher incentive to maintain your lead?
  - wpm4 hours ago
    > If you care about improvement of models, you would support the US labs here.
    I guess I don't care then.
    LZ_Khan4 hours ago
    that's fair!
  - falcor844 hours ago
    I think that the bigger conversation to be had here is about the environmental damage - if by using distillation we can really train new models at 1% of the cost in energy, it is ethically imperative that we do this.
  - bigyabai4 hours ago
    This reads a bit like over-moralizing to me. US labs will continue improving their models because they have to make money in a competitive market. Chinese distillations have arguably improved the status-quo, with Qwen and R1 forcing GPT-OSS to be released to the public. American businesses are competing, and American customers are getting better products because of the competitive pressure on them.
    Your purported "prisoner's dilemma" hasn't happened yet to my knowledge, instead we seem to see the opposite. The high-speed development velocity has forced US labs to release more often with less nebulous results. Supporting either side will contribute to healthier competition in the long run.
  - YetAnotherNick4 hours ago
    > incentivizes labs to withhold their models from the public.
    This is the only way they make money.
paxys4 hours ago
It's crazy for their official account to post this when Anthropic itself is fighting multiple high-profile lawsuits over its unauthorized use of proprietary content to train its models. Did no one run this by legal?
- notatoad3 hours ago
  i don't see them making any claim that unauthorized use of their proprietary content is illegal.
  i read this more as a claim to protect their brand and valuation. They just want us all to know that we shouldn't be too impressed by deepseek, because deepseek is training off claude.
  also, i think this blog post should be read in the context of anthropic execs meeting with Pete Hegseth today - this isn't legal, it's political, they're playing up the national security aspects here for some political benefit.
  - ndiddy44 minutes ago
    Yeah the blog post isn't saying distillation is illegal, it's saying that it should be illegal:
    > These [distillation] campaigns are growing in intensity and sophistication. The window to act is narrow, and the threat extends beyond any single company or region. Addressing it will require rapid, coordinated action among industry players, policymakers, and the global AI community.
    > Illicitly distilled models lack necessary safeguards, creating significant national security risks. Anthropic and other US companies build systems that prevent state and non-state actors from using AI to, for example, develop bioweapons or carry out malicious cyber activities. Models built through illicit distillation are unlikely to retain those safeguards, meaning that dangerous capabilities can proliferate with many protections stripped out entirely.
    > Anthropic has consistently supported export controls to help maintain America’s lead in AI. Distillation attacks undermine those controls by allowing foreign labs, including those subject to the control of the Chinese Communist Party, to close the competitive advantage that export controls are designed to preserve through other means.
  - paxys3 hours ago
    > industrial-scale distillation attacks
    > fraudulent accounts
    These are not terms used to describe regular users of your software.
    This tweet is 100% going to show up in court (whether in the current crop of cases or future ones) as an example of Anthropic accepting that copyright infringement and unauthorized use hurts their business as an IP holder.
- B1FF_PSUVM3 hours ago
  I'm curious about the "created over 24,000 fraudulent accounts". They didn't pay?
  - lostmsu3 hours ago
    They violated TOS
    devnonymous3 hours ago
    Are you sure about that ? Because the very next tweet says:
    > Distillation can be legitimate: AI labs use it to create smaller, cheaper models for their customers.
    davidgomes3 hours ago
    That's just in reference to the technique itself. They're basically saying it's okay for Google to use distillation to train Gemini N Flash using Gemini N-1 Pro (which they do).
cs7024 hours ago
It's been known for a long while that model outputs = data for training another model to copy the original model's behavior, also known as distillation.
What I didn't know is that the three groups mentioned "created over 24,000 fraudulent accounts and generated over 16 million exchanges with Claude, extracting its capabilities to train and improve their own models." There's some irony in that, given that Anthropic and all other established AI shops have been criticized for using copyrighted materials without permission to train their own models. I wouldn't be shocked if we subsequently find out tat every major AI shop has secretly engaged in distillation at some point in the past.
Still, wow, 24,000 accounts. I can't help but wonder, how many other AI shops have surreptitious accounts with other AI shops right now?
- lejalv4 hours ago
  So they did pay to distill a piratic model.
  More than can be said from Anthropic et al.’s leeching of a substantial proportion of human culture
  - joquarky3 hours ago
    The real cultural leeches are the corporations that kept extending copyright terms to the point that a kid can never create derivative works of their favorite show.
- lumost4 hours ago
  Also makes you wonder how much of the user growth could just be distillation attempts from one model vendor to another.
  - cs7024 hours ago
    Yeah, those 24K accounts are likely high-volume.
    16M "exchanges" / 24K accounts = 667 "exchanges" per account.
    How many tokens per "exchange"? I imagine a lot, because these accounts were likely maxing out on context.
- taytus4 hours ago
  This reads like AI slop.
falcor844 hours ago
Interesting, and my main take away is that ~16 million sessions is enough to distill Claude. That's extremely doable - obviously, as it's been done repeatedly - but it just looks very feasible in general.
If I think of the number of lessons and educational conversations that a human would have to acquire their lifetime knowledge, I would hazard to say that AI-to-AI learning no longer requires many orders of magnitude beyond that.
- Imustaskforhelp2 hours ago
  I wonder if more companies from different countries would get interested into Distillation efforts.
  Because a huge downside of Chinese-models is that these are chinese models with tianmen square and tibet and other issues.
  Yet everyone uses them because they thought that it was insanely hard to build and obviously I am not trying to downplay that even now its an incredible accomplishment that they achieve by created such good open source models and providing them at competitive rates.
  Now that we know it might be (more?) easier than previously thought. Would more countries, say South Korea, Japan or India want to enter the market as well without much bias on certain topics which are raised about Chinese censorship everytime a new model is discussed at times.
  It's a huge risk/rewards ratio thing. From what I can tell, inference is extremely profitable (Deepseek was profitable at inference fwiw) so perhaps, more countries could try to create their own "Deepseek" where they would focus on having a brand value + open-source/selling for entreprise.
  Mistral is a good example of that especially with their entreprise related contracts. Speaking of mistral, are they doing distillation too or not
MiSeRyDeee3 hours ago
Kudos to them then, for doing such a good job at distillation. Only 16 million chats(shared by multiple labs/models) needed for distillation for getting mostly on par performance at 1/10th - 1/50th cost, keep up keeping up!
throwfaraway44 hours ago
Company that rips-off creators to build their product complains other companies are doing the same to them.
xanthor4 hours ago
Ironic phrasing used here. China is the only country that actually has the capacity to deeply integrate AI into industrial manufacturing in a way that will reduce costs of goods. They already have lights-off autonomous factories without AI.
aquir4 hours ago
Not nice but the frontier labs "distilled the whole internet" using the common crawl.
- Sol-4 hours ago
  Which is a more transformative and creative act than just copying their outputs.
iagorodriguez4 hours ago
I was not emotionally prepared for this level of humor today, its Monday, please!
Zufriedenheit3 hours ago
The companies scraping the whole internet without caring for anyone’s tos and illegally torrenting every single book ever written are now complaining about the output of their models being used for training. That is very ironic.
armcat3 hours ago
This is such an insane rabbit hole. AI labs distill weights from the entirety of the internet knowledge, (mostly) without anyone's consent, which (technically) amounts to theft. However the chinchilla law dictates you need to expend X amount of energy to make this knowledge useful. Then the data law dictates that you need to shift the weights to a more useful latent space by paying maths, coding and domain experts lots of money. So you have "stolen" the data, but then paid billions to make it useful. And useful it is!
Then another lab comes, and "steals" from you - that beautiful, refined dataoil - by distilling your weights using inferior equipment but with a toolbox of ingenuity and low-level hacking tricks. They reach 90% of your performance at 20x cost reduction.
What happens when another lab distills from the distilled lab?
Who is the thief? How far will the Alice go?
- roborovskis3 hours ago
  What would you define as 'distillation' versus 'learning'? How do you know that what a LLM is doing is 'distillation' vs a process closer to a human reading a book?
  From my perspective, pretraining is pretty clearly not 'distilling', as the goal is not to replicate the pretraining data but to generalize. But what these companies are doing is clearly 'distilling' in that they want their models to exactly emulate Claude's behavior.
  - armcat3 hours ago
    That's a soft distinction (distilling vs learning). If I read a chapter in a text book I am distilling the knowledge from that chapter into my own latent space - one would hope I learn something. Flipping it the other way, you could say that model from Lab Y is ALSO learning the model from Lab X. Not just "distilling". Hence my original comment - how deep does this go?
    EnPissant3 hours ago
    And yet nearly every machine learning engineer would disagree with you, which is a given away that your argument is rooted in ideology.
    armcat3 hours ago
    > And yet nearly every machine learning engineer would disagree with you, which is a given away that your argument is rooted in ideology.
    That's a bold statement! Of course I know the difference, in one case you are learning from correct/wrong answers, and in the other from a probability distribution. But in both cases you are using some X to move the weights. We can get down and gritty on KL divergence vs cross-entropy, but the whole topic is about "theft", which is perhaps in the eye of the beholder.
- joquarky3 hours ago
  > which (technically) amounts to theft
  Why bother writing so many words when you lack the discipline to choose the words with correct semantics?
oncallthrow4 hours ago
Live by the sword, die by the sword
- esafak4 hours ago
  What's good for the goose is good for the gander.
- breakall3 hours ago
  Reap what you sow
m_ke4 hours ago
we should probe anthropic for what accounts they made to access third party data, or which proxies they use to circumvent scraping blockers
iamsaitam3 hours ago
At least they paid you for it.. unlike you
mudkipdev3 hours ago
Don't throw stones from glass houses. Ask Anthropic about the proxies they use for scraping. They're well-versed on the topic
snowhale2 hours ago
the 16M session number is the real data point here. that's not a huge moat by any standard -- it just means detecting distillation is structurally hard, not that it isn't happening. you'd need to either detect statistical similarity in outputs (feasible but expensive) or rely on behavior probes, which get gamed fast. this announcement reads more like a legal paper trail than a technical deterrent.
osiris9704 hours ago
It's not illegal, just agaisnt their TOS. Your job to deal with that anthropic lol
- tedd4u3 hours ago
  How many readers of this site spend some percentage of their time/brainpower/money to combat excessive scraping. Absolutely falls on deaf ears to hear them then complain about scraping. Sorry guys, nobody cares. This is the world you created, deal with it just like everyone else.
kgeist3 hours ago
Were those 16 mln sessions used only for alignment, chat format, reasoning, etc.? Or it's possible to train a base model too? If a single session is at least 32k tokens, then it's already 0.5 trillion tokens to train on, interesting.
veselin3 hours ago
I think they put two things:
* Likely they will seek regulation that would ban some models. Not sure this can work, but they will certainly try.
* Likely they will not release some of their next models in the API.
- 9999000009992 hours ago
  I called this out a few days ago.
  They'll come up with some excuses to get the Chinese models banned.
  Of course this will only work in the US. Every American tech company will have to pay 10 to 20x for tokens.
  - Imustaskforhelp42 minutes ago
    How can Chinese models get banned if American providers can still host it?
    Is there even any law which can tell people about it?
    99990000099930 minutes ago
    https://www.reuters.com/world/china/us-lawmakers-introduce-b...
    https://cyberscoop.com/deepseek-ban-congress-cassidy-rosen-c...
    If your company has any government contracts you might not be able to use Chinese models under these bills.
lousken3 hours ago
Good, if you don't release open weights, someone else does.
3 hours ago
undefined
karmasimida3 hours ago
Unless they stop selling APIs to the public this can’t be stopped.
Mind you that nuclear weapons are able to be regulated not because the tech itself is secret, it is because the refining is nation state effort, that is impossible to go unnoticed.
Realistically, the more tokens they are selling, the harder they can control it
UlisesAC43 hours ago
Antropic has too much to explain if their advantage can be closed with just black box distillation. And if it is being white version they have way worse things to take care.
maxglute3 hours ago
~650 messages per account? that seems either very little or too much. Surprised there isn't a coordinated distillation service with 5x accounts to spread the load.
- devnonymous3 hours ago
  24K is presumably the number that got caught. It isn't out of the realm of possibility that there indeed are (/ were over time), 5x accounts that managed to bypass the checks anthropic had.
int32_644 hours ago
The company that claims all knowledge workers are going to be wiped out by their technology is asking these future disenfranchised workers to care about the Chinese ripping off their tech. That seems like a hard no.
gregman14 hours ago
Do we need to re-announce proof of dirty practices by Anthropic?
StarterPro3 hours ago
> Distillation is a widely used and legitimate training method.
Oh ok, so you can steal from everyone, but when they do it to you, its bad.
zb34 hours ago
> But foreign labs that illicitly distill American models can remove safeguards
I hope so, I don't need their "safeguards".
4 hours ago
undefined
ralph843 hours ago
Human knowledge belongs to humanity. Of course the people who want to paywall it and extract rent will try to concoct some ethical basis for their rent seeking. Anthropic appears to be choosing the xenophobic route.
- EnPissant3 hours ago
  I don’t think China will see it that way if they take the lead. So, why help them?
ks20483 hours ago
I wonder how much American labs do the same.
sidgarimella3 hours ago
would sure be nice if the effort spent fighting their karma was pointed at a better frontier model
catsquirrel282 hours ago
My guess is they're setting up a narrative to claim this whole AI bubble wasn't a giant grift and they would've been profitable if it weren't for those dang Chinese people distilling their models and giving them out for free.
ChrisArchitect4 hours ago
Some more discussion on source: https://www.anthropic.com/news/detecting-and-preventing-dist... (https://news.ycombinator.com/item?id=47126177)
Imustaskforhelp3 hours ago
Edit: For what its worth, I don't really care about anthropic being scraped because they scrape and torrent illegally every book without any compensation and so many other things that they can't really use this moral card or any moral response at this point for the most part. Their scraping that directly leads to effectively some servers effectively getting Ddosed and so many other things.
Also actually, we all sort of knew this but its interesting to see Anthropic call out such companies in public.
I think that for providing models at 1/20th the cost and open sourcing it while sometimes being much more leaner is an overall win for most part for the general public whose data was questionably stolen by Anthropic and it seems that some court cases about these are still happening.
One of the more curious things I want to say is that Qwen and GLM 5 (Z.ai) are not in this.
Personally I love Kimi the most and maybe we will see in the future from more AI tech companies like chatgpt/google too if they have any proof of distillations as well.
But the fact that Z.ai isn't distilling makes me wonder what and how they are doing it. Qwen models although nice are not the best at the moment so I especially wonder what Z.ai model training does and where they get their training data.
I still love Kimi and I would probably use Kimi but I am interested to know more about the training sources of Z.ai
Also another point but given that Kimi and Qwen are quite tightly linked (Kimi aka moonshotAI is backed by Alibaba aka Qwen) [https://www.cnbc.com/2026/01/19/alibaba-backed-startup-moons...]
And qwen not being in here. Why didn't Qwen also share the data. Or could there be a fact where Kimi/moonshot trained on anthropic and also shared the data with Qwen/Alibaba too but the name of Qwen wasn't available in public ofc?
I can definitely see that being a possibility given that Kimi/Moonshot uses servers hosted on alibaba.
Interestingly for Z.ai I found a quick fact about them from Wikipedia:
In May 2024, the Saudi Arabian finance firm Prosperity7 Ventures, LLC participated in a USD $400 million financing round for Zhipu AI with a valuation of approximately 3 billion USD.
I want to know if z.ai does any large scale web scraping? Where does z.ai get from what I see 15T–28.5T tokens.
I saw this comment from an article:
Pre-training: On a 23T token dataset curated from diverse sources, with emphasis on high-quality data through techniques like SemDeDup and quality-tiered up-sampling.
I think I am interested in this rabbit-hole because if Anthropic has caught them. This will definitely impact the companies in future if Anthropic models get better and they might have to figure out the training data issue which Z.ai might've solved?
I am still extremely suspicious of Z.ai but perhaps someone who has the tech reach on twitter or any other platform (maybe simonw?) could ask them.
I think Z.ai guys are really open people especially within the research community yet I don't think I remember hearing about them intensively scraping as well while we consistently see posts about how American or even Chinese (Baidu most notoriously iirc) who basically DDOS a server/git-server etc.
What are the Z.ai team doing that they don't distill Anthropic, they don't create intensive scraping problems at the same time while still getting good quality data? Does seem to be too good to be true unless I am missing something which I think might be. So if anyone has the expertise, I would love to know more.
devnonymous3 hours ago
> These labs created over 24,000 fraudulent accounts and generated over 16 million exchanges with Claude
What exactly makes these accounts ^fraudulent^ ...did they not pay Anthropic for the service ?
akmarinov3 hours ago
So what? Should I announce that Anthropic has been trained on copyrighted material they stole?
rsynnott4 hours ago
Oh, now we care about IP, do we?
gostsamo4 hours ago
"You are trying to kidnap what I have rightfully stolen, and I think it quite ungentlemanly."
3 hours ago
undefined
bakugo3 hours ago
Anthropic leadership once again showing off a remarkable level of immaturity.
Of course they don't want anyone else to use the precious outputs from the model they created by scraping data from the millions of fleshbag programmers they're now trying to put out of a job. They're just another corporation with the standard goal of making as much money as possible with little regard for anything else, so that much is expected.
But to actually write up a public announcement like this, loudly and proudly announcing to the world that they're crying at the daycare because their precious toy has been stolen by some kid, even though everyone around them knows they themselves originally stole that toy from another kid too, takes a special kind of corporate shamelessness that seems to be becoming more prevalent by the day.
eagleinparadise3 hours ago
world's smallest violin meme
4 hours ago
undefined
stefan_4 hours ago
Anthropic, of course, ran an industrial-scale distillation attack on the combined works of human mankind. So, uh.. kindly go fuck yourself? Who asked?
grezql3 hours ago
[dead]