Shots fired?
It would be interesting to see how far "clean data" can go on the scaling laws.
P.S. A fairly basic website otherwise, but it unfortunately seems to be hacking scroll for no good reason.
For example, the Apache 2.0 license requires in just 4.c:
You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works;
Just because they're tokenized and transformed into a probabilistic mapping, doesn't suddenly mean that they weren't copied.I find it morally unethical that they (likely) just ingest IP of all open source repo's without asking, but also importantly without any attribution.
Let me also note that I'm not against LLM's in general. But I do think training on open source must be opt-in, and I look forward to a world with actually ethical, and traceable (i.e. on what they were trained on, like a bill of materials (BOM)), models.
> without distillation from third-party models
sounds like zero unless they are lying.
Though this is largely impossible these days, unless they pre-trained on pre-AI era data.
Adding "unless" to a statement makes it vacuous if the latter clause is weaker than the first clause. I find it hard to believe that a company willing to violate licenses would have scruples about lying about it.
Also, “Microsoft is lying” is not a logically stronger statement, because they might be lying about something other than whether they distilled or trained on AI output.
I think that's the point. "How do I say they're lying without outright saying they're lying?"
It's a common rhetorical trick.
From a strategic PoV for MS, all the models you cited are distilling GPT/Claude/Gemini and wouldn't be anywhere as good as they are without this distillation, which in turn means you are dependent on OAI/Anthropic/G first shipping a good model to generate data for your training. This MAI model is trained from scratch with no synthetic data or distillation. So in term of benchmark its obviously much harder to get strong score and thus not a disaster if they can keep on improving.
At least when you define benchmaxxed as "good in benchmarks but not human preference".
Isn’t 1M becoming the norm?
Claude code will suggest you to start a new session or compact if you go above 100k.
It’s almost always better to keep your context windows small.
This seemingly nonsensical sentence (of course this will have a smaller inference footprint than larger models) suggests this model's competitors have larger inference footprints and total parameter sizes.
For personal stuff this release is not noteworthy.
MAI-Code-1-Flash - https://news.ycombinator.com/item?id=48374466 - June 2026 (131 comments)
I was most excited about the "frontier tuning." Like, it will actually watch you do stuff and learn to do it for you? That would be actually interesting.
But no, it's just a data labelling interface: https://learn.microsoft.com/en-us/microsoft-365/copilot/copi.... You have to provide the instruction and give feedback and there is a whole UI with hour-lonf wait between steps. So basically they want you to do the labelling to train a model, or at least that's how it looks from the outside
Also the mission statement of Humanist AI is the most boring, but tries to sound way too grand. Like "all the cool labs have a mission statement, so we should also have one" vibes
"Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting."
About time Microsoft joined the fray. After the OpenAI divorce, it really looked like Microsoft was going to become another Uber.