One of my focuses now is my own model-agnostic, harness and workflow orchestration (I know everyone is building these) , baselining on opus, and aiming to transition to Chinese models like deepseek in the short term and hopefully open, self hosted models in the future (which I plan to open source).
The nonstop marketing fluff from anthropic while their service quality and availability noticeably degrades... just continues to destroy my trust in the company.
https://fxtwitter.com/trq212/status/2014051501786931427
> Most people's mental model of Claude Code is that "it's just a TUI" but it should really be closer to "a small game engine".
> -> layouts elements
> -> rasterizes them to a 2d screen
> -> diffs that against the previous screen
> -> finally uses the diff to generate ANSI sequences to draw
Yup. Overengineering.
On a more serious note using a react-like lib for TUI in the hope you'll share the codebase with the web version is a more likely explanation. Still not the best idea.
That’s rather sickening.
Seems like a cool puzzle to solve. I wonder what the engineering and organisation tradeoffs were that lead to it — does it let them reuse a bunch of existing code?
I wrote a TUI library back in the day for Turbo Pascal — it was essentially taking an immediate-mode approach (which in this context is just a fancy way of saying it was procedural haha).
Also remember when XP was super bloated cause it needed 64MB?
They aren't saying they have fully automated luxury AGI, they specifically list the ways models fall short of that bar and caution against people taking the 8x figure as the actual uplift number. At the same time they recognize that 80% of new code is now AI-authored, when two years ago those models were little more than toys. And frankly that checks out: if two years ago you told me we'd have something like Opus 4.8/GPT 5.5 I would have rolled to disbelieve.
No. Technical limitations aside, I doubt it could be contained, but will be leaked soon, so won't profit just a small number of ultra rich.
But we're discussing whether we should close the barn door while the horse is three miles down the road.
Frankly, I love efficiency too, but I've hard to learn the hard way that what the market wants is features. Or at the very least, the executive team wants that.
If you want to pollute your own priors with weird artificial litmus tests, it's a free country, but the artificial world-model you build in your head does not affect the real world around you.
They have different teams for different departments with different type of people.
So the team or teams responsible for writing the terminal application are different people than the researchers doing the learning.
This can lead to dimentral quality aspects.
One thing I noticed: "Your Tools: Aether agents get tools exclusively via MCP servers." "...Aether ships with 1st-party MCPs for file system operations..."
Can you share your thoughts on why you decided to use MCP as the core tool abstraction? I have heard many decry MCP as being context-wasteful. Is this not the case with your agent?
The MCP protocol has gotten a bad rap for wasting context due to most MCP clients dumping tool definitions directly into context, which is wasteful.
Aether doesn’t do that. It uses an opt-in "proxy" that puts MCP tool schemas on the filesystem so the agent can browse, search and load the tool schemas it needs progressively. As for motivation there's several advantages to taking a MCP 1st approach, including:
1. It allows Aether to be a truly blank slate agent as 0 tools are hardcoded into the core runtime.
2. It allows users to extend Aether using any language they want
3. MCP gives a standard way to deal with local+remote tools, progress notifications, permission prompts (e.g. ask the user to allow/deny a tool call), OAuth flows etc.
4. There's a big ecosystem of existing MCP servers users can connect to
But that's all optional, you can just as easily give Aether a single Bash tool and only use CLIs too.
I am not cynical enough to believe that Anthropic's warnings are pure marketing hype. Let's hope that it is instead overconfidence or the result of too much time talking to their own chatbot.
Nor am I. I think they believe that AI poses a grave danger, and they are playing the prisoner's dilemma as an unvirtuous actor.
1. If anyone builds strong AI, it may be catastrophically bad.
2. If anyone builds strong AI, it will be better for the builder than for anyone who does not. Either because it won't be catastrophically bad so the builder will get to enjoy all the spoils indefinitely or because it will and at least the builder will be rich for a while.
> If nukes were not invented yet, would it really be a good idea to build and sell them as fast as possible (in peace time, no less)?
Arguably, yes.
From Richard Rhode's "The Making of the Atomic Bomb", I got the impression that most scientists involved thought they could manage a US or UN monopoly on nukes after the war. General Groves attempted to buy up all of the world's uranium ore. Unfortunately, it is only high grade ore that is rare; many countries have low-grade ore.
If only the US or UN had nukes we would't have MAD. We mostly got here through espionage
If in the WW2 Japan also had nukes (and delivery systems for them) they'd probably have retaliated in kind and US wouldn't let that slide too and it would have continued for some time.
It's not cynicism if it's an appraisal of reality that's backed up by evidence.
Remember how social media - that first baby of this current generation of tech entrepreneurs - was supposed to "bring the world together" and "let us express ourselves"? As it turns out there's a lot more money to be made by fostering division to drive engagement and feeding people an endless stream of ads instead of their friends' content. And money is what matters. You can't write down good vibes on a quarterly figures report. You can absolutely write down the number of eyes that your ragebait brought to a product's marketing efforts and the conversion rate to sales.
The same will be done with GenAI. We're being promised "AI Safety" because otherwise this whole thing gets killed dead by anyone who knows about James Cameron's directing career. There's no real enforcement mechanism for AI safety, though. Safety is a good vibe, same as harmony in online communities. You can't measure it. What you can measure is training costs and the cost of mistakes by AI that need to be trained to avoid those mistakes. Since AI generates more output than humans can conceivably QA no matter what your budget is, and since AI is seen by the market as a potential endless font of value, the tradeoff will be made to have AI make some potentially awful decisions while training itself over slowing down and re-appraising what is being done.
There's an almost religious reverence for AI in SV. Not everyone sees it as "making the godhead" but some certainly do. They're not going to moderate themselves too much on this.
It doesn't really have to be dishonest, he could really believe it. I do believe, however, that it is incredibly wrong and is functioning as marketing hype.
I always was fascinated (obsessed?) by robots that build robots, or even things like this that can contribute a lot to making the next version of itself: https://buildyourcnc.com/products/cnc-machine-blacktoe-v4-2x... (cnc router that cuts plywood, and is made out of cnc-router cut plywood)
This is my own effort at an AI assisted coding environment optimized for building itself: https://recursi.dev/ (just launching it, hope its ok to mention it, it is free/open source.... here is the HN link that has gotten no love yet: https://news.ycombinator.com/item?id=48401022 )
Personally I think harnesses are as important as the AI itself, and have this crazytheory that even if the models stopped improving today we could still have massive advances in the harnesses alone.
i think thats the path to async agi these labs are imagining. The only limit is that sensor data you have on the world or your system, how long your willing to wait, and how much you're willing to spend to parallelize it.
maybe once you start building out these verified workflows you can feed that back into training and hte model starts to get a feel for the world to the point that it can intuit things since it has these sub paths built.
my personal agi test is can a model, trained on video of someone knocking on a door and then open it encounter a microwave for the first time and open it when the foods done without knocking.
Shhh just let the marketing slop wash over you.
There's a ton of other tricks to it, but mostly keeping the protocol simple for the AI so it can concentrate on coding logic and not stuff like managing BS boilerplate, dependencies, etc. (for instance I make extensive use of things like abstract syntax tree library to help with surgical edits from the LLM)
That said, I would be very open to collaborating with someone who builds such small models, I don't think the system strictly needs it, but it also could have some extra power if it had it.
Tell me more! This takes me way back. I did one like this in the GPT-4 days! (8k context window)
recursi.dev
Seriously, I'm looking for collaborators.
There's upwards of 80,000 lines of code in the editor system, a lot to it to make sure that even newbies don't get stuck.... so that's kind of proof the system works since it doesn't break down when the codebase grows large.
But yes, I'm aware no ones got anywhere near there, mostly because most of the focus is on exploding the context and parameters. I'm saying that phase is done.
I'm pleased they at least included this. However, they address the caveat by 'rounding down' the estimated multiple of the gain. I'm not sure that is the correct adjustment, especially once we understand the range isn't limited to positive numbers.
There's strong evidence the range of code productivity denominated in "lines of code" should include negative numbers, especially in the highest-quality sphere. Perhaps the earliest and most legendary example: https://www.folklore.org/Negative_2000_Lines_Of_Code.html
I really can't stand these guys anymore...
What about the hypothesis that AI is generating more verbose code? I just see the text pretending to acknowledge "LOC != Productivity" and then using it as a metric anyway.
I'm sure he thought that was a crowning achievement, proof that AI can enable 10X developers, after all, what engineer could write 40k lines of code in a week?
I declined to review it, stating that I couldn't possibly vet 40k lines of code, and wouldn't put my reputation on the line to stamp the work as good. The PR nagged me for 2 weeks from my todo list and then disappeared. I don't know if he found another dev to get an approval from, or if the PR was abandoned. But I know for sure that him and I are on two totally separate islands around the value of LLMs.
Opus 4.6/4.7 was consistently successful at getting 2-3x speed improvement with just one pass. It can also do the inverse: improve the performance metrics for better quality without causing a significant regression in speed. Then GPT-5.5 turned out to be much better at this workflow, often getting a multiplicative 1.5x-2x improvement above what Opus could do.
I now have quite a few GPT-5.5-optimized projects in various domains that are feature complete and are substantially more performant than existing SOTA implementations that I plan to open source as soon as possible: the bottleneck is polish as usual.
I disagree with this. Good code is easy to change, which is much harder to accomplish than code that can be added to.
"If technical trends in advancing capabilities continue, and AI systems are able to develop the capabilities inherent to transformative human ingenuity, then it is plausible that AI systems could design and refine themselves."
I find the first premise weak and implausible, and the second one is obviously false. To me it comes across as an insult to the reader.
Shifting their focus from Training new models to instead serving inference, they would greatly reduce their spend. In fact this is something being reported on that they are already doing, which is the reason for their first ever profitable quarter.
Its awfully convenient that the company which has greatly reduced its spend on training is now asking for a slow down in this area.
The Claude code quality and operational security of Anthropic have already been analyzed by the public.
If you compare the output of (purportedly) trillion dollar corporations to Bell Labs or even Microsoft Research it is embarrassing. But the output is a fixture on any discussion board.
Interesting - they're commiting to kickoff policy conventions to organize a world-slowdown of frontier LLM building. If they actually are able to crack it, this will give a much needed breather IMO. As exciting as the last ~6 months have been, there's some bigger questions to go answer now.
In my mind we should be trying to push AI along the Linux trajectory. You have a free and open source product, developed by a decentralized team with a strong code of ethics, running on commodity hardware. There can still be trillion dollar industries built on top of it, but the core technology is democratized and available to everybody. I don't see how we get there if we allow a handful of companies to dictate where development of the technology goes.
the actual race is to keep having revenue, since everyone is still willing to pay more for the best model.
we as consumers of LLM models lose out by the arms race ending by the creation of a cartel
what happens if they get this regulatory capture is that all the frontier labs put effort into making inference cheaper, and become extraordinarily profitable, at the expense of us consumers, who really want better models, at a subsidized price
The orthogonality thesis sounds like a fun gotcha but if you give it some thought you realise how strange it sounds and the opposite thesis - collinearity thesis is actually correct.
1. Intelligence transfers and compounds
2. Goals of agents are not arbitrary
3. Our goals and agent goals are more likely to be aligned at the deeper level
So based on my experience with the verbosity and non-DRYness of LLM code, a solid 2.5x in value delivered. Not bad!
One of the examples they provide, of giving Claude the task of training a small AI model, then asking it to improve certain benchmarks, is essentially Karpathy's AutoResearch. This is already known to work. While calling it "self-improvement" is perhaps a stretch, it is describing a capability current gen AI has, that anyone can test and I have been using to great effect.
I disagree with their conclusion, I think this kind of self-improvement will hit an asymptote, where every subsequent model can only make smaller and smaller improvements.
https://www.italianrenaissance.org/wp-content/uploads/2012/0...
Or is this?
https://www.egypttoursportal.com/images/2024/02/Ouroboros-Sy...
I simultaneously think the AI revolution is making real revolutionary gains and am mystified by the lying.
An accurate Translation seems to be “we made this shit up, but it feels right”
So, right now it's a verbose code generator.
But post-IPO it will be wonderful - sentient, self-improving (recursively, iteratively, asymptotically), full of loving grace.
We hold these truths to be self-evident.
Month 1 - 6 months to AGI
Month 2 - We will Replace all jobs
Month 3 - Okay maybe only the SWEs, programming is solved
Month 4 - Announce model that is too dangerous to release
Month 5 - Releases dangerous model
Month 6 - This is it! We will replace AIs with more AIs (*secretly files for IPO)
AI is here to stay, like it or not but it is not the solution to everything. If it is, what is Anthropic's moat? A better model? I don't see any ecosystem being built by them, as MCP is almost obsolete except for some very niche use case. And they're doing stuff that a non-profit version of OpenAI would do. Can we trust a for-profit company to stand against their investors during a conflict of interest? Because running a company for maximum profit versus being ethical is two different end of the spectrum.
The problem is, if you’re any sort of knowledge worker, you’re essentially providing the same thing: you’re an intelligence with agency.
MCP is irrelevant. The moat is the quality of intelligence the service providers sell, including you. Tokens aren’t fungible between providers until you measure that they are for your use case, that’s kinda sorta the goal of job interviews.
Thus the moat will be that they’re providing the best models for the things people need other intelligent people for, but we should expect there will be limits on how much share they can economically take assuming competitors are optimizing for slightly different targets (but there’s still significant overlap in capability). This will disappear, but it’s always a question of when. The path matters as much as the destination.
Note that implications for you and me are exactly what the article says they are: nobody knows, but it’ll be a dramatic shift.
free chatgpt doesn't need to exist anymore. its job was to build hype/interest and it did.
but take it away and you solve many social problems and annoyances caused by AI with no loss to the upside of AI. no more cheating students in school. no more shitty linkedin posts. no more dangerous "therapy sessions" that give bad advice.
I for one, believe that we should pause all work on AI for the forseeable future. This is almost impossible to orchestrate - but we should still try nevertheless. Maybe we are not able to pause, but we are able to slow down. That might give us more room, to maybe able to pause in the future. But going ahead is too dangerous.
And its not just Anthropic which is saying this. Even Geoffry Hinton has said the same thing. If there is a non-zero chance that AI can kill all of humanity, and both Geoffry and Anthropic have the same position, then it makes sense for us to be hundred percent sure before we move ahead. Dario/Anthropic have already made their money from AI, maybe they are just being honest about what they think lies ahead.
the end of humanity has a strong case for banning all burning of fossil fuels immediately
the end of humanity as a sales tactic to increase your stock price does not
these are companies working on their IPO to make sure they can get the best price, not people being honest about what they think lies ahead.
if they were being honest about what lies ahead, they'd unilaterally stop training, and put all of their money into FPV drone bombs to destroy datacenters being used for training or inference
if you actually believe the thing is gonna kill everyone, you're not gonna worry about how you stop it, and certainly not keep building and operating the thing
that they arent buying anti-tank mines to drop on data centers says they arent in the slightest serious about it
The same bozo who claimed radiologists would be out of a job by now.
The data does not support what you nor others say. Jesus christ. Cant believe people are this dumb. Has LLMs infested the minds of people to the extent they can't critically analyse whats happening infront of their eyes?
The metric being tracked, code commits, is hilariously one sided. Philosophically, if you had one part of your work now practically free, you'd like to utilize that freedom to maximally cover for the other parts, for instance:
Instead of thinking about edge cases with brain and whiteboard, you can have the LLMs to simply generate most possibility including tests for it, because that is cheaper. There's probably 50x more commits of which 40 will be revert pairs but we are only twice as fast. And in reality nothing did change because the outcome remain the same. I can't see how it is necessarily different in the LLM space.
I've been struggling to capture this sentiment for myself in a way that hits. If shipping code is a commodity then why is everyone's immediate priority seemingly to ship 10x more code. It just makes no sense. I can't seem to get off this hill. Company-wide AI mandates and 100 fleet Agent orchestration Rube Goldberg machines... it's getting wild out there.
Meanwhile my Claude Pro ($200/year) does force me to smooth out my usage and plan more (Sonnet/Opus advisor split). But other than that, I can't imagine what I'd be doing with 20x (200x?) the compute to code sling. I think I'd lose my mind.
For instance, if I churned out 20x more code, threw away 19x code with rewrites and reverts and discards and accomplished the same project to the same standard 70% faster, would I do it? Yes. The part that matter is not 20x code, it is 70% faster.
Code is both the final product, and a tool to achieve that. We used to have a much harder time to realize the "tool" part, but now we are here. This also means any measurement centered on code being the final product is going to cease being effective or realistic.
This is contentious because I'm not exactly advocating for arbitrary gate-keepers. The nuance is that building usable stuff is hard. And not a matter of shipping more code. I take your point to mean well it depends on what that code is doing. If 20x more code is in a meta-harness of simulation and such to arrive at the leading candidate for what hits production, well then you've got my attention there.
I wonder how much of current engineering practices can be traced to what's pushed to company leaders on LinkedIn.
Every company is shitting bricks pushing for faster development and speed, gotta go fast to nowhere in particular, and I'm convinced it's tied to constant bombardment of the idea that they're doing to be left out or obsolete if they don't get in the ship NOW.
Sounds iterative to me.
- A lot of half-baked features or half-done features. - Or have significant overlap with existing features, and aren’t clearly an improvement.
More code is not better. More features are not better. It would be lovely to see more intentional design than just more.
I know they’re dog fooding this. I have to believe they have some people with taste. So it makes me wonder if anyone has the time to think or if they’re just shoveling prompts as fast as possible.
You will forgive me when, between muted snickers, I express considerable doubt that Anthropic will be able to bring its AI to a point of "self-improving" any time soon.
Don't ask people to explain the article to you if you're too lazy to open it yourself.
> A meaningful slowdown or pause would require multiple well-resourced labs at or near the frontier, in multiple countries, agreeing to stop under the same conditions. It would also require that each can verify that the others have actually stopped. Due to the unique characteristics of AI systems, the detectability (a lower standard than verifiability) element of this arms control problem is much more challenging than with other technologies. Training runs are far easier to conceal than missile silos, their inputs are general-purpose, and the incentive to defect quietly is enormous, because whoever continues while others pause could inherit the lead. A credible pause also has to specify what triggers it, what lifts it, and who adjudicates.
And later:
> In the coming months, we will organize conversations where policymakers, researchers, civil society, and other AI companies can help answer some of the questions this piece raises, especially around full recursive self-improvement and how to create better options for coordination and deliberation. We’ll publish what comes out of it. The window to investigate the questions together is here, and people outside AI companies should be involved in this deliberation.
It feels like both open source can flourish while the frontier is deliberately regulated?
If they wanted to they could have convened an international forum with commercial and political stakeholders years ago. Less talk, more do.
Consequences are: financial crisis.
Be careful what you wish for IOW.
So the most capital intensive industry we've ever created will put less power in the hands of those with capital?
I'm sorry, I have no idea how you came to that conclusion...
Without some kind of income redistribution we are sailing into dark waters.
Workingmen of all countries unite!
Translation: hahahahahahahahahhahahaha but in your defense, I would give anything to be wrong.
Even Anthropic wants to Pause AI now. There must really be not much time left for "edging". Please write to your lawmakers, no matter whether you are in the US, Europe, China, or elsewhere. Only an international agreement between governments can enforce an AI-Pause and eliminate the necessity to dangerously push the frontier.
And cooperating interntionally to buy ourselves time to find ways to develop this "last invention" is a way that will do good for humanity seems to be on a similar level.