For context, we are right in the middle of building this thing... multiple rebuilds daily since we are using it to build itself. The value isn't in the code itself, yet, but in the approaches (UNIX philosophy, meta-cognitive recipes, etc.)
We are really excited about how productive these approaches are even in this early stage. We are able to have amplifier go off make significant progress unattended for sometimes hours at a time. This, of course, raises a lot of questions on how software will be built in the near future... questions which we are leaning into.
Most of our team's projects, unless they have some unresolved IP or are using internal-only systems, are built in the open. This is a research project at this stage. We recognize this approach it too expensive and too hacky for most independent developers (we're spending thousands of dollars daily on tokens). But once the patterns are identified, we expect we'll all find ways to make them more accessible.
The whole point of this is to experiment and learn fast.
So if you can get the spec right, and the LLM+agent harness is good enough, you can move much, much faster. It's not always true to the same degree, obviously.
Getting the spec right, and knowing what tasks to use it on -- that's the hard part that people are grappling with, in most contexts.
This is kind of how I feel. Chat as an interaction is mentally taxing for me.
At what cost,. monetary and environmental?
As costs drop exponentially (a reasonable expectation for LLMs, etc.) then increasing agent parallelism becomes more and more economically viable over time.
Not a reasonable expectation anymore. Moore's Law has been dead for more than a decade and we're getting close to physical limits.
But in all seriousness +1 can recommend this method.
But I thought there are lots of agentic systems that loop back and ask for approval every few steps, or after every agent does its piece. Is that not the case?
I’m super not interested in hearing what people have to say from a distance without actually using it.
I tried it with a feature, took about 10 minutes and a lot of iterations, and would easily have used hundreds of thousands of tokens. Doing this 20, 30 times a day would be crazy expensive.
1. It affects the fundamental ego of these engineers that a computer can do what they thought only they could do and what they thought made them better than the rest of the population. They might not realize this of course.
2. AI and all these AI systems are intelligence multipliers, with a zero around IQ 100. Zero multiplied by zero is zero, and negative multiplier just leads to garbage. So the people who say "I used AI and its garbage" should really think hard about what it says about them. I thought I was crazy to think of this hypothesis but someone else also mentioned the exact statement and I didnt think I was just being especially mean anymore.
I am an engineer and my vibe coded prototype is now in production, one of the best applications of its type in the industry, and doing really well. So well, I have a pretty large team working on it now. This project was and still is 95% written by AI. No complaints, never going back. That's my experience.
Clearly the eng community is splitting into two categories, people who think this is all never going to work and people who think otherwise. Time will tell who's right.
To anyone else reading and thinking closer to the second side, we're hiring :)
>"no need to prove yourself to a stranger on the Internet"
In this case, the stranger is you, and you say it as if he is debating with someone else. This proves you lost the debate and are now attacking personally.
People can easily detect these little nuances.
You can say it sucked and still continue to suck, but that LLM/agentic AI is degrading is simply false. Such a statement really makes me question the genuinity of the rest of the comment.
"unlike"? You made a claim and has nothing to show for either. Only difference is that the claim is actually way more ridiculous. Somehow technology advanced backward and you're the only one noticed.
Yeah, or maybe learn how transformer architecture with neural networks actually works and stop comparing apples to oranges.
> that YOUR AI set up degrades
Well, yeah, I did nothing but say all the time that the so-called "AI" has been degrading in my own setup, my entire company to be precise. Who knows, maybe myself and my team are just stupid and we "suck" at writing english-language sentences. Have a stroll through subreddits dedicated to cursor, claude, chatgpt. You could be onto something. So many stupid people vs. a few of those like you, who are apparently very smart. The masters of "Prompt Engineering". Or could it be, that your use-case is so trivial, and you so inexperienced, that whatever the statistical parrot spits out, seems like wonder to you?
The secret weapon to this approach is asking for 2-4 solutions to your prompt running in parallel. This helps avoid the most time consuming aspect of ai-coding: reviewing a large commit, and ultimately finding the approach to the ai took is hopeless or requires major revision.
By generating multiple solutions, you can cutdown investing fully into the first solution and use clever ways to select from all the 2-4 candidate solutions and usually apply a small tweak at the end. Anyone else doing something like this?
I've been doing something similiar: aider+gpt-5, claude-code+sonnet, gemini-cli+2.5-pro. I want to coder-cli next.
A main problem with this approach is summarizing the different approaches before drilling down into reviewing the best approach.
Looking at a `git diff --stat` across all the model outputs can give you a good measure of if there was an existing common pattern for your requested implementation. If only one of the models adds code to a module that the others do not, it's usually a good jumping off point to exploring the differing assumptions each of the agents built towards.
This is gaining stars and forks but I don't know if that's just because it's under the github.com/microsoft, and I don't really know how much that means.
I'd rather have the three word message than detailed but wrong messages.
I think I agree with you anyway on average. Most of the time a claude-authored commit message is better than a garbage message.
But it's still a red flag that the project may be filled with holes and not really ready for other people. It's just so easy to vibe your way to a project that works for you but is buggy and missing tons of features for anyone who strays from your use case.
I'd never encourage anyone to blind commit the messages But if they are correct they seem a lot more useful than 90% of commit messages.
I found the biggest mistakes that I've seen other people do are like - they move a file, and the commit message acts like it's a brand new feature they added because the llm doesn't put it together it's just a moved file
claude Claude
Interesting given Microsoft’s history with OpenAI
https://techcrunch.com/2025/09/09/microsoft-to-lessen-relian...
This stood out to me too, seems like a months-long project with heavy use of Claude
WARNING: Claude Code running in Bypass Permissions mode │ │ │ │ In Bypass Permissions mode, Claude Code will not ask for your approval before running potentially dangerous commands. │ │ This mode should only be used in a sandboxed container/VM that has restricted internet access and can easily be restored if damaged.
Caution
This project is a research demonstrator. It is in early development and may change significantly. Using permissive AI tools in your repository requires careful attention to security considerations and careful human supervision, and even then things can still go wrong. Use it with caution, and at your own risk.
and
requires careful attention to security considerations and careful human supervision
is a bit orthogonal no?
“Using permissive AI tools [that is, ones that do not ask for your approval] in your repository requires careful attention to security considerations and careful human supervision”. Supervision isn’t necessarily approving every action: it might be as simple as inspecting the work after it’s done. And security considerations might mean to perform the work in a sandbox where it can’t impact anything of value.
I have a repo that shows you how to do this stuff the correct way that's very easy to adapt, along with a detailed explanation, just do yourself a favor, skip the amateur hour re-implementations and instrument/silo your agents properly: https://sibylline.dev/articles/2025-10-04-hacking-claude-cod...
This project was in part written by Claude, so for better or worse I think we're at least 3 levels deep here (AI-written code which directs an AI to direct other AIs to write code).
Most models I've benchmarked, even the expensive proprietary models, tend to lose coherence when the context grows beyond a certain size. The thing is, they typically do not need the entire context to perform whatever step of the process is currently going on.
And there appears to be a lot of experimentation going on along the line of having subagents in charge of curating the long term view of the context to feed more focused work items to other subagents, and I find that genuinely intriguing.
My hope is that this approach will eventually become refined enough that we'll get dependable capability out of cheap open weight models. That might come in darn handy, depending on the blast radius of the bubble burst.
You think it is creative because you lack the knowledge of what it has learnt.
If an “objective” test purports to show that AI is more creative than humans then I’m sorry but the test is deeply flawed. I don’t even need to look at the methodology to confidently state that.
https://en.wikipedia.org/wiki/Torrance_Tests_of_Creative_Thi...
If this is restoring the entire context (and looking at the source code, it seems like it is just reloading the entire context) how does this not result in an infinite compaction loop?
Also, it can be useful to compact before it is strictly necessary to compact (before you are at max context length). So there could be a case where you decide you need to "undo" one of these types of early compactions for some reason.
I see a possible paradox here.
For exploration, my goal is _to learn_. Trying out multiple things is not wasting time, it's an intensive learning experience. It's not about finding what works fast, but understanding why the thing that works best works best. I want to go through it. Maybe that's just me though, and most people just want to get it done quickly.
That's cute
Again this "supercharging" nonsense? Maybe in Satiyas confabulated AI-powered universe, but not in the real world I am afraid...
Some people in the organization will experience the limitations and some will learn — although there are bound to be people elsewhere in the organization who have a vested interest in not learning anything and pushing the product regardless.
How is this different than Google's Jules thing? Both sort of experimental exploratory things.
Whether these new helpers that explore ideas on their own are helpful or not, and for which cases, is another discussion.
"Please don't post comments saying that HN is turning into Reddit. It's a semi-noob illusion, as old as the hills."
Talking about downvoting also violates the guidelines (meta-discussions are boring and repetitive), so this comment could arguably succumb, haha. If so, it won't be the first or the last time a comment of mine gets poorly received!
Gatekeepers who claim otherwise have something to sell.
Picking a Linux distribution is a commitment, and if I want to change it out I have a lot of work to do, much of which is unwinding my own work that was distro dependent.
Changing out an agent setup is as easy as installing an IDE, and if I don’t like it I can go back easily. My work is not dependent on the setup - the value I get is transactional, and the quirks of each model or agent approach are not difficult to learn or live without.
All of which suggests that selling ease of use to someone like me will be pointless. I’m sure there are clueless F500 managers out there who might go for it. But a business model based on selling to people who don’t know anything isn’t very durable.
I started using Linux before there were distros (circa 1993) and it was not a pleasant experience compared to when Slackware came out
Define "full potential".
Sounds like you are just making things up to sell your product.
With full potential I refer to getting the best possible results. For example, being able to work on tasks in parallel without Claude instances interfering with each other vs , well, no doing so.
I wonder why do you think we need rover? what is the use case? I got confused
before that we ask ai with a chat feature
the next we need a multiple swarm of ai why though?
Secondarily it makes it easier for everyone to share those best practices and tooling among us, but is less of an issue because we are a small team
The Austrian army already switched to LibreOffice for security reasons, we don't need another spyware and code stealing tool.
There are many many people who want better AI coding tools, myself included. It might or might not fail, but there is a clear and strong opportunity here, that it would be foolish of any large tech company to not pursue.
I would say it’s more the result of anti competitive bundling of cloud things into existing enterprise contracts rather than the wave. Microsoft is far worse than it ever was in the 90s but there’s no semblance of antitrust action in America.