114 pointsby AffableSpatula5 hours ago21 comments
  • joshribakoff3 hours ago
    This is just sub agents, built into Claude. You don’t need 300,000 line tmux abstractions written in go. You just tell Claude to do work in parallel with background sub agents. It helps to have a file for handing off the prompt, tracking progress, and reporting back. I also recommend constraining agents to their own worktrees. I am writing down the pattern here https://workforest.space while nearly everyone is building orchestrators i also noticed claude is already the best orchestrator for claude.
    • apsurd27 minutes ago
      OT: Your visual on "stacked PRs" instantly made me understand what a stacked PR is. Thank you!

      I had read about them before but for whatever reason it never clicked.

      Turns out I already work like this, but I use commits as "PRs in the stack" and I constantly try to keep them up to date and ordered by rebasing, which is a pain.

      Given my new insight with the way you displayed it, I had a chat with chatGPT and feel good about giving it a try:

          1. 2-3 branches based on a main feature branch
          2. can rebase base branch with same frequency, just don't overdo it, conflicts should be base-isolated.
          3. You're doing it wrong if conflicts cascade deeply and often
          4. Yes merge order matters, but tools can help and generally the isolation is the important piece
    • stingraycharlesan hour ago
      It’s even less of a feature, Claude Code already has subagents; this new feature just ensures Claude Code actually uses this for implementation.

      imho the plans of Claude Code are not detailed enough to pull this off; they’re trying to do it to preserve context, but the level of detail in the plans is not nearly enough for it to be reliable.

      • dceddia41 minutes ago
        Interesting about the level of detail. I’ve noticed that myself but I haven’t done much to address it yet.

        I can imagine some ideas (ask it for more detail, ask it to make a smaller plan and add detail to that) but I’m curious if you have any experience improving those plans.

        • stingraycharles7 minutes ago
          I’m trying to solve this myself by implementing a whole planner workflow at https://github.com/solatis/claude-config

          Effectively it tries to resolve all ambiguities by making all decisions explicit — if the source cannot be resolved to documentation or anything, it’s asked to the user.

          It also tries to capture all “invisible knowledge” by documenting everything, so that all these decisions and business context are captured in the codebase again.

          Which - in theory - should make long term coding using LLMs more sane.

          The downside is that it takes 30min - 60min to write a plan, but it’s much less likely to make silly choices.

    • AffableSpatula2 hours ago
      Claude already had subagents. This is a new mode for the main agent to be in (bespoke context oriented to delegation), combined with a team-oriented task system and a mailbox system for subagents to communicate with each other. All integrated into the harness in a way that plugins can't achieve.
    • mkw5053an hour ago
      Yeah, since they introduced (possibly async) subagents, I've had my main claude instance act as a manager overseeing implementation agents, keeping it's context clean, and ensuring everything goes to plan in the highest quality way.
      • AffableSpatulaan hour ago
        yep this is exactly how I use the main agent too, I explicitly instruct to only ever use background async subagents. Not enough people understand that the claude code harness is event driven now and will wake up whenever these subagent completion events happen.
  • birken39 minutes ago
    I'd really like to see a regular poll on HN that keeps track of which AI coding agents are the most popular among this community, like the TIOBE Index for programming languages.

    Hard to keep up with all the changes and it would be nice to see a high level view of what people are using and how that might be shifting over time.

    • samsolomon27 minutes ago
      Not this community's opinion on agents, but I've found it helpful to check the lmarena leaderboards occasionally. Your comment prompted me to take a look for the first time in a while. Kind of surprising to see models like MiniMax 2.1 above most of the OpenAI GPTs.

      https://lmarena.ai/leaderboard/code

      Also, I'm not sure if it's exactly the case but I think you can look at throughput of the models on openrouter and get an idea of how fast/expensive they are.

      https://openrouter.ai/minimax/minimax-m2.1

    • fragmede21 minutes ago
      Question is, are people on HN procrastinating and commenting here because the agent isn't very good and they're avoiding having to write the code themselves, or is the agent so good that it's off writing code, and the people here are commenting out of boredom?
      • nonethewiser13 minutes ago
        >Question is, are people on HN procrastinating and commenting here because the agent isn't very good and they're avoiding having to write the code themselves

        Can you help me envision what you're saying? It's async - you will have to wait whether its good or not. And in theory the better it is the more time you'd have to comment here, right?

      • thevinter9 minutes ago
        You're making it sound like before agents existed HN was a ghost town because everyone was too busy building ImportantThingTM by hand
  • czhu12an hour ago
    The problem I’ve been having is that when Claude generates copious amounts of code, it makes it way harder to review than small snippets one at a time.

    Some would argue there’s no point reviewing the code, just test the implementation and if it works, it works.

    I still am kind of nervous doing this in critical projects.

    Anyone just YOLO code for projects that’s not meant to be one time, but fully intend to have to be supported for a long time? What are learnings after 3-6 months of supporting in production?

    • serial_dev26 minutes ago
      In a professional setting where you still have coding standards, and people will review your code, and the code actually reaches hundreds of thousands of real users, handling one agent at a time is plenty for me. The code output is never good enough, and it makes up stuff even for moderately complicated debugging ("Oh I can clearly see the issue now", I heard it ten times before and you were always wrong!)

      I do use them, though, it helps me, search, understand, narrow down and ideate, it's still a better Google, and the experience is getting better every quarter, but people letting tens or hundreds of agents just rip... I can't imagine doing it.

      For personal throwaway projects that you do because you want to reach the end output (as opposed to learning or caring), sure, do it, you verify it works roughly, and be done with it.

    • gen22044 minutes ago
      In my (admittedly conflict-of-interest, I work for graphite/cursor) opinion, asking CC to stack changes, and then having an automated reviewer agent help a lot with digesting and building conviction in otherwise-large changesets.

      My "first pass" of review is usually me reading the PR stack in graphite. I might iterate on the stack a few times with CC before publishing it for review. I have agents generate much of my code, but this workflow has allowed me to retain ownership/understanding of the systems I'm shipping.

    • 41 minutes ago
      undefined
    • AstroBen39 minutes ago
      I think we'll start to see the results of that late this year, but it's a little early yet. Plenty of people are diving headfirst into it

      To me it feels like building your project on sand. Not a good idea unless it's a sandcastle

    • idontwantthisan hour ago
      I just can’t get with this. There is so much beyond “works” in software. There are requirements that you didn’t know about and breaking scenarios that you didn’t plan for and if you don’t know how the code works, you’re not going to be able to fix it. Assuming an AI could fix any problem given a good enough prompt, I can’t write that prompt without sufficient knowledge and experience in the codebase. I’m not saying they are useless, but I cannot just prompt, test and ship a multiservice, asynchronous, multidb, zero downtime app.
    • szundian hour ago
      [dead]
  • Androider3 hours ago
    Looks like agent orchestrators provided by the foundation model providers will become a big theme in 2026. By wrapping it in terms that are already used in software development today like team leads, team members, etc. rather than inventing a completely new taxonomy of Polecats and Badgers, will help make it more successful and understandable.
    • bloppe2 hours ago
      Respectfully disagree. I think polecats are a reasonable antidote to overanthropomorphization.
  • neom4 hours ago
    Claude Code in the desktop app seems to do this? It's crazy to watch. It sets of these huge swarms of worker readers under master task headings, that go off and explore the code base and compile huge reports and todo lists, then another system behind the scenes seems to be compiling everything to large master schemas/plans. I create helper files and then have a devops chat, a front end chat, an architecture chat and a security chat, and once each it done it's work it automatically writes to a log and the others pick up the log (it seems to have a system reminder process build in that can push updates from other chats into other chats. It's really wild to watch it work, and it's very intuitive and fun to use. I've not tried CLI claude code only claude code in the desktop app, but desktop app sftp to a droplet with ssh for it to use the terminal is a very very interesting experience, it can seem to just go for hours building, fixing, checking it's own work, loading it's work in the browser, doing more work etc all on it's own - it's how I built this: https://news.ycombinator.com/item?id=46724896 in 3 days.
    • jswny3 hours ago
      That’s just spawning multiple parallel explore agents instructed to look at different things, and then compiling results

      That’s a pretty basic functionality in Claude code

      • neom3 hours ago
        Sounds like I should probably switch to claude code cli. Thanks for the info. :)
    • deaux3 hours ago
      Sounds very similar to oh-my-opencode.
  • basedrum3 hours ago
    How is this different from GSD: https://github.com/glittercowboy/get-shit-done

    I've been using that and it's excellent

    • nonethewiser10 minutes ago
      GSD was the first project management framework I used. Initially I loved it because it felt like I was so much better organized.

      As time went on I felt like the organization was kind of an illusion. It demanded something from me and steered Claude, but ultimately Claude is doing whatever it's going to do.

      I went black to just raw-dogging it with lots of use of planning mode.

    • djfdat2 hours ago
      Really boils down to the benefits of first party software from a company that has billions of dollars of funding vs similar third party software from an individual with no funding.

      GSD might be better right now, but will it continue to be better in the future, and are you willing to build your workflows around that bet?

    • AffableSpatula3 hours ago
      a similar question was asked elsewhere in the thread; the difference is that this is tightly integrated into the harness
  • wild_pointer4 hours ago
    Listen team lead and the whole team, make this button red.
    • brookst3 hours ago
      Principal engineers! We need architecture! Marketing team, we need ads with celebrities! Product team, we need a roadmap to build on this for the next year! ML experts, get this into the training and RL sets! Finance folks, get me annual forecasts and ROI against WACCC! Ops, we’ll need 24/7 coverage and a guarantee of five nines. Procurement, lock down contracts. Alright everyone… make this button red!
    • AffableSpatula4 hours ago
      ha! The default system prompt appears to give the main agent appropriate guidance about only using swarm mode when appropriate (same as entering itself into plan mode). You can further prompt it in your own CLAUDE.md to be even more resistant to using the mode if the task at hand isn't significant enough to warrant it.
      • vorticalbox33 minutes ago
        I like opencode for the fact I can switch between build and plan mode just by pressing tab.
        • thevinter2 minutes ago
          Isn't it the same in base claude-code?
  • rco8786an hour ago
    Is this significantly different that the subagents that are already in CC?
  • MetaMonk3 hours ago
    A guy who worked at docker on docker swarm now works at Anthropic so makes sense
  • bpavukan hour ago
    hey that's exactly how I made Gemini 2.5 Flash give useful results in Opencode! a few specialized "Merc" subagents and a "Master" agent that can do nothing but send "Mercs" into the codebase
  • bakugoan hour ago
    > You're not talking to an AI coder anymore. You're talking to a team lead. The lead doesn't write code - it plans, delegates, and synthesizes.

    Even 90 word tweets are now too long for these people to write without using AI, apparently.

    • jen729wan hour ago
      I wonder how much 'listening' to an LLM all day affects one's own prose? Mimicry is in the genes…
      • flkiwian hour ago
        I accidentally gave my wife a prompt the other day. Everything was hellishly busy and I said something along the lines of “I need to ask you a question. Please answer the question. Please don’t answer any other issues just yet.” She looked at me and asked “Did you just PROMPT me?” We laughed. (The question was the sort that might spawn talking about something else and was completely harmless. In the abstract, my intent was fine but my method was hilariously tainted.)
      • Jweb_Guruan hour ago
        It affects it very heavily IME. People need to make sure they are getting a good mix of writing from other sources.
    • AffableSpatulaan hour ago
      You're absolutely right! I apologise — hopefully you can forgive me.
  • mohsen13 hours ago
    Everyone is wrapping Claude Code in Tmux and claiming they are a magician. I am not so good at marketing but I've done this here https://github.com/mohsen1/claude-code-orchestrator

    Mine also rotate between Claude or Z.ai accounts as they ran out of credits

    • AffableSpatula3 hours ago
      I think you've misunderstood what this is.
      • mohsen13 hours ago
        Sorry, you're right. went through the code and understood now. I'm going to try the patch. Claude Code doing team work natively would be amazing!

        Honestly if people in AI coding write less hype-driven content and just write what they mean I would really appreciate it.

    • bicx3 hours ago
      Well good sir, I _am_ a tmux magician.
  • svara2 hours ago
    I'm a fan of AI coding tools but the trend of adding ever more autonomy to agents confuses me.

    The rate at which a person running these tools can review and comprehend the output properly is basically reached with just a single thread with a human in the loop.

    Which implies that this is not intended to be used in a setting where people will be reading the code.

    Does that... Actually work for anyone? My experience so far with AI tools would have me believe that it's a terrible idea.

    • nilamo37 minutes ago
      It works for me, in that I don't care about all the intermediate babble ai generates. What matters is the final changelist before hitting commit... going through that, editing it, fixing comments, etc. But holding it's hand while it deals with LSP issues of a logger not being visible sometimes, is just not something I see a reason to waste my time with.
      • vorticalbox29 minutes ago
        After I have wrote a feature and I’m in the ironing out bug stage this is where I like the agents do a lot of the grunt work, I don’t want to write jsdocs, or fix this lint issue.

        I have also started it in writing tests.

        I will write the first test the “good path” it can copy this and tweak the inputs to trigger all the branches far faster than I can.

    • ttulan hour ago
      Yes, this actually works. In 2026, software engineering is going to change a great deal as a result, and if you're not at least experimenting with this stuff to learn what it's capable of, that's a red flag for your career prospects.

      I don't mean this in a disparaging way. But we're at a car-meets-horse-and-buggy moment and it's happening really quickly. We all need to at least try driving a car and maybe park the horse in the stable for a few hours.

    • pton_xdan hour ago
      > The rate at which a person running these tools can review and comprehend the output properly is basically reached with just a single thread with a human in the loop.

      That's what you're missing -- the key point is, you don't review and comprehend the output! Instead, you run the program and then issue prompts like this (example from simonw): "fix in and get it to compile" [0]. And I'm not ragging on this at all, this is the future of software development.

      [0] https://gisthost.github.io/?9696da6882cb6596be6a9d5196e8a7a5...

      • vunderba29 minutes ago
        I've commented on this before, but issuing a prompt like "Fix X" makes so many assumptions (like a "behaviorism" approach to coding) including that the bug manifests in both an externally and consistently detectable way, and that you notice it in the first place. TDD can reduce this but not eliminate it.

        I do a fair amount of agentic coding, but always periodically review the code even if it's just through the internal diff tool in my IDE.

        Approximately 4 months ago Sonnet 4.5 wrote this buried deep in the code while setting up a state machine for a 2d sprite in a relatively simple game:

          // Pick exit direction (prefer current direction)
          const exitLeft = this.data.direction === Direction.LEFT || Math.random() < 0.5;
        
        I might never have even noticed the logical error but for Claude Code attaching the above misleading comment. 99.99% of true "vibe coders" would NEVER have caught this.
    • plagiaristan hour ago
      Based on Gas Town, the people doing this agree that they are well beyond an amount of code they can review and comprehend. The difference seems to be they have decided on a system that makes it not a terrible idea in their minds.
    • IAmGraydonan hour ago
      No, it doesn't work in practice because they make far too many mistakes.
  • reilly3000an hour ago
    This no doubt takes some inspiration from mcp_agent_mail https://github.com/Dicklesworthstone/mcp_agent_mail
  • dlojudice4 hours ago
    It feels like Auto-GPT, BabyAGI, and the like were simply ahead of their time
    • woeirua3 hours ago
      Had to wait for the models to catch up...
  • engates4 hours ago
    Isn't this pretty much what Ruv has been building for like two years?

    https://github.com/ruvnet/claude-flow

    • AffableSpatula4 hours ago
      The difference is that this is tightly integrated into the harness. There's a "delegation mode" (akin to plan mode) that appears to clear out the context for the team lead. The harness appears to be adding system-reminder breadcrumbs into the top of the context to keep the main team lead from drifting, which is much harder to achieve without modifying the harness.
      • estearum4 hours ago
        It's insane to me that people choose to build anything in the perimeter of Claude Code (et al). The combination of the fairly primitive current state of them and the pace at which they're advancing means there is a lot of very obvious ideas/low-hanging fruit that will soon be executed 100x better by the people who own the core technology.
        • AffableSpatula3 hours ago
          yeah I tend to agree. They're must be reaching the point where they can automate the analysis of claude code prompts to extract techniques and build them directly into the harness. Going up against that is brave!
  • nehalem4 hours ago
    Answering the question how to sell more tokens per customer while maintaining ~~mediocre~~ breakthrough results.
    • AffableSpatula3 hours ago
      Delegation patterns like swarm lead to less token usage because:

      1. Subagents doing work have a fresh context (ie. focused and not working on the top of a larger monolithic context) 2. Subagents enjoying a more compact context leads to better reasoning, more effective problem solving, less tokens burned.

      • nulone3 hours ago
        Merge cost kills this. Does the harness enforce file/ownership boundaries per worker, and run tests before folding changes back into the lead context?
        • AffableSpatula3 hours ago
          I don't know what you're referring to but I can say with confidence that I see more efficient token usage from a delegated approach, for the reasons I stated, provided that the tasks are correctly sized. ymmv of course :)
    • Blemiono4 hours ago
      [dead]
  • lysace4 hours ago
    I'm already burning through enough tokens and producing more code than can be maintained - with just one claude worker. Feel like I need to move into the other direction, more personal hands-on "management".
    • AffableSpatula3 hours ago
      I've seen more efficient use of tokens by using delegation. Unless you continually compact or summarise and clear a single main agent - you end up doing work on top of a large context; burning tokens. If the work is delegated to subagents they have a fresh context which avoids this whilst improving their reasoning, which both improve token efficiency.
      • storystarling3 hours ago
        I've found the opposite to be true when building this out with LangGraph. While the subagent contexts are cleaner, the orchestration overhead usually ends up costing more. You burn a surprising amount of tokens just summarizing state and passing it between the supervisor and workers. The coordination tax is real.
        • AffableSpatula3 hours ago
          Task sizing is important. You can address this by including guidance in the CLAUDE.md around that ie. give it heuristics to use to figure out how to size tasks. Mine includes some heuristics and T shirt sizing methodology. Works great!
          • xpean hour ago
            Management is dead. Long live management.
    • stuaxo3 hours ago
      If there's any kind of management some of it could use small local models - e.g. to see when it looks like its stuck.
  • tom29483294944 hours ago
    And… how?
    • AffableSpatula4 hours ago
      The feature is shipped in the latest builds of claude code, but it's turned off by a feature flag check that phones home to the backend to see if the user's account is meant to have it on. You can just patch out the function in the minified cli.js that does this backend check and you gain access to the feature.
      • bonsai_spool3 hours ago
        Do you know what patch to apply? The Github link from the OP seems to have a lot of other things included.
        • mohsen13 hours ago
        • AffableSpatula3 hours ago
          it's my repo - it's a fork of cc-mirror which is an established project for parallel claude installs. I wanted to take the least disruptive approach for the sake of using working code and not spelunking through bugs. Having said that - if you look through the latest commits you'll see how the patch works, it's pretty straightforward - you could do it by hand if you wanted.
  • codethief4 hours ago
    • dang2 hours ago
      Thanks! We'll put those links in the toptext.
  • Blemiono3 hours ago
    [dead]