I built a UI to manage this, and it is starting to turn into a new type of IDE, based around agent management and review rather than working on one thing at a time.
For more important stuff, like if it falls under my supervision, I will test the branch and carefully check the implementation. And this for each PR updates. That takes a lot longer.
So I’m wondering, how do you context switch between many agent running and proposing diffs. Especially if you need to vet the changes. And how do you manage module dependencies where an update by one task can subtly influence the implementation by another?
I’m wondering this too. But from what I have seen, I think most people doing this are not really reading and vetting the output. Just faster, parallelized, vibe coding.
Not saying that’s what parent is doing, but it’s common.
For parallel work who want stuff to “happen faster”, I am convinced most of these people don’t really read (nor probably understand) the code it produces.
Honestly, I've seen too many fairly glaring mistakes in all models I've tried that signal that they can't even get the easy stuff right consistently. In the language I use most (C++), if they can't do that, how can I trust them to get all the very subtle things right? (e.g. very often they produce code that holds some form of dangling references, and when I say "hey don't do that", they go back to something very inefficient like copying things all over the place).
I am very grateful they can churn out a comprehensive test suite in gtest though and write other scripts to test / do a release and such. The relief in tedium there is welcome for sure!
I think there are opportunities to give special handling to the markdown docs and diagrams Claude likes to make a long the way to help review.
I would argue you haven't covered any.
Why not just skip the reviews then? If you can trust the models to have the necessary intelligence and context to properly review, they should be able to properly code in the first place. Obviously not where models are at today.
Here we are talking about the same model doing the review (even if you use a different model provider, it's still trained on essentially the same data, with the same objective and very similar performances).
We have had agentic systems where one agent checks the work of another since 2+ years, this isn't a paradigm pushed by AI coding model providers because it doesn't really work that well, review is still needed.
But that was two weeks ago; maybe it’s different today
Right now, background agents have two major problems:
1. There is some friction to getting the isolated environment working correctly. Difficulty depends on specifics of each project. Ranging from "select this universal container" to "it's going to be hell getting all of your dependencies working". Working in your IDE pretty much solves that - it's likely a place where everything is already setup.
2. People need to learn how agents build code. Watching an agent work in your IDE while being able to interject/correct them is extremely helpful to long term success with background agents.
My preferred way to vibe code is to lock in on a single goal and iterate towards it. When I'm waiting for stuff to finish, I'm exploring docs or info to figure out how to get closer. Reviewing the existing codebase or changes is also super useful for me to grasp where I'm up to and what to do next. This idea of managing swarms of agents for different tasks does not gel with me, too much context switching and multitasking.
Side note: You should look into electron-trpc. it greatly simplifies IPC handling
Sounds like you're limiting yourself to users who are comfortable paying 100-200$ monthly subscription or even thousands per month for API prices.
C.C. is expensive but i was hoping we weren't going to build tooling that exacerbated this issue simply because for some of us money is less of an issue than for most of us.
So yes it might feel expensive in terms of a personal monthly budget, but the value for money is insane.
Having a nice way to manage the work trees sounds great, but the rate limiting still sounds like an issue to this approach.
https://docs.anthropic.com/en/docs/claude-code/common-workfl...
One must also always be aware that an LLM WILL ALWAYS DO what you ask it for. Often you ask for the wrong thing. And you need to rethink.
Maybe I am inefficient though I really only use at the most two additional work trees at the same time.
What? That's not my experience at all. Especially not "always"
I cannot count how many times that or something like that has happened to me.
Don't take me wrong, I'm a big fan and constant user of all these things, but I would say it frequently have problem following prompts.
Personally, I'm running 2 accounts and switching between them for maximum productivity. Just as a function of what my time is worth it is a no brainer.
https://github.com/stravu/crystal/actions/runs/15791009893/a...
Thanks for your help, now I'll be able to include Linux support in my next release
- Top-tier `git worktree`-based context switching in the same IDE window.
- A framework for attaching terminal-based agents to each worktree branch. Eventually this should evolve into a better open protocol for integration, primarily for diffs, permission request notifications, and progress indicators.
- A sidebar that monitors agent status/notifications on each active worktree branch.
- A quick notification-style way of responding to agent prompts across all branches. This has been built in standalone agent manager tools, but I can't use those tools effectively when I need to quickly jump in and be an engineer.
- Branch-context-level association with browser test windows or mobile emulator/simulator instances.
- Strong code completion capabilities via other faster models, a great extension ecosystem with lots of language server support, and function as a high-quality IDE.
Right now, I'm managing multiple macOS desktops with different instances of Windsurf running Claude agents in-terminal, and web browser windows / mobile emulators/simulators are dragged into the respective desktops for each instance. It's clunky.
I tried, unsuccessfully, to write a plugin for VSCode that would let Claude run a tool to jump me to the file and line it was editing. It sorta worked but kept hanging.
The current state of having multiple editors open, or having to switch between JetBrains stuff and Cursor is really a bit of an annoying transition period (I hope).
Claude Code is fully agentic, meaning you give it a task and fully implements everything, produces surprisingly good, working code. Can test, commit, run commands, log in to remote system, debug anything.
It doesn't optimise for token usage, which Cursor heavily do, that's why it can produce higher quality code on first shots (the downside is that the cost is very high)
Cursor's agent mode is very much in it's infrantry just catching up, but Cursor is essentially a tool for editing files, but Claude Code is like a junior developer.
Cursor will suggest and complete code for you inline. You just tab-complete your way to a written function. It's mad.
Claude Code doesn't do this.
Cursor also has much better awareness of TypeScript. It'll fix errors as they occur, and you can right-click an issue and have it fixed.
Contrast with CC where I've had to specify in CLAUDE.md to "NEVER EVER leave me with TS errors", and to do this it runs a CLI check using its integration, taking way longer to do the same thing.
I noticed that CC’s generated Go code nowadays is very solid. No hallucination recently that i can remember or struck me. I do see youtube videos of people working with js/ts still struggling with this. Which is odd, there is way more training material for the latter. Perhaps the simplicity of Go shines here.
CC might generate Go code for which there are already library functions present. So thorough code reviews are a necessity.
Much as I dislike Go, it is indeed probably closer to the ideal language for the LLM. But I suspect that we need to dial it down even further, e.g. no type inference whatsoever (so no := etc). In fact I wonder if forcing the model to spell out the type of every subexpression as a type assertion might be beneficial due to the way LLMs work, for the same reason why prompting for explicit chain-of-thought improves outputs even with models not specifically trained to produce CoT. In the similar vein, it could require fully qualified names for all library functions etc. But it also needs to have fewer footguns, which Go has aplenty - possible to ignore error returns, concurrency is unsafe etc. I suspect message passing a la Erlang might be the best bet there but this is just a gut feel.
Of course, the problem with any hypothetical new PL optimized for LLMs is that there's no training data for it. To some extent this can be mitigated by mechanically converting existing code - e.g. mandatory fully qualified names and explicit type assertions for subexpressions could be easily bolted onto any existing statically typed language.
If I’m wrong I’d be overjoyed! But I have it installed and have seen no hint of this.
Otherwise CC has been stellar and I live it’s a CLI + optional vs code extension.
I am using the Cursor agent mode, which can run in auto mode with, let's say, 50 consecutive tool calls, along with editing and other tasks. It can operate autonomously for 30 minutes and complete a given task. I haven't tried Claude Code yet, but I'm curious—what exactly does Claude Code do differently compared to the Cursor agent?
Is the improvement in diff quality solely because Cursor limits the context size, or are there other factors involved?
I couldn't get cursor agent to do useful stuff for me - might be because I don't do TS or Python - and Claude Code was a big productivity boost almost from day one. You just tell it to do stuff, and it just... does it. At like the level of a college student.
Coming back to an implementation that has good test coverage, functions exactly as specified, and is basically production-ready is achievable through planning/specs.
Maybe Cursor can do this now as well, but it was just so far behind last time I tried it.
This has been exactly my experience. I guess one slightly interesting thing is that my “junior developer” here will get better with time, but not because of me.
In terms of performance, their agents differ. The base model their agents use are the same, but for example how they look at your codebase or decide to farm tasks out to lesser models, and how they connect to tools all differ.
But from an agent perspective, Claude Code is much more tuned to understanding the task, breaking it down into small steps, and executing those steps with precision.
Overall, IMO agentic coding is great for well defined tasks, especially when they're backed by tests. It still lacks though in deep technical discussions and being opinionated about architectural decisions, unless specifically nudged in a certain direction. This is an area where Gemini Pro excels at, but it sucks at as a coding agent. So I use both: Gemini Pro for high-level picture design, and Claude Code for executing the plan by giving it clear requirements. All while making some edits myself using Cursor Tab.
#browser_navigate https://news.ycombinator.com/
I had to do some installing and setup to get playwright to work. Now, how to get the agent to use the working playwright on its own is a different matter.
I can't comment on why you're having this issue specifically, unfortunately.
Hopefully you got it worked out by now?
Glad there is some competition.
Is it just me, or does that seem really invasive?
Auto-installation: When you launch Claude Code from within VSCode’s terminal, it automatically detects and installs the extension
Yesterday I burned 15€ (10€ free credit) trying Amp and I gotta said I was impressed.
The next few years are going to be interesting.
this slowly rewires how we approach code. we stop worrying about syntax early, we write more scaffolds, we batch tasks more. subtle shift but huge long term effect.
how soon before we start designing codebases for LLM agents to navigate more cleanly? flat structures, less indirection, more declarative metadata
This is something that I have been mulling over since I heard reports that LLMs work very well with languages like Go (explicit static typing, simple syntax, only 1 way to do things...)
Seems like with humans, the less we have to worry about the incidental complexity imposed by the tools we are using (language, framework, lib...) the more brain bandwidth we have available to use to solve a problem with code.
Maybe something like https://flix.dev/ with many analyzers.
It's already happening. Armin Ronacher started writing more Go code instead of Python because it understand it better. My coworker changed writing a Desktop app in Rust, because it can navigate it better because of better tooling and type system.
People already thinking about how to write documentation for AI instead of other people, etc.
Features:
- Auto-installation: When you launch Claude Code from within VSCode’s terminal, it automatically detects and installs the extension
- Selection context: Selected text in the editor is automatically added to Claude’s context
- Diff viewing: Code changes can be displayed directly in VSCode’s diff viewer instead of the terminal
- Keyboard shortcuts: Support for shortcuts like Alt+Cmd+K to push selected code into Claude’s prompt
- Tab awareness: Claude can see which files you have open in the editor
- Configuration: Set diff tool to auto in /config to enable IDE integration features
Has this been fixed? Does the vscode Claude Code plugin retain prompts more reliably?
I did try to get Claude Desktop to send comms to Claude Code, but got stuck on a few things related to the terminal emulation in Windows.
I have session list, load, and save tools. If a character is embodied that is working on a project, that goes in the session information and the character is loaded (embodied) when you start a new session. Making characters is done with the character generator tool, which strongly randomizes traits. Traits can related to ability (or inability) to run tools. Why have a personality in the AI? Because it keeps it fun and changes the tone of the code commenting and planning. And it affects tool runs...
> We are Groot! completely deadpan delivery while already analyzing the situation
There are notes on projects (folders) and any files it created for planning usually goes in /notes in the folder.
Claude Code does have some ability to save sessions, but I don't edit it much myself. That would be a better job for Claude Desktop.
If I remember correctly, this is even true between instances of Claude on the same terminal.
I am in a container, so if I close rebuild my container obviously that’s gone.
As a long-time IntelliJ user, I’m beginning to question whether it still makes sense to remain on this platform.
Perhaps I’m too impatient and agentic plugins may reach parity on IntelliJ within a year but a year is quite a long time to wait in this really fast-evolving landscape.
The intellij plugin in beta: https://plugins.jetbrains.com/plugin/27310-claude-code-beta-...
IntelliJ and PyCharm are both Apache 2, IntelliJ for sure supports many languages, and I'll keep the commentary about the last item to myself
What are you building that doesn’t compete with Anthropic? (Using your brain competes with Anthropic) — major legal risk
How do we justify accepting the lack of privacy on Claude? Is it just for people doing FOSS? You’re cool with them reading your business codebase to verify you aren’t using your brain?
Given it is logically impossible to not compete with general intelligence, and that I expect private github repos to remain private, I feel forced to think Claude Code is a nerd snipe / bad joke / toy
Claude Code stores feedback transcripts for only 30 days and has "clear policies against using feedback for model training":
Privacy safeguards
We have implemented several safeguards to protect your data, including limited retention periods for sensitive information, restricted access to user session data, and clear policies against using feedback for model training.