The main reason is because if there's a significant bug or large optimization going on, that shit needs to be done, tested and merged before building more stuff on top, otherwise you run the risk of wasted time, tokens and effort having a bunch of parallel work running that may not end up compatible at the end.
Lately I've had a lot more success having Claude generate a plan, send the plan to Codex for co-validation/amendments, have Claude implement the plan, then have Codex PR review the commit (and likely make some edits of its own), then I test out the code/changes.
Meanwhile, my actual management of what I'm asking them to do is just a text file in Notepad where I'll write like BUG: xyz thing does abc or IDEA: let's change this to that as I'm testing in-app, with the actual code opened in Notepad++ tabs (lol feel free to roast me, I'm in front of 2 screens, one Windows (primary), one Mac (to the right), sharing keyboard and mouse -- LLMs are 99% on the Mac, planning/testing/verification/manual coding/graphic design on Windows, committing and pushing to a repo both machines have checked out)
I haven't yet found a scenario where many Claudes and many Codexes running simultaneously on 35 concurrent features makes any sense, but I'd definitely encourage people to try multi-model cooperation since they all seem to have different sensibilities. I haven't made much use of Gemini in this context though because two's company, three's a crowd. YMMV.
I expected this to become less necessary over time as models got faster, but the opposite has happened. It feels like Claude has actually gotten slower (but in fairness does more per prompt), meaning worktrees are even more essential now.
I think whats important is, that you keep atomical small tasks and increments, and whenever possible merge things. to many hanging worktrees can quickly also become a nightmare managing
It's not perfect; I've had some issues with Claude Code forgetting where it did things ("oh... it's not working because I'm not in the right directory"). I think it needs some architectural tweaks to function more reliably.