I do wonder how far this scales once you hit non-deterministic tool behavior though. In a lot of agent setups the hard part isn’t parallelism, it’s knowing whether a tool actually did what it claimed.
A review loop helps, but if the reviewer is also an LLM you can end up with two layers of probabilistic validation.
Have you experimented with any deterministic verification (tests, schema checks, etc.) inside the loop?
If you're looking to see what that feedback mechanism might look like in action, you might like checking out one of the other projects I've worked on which pre-dates Orc: https://github.com/spencermarx/open-code-review
Love where your head is at though! DEFINITELY an important problem we've got to solve here.
What I am most interested in is the gap between "the agent completed the task" and "the system can actually prove the task was completed correctly." That is where a lot of multi agent setups still feel fragile to me.
LLM review definitely helps, but I think it gets much stronger when it is paired with deterministic checks in the loop, even simple ones like executable smoke tests, schema validation, contract checks or replayable fixtures. Otherwise you can end up with persuasive agreement rather than real verification.