If you ask Agent1 to write code, the PR diff coming out of it should really be reviewed by Agent2 (or 3 or whatever), the reviewer should be a different system than the code gen agent.
The core claim is that a model is a poor judge of its own work — it shares the same priors and blind spots that produced a bug, so self-review trends toward a shallow review rather than catching blind spots.
The system that writes the code shouldn't also be the thing that signs off that it's safe to ship.