Adding verification steps to AI agents made them worse. I tested it 29 times(substack.com)

2 pointsby chrisdudek6 hours ago1 comment

nareyko6 hours ago
[dead]
- chrisdudek5 hours ago
  I suspected that given some responses the agent was giving during the experimentation. Looked sometimes tendentious when measuring.
  - nareyko4 hours ago
    [dead]
    chrisdudek4 hours ago
    I tried to also avoid that, and the main orchestrating agent was running subagents running experiments, and others were checking the results. Somewhat it worked as it was not imposing only successes, there were more failures.