The core idea is that after ~20 autonomous experiments the loop breaks down because the agent random-walks through changes with no research direction and fills its context with noise. The fix isn't a better model, it's a better environment. To turn hill-climbing into autonomous research you need forced hypothesis writing before code edits, rich diagnostics beyond the final metric, and periodic distillation of meta-patterns into a strategy doc.
This is me thinking through the design in public before pointing an agent at my URM-Energy project on a single 3090. The next post will be the results. I'd love to hear feedback on how you’ve successfully applied these ideas to your own work!