Codex makes all kind of terrible blunders that it presents as "correct". What's to stop it from just doing that in the loop? The LLM is still driving, same as when a human is in the loop.
just the initial coding first requires you to actually define what the output is
if somebody can make a cleanroom agent that can explore and document specifications for commercial software, you could maybe throw ralph at building it, but then you still have to work out the parts that dont have documentation/training details, like how you are going to maintain it
the loop is pretty perfect for something like "my dependency updated. decide whether to update to match, and then execute"
itll do the mediocre job and then keep trying till it gets something working, at probably the most expensive token cost possible
https://github.com/anthropics/claude-code/blob/main/plugins/...
The README.md has prompt examples.
While being an insightful satire of mass training LLMs with (negative) reinforcement learning, it's actually from the 1993 episode "Last Exit to Springfield", thought by many (including me) to be the single greatest Simpsons episode of all time (https://www.reddit.com/r/Simpsons/comments/1f813ki/last_exit...).