So the author thinks he's giving Claude this instruction:
> You must re-read CLAUDE2.md, even if you've already read it before.
But the actual instruction is closer to:
> Do not re-read files you have already read. You must re-read CLAUDE2.md, even if you've already read it before.
So Claude has conflicting instructions. Is it any surprise that it tries to thread the needle by re-reading the minimal amount of CLAUDE2.md necessary? It's just doing its best to satisfy both masters!
Similarly, I'm trying to stop agents "gracefully" handling errors by stuffing results with empty junk and continuing (get_list_of_problems().unwrap_or_default() -> "no problems found!"). I've filled AGENTS.md with "fail closed", "extremely strict error handling", "no fallbacks", "don't use sentinel values", and hundreds of variations of these, but they work about as well as "do not hallucinate". I get "You're absolutely right, this will cause problems!" and the fix is "changed to Err(_) => String::new()", I suspect it's another case of gaming RL - failing early and loudly increases the chance of failing and being penalized. So fudging data, ignoring errors, and presenting a barely-working result is a better strategy overall. When it fails, it fails anyway, but as long as it stumbles to the finish line it has a non-zero chance of getting accepted by the RL judge.
I then had it make a mistakes file and write every mistake, so it would learn, it kinda worked but it would still make the mistakes. It clearly wasn't reading all of it.
So I made a checklist, and it had verify every item on the checklist, that was my work around to both lazy and short mindedness of the agents. Turn mistakes into items to check for. Traded processing time for better results, ok for me on smaller projects. My run times went from 5-10 minutes from 3 per task, need to start logging tasks effectiveness/efficiency to reduce processing time.
I keep seeing people saying loop engineering is the way to get around these issues, I guess I'm kinda doing that in an adhoc way. Since I'm already looking at adding cost and goals(kinda).