I think it would be valuable to list all interactions with the LLM by the dev team and transparently state what was induced by human steering the LLM, and what was actuall LLM decision, which was not biased by system instructions or dev team communicating with it
Agreed. Color me skeptical. All of the interactions and decisions described are plausible, but in my experience with AI agents, they would require frequent human intervention.