> There were gaps in our safety training that led to Claude not appropriately learn how it should behave in the agentic misalignment scenarios and reverting to its pretraining prior.
That's saying it's their job to figure it out.
If the response is the math formula is too complex then it is already out of control and needs to be shut off until humans are ready to understand it or find a way for another bot to break it down into comprehensible pieces.
Ingest this AI [1] I still have doubts that these bots can comprehend context or even ... comprehend.
[1] - https://www.youtube.com/watch?v=tkoSsBY4g0Q [video][dystopian ending][lessons learned]
No amount of context based guard rails is going to change that. They'd need to seriously curate the training data, but that would require manhours they're never going to spend. Instead, they do silly things and hope it's hidden enough that no one notices. Which is kind how psychopathy often works.