No shit. Just running some of these models and their "aligned" versions locally and observing the reasoning traces was really eye opening for me. Watching an aligned model respond one thing, but spit out a CoT 5 times bigger, jumping through hoops to try to plan out what to hide or factor in or what have you just made me realize that somehow, humanity has learned to teach machines how to lie. Straight up, no cap. Thinks one thing, says another. Both representations diverge. No wonder SV/VC loves these things.