1 pointby SashaCui10 hours ago1 comment

SashaCui10 hours ago
We introduce an automated activation‑steering approach that plugs into standard labeled datasets—no handcrafted prompt pairs or feature annotation. On 18 tasks and 3 open‑weight models, the introspective variant (iPAS) yields the strongest behavior improvements, and layers on top of ICL/SFT.
Deep Dive: https://open.substack.com/pub/sashacui/p/painless-activation... Paper: arxiv.org/abs/2509.22739