And I don't mean "fails to help." My steering-only approach achieved 24.4% valid JSON, compared to 86.8% from the completely untrained base model. Steering made the model worse than doing nothing at all.
Here's what I learned, why it matters, and what actually works when you need guaranteed structured outputs from decoder-only language models.