None of those formats fully satisfied my curiosity, so my RAIF thing isn't just lighter on syntax, but also features a self-repair feature as a core principle. LLMs are non-deterministic and sometimes the structured output may be corrupted. jsonrepair fixes the issue, but still I wanted to push it further and make a format built around repairability. Just like a QR code (of course it works a bit differently, but still was an inspiration subject).
Numbers are nice so far. About -14% of tokens on a worst-case scenario benchmark and up to -35% tokens during normal usage. Repairing works as intended also, I'm still gathering the data on repairing cases. The most common ones so far is output truncation. Minor syntax-related errors rarely happen.
It's worth mentioning that RAIF works only as a LoRA, but I liked the results even using the small Qwen2.5-0.5B model. It builds structures noticeably more stable than the base model did on its own. Medium models handle RAIF even better and perfectly switch from JSON to RAIF using LoRAs without any artifacts.
I see RAIF as a useful thing for any self-hosted agent or LLM. Especially subagents that use smaller models.
Still very much an experiment, so any feedback and ideas welcome.