> The model will respond with a JSON object that strictly follows your schema
Gemini is listed as a model supporting structured output, and yet its fail rate is 0.39% (Gemini 2.0 Flash)!! I get that structured output has a high performance cost but advertising it as supported when in reality it's not is a massive red flag.
Worst yet response healing only fixes JSON syntax error, not schema adherence. This is only mentioned at the end of the article which people are clearly not going to read.
WTF
llguidance is used in vLLM, SGLang, internally at OpenAI and elsewhere. At the same time, I also see a non-trivial JSON error rate from Gemini models in large scale synthetic generations, so perhaps Google hasn't seen the "llight" yet and are using something less principled.
1: https://guidance-ai.github.io/llguidance/llg-go-brrr 2: https://github.com/guidance-ai/llguidance
Gemini [0] is falsely advertising this:
> This capability guarantees predictable and parsable results, ensures format and type-safety, enables the programmatic detection of refusals, and simplifies prompting.
[0]: https://ai.google.dev/gemini-api/docs/structured-output?exam...
If part of my system can't even manage to output JSON reliably, it needs way more "healing" than syntax munging. This comes across as naive.
Don't you worry about Planet Express, let me worry about blank.
Isn't this exactly how we got weird html parsing logic in the first place, with "autohealing" logic for mismatched closing tags or quotes?
{"name": "Alice", "age": 30
the standard LLM output would have stopped there because the LLM output an end-of-sequence (EOS) token. But because that would lead to a syntax error in JSON, the EOS token would have probability zero, and it would be forced to either extend the number "30", or add more entries to the object, or end it with "}".I haven't played much with structured output, but I imagine the biggest risk is that you may force the model to work with contexts outside its training data, leading it to produce garbage, though hopefully syntactically-correct garbage.
I don't understand, though, why the probability of incorrect JSON wouldn't go to 0, under this framework (unless you hit the max sequence length before the JSON ended.) The post implies that JSON errors still happen, so it's possible they're doing something else.
The content of your posts is really insightful and interesting, but it's feel like junk quality because of the way LLMs write blogposts.
What was your prompt?
This sounds AI written.
Why do you have an expectation that a company will disclose to you when they use AI for their copywriting? Do you want them to disclose the software they used to draft and publish? If a manager reviewed the blog post before it went live?
Plagiarism is bad for a lot of reasons, all of which also apply to the undisclosed use of generative AI.
Basically, I'm asking for open source blogging!
I don't like this future we're going towards where we have to trick our software (which we can no longer understand the workings of) into doing what we tell it to by asking it nicely, or by putting another black box on the end to "fix" the output. This is the opposite of engineering. This is negotiation with a genie trapped in silicon.
we have brilliant machines that can more or less work perfectly
then the scam artists have convinced people that spending a trillion dollar and terawatts to get essentially a biased random number generator to produce unusable garbage is somehow an improvement
I think the fall down you see is in logical domains of that rely on relative complexity and contextual awareness in a different way. I've had less luck, for example, having AI systems parse and break down a spreadsheet with complex rules. Thats simply recent memory
So, the software that you learned on is changing. You aren't going crazy, but the ground is indeed shifting. The problem is that you assumed it couldn't shift because you were applying the wrong constraints.
The other idea was a bit more theoretical: If you know only a handful tokens are valid, then calculating the logits of the other tokens in the forward pass is wasteful as they won't affect the sampling process. However, it's probably not worthe the cost to optimize that as it only affects the last layer and might be mostly amortized by SIMD parallel processing anyway.
I think there are lots of boilerplate sequences like '":{' or '":[' or '", "', etc - though they might already be compressed into a single token if the tokenizer was trained on enough JSON.
There are also situations where the schema would only allow a specific field name as the next token, e.g. if it was the only remaining valid and required field, or if fields have to be output in a specific order.
Seems like Openrouter also supports structured outputs.
https://openrouter.ai/docs/guides/features/structured-output...
Maybe people got used to computers being unreliable and unpredictable as the UIs we shipped became more distracting, less learnable, always shifting and hiding information, popping up suggestions and displaying non-deterministic-seeming behavior. We trained users to treat their devices like unruly animals that they can never quite trust. So now the idea of a machine that embodies a more clever (but still unreliable) animal to wrangle sounds like a clear upgrade.
But as someone who's spent an inordinate amount of time tweaking and tuning his computing environment to prune out flakey components and fine-tune bindings and navigation, the idea of integrating a tool into my workflow that does amazing things but fails utterly even 1% of the time sounds like a nightmare, a sort of perpetual torture of low-grade anxiety.
I wish I didn't agree with this, but I think you're exactly right. Even engineers dealing with systems we know are deterministic will joke about making the right sacrifices to the tech gods to get such-and-such working. Take that a step further and maybe it doesn't feel too bad to some people for the system to actually not be deterministic, if you have a way to "convince" it to do what you want. How depressing.