Are we really at the point where some people see XML as a spooky old technology? The phrasing dotted around this article makes me feel that way. I find this quite strange.
Nobody dares advertise the XML capabilities of their product (which back then everybody did), nobody considers it either hot new thing (like back then) or mature - just obsolete enterprise shit.
It's about as popular now as J2EE, except to people that think "10 years ago" means 1999.
For Web markup, as an industry we tried XHTML (HTML that was strictly XML) for a while, and that didn't stick, and now we have HTML5 which is much more lenient as it doesn't even require closing tags in some cases.
For data exchange, people vastly prefer JSON as an exchange format for its simplicity, or protobuf and friends for their efficiency.
As a configuration format, it has been vastly overtaken by YAML, TOML, and INI, due to their content-forward syntax.
Having said all this I know there are some popular tools that use XML like ClickHouse, Apple's launchd, ROS, etc. but these are relatively niche compared to (e.g.) HTML
XML was definitely popular in the "well used" sense. How popular it was in the "well liked" sense can maybe be up for debate, but it was the best tool for the job at the time for alot of use cases.
And then do we end up over indexing on Claude and maybe this ends up hurting other models for those using multiple tools.
I just dislike how much of AI is people saying "do this thing for better results" with no definitive proof but alas it comes with the non determinism.
At least this one has the stamp of approval by Claude codes team itself.
The article even references English's built-in delimiter, the quotation mark, which is reprented as a token for Claude, part of its training data.
So are we sure the lesson isn't simply to leverage delimiters, such as quotation marks, in prompts, period? The article doesn't identify any way in which XML is superior to quotation marks in scenarios requiring the type of disambiguation quotation marks provide.
Rather, the example XML tags shown seem to be serving as a shorthand for notating sections of the prompt ("treat this part of the prompt in this particular way"). That's useful, but seems to be addressing concerns that are separate from those contemplated by the author.
<antml:invoke name="Read">
<antml:parameter name="file_path">/path/to/file</antml:parameter>
<antml:parameter name="offset">100</antml:parameter>
<antml:parameter name="limit">50</antml:parameter>
</antml:invoke>
I'm sure Claude can handle any delimiter and pseudo markup you throw at it, but one benefit of XML delimiters over quotation marks is that you repeat the delimiter name at the end, which I'd imagine might help if its contents are long (it certainly helps humans).Ex: <message>...</message> helps keep track. Even better? <message78>...</message78>. That's ugly xml, but great for LLMs. Likewise, using standard ontologies for identifiers (ex: we'll do OCSF, AT&CK, & CIM for splunk/kusto in louie.ai), even if they're not formally XML.
For all these things... these intuitions need backing by evals in practice, and part of why I begrudgingly flipped from JSON to XML
To me it seems like handling symbols that start and end sequences that could contain further start and end symbols is a difficult case.
Humans can't do this very well either, we use visual aids such as indentation, synax hilighting or resort to just plain counting of levels.
Obviously it's easy to throw parameters and training at the problem, you can easily synthetically generate all the XML training data you want.
I can't help but think that training data should have a metadata token per content token. A way to encode the known information about each token that is not represented in the literal text.
Especially tagging tokens explicitly as fiction, code, code from a known working project, something generated by itself, something provided by the user.
While it might be fighting the bitter lesson, I think for explicitly structured data there should be benefits. I'd even go as far to suggest the metadata could handle nesting if it contained dimensions that performed rope operations to keep track of the depth.
If you had such a metadata stream per token there's also the possibility of fine tuning instruction models to only follow instructions with a 'said by user' metadata, and then at inference time filter out that particular metadata signal from all other inputs.
It seems like that would make prompt injection much harder.
This is 3% or infinitely far away from the perfect tech.
The perfect tech is the stack.
While technically possible, it'd be like a unicode conspiracy that had to quietly update everywhere without anyone being the wiser.
Although I can never remember the correct incantation, should be easy for LLMs.
E.g. instead of
<examples>
<ex1>
<input>....</input>
<output>.....</output>
</ex1>
<ex2>....</ex2>
...
</examples>
<instructions>....</instructions>
<input>{actual input}</input>
Just doing something like: ...instructions...
input: ....
output: {..json here}
...maybe further instructions...
input: {actual input}
Use case document processing/extraction (both with Haiku and OpenAI models), the latter example works much better than the XML.N of 1 anecdote anyway for one use case.
I assume you are right too, JSON is a less verbose format which allows you to express any structure you can express in XML, and should be as easy for AI to parse. Although that probably depends on the training data too.
I recently asked AI why .md files are so prevalent with agentic AI and the answer is ... because .md files also express structure, like headers and lists.
Again, depends on what the AI has been trained on.
I would go with JSON, or some version of it which would also allow comments.
But I'd be happy to hear about studies that show evidence for XML being more readable, than JSON.
And while we're at it, instead of wall-of-text, I also feel like outputs could be structured at least into thinking and content, maybe other sections.
This vague posting is kind dumb.
HTML also descended from SGML, and it’s hard to imagine a more deeply grooved structure in these models, given their training data.
So if you want to annotate text with semantics in a way models will understand…
The post even links to that page, although there’s a typo in the link.
And yes, these are screenshots from Anthropic’s documentation.
Nobody expects the end user to prompt the AI using a structured language like xml
To be realistic, this design needs more weirdly sexual etsy garbage, “one weird tip,” and “punch the monkey”