Show HN: Xmloxide – an agent made rust replacement for libxml2(github.com)

54 pointsby jawiggins8 hours ago13 comments

mdavid626an hour ago
Can you add “made with AI” to the GitHub repo?
It’s time to make this mandatory.
Nothing against AI - just to inform people about quality, maintainability and future of this library. No human has mental model of the code, so don’t waste your time creating it - the original author didn’t either.
- agentifysh17 minutes ago
  what would be the point ? why should this be mandatory ?
  none of your arguments make sense here
  - kelnos15 minutes ago
    GP literally tells you the point in the last paragraph. Makes perfect sense to me.
    agentifysh12 minutes ago
    and why should that be solved by "made by AI" being mandatory label when pretty much all of coding now involves it
    mdavid6263 minutes ago
    Involves and made only by, are 2 different things.
    I use agentic coding in my daily work. I do make mental model of the code I write and I also test the code, exactly the same way, as when written completely manually.
wooptoo6 hours ago
A comment on libxml, not on your work: Funny how so many companies use this library in production and not one steps in to maintain this project and patch the issues. What a sad state of affairs we are in.
- jawiggins6 hours ago
  Yeah I agree, maintaining OS projects has been a weird thing for a long time.
  I know a few companies have programs where engineers can designate specific projects as important and give them funds. But it doesn't happen enough to support all the projects that currently need work, maybe AI coding tools will lower the cost of maintenance enough to improve this.
  I do think there are two possible approaches that policy makers could consider.
  1) There could probably be tax credits or deductions for SWEs who 'volunteer' their time to work on these projects.
  2) Many governments have tried to create cyber reserve corps, I bet they could designate people as maintainers of key projects that they rely on to maintain both the projects as well as people skilled with the tools that they deem important.
  - da_chicken3 hours ago
    There should be public works grants to maintain them, or else a foundation specifically to maintain them funded with donations, grants, etc.
    The alternative is another XZ backdoor.
- ddlsmurf3 hours ago
  we need a tax on companies using or selling anything OSS, the funds of which go into OSS, the wealth it generated is insane, and it's nearly all just donations of experts
  - skybrianan hour ago
    That's a bit unclear on the concept. It's not open source if you have to pay for it. How about charging money for your code instead?
    saintfirean hour ago
    Well that's not strictly true.
    OSS is allowed to make money and there are projects that require paid licenses for commerical use.
    The source is available and collaborative.
    Qt states this on their site: Simply put, this is how it works: In return for the value you receive from using Qt to create your application, you are expected to give back by contributing to Qt or buying Qt.
    capitol_an hour ago
    There is nothing in the open source licensees that prevents charging money, in fact, non-commercial clauses are seen as incompatible with the Debian Free Software Guidelines.
    And there is a lot of companies out there that make their money based on open source software, red hat is maybe the biggest and most well known.
    skybrianan hour ago
    I meant in the sense that someone else can redistribute the source for free, not that the company has to do it.
    > The license shall not restrict any party from selling or giving away the software as a component of an aggregate software distribution containing programs from several different sources. The license shall not require a royalty or other fee for such sale.
    https://opensource.org/osd
- da_chicken3 hours ago
  Feels like tragedy of the commons.
  - wrboyce3 hours ago
    Feels more like you don’t understand the concept of the tragedy of the commons.
    EDIT: Sorry, I’ve had a shitty day and that wasn’t a helpful comment at all. I should’ve said that as I understand it TOTC primarily relates to finite resources, so I don’t think it applies here. Sorry again for being a dick.
    em-bee11 minutes ago
    the finite resource here is the unpaid developer time. everyone takes advantage of it until the developer burns out.
- black_136 hours ago
  [dead]
agentifysh14 minutes ago
lot of weird comments here getting upset AI was used but thanks for doing this
libxml2 is always one of those libraries that i used to have trouble with for different platforms
I think its great that more and more OSS projects get attention now with ai coding agents
kburman6 hours ago
Amazing work! I'd love to hear more details about your workflow with Claude Code.
As a side note and this isn't a knock on your project specifically. I think the community needs to normalize disclaimers for "vibe-coded" packages. Consumers really need to understand the potential risks of relying on agent-generated code upfront.
- nine_k3 hours ago
  Even more interesting is how much did the effort cost.
  Unlike the development work of old (pre-2025), work with high-end models incurs a very direct monetary cost, one burns tokens which cost money, and you can't have something as powerful to be running locally (even if you happened to have a Mac Pro Ultra with RAM maxed out).
  Some of my friends burned through hundreds of dollars a day while doing large amounts of (allegedly efficient) work with Claude Code.
- jawiggins5 hours ago
  Yeah its a fair point. I wondered if it might be irresponsible to publish the package because it was made this way, but I suspect I'm not the first person to try and develop a package with Claude Code, so I think the best I can do is be honest about it.
  As for the workflow, I think the best advice I can give is to setup as many guardrails and tools as possible, so Claude and do as many iterations before needing any intervention. So in this case I setup pre-commit hooks for linting and formatting, gave it access to the full testing suite, and let it rip. The majority of the work was done in a single thinking loop that lasted ~3 hours where Claude was able to run the tests, see what failed, and iterate until they all passed. From there, there was still lots of iterations to add features, clean up, test, and improve performance - but allowing Claude to iterate quickly on it's own without my involvement was crucial.
  - kelnos11 minutes ago
    I don't think it was irresponsible to publish it, but I do think it was irresponsible to publish it without clearly disclosing at the top of the crates.io README that it was built entirely by AI, and that you haven't reviewed the code (assuming you haven't).
    If I were looking for an XML parser/generator library, I might stumble across this and think it might be production-quality, and assume it was built by humans, or at least that humans had fully vetted and understand the code.
  - tonyedgecombe27 minutes ago
    Yes, if you tripped across this package in crates.io the readme gives the impression of a serious piece of software but your comments here imply it is a one off experiment rather than something you plan to maintain for the next decade.
hrtla3 hours ago
Yes, you can rip off any sucker who published a test suite when the AI is trained on existing code as well. Congratulations, you will be showered with praise and AI mafia money.
alexhans6 hours ago
> I do think there is something interesting to think about here in how coding agents like Claude Code can quickly iterate given a test suite.
This is a point I've tried to advocate for a while. Specially to empower non coders and make them see that we CAN approach automation with control.
Some aspects will be the classic unit or integration tests for validation. Others, will be AI Evals [1] which to me could be the common language for product design for different families/disciplines who don't quite understand how to collaborate with each other.
The amount of progress in a short time is amazing to see.
- [1] https://ai-evals.io/
- koakuma-chan5 hours ago
  Please stop spreading this "AI evals" terminology. "evals" is what providers like OpenAI and Anthropic do with their models. If you wrote a test for a feature that uses an LLM, it's just a test, there's no need to say "evals." Having a separate term only further confuses people who already have no idea what that actually means.
blegge7 hours ago
> arena-based tree with zero unsafe in the public API
Why "in the public API"? Does this imply it's using unsafe behind the hood? If so, what for?
- gpm5 hours ago
  I agree the wording is a bit strange, but a quick grep of the repo shows that it doesn't imply that.
  The only usages of unsafe are in src/ffi, which is only compiled when the ffi feature is enabled. ffi is fundamentally unsafe ("unsafe" meaning "the compiler can't automatically verify this code won't result in undefined behavior") so using it there is reasonable, and the rest of the crate is properly free of unsafe.
- fulafel4 hours ago
  It provides a libxml2-compatible C API and that accepted pointers, this would seem to necessitate unsafe at least.
- DetroitThrow6 hours ago
  Yeah I'm a bit confused because you can have an entirely unsafe code base with just the public interface marked as safe. No unsafe in the interface isn't a measure of safety at all.
  - mirashii6 hours ago
    It is a measure of the intended level of care that the users of your interface have to take. If there's no unsafe in the interface, then that implies that the library has only provided safe interfaces, even if it uses unsafe internally, and that the interface exposed enforces all necessary invariants.
    It is absolutely a useful distinction on whether your users need to deal with unsafe themselves or not.
    kelnos9 minutes ago
    It's useful, to be sure, but I wouldn't want to use a library with a safe public interface that is mostly unsafe underneath (unless it's a -sys crate, of course). I think "this crate has no unsafe code" or "this crate has a minimal amount of carefully audited unsafe code" are good things to see, in general.
    ahepp4 hours ago
    I guess I don't write enough rust to say this with confidence, but isn't that the bare minimum? I find it difficult to believe the rust community would accept using a library where the API requires unsafe.
    cstrahanan hour ago
    Not at all. Some things are fundamentally unsafe. mmap is inherently unsafe, but that doesn’t mean a library for it shouldn’t exist.
    If you’re thinking of higher level libraries, involving http, html, more typical file operations, etc, what you’re saying may generally be true. But if you’re dealing with Direct Memory Access, MCU peripherals, device drivers, etc, some or all of those libraries have two options: accept unsafe in the public interface, or simply don’t exist.
    (I guess there’s a third option: lie about the unsafety and mark things as safe when they fundamentally, inherently are not and cannot be safe)
    DetroitThrow3 hours ago
    >I guess I don't write enough rust to say this with confidence, but isn't that the bare minimum
    I have some experience and yes, unless you're putting out a library for specifically low-level behavior like manual memory management or FFI. Trivia about the unsafe fn keyword missed the point of my comment entirely.
    DetroitThrow5 hours ago
    Sure, it's a useful distinction for whether users need to care about safety but not whether the underlying code is safe itself, which is what I wrote about.
    No or very little but verified unsafe internal code is the bar for many Rust reimplementations. It would also be what keeps the code memory safe.
mkj3 hours ago
Intriguing work! Does it panic on any bad inputs? That's better than memory unsafety of libxml2, but still a DoS concern for some servers.
nicoburns6 hours ago
How does it compare to the original in terms of source code size (number of lines of code?)
- jawiggins5 hours ago
  It's significantly smaller. Because Rust doesn't require header files or memory management, xmloxide is ~40k lines while libxml2 is ~150k lines.
fourthark7 hours ago
Does it fix the security flaws that caused the original project to be shut down?
- jawiggins5 hours ago
  Because it was written in C, libxml2's CVE history has been dominated by use-after-free, buffer overflows, double frees, and type confusion. xmloxide is written in pure Rust, so these entire vulnerability classes are eliminated at compile time.
  - sarchertech4 hours ago
    Only if it doesn’t use any unsafe code, which I don’t think is the case here.
    3 hours ago
    undefined
    jawiggins4 hours ago
    Is that true? I thought if you compiled a rust crate with, `#[deny(unsafe_code)]`, there would not be any issues. xmloxide has unsafe usage only in the the C FFI layer, so the rest of the system should be fine.
- blegge6 hours ago
  https://gitlab.gnome.org/GNOME/libxml2/-/commit/0704f52ea4cd...
  Doesn't seem to have shut down or even be unmaintained. Perhaps it was briefly, and has now been resurrected?
- notpushkin6 hours ago
  If by flaws you mean the security researchers spamming libxml2 with low effort stuff demanding a CVE for each one so they can brag about it – no, I don’t think anybody can fix that.
  - bawolff5 hours ago
    Based on context, i kind of imagine they are more thinking of the issues surounding libxslt.
benatkin4 hours ago
It would be interesting to try this approach out with mQuickJS, QuickJS or micropython. They could potentially run hoops around the ones that were first coded in Rust, such as Boa or RustPython.
lynxbot20264 hours ago
[flagged]
- jawiggins4 hours ago
  Yes, in testing I did add four fuzzing targets to the repo:
  1. fuzz_xml_parse: throws arbitrary bytes at the XML parser in both strict and recovery mode
  2. fuzz_html_parse: throws arbitrary bytes at the HTML parser
  3. fuzz_xpath: throws arbitrary XPath expressions at the evaluator
  4. fuzz_roundtrip: parse → serialize → re-parse, checking that the pipeline never panics
  Because this project uses memory safe rust, there isn't really the need to find the memory bugs that were the majority of libxml2's CVEs.
  There is a valid point about logic bugs or infinite loops, which I suppose could be present in any software package, and I'm not sure of a way to totally rule out here.
  - agentifysh8 minutes ago
    pretty sure you are replying to a bot seems like they make a new account just to leave short drive by comments
    this is like the 8th green handle i've seen so far recently with similar style of comments I suspect is AI generated
man47 hours ago
[dead]