- JustHTML [1], which in practice [2] is a port of html5ever [3] to Python.
- justjshtml, which is a port of JustHTML to JavaScript :D [4].
- MiniJinja [5] was recently ported to Go [6].
All three projects have one thing in common: comprehensive test suites which were used to guardrail and guide AI.
References:
1. https://github.com/EmilStenstrom/justhtml
2. https://friendlybit.com/python/writing-justhtml-with-coding-...
3. https://github.com/servo/html5ever
4. https://simonwillison.net/2025/Dec/15/porting-justhtml/
Same user did a similar thing by creating an AWK interpreter written in Go using LLMs: https://github.com/kolkov/uawk -- as the creator of (I think?) the only AWK interpreter written in Go (https://github.com/benhoyt/goawk), I was curious. It turns out that if there's only one item in the training data (GoAWK), AI likes to copy and paste freely from the original. But again, it's poorly tested and poorly benchmarked.
I just don't see how one can get quality like this, without being realistic about code review, testing, and benchmarking.
CEO stated "We built a browser with GPT-5.2 in Cursor"
instead of
"by dividing agents into planners and workers we managed to get them busy for weeks creating thousands of commits to the main branch, resolving merge conflicts along the way. The repo is 1M+ lines of code but the code does not work (yet)"
[0] https://cursor.com/blog/scaling-agents
[1] https://x.com/kimmonismus/status/2011776630440558799
[2] https://x.com/mntruell/status/2011562190286045552
[3]https://www.reddit.com/r/singularity/comments/1qd541a/ceo_of...
If you view the PRs, they bundle multiple fixes together, at least according to the commit messages. The next hurdle will be to guardrail agents so that they only implement one task and don't cheat by modifying the CI piepeline
True, but it is shocking how often claude suggests just disabling or removing tests.
Arguably, Claude is simply successfully channeling what the developers who wrote the bulk of its training data would do. We've already seen how bad behavior injected into LLMs in one domain causes bad behavior in other domains, so I don't find this particularly shocking.
The next frontier in LLMs has to be distinguishing good training data from bad training data. The companies have to do this, even if only in self defense against the new onslaught of AI-generated slop, and against deliberate LLM poisoning.
If the models become better at critically distinguishing good from bad inputs, particularly if they can learn to treat bad inputs as examples of what not to do, I would expect one benefit of this is that the increased ability of the models to write working code will then greatly increase the willingness of the models to do so, rather than to simply disable failing tests.
>"To test this system, we pointed it at an ambitious goal: building a web browser from scratch."
and then near the end, they say:
>"Hundreds of agents can work together on a single codebase for weeks, making real progress on ambitious projects."
This means they only make progress toward it, but do not "build a web browser from scratch".
If you're curious, the State of Utopia (will be available at https://stateofutopia.com ) did build a web browser from scratch, though it used several modules/packages for the networking portion of it.
See my other comments and posts for links.
But apparently "some pages take a literal minute to load"
Seems like "I had to do the last mile myself", not "autonomous coding" which was Cursor's claim here.
Edit: As mentioned, I ran `cargo check` on all the last 100 commits, and seems every single of them failed in some way: https://gist.github.com/embedding-shapes/f5d096dd10be44ff82b...
> Sometime fishy is happening in their `git log`, it doesn't seem like it was the agents who "autonomously" actually made things compile in the end. Notice the git username and email addresses switching around, even a commit made inside a EC2 instance managed to get in there: https://gist.github.com/embedding-shapes/d09225180ea3236f180...
Gonna need to look closer into it when I have time, but seems they manually patched it up in the end, so the original claim still doesn't stand :/
https://github.com/wilson-anysphere/formula
The Actions overview is impressive: There have been 160,469 workflow runs, of which 247 succeeded. The reason the workflows are failing is because they have exceeded their spending limit. Of course, the agents couldn't care less.
I think they know they're on the backfoot at the moment. Cursor was hot news for a long time but now it seems terminal based agents are the hot commodity and I rarely see cursor mentioned. Sure they already have enterprise contracts signed but even at my company we're about to swap from a contract with cursor to Claude code because everyone wants to use that instead now - especially since it doesn't tie you to one editor.
So I think they're really trying to get "something" out there that sticks and puts them in the limelight. Long context/sessions are one of the hot things especially with Ralph being the hot topic so this lines up with that.
Also I know cursor has its own cli but I rarely see mention of it.
Diminishing returns are starting to really set in and companies are desperate for any illusion to the contrary.
I couldn’t make it render the apple page that was on the Cursor promo. Maybe they’ve used some other build.
Sometime fishy is happening in their `git log`, it doesn't seem like it was the agents who "autonomously" actually made things compile in the end. Notice the git username and email addresses switching around, even some commits made inside a EC2 instance managed to get in there: https://gist.github.com/embedding-shapes/d09225180ea3236f180...
> It's 3M+ lines of code across thousands of files. The rendering engine is from-scratch in Rust with HTML parsing, CSS cascade, layout, text shaping, paint, and a custom JS VM.
"From scratch" sounds very impressive. "custom JS VM" is as well. So let's take a look at the dependencies [1], where we find
- html5ever
- cssparser
- rquickjs
That's just servo [2], a Rust based browser initially built by Mozilla (and now maintained by Igalia [3]) but with extra steps. So this supposed "from scratch" browser is just calling out to code written by humans. And after all that it doesn't even compile! It's just plain slop.
[1] - https://github.com/wilsonzlin/fastrender/blob/main/Cargo.tom...
- Servo's HTML parser
- Servo's CSS parser
- QuickJS for JS
- selectors for CSS selector matching
- resvg for SVG rendering
- egui, wgpu, and tiny-skia for rendering
- tungstenite for WebSocket support
And all of that has 3M of lines!
It's also using weirdly old versions of some dependencies (e.g. wgpu 0.17 from June 2023 when the latest is 28 released in Decemeber 2025)
We at least it's not outright ripping them off like it usually does.
I wouldn't particularly care what code the agents copied, the bigger indictment is the code doesn't work.
So really, they failed to meet the bar of "download and build Chromium" and there's no point to talk about the code at all.
I doubt even they checked, given they say they just let the agents run autonomously.
It can be very hard to determine if an isolated patch that goes from one broken state to a different broken state is on net an improvement. Even if you were to count compile errors and attempt to minimize them, some compile errors can demonstrate fatal flaws in the design while others are minor syntax issues. It's much easier to say that broken tests are very bad and should be avoided completely, as then it's easier to ensure that no patch makes things worse than it was before.
The diffusion model of software engineering
Writing junk in a text file isn't the hard part.
That doesn't mean we can usefully build software that is a big, tangled mess.
What Cursor did with their blogpost seems intentionally and outright misleading, since I'm not able to even run the thing. With Codex/Claude Codex it's relatively easy to download it and run it to try for yourself.
Reminds me of SAAP/Salesforce.
You think you can just fire up Ableton, Cubase or whatever and make as great music as a artist who done that for a long time? No, it requires practice and understanding. Every tool works like this, some different difficulties, some different skill levels, but all of them have it in some way.
(I grant that you're speaking from your experience, about different tools, two replies up, but this claims is just paper-rock-scissorable through these various AI tools. "Oh, this tool's authors are just hype, but this tool works totes-mc-oates…". Fool me once, and all.)
Codex was sold to me as a tool that can help me do program. I tried it, evaluated it, found it helpful, continued using it. Based on my experience, it definitively helps with some tasks. Apparently also, it does not work for others, for some not at all. I know the tool works for me, and I take the claim that it doesn't for others, what am I left to believe in? That the tool doesn't actually work, even though my own experience and usage of it says otherwise?
Codex is still an "AI success", regardless if it could build an entire browser by itself, from scratch, or whatever. It helps as it is today, I wouldn't need it to get better to continue using it.
But even with this perspective, which I'd say is "nuanced" (others would claim "AI zealot" probably), I'm trying to see if what Cursor claims is actually true, that they managed to build a browser in that way. When it doesn't seem true, I call it out. I still disagree with "This is what most AI "successes" turn out to be when you apply even a modicum of scrutiny", and I'm claiming what Cursor is doing here is different.
> are definitively capable tools when used in certain ways
Which I received pushback on. My reply is to that pushback, defending what I said, not what others told you.
Edit: Besides the point, but Ableton (and others) constantly tell people how to learn how to use the tool, so they use it the right way. There is a whole industry of people (teachers) who specialize in specific software/hardware and teaching others "how to hold the tool correctly".
That's an almost universal truth that you need to learn how to use any non trivial tool.
They definitely can make some things better and you can do somethings faster, but all the efficiency is gonna get sucked up by companies trying to drop more slop.
It _is_ stuck at this point.
There's so much money involved no one wants to admit it out loud.
They have no path to the necessary exponential gains and no one is actually working on it.
I don’t mean the tech itself—-which is kind of useful. I mean the 99% of the value inflation of a kind of useful tool (if you know what you’re doing).
Browsers contain several high complexity pieces each of could take a while to build on its own, and interconnect them with reasonably verbose APIs that need to be implemented or at least stubbed out for code to not crash. There is also the difficulty of matching existing implementations quirk for quirk.
I guess the complexity is on-par with operating systems, but with the added compatibility problems that in order to be useful it doesn't just have to load sites intended to be compatible with it, it has to handle sites people actually use on the internet, and those are both a moving target, and tend to use lots of high complexity features that you have to build or at least stub out before the site will even work.
> looks inside
> completely useless and busted
30 billion dollar VS Code fork everyone. When we do start looking at these people for what they are: snake oil salesmen.
They slop laundered the FOSS Servo code into a broken mess and called it a browser, but dumbasses with money will make line go up based on lies. EFF right off.
Well, I'm a heavy LLM user, I "believe" LLM helps me a lot for some tasks, but I'm also a developer with decades of experience, so I'm not gonna claim it'll help non-programmers to build software, or whatever. They're tools, not solutions in themselves.
But even us "folks on HN" who generally keep up with where the ecosystem is going, have a limit I suppose. You need to substantiate what you're saying, and if you're saying you've managed to create a browser, better let others verify that somehow.
The top comment is indeed baseless hype without a hint of skepticism.
There is also clearly a lot of other skeptical people in that submission too. Also, simonw (from that top comment) told me themselves "it's not clear that what they built even runs": https://bsky.app/profile/simonwillison.net/post/3mckgw4mxoc2...
> This project from Cursor is the second attempt I've seen at this now!
I used the word "attempt" very deliberately, to avoid suggesting that either of these two projects had achieved the goal.
I don't see how you can get to "baseless hype without a hint of skepticism" there unless you've already decided to take anything I say in bad faith.
and he wonders why people call him a shill
accepting everything some shit company tells you as gospel is not the default position of a "researcher"
he better hope he's on the right side of history here, as otherwise he will have burnt his reputation
Edit: Of course, this isn’t a trait unique to Simon either. Everybody has blind spots, and it’s reasonable to be excited when new tech is released. On an unrelated note, my intent is to push back against some of the people here who try to shut down skepticism. Obviously, this doesn’t describe Simon, but I’ve seen others here who try to silence skeptical voices. This comes across as highly controlling and insecure.
I do not think you are reacting to what I said in good faith.
> he better hope he's on the right side of history here, as otherwise he will have burnt his reputation
That's something I've actually given quite a lot of thought to. My reputation and credibility matters a great deal to me. If it turns out this entire LLM thing was an over-hyped scam I'll take a very big hit to that reputation, and I'll deserve it.
(If AI rises up and tries to kill or enslave us all I'll be too busy fighting back to care.)
Always take any pronouncement from an AI company (heavily dependent on VC and public sentiment on AI) with a heavy grain of salt..
hype over reality
I’m building an AI startup myself and I know that world and its full of hypsters and hucksters unfortunately - also social media communication + low attention span + AI slop communication is a blight upon todays engineering culture
People were making all sorts of statements like: - “I cloned it and there were loads of compiler warnings” - “the commit build success rate was a joke” - “it used 3rd party libs” - “it is AI slop”
What they all seem to be just glossing over is how the project unfolded: without human intervention, using computers, in an exceptionally accelerated time frame, working 24hr/day.
If you are hung up on commit build quality, or code quality, you are completely missing the point, and I fear for your job prospects. These things will get better; they will get safer as the workflows get tuned; they will scale well beyond any of us.
Don’t look at where the tech is. Look where it’s going.
No one is hung up on the quality, but there is a ground fact if something "compiles" or "doesnt". No one is gonna claim a software project was successful if the end artifact doesn't compile.
Me neither, and I note so twice in the submission article. But I also didn't expect a project that for the last 100+ commits couldn't reliably be built and therefore tested and tried out.
Correct, but Gas Town [1] already happened and what's more _actually worked_, so this experiment is both useless (because it doesn't demonstrate working software) _and_ derivative (because we've already seen that you can set up a project where with spend similar to the spend of a single developer you can churn out more code than any human could read in a week).
I'm sorry but what? Are you really trying to argue that it doesn't matter that nothing works, that all it produced is garbage and that what is really important is that it made that garbage really quickly without human oversight?
That's.....that's not success.
This idea that quality doesn't matter is silly. Quality is critical for things to work, scale, and be extensible. By either LLMs or humans.
Am I misunderstanding this metaphor? Tsunamis pull the sea back before making landfall.