I use AI coding tools daily. They're genuinely useful for real work. But stunts like this make it harder to have honest conversations about what AI can and can't do. When executives see "AI built a browser in 3 million lines," they form expectations that set everyone up for disappointment.
The gap between AI demos and AI in production is wider than most people realize. We'd all be better off if people stopped optimizing for impressiveness and started optimizing for honesty.
> "So I agree this isn't just wiring up of dependencies, and neither is it copied from existing implementations: it's a uniquely bad design that could never support anything resembling a real-world web engine."
It hurts, that it wasn't framed as an "Experiment" or "Look, we wanted to see how far AI can go - kinda failed the bar." Like it is, it pours water on the mills of all CEOs out there, that have no clue about coding, but wonder why their people are so expensive when: "AI can do it! D'oh!"
I feel like a lot of the AI articles and experiments like this one are producing "app shaped objects" that look okay for making content (and indeed are fine for making earrings) but fall apart when pounded on by the real world.
They were making claims without the level of rigor to back them up. There was an opportunity to learn some difficult lessons, but—and I don’t think this was your intention—it came across to me as kind of access journalism; not wanting to step on toes while they get their marketing in.
The claims they made really weren't that extreme. In the blog post they said:
> To test this system, we pointed it at an ambitious goal: building a web browser from scratch. The agents ran for close to a week, writing over 1 million lines of code across 1,000 files. You can explore the source code on GitHub.
> Despite the codebase size, new agents can still understand it and make meaningful progress. Hundreds of workers run concurrently, pushing to the same branch with minimal conflicts.
That's all true.
On Twitter their CEO said:
> We built a browser with GPT-5.2 in Cursor. It ran uninterrupted for one week.
> It's 3M+ lines of code across thousands of files. The rendering engine is from-scratch in Rust with HTML parsing, CSS cascade, layout, text shaping, paint, and a custom JS VM.
> It kind of works! It still has issues and is of course very far from Webkit/Chromium parity, but we were astonished that simple websites render quickly and largely correctly.
That's mostly accurate too, especially the "it kind of works" bit. You can take exception to "from-scratch" claim if you like. It's a tweet, the lack of nuance isn't particularly surprising.
In the overall genre of CEO's over-hyping their company's achievements this is a pretty weak example.
I think the people making out that Cursor massively and dishonestly over-hyped this are arguing with a straw man version of what the company representatives actually said.
> In the overall genre of CEO's over-hyping their company's achievements this is a pretty weak example
I kind of agree, but kind of not. The tweet isn't too bad when read from an experienced engineer perspective, but if we're being real then the target audience was probably meant to be technically clueless investors who don't and can't understand the nuance.
It's far more dishonest to search for contrived interpretations of their statements in an attempt to frame them as "mostly accurate" when their statements are clearly misleading (and in my opinion, intentionally so).
You're giving them infinite benefit of the doubt where they deserve none, as this industry is well known for intentionally misleading statements, you're brushing off serious factual misrepresentations as simple "lack of nuance" and finally trying to discredit people who have an issue with all of this.
With all due respect, that's not the behavior of a neutral reporter but someone who's heavily invested in maintaining a certain narrative.
> We built a browser with GPT-5.2 in Cursor. It ran uninterrupted for one week.
tweet was seen by over 6 million people.
The follow up tweet which includes the link to the actual details was seen by less than 200000.
That's just how Twitter engagement works and these companies know it. Over 6 million people were fed bullshit. I'm sorry, but it's actually a great example of CEOs over hyping their products.
You only quoted the first line. The full tweet includes the crucial "it kind of works" line - that's not in the follow-up tweet, it's in the original.
Here's that first tweet in full:
> We built a browser with GPT-5.2 in Cursor. It ran uninterrupted for one week.
> It's 3M+ lines of code across thousands of files. The rendering engine is from-scratch in Rust with HTML parsing, CSS cascade, layout, text shaping, paint, and a custom JS VM.
> It kind of works! It still has issues and is of course very far from Webkit/Chromium parity, but we were astonished that simple websites render quickly and largely correctly.
The second tweet, with only 225,000 views, was just the following text and a link to the GitHub repository:
> Excited to continue stress testing the boundaries of coding agents and report back on what we learn.
> Code here: https://github.com/wilsonzlin/fastrender
It's like claiming "my dog filed my taxes for me!" when in reality everything was filled out in TurboTax and your dog clicked the final submit button. Technically true, but clearly disingenuous.
I'm not saying an LLM using existing libraries is a bad thing--in fact I'd consider an LLM which didn't pull in a bunch of existing libraries for the prompt "build a web browser" to be behaving incorrectly--but the CEO is misrepresenting what happened here.
> "So I agree this isn't just wiring up of dependencies, and neither is it copied from existing implementations: it's a uniquely bad design that could never support anything resembling a real-world web engine."
It didn't use Servo, and it wasn't just calling dependencies. It was terribly slow and stupid, but your comment is more of a mischaracterization than anything the Cursor people have said.
[0] https://github.com/search?q=repo%3Awilsonzlin%2Ffastrender%2...
[1] https://github.com/search?q=repo%3Awilsonzlin%2Ffastrender+h...
https://github.com/DioxusLabs/taffy
Used here (I think): https://github.com/servo/servo/tree/c639bb1a7b3aa0fd5e02b40d...
But it was accompanied by a link to the GitHub repo, so you can hardly claim that they were deliberately hiding the truth.
If anything, that proves the point that they weren't rigorous! They claimed a thing. The thing didn't accomplish what they said. I'm not saying that they hid it but that they misrepresented the thing that they built. My comment to you is that the interview didn't directly firmly pressure them on this.
Generating a million lines of code in parallel isn't impressive. Burning a mountain of resources in parallel isn't noteworthy (see: the weekly post of someone with an out of control EC2 instance racking up $100k of charges.)
It would have been remarkable if they'd built a browser from scratch, which they said they did, except they didn't. It was a 50 million token hackathon project that didn't work, dressed up as a groundbreaking example of their product.
As feedback, I hope in the future you'll push back firmly on these types of claims when given the opportunity, even if it makes the interviewee uncomfy. Incredible claims require incredible evidence. They didn't have it.
I don't think directly accusing them of being misleading about what they had done would have supported that goal, so I didn't do it.
Instead I made sure to dig into things like what QuickJS was doing in there and why it used Taffy as part of the conversation.
I believe in the UK the term for this is actually fraudulent misrepresentation:
https://en.wikipedia.org/wiki/Misrepresentation#English_law
And in this context it seems to go against The Consumer Protection from Unfair Trading Regulations 2008 and the Digital Markets, Competition and Consumers Act 2024:
Well, yes and no; we live in an era where people consume headlines, not articles, and certainly not links to Github repositories in articles. If VCs and other CEOs read the headline "Cursor Agents Autonomously Create Web Browser From Scratch" on LinkedIn, the project has served its purpose and it really doesn't matter if the code compiles or not.
You have a reputation. You don’t need to carry water for people who are misleading people to raise VC money. What’s the point of you language lawyering about the precise meaning of what he said?
“No no, you don’t get it guys. I’m technically right if you look at the precise wording” is the kind of silly thing I do all the time. It’s not that important to be technically right. Let this one go.
The reason I won't let this one go is that I genuinely believe people are being unfair to the engineer who built this, because some people will jump on ANY opportunity to "debunk" stories about AI.
I won't stand for misleading rhetoric like "it's just a Servo wrapper" when that isn't true.
this level of outrage seems absent when it's misleading in the pro-"AI" direction
https://github.com/wilsonzlin/fastrender/issues/98
A project that didn't compile at all counts as "kind of" working now?
> I won't stand for misleading rhetoric like "it's just a Servo wrapper" when that isn't true.
True, at least if it was a wrapper then it would actually kind of work, unlike this which is the most obvious case of hyping lies up for investors I've witnessed in the last... Well, week or so, considering how much bullshit spews out of the mouths of AI bros.
"This project is junk that doesn't even compile", for example.
I only know this because on occasion I'll notice there was a comment from them (I only check the name of the user if it's a hot take) and I ctrl-F their username to see 20-70 matches on the same thread. Exactly 0 of those comments present the idea that LLMs are seriously flawed in programming environments regardless of who's in the driver seat. It always goes back to operator error and "just you watch, in the next 3 months or years...".
I dunno, I manage LLM implementation consulting teams and I will tell you to your face that LLMs are unequivocally shit for the majority of use cases. It's not hard to directly criticize the tech without hiding behind deflections or euphemisms.
I was in a meeting recently where a director lauded Claude for writing "tens of thousands of lines of code in a day", as if that metric in and of itself was worth something. And don't even get me started on "What percentage of your code is written by AI?"
"I don't know, what percentage of your sweater is polyester?"
"I don't know, I think it's all cotton, why do you ask me such a random question?"
"Well surely you know that polyester can be made far cheaper in a plastics factory than cotton? Why do you use cotton?"
Being in a similar position to him now though... if it can be deleted it gets deleted.
If you write code in any capacity, you'll know that high LOC counts are usually a sign of a bad time, browsers and operating systems aside.
The rest is stuff like HarfBuzz for font rendering which is an entirely cromulent dependency for a project like this.
And the latter is what's driving the push for KPIs the most - "active" ETFs already were bad enough because their managers would ask the companies they invested in to provide easily-to-grok KPIs (so that they could keep more of the yearly fee instead of having to pay analysts to dig down into a company's finances), and passive ETFs make that even worse because there is now barely any margin left to pay for more than a cursory review.
America's desire for stock-based pensions is frying the world's economy with its second and third order effects. Unfortunately, that rotten system will most probably only collapse when I'm already dead, so there is zero chance for most people alive today to ever see a world free of this BS.
The reality was the AI made an uncompilable mess, adding 100+ dependencies including importing an entire renderer from another browser (servo) and it took a human software engineer to clean it all up.
Don't publish things like that. At the very least link to a transcript, but this is a very non-credible way of reporting those numbers.
I'd still be surprised if that added up to "trillions" of tokens. A trillion is a very big number.
Fully agree that the original authors made some unsubstantiated and unqualified claims about what was done - which is sad, because it was still a huge accomplishment as i see it.
> ...while far off from feature parity with the most popular production browsers today...
What a way to phrase it!
You know, I found a bicycle in the trash. It doesn't work great yet, but I can walk it down a hill. While far off from the level of the most popular supercars today, I think we have made impressive progress going down the hill.
We talked about dependencies, among a whole bunch of other things.
You can watch the full video on YouTube or read my extracted highlights here: https://simonwillison.net/2026/Jan/23/fastrender/
EDIT: I retract my claim. I didn't realize this had servo as a dependency.
They marketed as if we were really close to having agents that could build a browser on their own. They rightly deserve the blowback.
This is an issue that is very important because of how much money is being thrown at it, and that effects everyone, not just the "stakeholders". At some point if it does become true that you can ask an agent to build a browser and it actually does, that is very significant.
At this point in time I personally can't predict whether that will happen or not, but the consequences of it happening seem pretty drastic.
yes, every AI skeptic publicly doubted that right up until they started doing it.
And I'm an optimist, not one of the AI skeptics heavily present on HN.
From the post it sounds like the author would also doubt this when he talks about "glorified autocomplete and refactoring assistants".
You have the agents compile the code every single step of the way, which is what this project did.
Still, getting "something" to compile after a week of work is very different from getting the thing you wanted.
What is being sold, and invested in, is the promise that LLMs can accomplish "large things" unaided.
But they can't, as of yet, they cannot, unless something is happening in one of the SOTA labs that we don't know about.
They can however accomplish small things unaided. However there is an upper bound, at least functionally.
I just wish everyone was on the same page about their abilities and their limitations.
To me they understand conext well (e.g. the task, build a browser doesn't need some huge specification because specifications already exist).
They can write code competently (this is my experience anyway)
They can accomplish small tasks (my experience again, "small" is a really loose definition I know)
They cannot understand context that doesn't exist (they can't magically know what you mean, but they can bring to bear considerable knowledge of pre-existing work and conventions that helps them make good assumptions and the agentic loop prompts them to ask for clarification when needed)
They cannot accomplish large tasks (again my experience)
It seems to me there is something akin to the context window into which a task can fit. They have this compact feature which I suspect is where this limitation lies. Ie a person can't hold an entire browser codebase in their head, but they can create a general top level mapping of the whole thing so they can know where to reach, where areas of improvement are necessary, how things fit together and what has been and what hasn't been implemented. I suspect this compaction doesn't work super well for agents because it is a best effort tacked on feature.
I say all this speculatively, and I am genuinely interested in whether this next level of capability is possible. To me it could go either way.
It didn't have correctly configured GitHub Actions so the CI build was broken.
Even though I have no burden of proof to debunk your claims as you have provided no evidence for your claims, I will point out that another commenter [1] indicates there were build errors. And the developer agrees there were build errors [2] that they resolved.
I take back the implication I inadvertently made here that it compiled cleanly the whole time - I know that's not the case, we discussed that in our interview: https://simonwillison.net/2026/Jan/23/fastrender/#intermitte...
I'm frustrated at how many people are carrying around a mental model that the project "didn't even compile" implying the code had never successfully compiled, which clearly isn't true.
I am frustrated at people loudly and proudly "releasing" a system they claim works when it does not. They could have pointed at a specific version that worked, but chose not to indicating they are either intentionally deceptive or clueless. Arguing they had no opportunity for nuance and thus had no choice but to make false statements for their own benefit is ethical bankruptcy. If they had no opportunity for nuance, then they could make a statement that errs against their benefit; that is ethical behavior.
I do not think Cursor's statements about this project were remotely misleading enough to justify this backlash.
Which of those things would you classify as "false statements"? The use of "from scratch"?
absolutely
and clueless managers seeing these headlines will almost certainly lead to people losing their jobs
Take a look in the Cargo.toml: https://github.com/wilsonzlin/fastrender/blob/19bf1036105d4e...
Maybe there is a main servo crate as well out there, and fastrender doesn't depend on that crate, but at least in my mind fastrender depends on some servo browser functionality.
EDIT: fastrender also includes the servo HTML parser: html5ever (https://github.com/servo/html5ever).
I do not think that makes it a "Servo wrapper", because calling it that implies it has no rendering code of its own.
It has plenty of rendering code of its own, that's why the rendered pages are slow and have visual glitches you wouldn't get with Sero!
In… terms of sheer volume of production of useless crap?
I've yet to see anyone in this space be negatively impacted by their outlandish claims.
They release a new model or add extra sub agents and the slate is wiped clean.
Management already doesn't trust developers in any way. Why would they believe you, who are clearly just trying to save your job, over a big company who clearly is the future!
Or do you trust your management to make the right decision?
Is entropy increasing or decreasing the longer agents work on a code base? If it's decreasing, no matter how slowly, theoretically you could just say "ok, start over and write version 2 using what you've learned on version 1." And eventually, $XX million dollars and YY months of churning later, you'd get something pretty slick. And then future models would just further reduce X and Y. Right?
Maybe they just need to keep iterating.
I am an avid user of LLMs but I have not seen them remove entropy, not even once. They only add. It’s all on the verge of tech debt and it takes substantial human effort to keep entropy increases in check. Anyone can add 100 lines, but it takes genuine skill to do it 10 (and I don’t mean code golf).
And to truly remove entropy (cut useless tests, cut useless features, DRY up, find genuine abstractions, talk to PM to avoid building more crap, …) you still need humans. LLM built systems eventually collapse under their own chaos.
I think your analogy is quite fitting!
Significant typo I assume?
I'm also getting really tired of claims like "we are X% more productive with AI now!" (that I'm hearing day in and out at work and LinkedIn of course). Didn't we, as an industry, agree that we _didn't_ know how to measure productivity? Why is everyone believing all of these sudden metrics that try to claim otherwise?
Look, I'm not against AI. I'm finding it quite valuable for certain scenarios -- but in a constrained environment and with very clear guidance. Letting it loose with coding is not one of them, and the hype is dangerous by how much it's being believed.
But how do you measure it? All the metrics I see being chased (metrics that were never accepted as productivity measurements before) can be gamed with slop, and so slop is what we'll get.
Thats like the entire hype cycle: LLM builders see a bunch of hyper specific lanuage in fields they're not experts in and thing 'wow, AI is really smart!'
6 months ago with previous models this was absolutely impossible. One of the biggest limitations of LLMs is their difficulty with long tasks. This has been steadily improving and this experiment was just another milestone. It will be interesting a year from now to test how much better new models fare at this task.
project 1: build a text based browser using ratatui and quickjs.
project 2: base it on project 1. convert to gui, pages should render pure html.
project 3: acid1 compliance. Use constraint based programming to output final render, no animation support.
etc etc.There was a story going around about LLMs making minesweeper clones, and they were all terrible in extremely dumb ways. The headline wasn't obvious, so I thought the take that people were getting from it is that AI is making the same dumb mistakes that it was making a year ago. Nope. It was people ranting about how coders are going to be out of a job next week. Meanwhile, none of them can do a minesweeper clone with like 50 working examples online, maybe 8 things you have to do right to be perfect, and 9000 articles about minesweeper and even mathematical papers about minesweeper to make everything about the game and its purpose perfectly clear. And then AI generates buttons that don't do anything and timers that don't stop.
Claude Opus 4.5: "Build minesweeper as an artifact, don't use react"
(Then "Fix it to work on mobile where right click isn’t a thing")
Play it here: https://tools.simonwillison.net/minesweeper
Transcript here: https://claude.ai/share/2d351b62-a829-4d81-b65d-8f3b987fba23
It doesn't matter to the people that were fired that the AI isn't as capable as promised. They're still job hunting in a shitty job market. When management does eventually figure out the AI underperforms they'll hire back staff at a fraction of the salary.
So executives and management look great no matter what and everyone else gets screwed.
> tools like Cursor can be genuinely helpful as glorified autocomplete and refactoring assistants
That suggests a fairly strong anti-AI bias by the author. Anyone who thinks that this is all AI coding tools are today is not actually using them seriously.
That's not to say that this exercise wasn't overhyped, but a more useful, less biased article that's not trying to push an agenda would look at what went right, as well as what went wrong.
If this is the first time you’ve encountered a hype bubble, it’s a good opportunity to learn so that you can navigate the next one more easily.
The obvious? Selling subscriptions to individuals, reaching higher-ups with bombastic headlines, reaching potential investors, perpetuating the bubble.