Yes, I know it likely breaks everybody's terms of service but at the same time I'm not loading gigabytes of ads, images, markup, to accomplish things.
If anyone is interested I can take some time and publish it this week.
I also use it to automatically retrieve page responsiveness behavior in complex web apps. It uses playwright to adjust the width and monitor entire trees for exact changes which it writes structured data that includes the complete cascade of styles relevant with screenshots to support the snapshots.
There are tools you can buy that let you do this kind of inspection manually, but they are designed for humans. So, lots of clickety-clackety and human speed results.
---
My first reaction to seeing this FP was why are people still releasing MCPs? So far I've managed to completely avoid that hype loop and went straight to building custom CLIs even before skills were a thing.
I think people are still not realizing the power and efficiency of direct access to things you want and skills to guide the AI in using the access effectively.
Maybe I'm missing something in this particular use case?
It's God's gift to them when it lets them bypass ads and dl copyrighted material. But it's Satan's curse on humanity when the Zuck does it to train his llm and dl copyrighted material.
My excuse for not keeping up is that I'm in so deep that Claude Code can predict the stock market.
I'll still publish mine and see if has any value but agent browser looks very complete.
Thank you for sharing!
Here is what actually fixed it: https://github.com/yt-dlp/ejs/pull/53/changes
yt-dlp is relatively stable, but still occasionally breaks for long periods. I get the sense YouTube is becoming increasingly adversarial to yt-dlp as well.
I don't know the details, but it doesn't seem like yt-dlp is running the entire YouTube JS+DOM environment. Something like a real headless browser seems like it would break less often, but be much heavier weight. And Youtube might have all sorts of other mitigations against this approach.
IIRC they maintain a minimal execution environment that is able to run just the JS needed to pass a few checks but this breaks too often enough that they're planning to make Node.js or another JS interpreter a hard requirement (possibly already happened).
* there may well be other JS interpreters that are accepted, can be used - but solving JS challenges is required for much, if not all, YT content.
I rarely use yt-dlp anymore.
Before I just updated. Now when I do that, it usually becomes complex and full of questions.
I don't know anything about yt-dlp.
It would probably help people who want to go to a concert and have a chance to beat the scalpers cornering the market on an event in 30 seconds hitting the marketplace services with 20,000 requests.
I can try to see if can bypass yt-dlp. But that is always a cat and mouse game.
Never tried that level of autonomy though. How long is your iteration cycle?
If I had to guess, mine was maybe 10-20 minutes over a few prompts.
I use Patchright + Ghostery and I have a cleaver tool that uses web sockets to pass 1 second interval screenshots to the a dashboard and pointer / keyboard events to the server which allow interacting with websites so that a user can create authentication that is stored in the chrome user profile with all the cookies, history, local storage, ect.. in the cloud on a server.
Can you list some websites that don't require subscription that you would like to me to test against? I used this for Robinhood and I think Linked in would be a good example for people to use.
Great news to all of us keenly aware of MCP's wild token costs. ;)
The CLI hasn't been announced yet (sorry guys!), but it is shipping in the latest v0.20.0 release. (Disclaimer: I used to work on the DevTools team. And I still do, too)
‘I don’t have a girlfriend. But I do know a woman who’d be mad at me for saying that.’
‘I’m against picketing, but I don’t know how to show it.’
‘I haven’t slept for ten days, because that would be too long.’
‘I like to play blackjack. I’m not addicted to gambling. I’m addicted to sitting in a semi-circle.’
https://github.com/pasky/chrome-cdp-skill
For example, I use codex to manage a local music library, and it was able to use the skill to open a YT Music tab in my browser, search for each album, and get the URL to pass to yt-dlp.
Do note that it only works for Chrome browsers rn, so you have to edit the script to point to a different Chromium browser's binary (e.g. I use Helium) but it's simple enough
Is this the same as what Claude in Chrome does?
I tried that for a while and since I use Firefox and Chromium, the security problem of it seeing your tabs wasn't a big deal. Fresh Chrome install, only ever used for this exact purpose. Plus you can watch it working in real (actually very slow) time so if you did point it at something risky you can take over at any point.
For actual testing of web apps though, a skill with playwright cli in headless mode is much more effective. About 1-2k context per interaction after a bit of tuning.
edit: upon rereading, I now realize the (different) prompt injection risk you were calling out re: the handoff to yt-dlp. Separate profiles won't save you from that, though there are other approaches.
But if I understood the original commenter's use case, they're just searching YT Music to get the URL to a given song. This appears[0] to work fine without being logged in. So you could parameterize or wrap the call to yt-dlp and only have your cookie jar usable there.
Chrome's 'allow pasting' gets ignored reflexively by most users anyway. If this agent can touch DevTools the attack surface expands far faster than most people realize or will ever audit.
I think using the Pi coding agent really got me used to this way of thinking: https://mariozechner.at/posts/2025-11-30-pi-coding-agent/#to...
Also. AAarrgh, my new thing to be annoyed at is AI drivel written slop.
"No browser automation framework, no separate browser instance, no re-login."
Oh really, nice. No separate computer either? No separate power station, no house, no star wars? No something else we didn't ask for? Just one a toggle and you go? Whoaaaaaa.
Edit: lol even the skill itself is vibe coded:
Lightweight Chrome DevTools Protocol CLI. Connects directly via WebSocket — no Puppeteer, works with 100+ tabs, instant connection.
I feel like there's nothing fucking left on the internet anymore that is not some mean of whatever the LLM is trained to talk like now.
HN is becoming close to unusable, and this isn’t like the previous times where people say it’s like reddit or something. It is inundated with bot spam, it just happens the bot spam is sufficiently engaging and well-written that it is really hard to address.
DevTools MCP and its new CLI are maintained by the team behind Chrome DevTools & Puppeteer and it certainly has a more comprehensive feature set. I'd expect it to be more reliable, but.. hey open source competition breeds innovation and I love that. :)
(I used to work on the DevTools team. And I still do, too)
As someone that does heavy agentic coding (using basically all the tools), this is so far from the truth. People claiming this have probably never worked in large enterprise environments where things like authentication, RBAC, rate limiting, abuse detection, centralized management/updates/ops, etc. are a huge part of the development and deployment workflow.
In these situations you can't just use skills and cli tools without a gigantic amount of retooling and increased operational and security complexity. MCP is really useful here, and allows centralized eng and ops teams to manage their services in a way that aligns with the organizations overall posture, policies, and infrastructure.
> Google is so far behind agentic cli coding. Gemini CLI is awful.
This part I totally agree. It's really hard to express how bad it is (and it's really disappointing.)
You're describing MCP. After all, MCP is just reinventing the OpenAPI wheel. You can just have a self-documenting REST API using OpenAPI. Put the spec in your context and your model knows how to use it. You can have all the RBAC and rate limiting and auth you want. Heck, you could even build all that complexity into a CLI tool if you want. MCP the protocol doesn't actually enable anything. And implementing an MCP server is exactly as complex as using any other established protocol if you're using all those features anyway
I think it's clear a self-describing CLI is optimal for local-first tooling and portability. I personally view remote MCP servers as complementary in the space.
Source - I know people at Google.
Some people will push back on this. They are holding out hope that the recent improvements Anthropic has made in this regard have improved the context rot problem with MCP. Anthropic's changes improve things a little. But it is akin to putting lipstick on a pig. It helps, but not much.
The reason MCP is dying/dead is because MCP servers, once configured, bloat up context even when they are not being used. Why would anybody want that?
Use agent skills. And say goodbye to MCP. We need to move on from MCP.
MCP is a wire format protocol between clients and servers. What ends up inside the context window is the agent builder's decision.
The lipstick helps? This had me in stitches. Sorry for the non-additive reply. This is the funniest way I have seen this or any other phrase explained. By far. Honestly has made my day and set me up for the whole week.
MCP permanently sacrifice a chunk of the context window? And a skill for you cli is free?
MCP's remaining moats I think are:
- No-install product integrations (just paste in mcp config into app)
- Non-developer end users / no shell needed (no terminal)
- Multi-tenant auth (many users, dynamic OAuth)
- Security sandboxing (restrict what agents can do), credential sandboxing (agents never see secrets)
- Compliance/audit (structured logs, schema enforcement)?
If you're a developer building for developers though, CLI seems to be a clear winner right
But since when has this industry done the right thing informed by wisdom and hindsight?
When folks say MCP is dead, I don't get it. What other alternatives exist in place of MCP? Arbitrary code via curl/sdks to call a remote endpoint?
cli?
for example aws cli. It's a full interface to aws API. Why would you need mcp for that?
and if you have any doubts, agents use it with a great effect even without any relevant skill. "aws help" is fully discoverable.
It is hard to say nowadays, when things change so quickly
Couldn't have been more wrong. MCP despite its manageable downsides is leagues ahead of anything else in many ways.
The fact that SoTA models are trained to handle MCP should be hint enough to the observant.
I probably build one MCP tool per week at work.
And every project I work on gets its own MCP tool too. It's invaluable to have specialized per-project tooling instead of a bunch of heterogeneous scripts+glue+prayer.
Anything specialized goes into an MCP.
I use it extensively, many of my colleagues do. I get a ton of value out of it. Some prefer Antigravity, but I prefer Gemini CLI. I get fairly long trajectories out of it, and some of my colleagues are getting day-long trajectories out of it. It has improved massively since I started using it when it first came out.
What about all the CLI tools not baked into the model's priors?
Every time someone says "extensibility mechanism X is dead!", I think "Well, I guess that guy isn't doing anything that needs to extend the statistical average of 2010s-era Reddit"
The agent basically is living inside your running app with access to databases, endpoints etc. It's awesome.
We could put them in a dedicated tag:
<script type="text/skill+markdown">
---
name: ...
description ...
---
...
</script>
For all the skills with you want on the page, optionally set to default which "should be read in full to properly use the page".And then add some javascript functions to wrap it / simplify required tokens.
Made a repo and a website if anyone is interested: https://webagentskills.dev/
Favourite unexpected use case for me was telling gemini to use it as a SVG editing repl, where it was able to produce some fantastic looking custom icons for me after 3-4 generate/refresh/screenshot iterations.
Also works very nicely with electron apps, both reverse engineering and extending.
The thing I am working on is improving at the moment agentic tool usage success rates for my research and I use this as a proxy to access everything with the cookies I allow in the session.
Will check this out to see if they’ve solved the token burn problem.
This is one place where human intuition helps a ton today. If you can find the most relevant snippets and give the AI just the right context, it does a much better job.
But then I wonder, where the balance is between a bunch of small tool calls, vs one larger one.
I recall some recent discussion here on hn on big data analysis
The problem with MCP is that you are paying the price in token usage, even if you are not using the MCP server. Why would anybody want that?
And no, the tool search function recently introduced by Anthropic does not completely solve this problem.
I ran the Docker container locally for testing. Could a web developer test using Claude + Chromium in a Docker container without using their real Chrome instance?
Chrome's dev tools already had an API [1], but perhaps the new MCP one is more user friendly, as one main requirement of MCP APIs is to be understood and used correctly by current gen AI agents.
Hoping from some good stories from open claw users that permanently run debug sessions.
The approach I landed on was a deterministic enforcement pipeline that sits between the agent and the MCP server, so every tool call gets checked for things like SSRF (DNS resolve + private IP blocking), credential leakage in outbound params, and path traversal, before the call hits the real server. No LLM in that path, just pattern matching and policy rules, so it adds single-digit ms overhead.
The DevTools case is interesting because the attack surface is the page content itself. A crafted page could inject tool calls via prompt injection. Having the proxy there means even if the agent gets tricked, the exfiltration attempt gets caught at the egress layer.
For example, you can get a markdown out of most OpenAI documentation by appending .md like this: https://developers.openai.com/api/docs/libraries.md
Not definitive, but still useful.
Citation needed.
> The web already went through this evolution once: we went from screen-scraping HTML to structured APIs. Now we're regressing back to scraping because agents need to interact with sites that only have human interfaces.
To me, sites that "only have human interfaces" are more likely that not be that way totally on purpose, attempting to maximize human retention/engagement and are more likely to require strict anti-bot measures like Proof-of-Work to be usable at all.
Which is called ARIA and has been a thing forever.
We had this 20 years ago with the Semantic Web movement, XHTML, and microformats. Sadly, it didn't pan out for various reasons, most of them non-technical. There's remnants of it today with RSS feeds, which is either unsupported or badly supported by most web sites.
Once advertising became the dominant business model on the web, it wasn't in publishers' interest to provide a machine-readable format of their content. Adtech corporations took control of the web, and here we are. Nowadays even API access is tightly controlled (see Reddit, Twitter, etc.).
So your idea will never pan out in practice. We'll have to continue to rely on hacks and scraping will continue to be a gray area. These new tools make automated scraping easier, for better or worse, but publishers will find new ways to mitigate it. And so it goes.
Besides, if these new tools are "superintelligent", surely they're able to navigate a web site. Captchas are broken and bot detection algorithms (or "AI" themselves) are unreliable. So I'd say the leverage is on the consumer side, for now.