MCP-B: A Protocol for AI Browser Automation(mcp-b.ai)

336 pointsby miguelspizza7 months ago42 comments

jacquesm7 months ago
Prediction: this will go the same way as RSS. Companies don't like you to be in control of how you use their data.
- TeMPOraL7 months ago
  Indeed. Though I guess a better example would be: it'll go the same way as REST APIs (which happen to be fundamentally the same thing as MCP anyway).
  Remember the time when REST was the new hot thing, everyone started doing API-first design, and people thought it'll empower people by letting programs navigate services for them programmatically? Remember when "mashups" were the future?
  It all died before it could come to pass, because businesses quickly remembered that all their money comes specifically from denying users those capabilities.
  - throwawaymaths7 months ago
    REST did not die. it mostly became a mechanism for business managers to separate concerns between frontend and backend.
    i wonder if mcp will become, "let the project people talk to the backend team and the frontend team separately and the AI will figure out the middle"
    ep1037 months ago
    This is already the way my company wants to go.
    Put MCP in front of every GET API, and let teams explore on their own
  - xienze7 months ago
    > and people thought it'll empower people by letting programs navigate services for them programmatically?
    I don’t think that concept died because of nefarious business-related reasons but rather that building true HATEOAS APIs is hard and the utility of “automatically navigable APIs” is quite limited. It’s a neat trick to point a generic API client at an API and crawl it automatically, but hardly anyone consumes APIs that way. We read the API docs and construct integrations suited to the task at hand, manually.
    TeMPOraL7 months ago
    Right. But there are hardly any useful APIs you can just use, with only an account on a service and willingness to read the docs. Everyone is exposing as little functionality as possible, and even that only under special conditions that make them useless for regular people. APIs are primarily a way for businesses to partner these days.
  - deepdarkforest7 months ago
    I don't know about that. Zapier and automation apps were huge before agents, or even for integrations for Slack. There is definitely a big portion of tech products that have mutual benefits by providing good APIs to be in the same bubble
    TeMPOraL7 months ago
    Yes, that's my point - Zapier is exactly where this is heading. Automation as a service, requiring you to enter into contracts with everyone, and limited only to what said services feel like enabling. This is the opposite of what we hoped APIs will be, and the opposite of what people hope MCP will be.
  - pyuser5837 months ago
    How is MCP doing the same as REST?
    I’m a REST developer learning MCP, and most of my effort is spent finding anything new to learn.
    So I’m not suprised by this statement, but I’m a bit startled.
    How are they the same thing?
  - sneak7 months ago
    REST and MCP aren’t fundamentally the same thing. MCP is JSON-RPC, and includes special methods that allow you to enumerate the various functions and their signatures. REST apis have none of that, and use different verbs. JSON-RPC is always POST (which kills cacheability for common reads, unfortunately).
- SquareWheel7 months ago
  Isn't RSS a smashing success? I changed readers after Google Reader died, but otherwise, my feeds have been working seamlessly for nearly 20 years. I rarely meet a site with updates that doesn't support RSS.
  - bayindirh7 months ago
    Until recently, many sites had RSS functionality because the infrastructure they are using provided some of automated RSS generation. Also, many sites stopped providing "full-content" RSS feeds, but gave pointers to the website itself to drive clicks.
    "Real" RSS gives you the whole content. The blog platform I use does this, for example. They are not greedy people and just want to provide a blog platform so, they use the thing as it's supposed to be.
  - xnx7 months ago
    Twitter, Instagram, TikTok, craigslist, eBay, Amazon, etc.
- latexr7 months ago
  > Prediction: this will go the same way as RSS.
  Meaning what? RSS remains ubiquitous. It’s rare to find a website which doesn’t support it, even if the owners don’t realise it or link to it on their page. RSS remains as useful as it ever was. Even if some websites only share partial post content via RSS, it’s still useful to know when they are available (and can be used as an automation hook to get the full thing).
  RSS is alive and well. It’s like if you wrote “this will go the same way as the microwave oven”.
  - 57473m3n7Fur7h37 months ago
    The built-in RSS reader in Firefox was removed. (But extensions exist to add RSS reader to Firefox.)
    Google killed Google Reader. (Other products exist you can use instead.)
    Facebook removed support for RSS feeds. (You can replace it with third party tools or API calls.)
    It’s not dead dead, but it did seem to lose some momentum and support over time on several fronts.
    latexr7 months ago
    > It’s not dead dead
    It’s not dead, period. Not dead, dead dead, dead dead dead, or any other combination.
    Yes, some integrations were removed, but on the whole you have more apps and services for it than ever. The death of the behemoth that was Google Reader was a positive there.
    Maybe fewer people are using it, but the technology itself is fine and continues to be widely available and supported by most websites, which was the point.
    Maybe Facebook and Instagram don’t have RSS access, but you can’t even navigate two pages on them without an account, anyway. They are closed to everything they don’t control, which has nothing to do with RSS.
    Gracana7 months ago
    I got the feeling that it was dead long ago. Sure there's plenty of readers and lots of sites "support" it, but what they tend to give you is a headline and article blurb feed with links to click through to see the ads. They don't want you consuming their articles through a reader. That's what makes it "dead" to me and a lot of others.
    Aurornis7 months ago
    These products were phased out because most people weren't using them.
    RSS is and always was very niche. There are always claims that companies killed RSS for nefarious reasons, but I think the truth is much simpler: Companies stopped putting resources into RSS tools because very few people use them.
    The people who use RSS are very vocal about their support, but they're a small minority of internet users. Even I started with an RSS reader but found myself preferring a set of bookmarked favorites to the sites I wanted to read, even though they're not equivalent in functionality. For my use case, a random sampling of websites that I could visit during times I had 15 free minutes to read something was better than collecting everything into one big feed, even though I would have guessed the opposite before trying both ways.
    threetonesun7 months ago
    Was RSS was not niche, and it's not niche today unless you consider podcasts niche. Also most new social media platforms have RSS built in.
    It was nefariously killed by companies, especially news sites, who saw no good way to monetize RSS feeds, and would much rather you keep clicking bookmarks to be served new ads.
- wkat42427 months ago
  It doesn't matter. Soon the AI will be able to click and scroll like a normal user. It's going to be another arms race.
- theptip7 months ago
  Maybe, but the market structure has inverted and the big guys now want to be in the intelligence layer, not content. (Content is being commoditized.)
  Google can still sell ads as long as they own the eyeballs and the intelligence that’s engaging them.
  Google did not want you using RSS because it cut out Google Search.
- worldsayshi7 months ago
  Unless it becomes useful enough that customers will go through the hassle of switching to companies that are "AI-ready".
fzysingularity7 months ago
The contributions for the Github project is quite intriguing: https://github.com/MiguelsPizza/WebMCP/graphs/contributors
MiguelsPizza | 3 commits | 89++ | 410--
claude | 2 commits | 31,799++ | 0--
- miguelspizza7 months ago
  I did some git history re-visioning when I closed sourced the extension for a bit. So these are not super accurate. Claude code did write about 85% of the code though.
  - Simon_O_Rourke7 months ago
    How can you figure out that percentage? The commit logs?
    shreddit7 months ago
    More likely a wild guess. The number is also probably higher...
  - fzysingularity7 months ago
    Nice!
    rapind7 months ago
    I was checking that out too. Looks like claude was co-author on the initial commit, which is like 90%.
    https://github.com/MiguelsPizza/WebMCP/commit/26ec4a75354b1c...
    csomar7 months ago
    The main author doesn't have a GitHub account or un-linked his account (on purpose or by accident).
    miguelspizza7 months ago
    Looks like Claude put me down as a co-author with my real name (Alex Nahas) instead of my GitHub handle (MiguelsPizza)
- efitz7 months ago
  You’re going to see this pattern a lot more in the future.
- consumer4517 months ago
  Claude's contributions graph is interesting. What is going on here? Does Claude Code commit as itself sometimes, but extremely rarely? I don't understand.
  https://github.com/claude
  - handfuloflight7 months ago
    If you ask it to commit it'll sign itself as the author.
    consumer4517 months ago
    But then, how are there so few commits in its profile graph? I suppose I may be admitting my ignorance of how public GitHub works, but still curious.
    eddythompson807 months ago
    I was guessing Anthropic asked them to turn it off. Though why not also ask to delete the old activity of the person you bought the account from?
    Like: https://github.com/fotinakis/swagger-blocks/issues/3
    darkwater7 months ago
    Wow! Now I wonder how many bucks they paid for it.
    Because for sure they didn't DMCAed their way to own that account, right? Right?
    notpushkin7 months ago
    Whoa.
    handfuloflight7 months ago
    Oh no it's not done through that account. Seems to be some sort of ephemeral account scoped to the repo.
    baobun7 months ago
    What's in authour/commit meta is orthogonal to which github.com account you are actually using. Anyone can commit as anyone as long as you have any way to push and if you want better than that you need signed commits. There are no "ephemeral accounts" in play here.
    handfuloflight7 months ago
    Right, that was my intended meaning of ephemeral accounts.
    numpad07 months ago
    git require name and email to commit, but there's no cryptography involved there. Maybe GitHub won't green the lawn for unverified third party commits included in pushed commits?
    DougBTX7 months ago
    For a little while someone added Claude’s noreply email address to their account, so their profile started appearing alongside private repo commits!
    sneak7 months ago
    I don’t understand why we are personifying AI. They’re inanimate tools. My commits don’t credit my keyboard.
    handfuloflight7 months ago
    Might be less personification and more a figure of speech.
- gubicle7 months ago
  That doesn't look right... if you look at the actual commits, they are all from
  MiguelsPizza / Alex Nahas
  https://github.com/MiguelsPizza/WebMCP/commits/main/
  - byteknight7 months ago
    He rewrote history to hide it?
    He admits it here https://news.ycombinator.com/item?id=44516104
mehdibl7 months ago
From the blog post:
"The Auth problem At this point, the auth issues with MCP are well known. OAuth2.1 is great, but we are basically trying to re-invent auth for agents that act on behalf of the user. This is a good long term goal, but we are quickly realizing that LLM sessions with no distinguishable credentials of their own are difficult to authorize and will require a complete re-imagining of our authorization systems. Data leakage in multi-tenant apps that have MCP servers is just not a solved problem yet.
I think a very strong case for MCP is to limit the amount of damage the model can do and the amount of data it will ever have access to. The nice thing about client side APIs in multi-tenant apps is they are hopefully already scoped to the user. If we just give the model access to that, there's not much damage they can do.
It's also worth mentioning that OAuth2.1 is basically incompatible with internal Auth at Amazon (where I work). I won't go to much into this, but the implications of this reach beyond Amazon internal."
1. Oauth is not working in Amazon ==> need solution.
2. Oauth are difficult to authorize
3. limit the amount of damage the model can do WHILE "ulti-tenant apps is they are hopefully already scoped to the user".
I feel from a security side there is an issue here in this logic.
Oauth for apps can be far more tuned than current web user permission as usually, user have modification permission, that you may not want to provide.
Oauth not implemented in Amazon, is not really an issue.
Also this means you backdoor the App with another APP you establish trust with it. ==> This is a major no go for security as all actions on MCP app will be logged in the same scope as USER access.
You might just copy your session ID/ Cookie and do the same with an MCP.
I may be wrong the idea seem intersting but from a security side, I feel it's a bypass that will have a lot of issues with compliance.
- arkh7 months ago
  Wait, isn't it already part of Oauth?
  > https://datatracker.ietf.org/doc/html/rfc8693#name-delegatio...
- miguelspizza7 months ago
  Not sure I understand. The model has no more access than the user does. proper security implementation still lies with the website owner
  - tehryanx7 months ago
    I think the point is that you shouldn't be giving the agent the same privileges as the user. This is one of the biggest issues with how people are using agents rn imo. The agent should be treated as an untrusted user in your client, given restricted privileges scoped to only the exact access they need to perform a given task.
    miguelspizza7 months ago
    Again, that's up to the website owner. They can give the model anywhere from no access to full access to the client side api.
    > The agent should be treated as an untrusted user in your client, given restricted privileges scoped to only the exact access they need to perform a given task
    I agree, this is exactly what MCP-B does
    ricardobeat7 months ago
    The data you give it can be shared with any other website, at the agent's discretion. Some of it might be safe to share with the user, but not with third parties; at a minimum this should request permission when trying to share data between different websites/servers.
    michaelmior7 months ago
    > at a minimum this should request permission when trying to share data between different websites/servers.
    I don't see how you could possibly implement such a thing reliably. Do you scan all the parameters to other tool calls from different servers looking for something in a previous response? Even if you do that, the LLM could derive something private from a previous response that couldn't easily be detected. I suppose you could have an agent that tracks data flow in some way, but that's beyond the scope of MCP.
    tehryanx7 months ago
    I don't think it is beyond the scope of MCP. Browsers have controls to prevent cross-origin data exposures, and this protocol is designed to bridge origins across a context that they all have access to. It's breaking the existing isolation mechanism. If you're building a system that breaks the existing security controls of the environment it's running in I think you have an architectural responsibility to figure out a way to solve for that.
    Especially in this context, where decades have been spent building and improving same origin policy controls. The entire web has been built around the expectation that those controls prevent cross origin data access.
    I also don't even think it's that difficult to solve. For one, data in the context window doesn't have to be a string, it can be an array of objects that contain the origin they were pulled from as metadata. Then you can provide selective content to different MCP-B interfaces depending on their origins. That would live in the protocol layer that would help significantly.
    michaelmior7 months ago
    The agent is probably not operating in the browser though. And current LLMs work with (tokenized) strings, not objects.
    miguelspizza7 months ago
    ah this is a great point, I will add it to the road map
    tehryanx7 months ago
    I'm not following.
    Say I have your browser extension running, and it's interfacing with an MCP-B enabled banking application using my session to access my data in that app.
    I also have it connected to MCP-B enabled rogue web app that I mistakenly trust.
    My browser has an entire architecture built around preventing data from crossing between those two origins, but what's stopping a malicious instruction from the rogue app asking the extension agent to include some data that it pulled into the context window from the banking app?
    Further, when I use MCP in my IDE I have to deliberately provide that MCP server with a token or credentials to access a protected resource. With MCP-B, isn't it just automatically provided with whatever credentials are already stored in cookies/etc for a given MCP-B enabled app? If I load an MCP-B enabled app, does the agent automatically have access or do I have to configure it somewhere?
    > If a website wants to expose a "delete all user data" tool, that's on them. It's no different than putting a big red delete button on the page.
    It is different though, because the directive to push that button can come from somewhere other than the user, unless you've somehow solved prompt injection.
    The point I'm driving toward is that I think you're violating the most common assumption of the web's long-standing security model, that data is protected from leaking cross origin by the browser. There's no SOP or CORS for your agent extension, and that's something that web apps have been built to expect. You're basically building an SOP bypass extension.
    miguelspizza7 months ago
    Ah I see. Yes this is a concern, but this issue is actually not unique to MCP-B and is just a generally issue with agentic workflows that rely on a dynamic toolset from 3p vendors. (which any MCP server local or remote has the ability to be)
    > With MCP-B, isn't it just automatically provided with whatever credentials are already stored in cookies/etc for a given MCP-B enabled app?
    Not exactly, MCP-B just allows your extension agent to call functions that the website owner explicitly exposes. The client itself is not given an credentials like traditional MCP.
    > If I load an MCP-B enabled app, does the agent automatically have access or do I have to configure it somewhere?
    Theres more in the blog post but how much access the agent has and how much human approval is needed to grant this access is completely up to the website creator.
    FWIW your points are valid and MCP-B should enforce some guardrails when any domain shift happens via elicitation: https://modelcontextprotocol.io/specification/draft/client/e...
    I'll add it to the road map. Thanks for bringing it up!
    tehryanx7 months ago
    I do think the threat model here is a bit unique though.
    If I'm running two MCP servers on my machine, I'm the one that installed them, I'm the one that assigned what permissions they have in my environment, and I'm the one that explicitly decided what level of access to give them within whatever resource they're accessing. That gives me reasonably strong control over, or at least full knowledge of, what data can be shared between them.
    With MCP, I can use oauth to make very deliberate decisions about the scope of access I want to give the agent.
    With MCP-B, it's the web application owner that installed the interface and what access it has to my data, and the agent running in my client gets access to whatever that third party deemed appropriate.
    With MCP-B the agent has the same access I do by default, with the only restrictions being up to the app owner rather than it being up to me.
    MCP auth is not perfect by any stretch, but the key thing it gives the user is the capacity to restrict what the agent has access to with some granularity. That's super important because the agent can't be trusted when it's consuming inputs the user didn't explicitly define. MCP-B doesn't have this, if you have the agent in your browser it has access to whatever resources you have so long as they were exposed by a tool call, which isn't somethign the user has any say in.
    miguelspizza7 months ago
    I see your point. The MCP-B zero config nature from a user perspective is simultaneous it's biggest strength and weakness. You can think of it kind of like putting your Social Security number into a website. You are putting a bunch of trust that they are going to protect it properly.
    With MCP-B you are putting trust in both the model and the website owner. It opens up the user to risk for sure, but it's up to them to determine if the upside is worth it.
    tehryanx7 months ago
    I appreciate your responses here. The thing that still really stands out to me as a completely novel risk in this framework is that the extension is automatically seeking out and attaching to these servers as soon as a page gets loaded.
    This seems really bad to me. There are so many ways for a website to end up in one of my browser tabs without me wanting it there, or even knowing it's there.
    If that happens, and that tab just so happens to be a malicious MCP-B enabled page, it could steal all kinds of data from all kinds of different web apps I'm interacting with. I think it should be seen as the responsibility of the framework to enforce some level of data isolation, or at the least opt-in consent mechanisms.
    miguelspizza7 months ago
    Yea this is a really good idea, Maybe like a popup that say "hey x website has an MCP, do you trust it to connect?"
    I guess there would also need to be a way to "audit" a websites full tool list at connection time and throw some sort of warning if tools show up that are not part of this list during use.
    Interesting problems for sure. I really appreciate you taking to time to think them through. I'll call these out in the issues section of the repo
    miguelspizza7 months ago
    Added a summary of what has been brought up here to the repo wiki. Let me know if I missed anything
    https://github.com/MiguelsPizza/WebMCP/wiki/Known-Security-I...
    michaelmior7 months ago
    > with the only restrictions being up to the app owner rather than it being up to me.
    I don't see any reason sites using MCP-B couldn't have settings to restrict access to certain data based on user configuration.
    tehryanx7 months ago
    Sure, but the leak risk is happening in a place outside the site's control.
    If the purpose of the MCP-B tool on mail.com is to summarize your email, then the site needs to allow the agent to pull your email into the context window. Once it's in the context window it's available to any other MCP-B enabled site that can convince the agent to send it along.
    michaelmior7 months ago
    Sure. My point was that you can limit what the agent is allowed to access at the very least. The fact that you need to trust the agent not to share the info is a n important, but separate concern.
    owebmaster7 months ago
    > The agent should be treated as an untrusted user in your client,
    An untrusted user in a client is a hacker/invasor, not an agent.
    lcnPylGDnU4H9OF7 months ago
    That’s not really a reason not to treat the agent like it’s “rogue”. The point is, if it accepts any untrusted inputs then, from a security perspective, it is possible for any given (untrusted) input to contain a prompt injection payload that jailbreaks the model and tells it to do things it shouldn’t do.
    As such, it can be told to do bad stuff in a way that can’t be prevented and therefore should not be given read access to anything you don’t want others to know about, nor write access to any data of which you care about the integrity.
    TeMPOraL7 months ago
    That is out of scope of the service. What kind of user agent the actual user deputizes to interact with a service, is the user's own choice and responsibility. In general, it's not something a service can solve on their end.
    arnarbi7 months ago
    Services can certainly make this safer by providing means to get more restricted credentials, so that users can deputize semi-trusted delegates, such as agents vulnerable to injection.
    The important point being made in this discussion is that this is already a common thing with OAuth, but mostly unheard of with web sessions and cookies.
    tehryanx7 months ago
    an untrusted, but permitted, user is why sandboxes exist. There are plenty of times you want to allow an untrusted user to have capabilities in a system, that's why you restrict those capabilities.
    owebmaster7 months ago
    a sandboxed user is not an untrusted user of the client but an unstrusted user of the host, that is why the client is sandboxed.
    tehryanx7 months ago
    sandboxing is a general term for actor isolation, and its context agnostic.
    For example, when you use the sandbox attribute on an iframe in a web application, it's not the user that's untrusted, it's some other user that's attempting to trigger actions in your client.
    miguelspizza7 months ago
    I've thought more about this and I think the only way to make completely sure that sensitive data does not get leaked is by making sure it never makes it into the models context in the first place.
    The issue is even if the MCP-B extension makes it so the user has to give confirmation when the agent want's to call a tool on a new domain after interacting with another domain, there is no clear way to determine if a website is malicious or not.
    A solution to this might be to give server owners the ability to write the restricted data to extension storage on tool response instead of returning it to the models context. Instead, a reference to this location in extension storage get's passed to the model. The model then has the ability to "paste" this value into other website via tool call without ever actually seeing the value itself.
    That way, MCP-B can put lots of warnings and popups when this value is requested to be shared.
    Any thoughts?
  - ImPostingOnHN7 months ago
    When you say "the user", do you mean that if Alice set up the MCP, and Bob, Charlie, and Dave all access it, the MCP will only execute commands as Bob, or Charlie, or Dave, depending on who is accessing it?
slt20217 months ago
Could all of this be replaced simply by publishing OpenAPI (Swagger) spec and using universal swagger mcp client ???
This basically leaves up to the user to establish authenticated session manually.
Assuming claude is smart enough to pick up API key from prompt/config, and can use swagger based api client, wouldnt that be the same?
- miguelspizza7 months ago
  That was everyone's first thought when MCP came out. Turns out it doesn't work too well since there is generally too many tools. People are doing interesting work in this space though
  - randomaifreak7 months ago
    Yeah agreed. Tool overload is quite problematic. And then having to interact with the api for each website and their tools and possibly clashing tool names isnt ideal.
- nilslice7 months ago
  pls don't put an api key in a prompt
  - loandbehold7 months ago
    It may or may not be an issue. It's ok to give it API key for test/qa system but probably not for prod.
- loandbehold7 months ago
  I found i can have Claude Code consume API just by giving it link to swagger.json in CLAUDE.md. it's very useful for adhoc testing.
- 7 months ago
  undefined
- efitz7 months ago
  Do it.
SchemaLoad7 months ago
Not sure who the intended user is here? For frontend testing you actually do somewhat want the tests to break when the UI changes in major ways. And for other automation you'd be better off providing an actual API to use.
- nicman237 months ago
  scrappers and me buying milk with a vlm
throwanem7 months ago
> If I asked you to build a table and gave you a Home Depot you probably would have a harder time than if I gave you a saw, a hammer and some nails.
I doubt that, first and not least because Home Depot stocks lumber.
- bobmcnamara7 months ago
  Home Depot also sells tables.
  - devoutsalsa7 months ago
    Haha, that’s pretty close to any software I’ve ever written. Suck in a ton of open source dependencies, write comparatively little, and say “look what I made!” Buying a table, adorning it with vase of fake flowers, and claiming to be a Senior Woodworking Engineer sounds about right. I’ll be a Principal after buying a new bed frame & putting a mattress on it.
    sneak7 months ago
    Wait until you find out that lumber doesn’t grow on trees like that.
    It’s manufacturing all the way down.
    Don’t sell yourself short.
  - throwanem7 months ago
    Not good ones. But in any case the spec was not to provide a table, was it?
  - stuartjohnson127 months ago
    import table
    table()
- 7 months ago
  undefined
- leptons7 months ago
  You're supposed to "hallucinate" the lumber.
- latexr7 months ago
  And, I imagine, Home Depot might have better and more precision tools available, plus professionals who know how to use them.
- miguelspizza7 months ago
  Fixed. Nice catch
  - throwanem7 months ago
    Well, I've built tables before.
- airtonix7 months ago
  [dead]
- 7 months ago
  undefined
Abishek_Muthian7 months ago
I’ve haven’t used any MCP so far but as a disabled person I see use cases in accessibility for MCPs doing browser/smartphone automation.
But any accessibility tool will be exploited by nefarious actors so I wonder how many main stream websites/apps would implement these MCP.
Has anyone tried any MCP for improving accessibility?
- krashidov7 months ago
  > But any accessibility tool will be exploited by nefarious actors so I wonder how many main stream websites/apps would implement these MCP.
  How so?
  - Abishek_Muthian7 months ago
    Android smartphone bot farms use the accessibility service to automate usage of apps.
    Audio captchas are often used by bots.
  - mattlondon7 months ago
    Anything that makes it easier to automate will make bad actors more efficient.
    So people like ticket sales sites, eBay etc. It will make it easier for those sites to have all the tickets purchased or for auctions to be sniped etc.
    FWIU, these sort of sites actually (currently at least) put on measures to try and stop bots using them for these reasons.
    sneak7 months ago
    Scalping tickets and sniping auctions are legitimate use cases by customers.
    “Use it, but not like that” is not a legitimate position to take.
    TeMPOraL7 months ago
    Indeed. Bur ironically, this is exactly the position most people propose to take with MCPs.
    From the POV of the service, prompt injections are immaterial - the LLM is acting on behalf of the user, so as long as it's limited to the same actions/privileges the actual user has, it's really not the job of the service to police what the LLM does. It's the user's choice to delegate to an LLM instead of doing something themselves.
    7 months ago
    undefined
orliesaurus7 months ago
I don't get it from the homepage, feels like Selenium on the browser, since you built it can you explain ?
- miguelspizza7 months ago
  Similar but also very different. Playwright and Selenium are browser automation frameworks. There is a Playwright-MCP server which let's your agent use Playwright for browser automation.
  MCP-B is a different approach. Website owners create MCP servers `inside` their websites, and MCP-B clients are either injected by browser extensions or included in the websites JS.
  Instead of visual parsing like Playwright, you get standard deterministic function calls.
  You can see the blog post for code examples: https://mcp-b.ai/blogs
  - mhio7 months ago
    A playright-mcp server, or any bidi browser automation, should be equally capable of discovering/injecting and calling the same client JS exposed MCP-B site API?
    It's like an OpenAPI definition but for JS/MCP? (outside of the extension to interact with that definition)
    miguelspizza7 months ago
    Sure they can inject clients, but that's really only beneficials for developers. doing it via browser extension means regular people can use it.
    > It's like an OpenAPI definition but for JS/MCP?
    Sortof. It's a true MCP server which you can use to expose existing (or new functionality on your webapp to the client)
  - c0wb0yc0d3r7 months ago
    What differentiates this from something like data-test-id attributes?
    miguelspizza7 months ago
    data-test-id attributes and other attributes are hardcoded and need to be know by the automator at run time. MCP-B clients request what they can call at injection time and the server responds with standard MCP tools. (functions LLM's can call with context for how to call them)
  - Nathanba7 months ago
    what do you mean by "visual parsing like Playwright"? I'm pretty sure Playwright queries the DOM via js, there isn't inherently any visual parsing. Do you just mean that mcp-b has dedicated js APIs for each website? Your example is also pretty confusing, it looks like the website itself offers an "Increment by x" "tool" and then your first command to the website is to "subtract two from the count". So the AI model has to still understand the mcp tools offered by the website quite loosely and just calls them as needed? I suppose this is basically like using playwright except it doesn't have to parse the DOM (although it probably still does, I mean how else will it know that the "Increment by X" tool offered is in any way connected to the "count" you mention in your vague prompt. And then the additional benefit is that it can call a js function instead of having to generate the DOM/js playwright calls to do it.
    I mean all this MCP stuff certainly seems useful even though this example isn't so good, the bigger uses will be when larger APIs and interactions are offered by the website like "Make a purchase" or "sort a table" and the AI would have to implement very complex set of DOM operations and XHR requests to make that happen and instead of flailing to do that, it can call an MCP tool which is just a js function.
    miguelspizza7 months ago
    Sorry this is in reference to the Playwright MCP server which gives a model access to screen shots of the browser and Playwright API's.
    MCP-B doesn't do any DOM parsing. It exchanges data purely over browser events.
lewisjoe7 months ago
This looks promising - thanks for open-sourcing this. This addresses the gap that most work happens in browsers while MCP assumes that work happens with AI clients.
I have a fundamental question though: how is it different from directly connecting my web app's JS APIs with tool calling functions and talking directly with a LLM server with tool-call support?
Is it the same thing, but with a protocol? or am I missing the bigger picture?
- miguelspizza7 months ago
  Np thanks for reading! The difference is with MCP-B you don't have to integrate or maintain any AI chat functionality yourself.
  It's a protocol which allows the user to bring their own model to interact with the tools on your website
Flux1597 months ago
This is an interesting take since web developers could add mcp tools into their apps rather than having browser agents having to figure out how to perform actions manually.
Is the extension itself open source? Or only the extension-tools?
In theory I should be able to write a chrome extension for any website to expose my own custom tools on that site right (with some reverse engineering of their APIs I assume)?
- miguelspizza7 months ago
  The extension should be open source. I had it as a private submodule until today. Let me figure out my it's not showing up and get back to you.
  The extension itself is a MCP server which can be connected to by other extension over cross extension messaging. Since the extension is part of the protocol, I'd like for the community to pull from the same important parts of the extension (MCPHub, content script) so they are consistent across extension implementations.
- miguelspizza7 months ago
  Ok it's open source now
  - Flux1597 months ago
    Thanks! Took a very quick look. It seems like the extension exposes tools for all domains that support mcp-b looking at DomainToolManager - does this mean if I have two tabs for a single domain you'll have duplicate tools per tab?
    Haven't had enough time to look through all the code there - interesting problem I guess since a single domain could have multiple accounts connected (ex: gmail w/ account 0 vs account 1 in different tabs) or just a single account (ex: HN).
    miguelspizza7 months ago
    No there is built in tool de-duping. I'm not sure how to handle domains with different url states though.
    Like you said there are some edge cases where two tabs of the same website expose different tool sets or have tools of the same name but would result in different outcomes when called.
    Curios if you have any thoughts on how to handle this
    t1amat7 months ago
    The user should be able to enable/disable tools or an entire tab’s toolset. Some keep open hundreds of tabs and that’s simply too many potential tools to expose. Deduping doesn’t make sense for the reasons you say, and that one logical task could lead to a series of operations missequenced across a range of tabs.
muratsu7 months ago
This puts the burden on the website owner. If I go through the trouble of creating and publishing an MCP server for my website, I assume that through some directory or method I'll be able to communicate that with consumers (browsers & other clients). It would be much more valuable for website owners if you can automate the MCP creation & maintenance.
- mindwok7 months ago
  Pretty much every revolution in how we do things originates from the supplier. When websites became a thing the burden was on businesses to build them. Same with REST APIs. Same with mobile apps. As soon as there’s a competitive advantage to having the new thing, companies will respond if consumers demand it.
  - gavmor7 months ago
    Am I going to start to choose products based on their compatibility with WebMCP?
    caleblloyd7 months ago
    Some may. If it allows you to skip learning how to navigate a complex web app because the AI experience will help navigate for you, you may be drawn to it.
- miguelspizza7 months ago
  I think with AI tools you can pretty confidently build out an MCP server for your existing website. I plan to have good LLM docs for this very purpose.
  For react in particular, lots of the form ecosystem (react hook form) can be directly ported to MCP tools. I am currently working on a zero config react hook form integration.
  But yes, MCP-B is more "work" than having the agent use the website like a user. The admission here is that it's not looking like models will be able to reliably do browser automation like humans for a while. Thus, we need to make an effort to build out better tooling for them (at least in the short term)
- rapind7 months ago
  I think this is the practical way. The website owner (or rather the builder, since if you're running wordpress, we can assume MCP will be part of the package) is already responsible for the human interface across many devices, and also the search engine interface (robots.txt, sitemap.xml, metatags). Having a standard we can use to curate what the AI sees and how it can interact would be hugely beneficial.
  There's space for both IMO. The more generic tool that figures it out on it's own, and the streamlined tool that accesses a site's guiderails. There's also the backend service of course which doesn't require the browser or UI, but as he describes this entails complexity around authentication and I would assume discoverability.
  - muratsu7 months ago
    I agree with you that platforms like wordpress, shopify etc will likely ship MCP extensions to help with various use cases. Accompanied with a discovery standard similar to llms.txt, I think it will be beneficial too. My only argument is that platforms like this are also the most "templated" designs and it's already easy for AI to navigate them (since dom structure variance is small).
    The bigger challenge I think is figuring out how to build MCPs easily for SaaS and other legacy portals. I see some push on the OpenAPI side of things which is promising but requires you to make significant changes to existing apps. Perhaps web frameworks (rails, next, laravel, etc) can agree on a standard.
    sbarre7 months ago
    > it's already easy for AI to navigate them (since dom structure variance is small).
    The premise of MCP-B is that it's in fact not easy to reliably navigate websites today with LLMs, if you're just relying on DOM traversal or computer vision.
    And when it comes to authenticated and read/write operations, I think you need the reliability and control that comes from something like MCP-B, rather than just trusting the LLM to figure it out.
    Both Wordpress and Shopify allow users to heavily customize their front-end, and therefore ship garbage HTML + CSS if they choose to (or don't know any better). I certainly wouldn't want to rely on LLMs parsing arbitrary HTML if I'm trying to automate a purchase or some other activity that involves trust and/or sensitive data.
- mfrye07 months ago
  I was thinking the same. Forward thinking sites might add this, but the vast majority of website owners probably wouldn't be able to figure this out.
  Some middle ground where an agent reverse engineers the api as a starting point would be cool, then is promoted to use the "official" mcp api if a site publishes it.
p0w3n3d7 months ago
I can see with my prophetic/logic eyes that free models will start to require captcha because of people start using MCP to automate browsers to use free LLMs. But captchas are ineffective against LLM so LLMs will fight automated LLMs from using them...
Sounds like a very strange world of robots fighting robots
- falcor847 months ago
  In the stories, the robots eventually realize that they actually share common goals ...
handfuloflight7 months ago
Would it be possible to do this with any arbitrary website since we can execute JS client side?
- miguelspizza7 months ago
  Yup! You just declare a standard MCP server and attach a TabServerTransport to it. Any TabClientTransport in the same Tab will be able to connect to it.
  The examples focus mostly on extensions injecting clients at website load time, but you can ship a client with your server javascript. That being said, if the client and server live in the the same script I recommend just using the InMemoryTransports from the official SDK.
  - imcritic7 months ago
    Wouldn't sites be able to detect presence of scripts injected by your extension (to, say, refuse you services since site owner decided they would like their site to be used only by humans, not AI agents)?
    miguelspizza7 months ago
    Sure. Although I'd assume if the website owner went through the hassle of creating a MCP server for their website, they would probably want to be discovered
    handfuloflight7 months ago
    We mean if we could use this to instantiate an MCP for any website we're visiting.
    miguelspizza7 months ago
    Yea, I am planning a dev build of MCP-B which has access to user scripts apis. So technically you could `vibe code` and inject an MCP server into the target webpage
    TeMPOraL7 months ago
    In the long run (well, mid-run), it'll be about the only way in which it'll be useful: toying with MCPs is all the rage now, but that'll end once business people pause and think for five seconds about what MCP actually does. Which is, provide users with ability to use a service the way they like, not the way the service owners likes, and avoiding interacting with the service directly.
    Or, in other words, it helps users get around all the bullshit that actually makes money for the business. Ads, upsells, cross-marketing opportunities. Engagement. LLMs help users avoid all that, and adding an MCP to your site makes it trivial for them.
    Maybe I'm too cynical, but think about this: the technologies we needed for this level of automation became ubiquitous decades ago. This was, after all, also the original hype behind APIs, that burned bright on the web for a year or two - before everyone realized that letting people interact with webservices the way they want is bad for business, and everything closed down behind contracts backed by strong auth. Instead of User Agents connecting diverse resources of the web for the benefit of users, we got... Zapier. That's what the future of service-side MCPs on the web is.
    But user scripts were always a way to let at least power users show a middle finger to "attention economy" and force some ergonomy out of web apps that desperately try to make users waste time. Giving users the ability to turn any website into MCP, regardless of whether that website wants it, will supercharge this. So for better or worse, that's where the future is.
    Adversarial interoperability remains the name of the game.
- gavmor7 months ago
  Yeah, I am tempted to rig up a "generic" webpage MCP injected via greasemonkey just so I can use this UI for navigating the web.
abrookewood7 months ago
Looks similar to Elixir's Tidewave MCP server, which currently also supports Ruby: https://tidewave.ai/
Paraphrasing: Connect your editor's assistant to your web framework runtime via MCP and augment your agentic workflows and chats with: Database integration; Logs and runtime introspection; Code evaluation; and Documentation context.
Edit: Re-reading MCP-B docs, that is more geared towards allowing visitors to your site to use MCP, while Tidewave is definitely focussed on Developers.
Johnny_Bonk7 months ago
So if I'm using claude code and developing a web app, its running on localhost:3000, can I use claude code to basically get ui information, browser console logs and other web dev feedback and useful information? Cause I installed it and added that file but all I see is the 55 tools and 6 apis when i open the browser extension. not the stuff i need. and i also installed the extension tools i think it was called.
- miguelspizza7 months ago
  Ah maybe I should make that more clear. The web app is an example of a MCP-B server and the extension is a client. When you visit MCP-b.ai with the extension, it's tools will register
ethanniser7 months ago
this is super cool
wonder if it was inspired by `broadcast-mcp` [1] (hackathon project by me and a friend from may based on the same concept but not fleshed out)
1: https://x.com/RhysSullivan/status/1923956444153643443
- miguelspizza7 months ago
  Ah no, first time seeing this. How were you interacting with the website server? Via extension or some way else?
  - ethanniser7 months ago
    we would open the mcp site in a new tab or iframe then had a custom mcp transport based on `window.postMessage` just like you do https://github.com/RhysSullivan/broadcast-mcp/blob/main/pack...
    this concept is awesome- glad someone really fleshed it out
miguelspizza7 months ago
Hey HN,
This was an idea I had while trying to build MCP servers internally at Amazon. Today I am open sourcing it. TLDR it's an extension of the Model Context Protocol which allows you to treat your website as an MCP server which can be discovered and called by MCP-B compliant web extensions.
You can read a more detailed and breakdown here (with gifs): https://mcp-b.ai/blogs
- miguelspizza7 months ago
  Oh and the code is here: https://github.com/MiguelsPizza/WebMCP
xnx7 months ago
AI automation is exciting because it doesn't require any cooperation from the site.
It's nice when a site is user friendly (RSS, APIs, obvious JSON, etc.) but it is more powerful to be self sufficient.
netrem7 months ago
The product seems interesting, but the landing page I found very chaotic and gave up reading it. The individual pieces of information are fine I think, but the flow is poor and some info repeats. Was it AI generated?
- miguelspizza7 months ago
  Yes it was mostly AI generated. I'm much more of a dev than a writer/marketer. Hopefully if this gains some traction I can pay someone to clean it up
nurettin7 months ago
This gave me an idea. Instead of writing/maintaining servers and whatnot, why not just open the browser and give [$LLM] access to the development port and let it rip using the puppeteer protocol?
devops0007 months ago
I still don’t understand. The browser is made for human. We already invented the API to comunicare between machines. Why a machine should use a UI?
_1tem7 months ago
The entire point of AI Agents is that they should "just work" for websites that don't have APIs. Lots of websites simply have no incentive or resources to provide a good API.
metta2uall7 months ago
Looks great. I love ideas that increase efficiency and reduce electricity usage.
Only nitpick is that the home page says "cross-browser" at the bottom but the extension is only available for Chrome..
- miguelspizza7 months ago
  Ah yea I'll fix that. Nice catch
rapind7 months ago
This looks great. I'd really like to add something like this to my application (public and admin side). I have users, especially on the admin side, that could really benefit.
- miguelspizza7 months ago
  Thanks! I'd be happy to help onboard. Let me know!
TechDebtDevin7 months ago
hmm, I have an MCP route, that fetches the page in a browser, returns and lets the LLM inject javascript onto the page to return whatever structured output it desires..Or whatever (kinda scarily). How is this different?
--Shoutout to Go-Rod https://pkg.go.dev/github.com/go-rod/rod@v0.116.2#Page
- miguelspizza7 months ago
  Cool, I'll check it out!
  I'll need to look a bit more, but at a glance, MCP-B is more putting the onus of browser automation (i.e. how the agent will interact with the web page) on the website owner. They get to expose exactly the functionality they want to the agent
  - TechDebtDevin7 months ago
    Oh this is for the website owner. Yeah, mine is to make an arbitrary site interactable with an LLM. It can choose to get a map of the DOM/screenshot/extract by xml path/ and interact via a few different methods. But the PageEval() method from GO rod works pretty well
    Would like to just provide a runtime for an LLM to solve captchas.
    My main focus is (anti) bot detection.
ActorNightly7 months ago
This MCP stuff is leading dev down the wrong path. We should be focusing on llms using self discovery to figure out information.
- teruakohatu7 months ago
  I had that opinion too.
  You can ask an agent to browse a web page and click a button etc. They will work out how to use a browser automation library.
  But it’s not worth the cost, time spent waiting or the inconsistency between implementations.
  MCP just offloads that overload, much like how they can use bash tools when they are quite capable of writing an implementation of grep etc.
  - ActorNightly7 months ago
    The whole point is that you shouldn't have to worry about implementation. AI should do it for you.
    johschmitz7 months ago
    At the end of the day you often need to consider the energy efficiency of a system which is also reflected in the cost of operating it. For use cases where this is relevant the suggested MCP approach potentially offers large benefits compared to what's probably meant here by "AI". However, the disadvantages of public access discussed in other threads need to be considered as well, therefore, I expected this only to be used for certain niche use cases. Testing and retro fitting non-public websites come to mind.
    ActorNightly7 months ago
    Once we have the thing working, optimization can come later.
    Also MCP is not really optimal. Every prompt in a sequence of chat gets an injection of MCP capabilities. Its just simply not scalable with lots and lots of MCP servers. Not to mention the protocol changes every month and breaks things.
    Furthermore, you can already do MCP like behavior, in a better way, on pretty much any model by basically writing a wrapper around the chatbot and using a system prompt which guides it to print specific text (like "action:load file"), and the wrapper can detect that text and inject the prompt. I have an agent that runs at home using this, which I also use to self improve and define actions on the fly - I ask it to write code, it writes code, then the wrapper takes that code and makes it part of the library and appends it to the system prompt on every new chat.
    The point is that we should be able to take the latter, and build something that can do this automatically. Once we have the full loop complete, then we can optimize it down to minimum compute.
    JambalayaJimbo7 months ago
    AI is a very leaky abstraction. You will always be worried
    ActorNightly7 months ago
    LLMS are leaky abstractions. LLMs configured in wrappers can be made to be mostly correct.
    For example, define range of input, appropriate output, ask LLM to write code, automatically run that code against the range of input, evaluate the output, ask llm to fix any issues where the input doesn't match the output.
    That whole process can be made faster without the need for huge models. The model doesn't need to get trained on everything CS, because it doesn't need to get the code correct on the first try, it just needs to be trained on enough code to understand how something affects the output and iterate on that. I.e basically making the model do smart guided search. It was done with Mu Zero with great success, not sure why nobody is focusing on this now.
- ashwinsundar7 months ago
  We should be focusing on llms using self discovery to figure out information.
  Can you expand? What does that mean, and why is the right (or better) path
  - ActorNightly7 months ago
    Manually coding things is not how we get better AI. For AI to be truly useful in the area of figuring things out (i.e actually reasoning), one of the core components of a model would be building its own knowledge trees across multi modal information. So when you ask a model to do something, it should figure out how to do it on its own.
    volkandkaya7 months ago
    Doesn't sound like it conflicts with MCP-B, in theory they go well together.
    - AI checks if a MCP tool exists - If it does not exist, AI can handle it, then send feature request to add new MCP tool
    By doing the above it will be more likely to do the correct action and save a lot of tokens
    blackqueeriroh7 months ago
    I don’t think OP is trying to create better AI. That’s someone else’s job. OP is trying to give current LLMs better ways to interact with websites.
    Two different goals.
mupuff12347 months ago
I still don't understand MCP. If according to all the AI companies soon AI will replace devs than why bother with MCP?
- qayxc7 months ago
  Lock-in. LLMs are today's hammer: everything looks like a nail now. LLMs are super useful for certain tasks (generating boilerplate code, generating tests, providing examples for API usage, summarising etc.), but the demo to me just illustrates a solution in desperate search for a problem. "Create A TODO" using a chatbot? That's an example gone wrong in so many ways and goes to show what happens if you start with a solution and work your way backwards to a use case without actually thinking about it yourself...
  - dominicrose7 months ago
    A todo list is already a productivity tool and not an essential one. When I hear about productivity I can't help but think "but be productive doing what?"
    What do we have to do that's so important we need AI, and not a chat AI but AI on steroids (supposedly)?
- tracerbulletx7 months ago
  It's pretty much standardizing on a couple endpoints for providing a list of resources/actions/prompt templates and calling to fetch those resources/actions/templates and feed them to the model context. It's really kind of trivial, but it's nice there's a standard I guess so you can write a service that anyone can use in their favorite client.
  - mupuff12347 months ago
    I think that also known by another word - "documentation".
- volkandkaya7 months ago
  Either AI will replace everyone and it doesn't matter what we did up until that point or it won't and building these systems will be useful.
  What is your recommendation for companies? To take it to the extreme are you saying fire everyone and wait for AI?
- surrealistic7 months ago
  Because we're in the denial phase, doing expert systems all over again but this time on top of something that looks like NLP but isn't quite there.
damnever7 months ago
As far as I can tell, "API" and the webpages often have different authentication methods.
ge967 months ago
Ultimate test for me, make me a payment system where I put in $1 and it gives me $2 back
calrain7 months ago
Do you really want to change 'everything'?
hereforcomments7 months ago
RIP QA engineers
- volkandkaya7 months ago
  Sounds like more work needed for QA engineers, they will have to test that MCP-B works well with a bunch of other sites.
cryptozeus7 months ago
can someone explain like I am five?
- lovelearning7 months ago
  A website owner can publish their website's capabilities or data as "tools". AI agents and LLMs like ChatGPT, in response to user prompts, can consult these tools to figure out their next actions.
  Example:
  1. An author has a website for their self-published book. It currently checks book availability with their database when add to cart is clicked.
  2. The website publishes "check book availability" and "add to cart" as "tools", using this MCP-B protocol.
  3. A user instructs ChatGPT or some AI agent to "Buy 3 copies of author's book from https://theirbooksite"
  4. The AI agent visits the site. Finds that it's MCP-B compliant. Using MCP-B, it gets the list of available tools. It finds a tool called "check book availability", and uses it to figure out if ordering 3 copies is possible. If yes, it'll next call "add to cart" tool on the website.
  The website here is actively cooperating with the agent/LLM and supplying structured data. Instead of being a passive collection of UI elements that AI chatbots have to figure out based on UI layouts or UI captions, which are generally very brittle approaches.
- volkandkaya7 months ago
  Example:
  You have google docs and CMS open in 2 tabs
  1. Ask to take your google doc and add it to the CMS
  2. MCP tool takes the data from Google docs
  3. MCP tool to convert text to CMS item
  4. MCP tool to insert that CMS item
  With the above you can view unique UIs for each stage as well, such as generating a table with CMS fields before accepting.
- airtonix7 months ago
  [dead]
7 months ago
undefined
7 months ago
undefined
7 months ago
undefined
bpiroman7 months ago
Vite ...
roundrobins7 months ago
It's not every day that I catch tomorrow's huge hit today on a random HN post.
Better get ready to quit your day job and get funded buddy, as my 30 years worth of tech instincts tell me this will take off vertically!
- 7 months ago
  undefined
7 months ago
undefined
fhdjm7 months ago
[dead]
bhakthan7 months ago
[dead]
bhakthan7 months ago
[dead]
assanineass7 months ago
[dead]