But the comparison with HTTP is not a very good one, because MCP is stateful and complex. MCP is actually much more similar to FTP than it is to HTTP.
I wrote 2 short blog posts about this in case anyone is curious: https://www.ondr.sh/blog/thoughts-on-mcp
https://spec.modelcontextprotocol.io/specification/2024-11-0...
https://modelcontextprotocol.io/sdk/java/mcp-server
Also, btw, how long until people rediscover HATEOAS, something which inherently relies on a generalised artificial intelligence to be useful in the first place?
As you said, HATEOAS requires a generic client that can understand anything at runtime — a client with general intelligence. Until recently, humans were the only ones fulfilling that requirement. And because we suck at reading JSON, HATEOAS had to use HTML. Now that we have strong AI, we can drop the Hypermedia from 'H'ATEOAS and use JSON instead.
I wrote about that exact thing in Part 2: https://www.ondr.sh/blog/ai-web
I’m bullish on MCP-what is are some non-obvious things I shod consider that might dampen my fire?
The key reason the web won out over Gopher and similar protocols was that the early web was stupidly simple. It had virtually no structure. In fact, the web might have been the greatest MVP of all time: it handed server developers a blank canvas with as few rules as possible, leading to huge variance in outputs. Early websites differed far more from each other than, for example, Gopher sites, which had strict rules on how they had to work and look.
Yet in a server-client "ping-pong" system, higher variance almost always wins. Why? Because clients consume more of what they like and less of what they don't. This creates an evolutionary selection process: bad ideas die off, and good ideas propagate. Developers naturally seem to develop what people want, but they are not doing so by deliberate choice — the evolutionary process makes it appear so.
The key insight is that the effectiveness of this process stems from a lack of structure. A lack of structure leads to high variance, which lets the protocol escape local minima and evolve according to user needs.
The bear case for MCP is that it's going the exact opposite route. It comes with tons of features, each adding layers of abstractions and structure. While that might work in narrowly understood fields, it's much harder to pull off in novel domains where user preferences aren't clear — knowing what users want is hard. The MCP's rigid structure inherently limits variance in server styles (a trend already observable IMHO), making MCP vulnerable to competition by newer, less structured protocols — similar to how the web steamrolled Gopher, even though the latter initially seemed too far ahead to catch. The fact that almost all MCP servers are self-contained (they don't link to other MCP servers) further means the current lead is not as effective, as the lock-in effect is weaker.
In any case, protocols need killer applications to take off — for the web this killer app was Mosaic. Right now I don't see any application supporting SLOP. If they are able to come up with one that outperforms other MCP-based LLM applications, they will have a chance.
My personal belief is that the winning protocol will be web-like. Right now there is no such protocol. Maybe I'm wrong, let's see.
MCP standardizes how LLMs can call tools at runtime, and how tools can call LLMs at runtime. It's great!
In essence it seems like an additional shim that removes all the security of API tokens while still leaving the user to deal with them.
Side note, has Tron taught us nothing about avoiding AI MCPs?
In your post you say "The key insight is: Because this can happen at runtime, the user (NOT the developer) can add arbitrary functionality to the application (while the application is running — hence, runtime). And because this also works remotely, it could finally enable standardized b2ai software!"
That makes sense, but my question is: how would the user actually do that? As far as I understand, they would have to somehow pass in either a script to spin up their own server locally (unlikely for your everyday user), or a url to access some live MCP server. This means that the host they are using needs an input on the frontend specifically for this, where the user can input a url for the service they want their LLM to be able to talk to. This then gets passed to the client, the client calls the server, the server returns the list of available tools, and the client passes those tools to the LLM to be used.
This is very cool and all, but it just seems like anyone who has minimal tech skills would not have the patience to go and find the MCP server url of their favourite app and then paste it into their chatbot or whatever they're using.
Let me know if I have misunderstood anything, and thanks in advance!
> As far as I understand, they would have to somehow pass in either a script to spin up their own server locally (unlikely for your everyday user), or a url to access some live MCP server. This means that the host they are using needs an input on the frontend specifically for this, where the user can input a url for the service they want their LLM to be able to talk to. This then gets passed to the client, the client calls the server, the server returns the list of available tools, and the client passes those tools to the LLM to be used.
This is precisely how it would work. Currently, I'm not sure how many host applications (if any) actually feature a URL input field to add remote servers, since most servers are local-only for now. This situation might change once authentication is introduced in the next protocol version. However, as you pointed out, even if such a URL field existed, the discovery problem remains.
But discovery should be an easy fix, in my opinion. Crawlers or registries (think Google for web or Archie for FTP) will likely emerge, so host applications could integrate these external registries and provide simple one-click installs. Apparently, Anthropic is already working on a registry API to simplify exactly this process. Ideally, host applications would automatically detect when helpful tools are available for a given task and prompt users to enable them.
The problem with local-only servers is that they're hard to distribute (just as local HTTP servers are) and that sandboxing is an issue. One workaround is using WASM for server development, which is what mcp.run is doing (https://docs.mcp.run/mcp-clients/intro), but of course this breaks the seamless compatibility.
Thanks for the awesome feedback, and congrats on the blog posts by the way, they are a great read!
While you usually get tools that work out of the box with MCP (and thus avoid the hassle of prompting + testing to get working tool code), integrating external APIs manually often results in higher accuracy and performance, as you're not limited by the abstractions imposed by MCP.
MCP is basically a trifecta of:
1) MCP-aware LLM applications
2) MCP clients
3) MCP servers
The LLM application is key here. It is doing all the "plumbing", like spawning MCP clients to connect to MCP servers — similar to how your web browser is spawning HTTP clients to connect to HTTP servers. The LLM application thus initiates and receives the actual requests between MCP client and MCP server, manages MCP client/server pairs, injects tool results into the LLM context et cetera. This means the LLM application must be MCP-aware at design-time. But because all of this plumbing can then happen at runtime under the hood, the user (who adds MCP tools while the application is running) does not need to be a developer.As a developer, MCP allows you to write:
1) MCP-aware LLM applications
2) MCP servers
MCP-aware LLM applications (like Claude Desktop or Cursor) let their users add arbitrary functionality (i.e. other MCP servers) at runtime.MCP servers can be added by users of MCP-aware LLM applications at runtime.
Both evolve around the concept of giving non-developers a way to add functionality at runtime. Most developers are confused about MCP because they don't need to do neither 1) nor 2), instead they themselves add tools to the applications they write (at design-time) and then ship it.
If you are building your own applications, you can simply use "Tools APIs" provided by the LLM directly (e,.g. https://platform.openai.com/docs/assistants/tools).
MCP is not something most people need to bother with unless you are building an application that needs extension or you are trying to extend an application (like those I listed above). Under the hood the MCP is just an interface into the tools API.
MCP is not all it's cracked up to be.
When computer use was demoed it seems like a big deal. However, with MCP, any one can create and MCP server and run it on their computer and hook it up to an MCP compatible client, regardless of the model.
Their config manifest is like package.json's dependencies.
Their init is like import resolution.
Jsonrpc methods are like exported functions in package.
Json schema declarations are like type declarations (ie. .d.ts) files.
In your config manifest you specify "imports" that llm can use and it handles populating tools - it's like npm for llm sessions.
2) Is this meaningfully different from just having every API provide a JavaScript SDK to access it, and then having the model write code? That's how humans solve this stuff.
3) If the AI is actually as smart at doing tasks like writing clients for APIs as people like to claim, why does it need this to be made machine readable in the first place?
2 + 3) having a few commands that AI knows it should call and confidently so without security concern, is better than just give AI permision to do every thing under the sun and tell it to code a program doing so.
The prompt for the later is also much more complex and does not work as predictably.
If it was truly intelligent it could reason about things like API specifications without any precursors or shared structure, but it can’t.
Are LLMs powerful? Yes. Is current “AI” simply a re-brand of machine learning? IMO, also yes
I can reason about any API or specification. But when I'm trying to get a different, compound, and higher-level task done, its quite a bit faster and less distracting if I can rely on someone else to have already distilled what I need (into a library, cheat-sheet, tutorial, etc).
Similarly, I've seen LLMs do things like generate clients and scripts for interacting with APIs. But its a lot easier to just hand them one ready to go.
I say this out loud so someone can correct me if I’m mistaken!
That’s sort of the point of MCP, as near as I can tell.
But LLm will replace them?
That's a direct answer for (2) too - instead of writing a JS SDK or Swift SDK or whatever, it's an AI SDK and shared across Claude, OpenAI, Groq, and so on.
(3) is exactly related to this. The AI has been trained to run MCPs, viewing them as big labeled buttons in their "mind".
I think you got the questions spot on and the answers right there as well.
Regardless, again: if the AI is so smart, and it somehow needs something akin to MCP as input (which seems silly), then we can use the AI to take, as input, the human readable documentation -- which is what we claim these AIs can read and understand -- and just have it output something akin to MCP. The entire point of having an AI agent is that it is able to do things similar to a software developer, and interfacing with a random API is probably the most trivial task you can possible do.
> Regardless, again: if the AI is so smart, and it somehow needs something akin to MCP as input (which seems silly), then we can use the AI to take, as input, the human readable documentation -- which is what we claim these AIs can read and understand -- and just have it output something akin to MCP.
This example is like telling someone who just wants to check their email to build an IMAP client. It's an unnecessary and expensive distraction from whatever goal they are actually trying to accomplish.
As others have said, models are now being trained on MCP interactions. It's analogous to having shared UI/UX patterns across different webapps. The result is we humans don't have to think as hard to understand how to use a new tool because of the familiar visual and interaction patterns. As the design book title says, 'don't make me think.'
API is a user interface for other developers – just like MCP is a UI for LLMs.
That seems to be what happens here with MCP: it is a way for an Application (the LLM) to derive programming by Interfacing with another Application (the 3rd party API provider, for example).
That would make MCP an API for accessing other APIs. Not that that's bad, computers are layers of abstraction all the way down. At the same time though, we already have some of those. Perhaps some sort of OpenAPI bridge would be useful in the same manner and not require rewriting API specs, but that probably exists, too.
Who am I kidding, though? The AI assistants/agents are going to be writing whatever manifests are necessary to run more AI, so it'll be a negligible increase in effort to do both.
My point is, the applications have been (until recently) predominantly written by humans. API is the interface developers use through the code they write. Just like a UI can be better or worse, so can API: it might be concise, expressive, consistent – or verbose, clunky and completely unpredictable. Just like in UI you don’t want to click through dozens of submenus, in API you don’t want to make a dozen of calls to do something simple. It’s way more similar than you think!
Now where MCP fits in here is a whole other question...
What you're describing are qualities of an interface, as in a User Interface or an Application Interface. You are right that UIs and APIs are similar, because they are both Interfaces. You are right that a good Interface has certain qualities, whether it's a UI or an API. For example, a GraphQL API tries to address the challenge of, "in API you don’t want to make a dozen of calls to do something simple" by consolidating multiple calls into 1.
That said, API is the interface programs use to interact with an application, UI is the interface humans use to interact with a program. Sometimes you get both: A developer interacts with an IDE or text editor (a user interface), and the IDE or text editor interacts with the underlying layer (an application interface).
What you don't see are humans typing bytes to an MCU server, or any other API. Humans are clicking or typing commands into a program via a UI, the program connects to the MCU server via an API, the MCU server connects to, say, a weather server via an API.
{"action": "create_directory", "value": "foobar/assets/"} is 15 tokens whereas create_directory("foobar/assets/") is 7 tokens. It's not the exact format, but you get the idea.
It's not just about cost, higher tokens also result in lower performance. It's as hard for the LLM to read this as it is for you to read it.
I did some experiments with protocols last year. YAML was the most efficient one by a large margin, and yet it often made mistakes. Half the output layer code is dedicated to fixing common mistakes in formatting, like when it forgets to put in a dash or merges one parameter with another. We had 2/3 of the input prompt dedicated to explaining the spec and giving examples. It's definitely not trivial.
MCP is pre-trained into the models, no need for all this.
The work we had it on did not need a good model. We had to use a more expensive model and most open source/self-trained ones didn't do the trick. We ended up taking a 3x more expensive model. Also don't look at it as LLMs being smart enough to do it; we also want something for the dumb & cheap micro LLMs as well, and micro LLMs will likely be doing agentic work.
It's also as likely to make mistakes as a human - LLMs didn't output JSON until mid 2024. Gemini was one of the first to officially feature JSON output and it was still randomly breaking by Sept 2024 with JSON arrays, even when giving the API a properly detailed spec to respond in.
They can improve it, but they have to train it on something and they might as well make something up that's more efficient. OpenAI might do one too. Even with images we see newer protocols like HEIC, WEBP when PNG works fine. I expect MCP will live because it's particularly suited to this use case.
A protocol is not a software development kit.
https://github.com/modelcontextprotocol/specification/blob/m...
So if you are here for MCP, I will use the opportunity to share what I've been working on the last few months.
I've hand curated hundreds of MCP servers, which people can access and browse via https://glama.ai/mcp/servers and made those servers available via API https://glama.ai/mcp/reference
The API allows to search for MCP servers, identify their capabilities via API attributes, and even access user hosted MCP servers.
However, you can also try these servers using an inspector (available under every server) and also in the chat (https://glama.ai/chat)
This is all part of a bigger ambition to create an all encompassing platform for authoring, discovering and hosting MCP servers.
I am also the author of https://github.com/punkpeye/fastmcp framework and several other supporting open-source tools, like https://github.com/punkpeye/mcp-proxy
If you are also interested in MCP and want to chat about the future of this technology, drop me a message.
MCP reminds me of a new platform opportunity akin to the Apple App Store.
It's rapidly adopted, with offerings from GitHub, Stripe, Slack, Google Maps, AirTable, etc. Many more non-official integrations are already out there. I expect this will only gain adoption over the coming year.
But with MCP there's not a whole lot of information out there for LLMs to digest and so perhaps for that reason the article is not particularly insightful.
Thank you HN for bringing the insights!
Appreciate the feedback - brb I'll update the post to include this!
I honestly think most of the article was written by an LLM.
> Two-way communication: MCP supports persistent, real-time two-way communication - similar to WebSockets. The AI model can both retrieve information and trigger actions dynamically".
This is not what two-way communication means.
MCP is probably easier for clients to implement but suffers from poor standardization, immaturity and non-human readability. It clearly scratches an itch but I think it’s a local-minimum that requires a tremendous amount of work to implement.
I’ve used MCP quite a bit but perhaps I’m misunderstanding something? Happy to hear why you think it’s “wacky”.
So all that's needed are API docs. Or what am I missing?
Let's say you want to add or delete Jira tickets. A MCP is like a big labeled button for the AI to do this, and it doesn't come with the token cost of reading an API or the possibility of making a mistake while accessing it.
I mean, a lot? I have multiple times felt like that was my entire life for weeks or months on end during the past over three decades of doing software development...
(If we expand the scope a bit to network protocols, as opposed to just "APIs", I was even the person who first spiked nmap's protocol scanning and detection logic.)
To wit, I am one of those people who pretty much never use an SDK provided for an API: if I have to, I will rather reverse engineer the protocol using a disassembler.
(This then being part of why I've won millions of dollars in bug bounties, as part of my relentless drive to always operate at the lowest level presented to me.)
But, regardless, can we move past trying to attack my credibility on software, and shift back to more productive forms of analysis? (Why did this become so personal?!)
> What happens when the docs are wrong or incomplete?
If we posit that the documentation for the API is wrong, so we should this MCP description / wrapper, as both were written by the humans charged to enable this function.
And, of course, the real point is whether the task is easier than the thing we are trying to do... even writing a correct tree map is much harder than an API client.
^ Both of these arguments can be made by someone who doesn't even do software development, helping us try to understand why MCP is being hyped up as a new paradigm.
I’m not hyping or defending MCP at all: I’m just saying AI can’t figure out APIs well enough to be something you can promise as a product.
I founded an integration platform so definitely a developer and I’ve been living these problems every day.
Also > ...if an LLM is failing to do this task...
It CURRENTLY fails to do so, PREDITABLY and securely. What are you gonna do about that? Keep throwing more data into it and hope to start building stuff on top, one day?
Yes and it's working? People ARE CURRENTLY building things.
They are NOT currently whinning that LLM is not smart enough so they must sit and wait for the next model to be able to code any problem, (again RELIABLY) on demand.
> People are trying to get these things to do complex multi-step reasoning tasks, including making changes to their codebase (?!), automating behaviors as "agents" that need to predictably and secure function...
You understand that all these are powered by tools calling underneath? The planning and orchestration of tasks follows a structure, to be fed into tools. This is why a plain model cannot do shit, people have to make products with the right tools on top to make a llm behave the way they want, and not just chit chat endlessly.
The abitrary execution approach, if it ever works, is by building tools and MCP servers for code execution. Because obviously it's not the LLM server who executes code.
You clearly have never thought about how to actually build any of these things.
> ...writing API boilerplate
Tool/function calling can be anything, it's you who decided that you can only use it for API boilerplate. Does the word "function" always mean boilerplate in progranming?
The value of MCP then depends on it's adoption. If I need to write an MCP adapter for everything, it's value is little. If everyone (API owners, OS, Clouds, ...) puts in the work to have an MCP compatible interface it's valuable.
In a world where I need to build my own X-to-USB dongle for every device myself, I wouldn't use USB, to stay with the articles analogy.
Normally, LSP when running on a remote server, you would use a continuous (web)socket instead of API requests. This helps with the parsing overhead and provides faster response for small requests. Also requests have cancellation tokens, which makes it possible to cancel a request when it became unnecessary.
While similar to MCP, ANP is significantly different. ANP is specifically designed for agents, addressing communication issues encountered by intelligent agents. It enables identity authentication and collaboration between any two agents.
Key differences include:
ANP uses a P2P architecture, whereas MCP follows a client-server model. ANP relies on W3C DID for decentralized identity authentication, while MCP utilizes OAuth. ANP organizes information using Semantic Web and Linked Data principles, whereas MCP employs JSON-RPC. MCP might excel at providing additional information and tools to models and connecting models to the existing web. In contrast, ANP is particularly effective for collaboration and communication between agents.
Here is a detailed comparison of ANP and MCP (including the GitHub repository): https://github.com/agent-network-protocol/AgentNetworkProtoc...
- slack or comment to linear/Jira with a summary of what I pushed
- pull this issue from sentry and fix it - pull this linear issue and do a first pass
- pull in this Notion doc with a PRD then create an API reference for it based on this codebase, then create a new Notion page with the reference
MCP tools are what the LLM uses and initiates
MCP prompts are user initated workflows
MCP resources is the data that the APIs provide and structure of that data (because porting APIs to MCPs are not as straight forward) Anyways please give me feedback!
We just make it a highly reliable, easy to use, after committing - add a comment with a summary to that Jira/linear issue. Start a PR in GitHub and assign x, update the slack channel with an update.
In order to get this it wasn’t about porting APIs to mcp. It was thoughtfully designing and optimizing for these workflows. Also quality and polish where the calls are highly reliable - required lower level networking optimizations, sessions, etc to make to work smoothly.
But yes, also part of the frictionless experience was, just oauth.
I've played a lot with the FileSystem MCP server but couldn't get it to do something useful that I can't already do faster on my own. For instance, asking it how many files have word "main" in it. It returns 267, but in reality there are 12k.
Looks promising, but I am still looking for useful ways to integrate it into my workflow.
In Cursor for example, it gives the agent the ability to connect to the browser to gather console logs, network logs and take screenshots. The agent will often invoke the tools automatically when it is debugging or verifying it's work, without being explicitly prompted to do so.
It's a little bit of a set up process, as it requires a browser extension on top of the MCP configuration.
So, now when Roo Code does tasks for me, it takes notes and searches memory.
It’s good as a means to get a quick POC running, for dev oriented use cases.
I have seen very few implementations that use anything but the tools capabilities though.
The complete lack of auth consideration and the weird orchestration (really the “client” manages its own “server” processes), make me doubt it’s going to get serious adoption in prod. It’s not something I’d have a lot of confidence in supporting for non dev users.
I wrote mcp-hfspace to let you connect to Hugging Face Spaces; that opens up a lot of image generation, vision, audio transcription and other services that can be integrated quickly and easily in to your Host app.
Regular SDK lib: - Integration Effort: just like MCP - Real-Time Communication - Sure - Dynamic Discovery - obviously. just call refresh or whatever - Scalability - infinite, it is a library - Security & Control - just like mcp
i trully don't get it
If you would like to switch clients, then you have build it yourself. MCP solves this very well since, any MCP supported client can use the same tools/resources that you have built.
Following those two principles means your implementation ends up as simple class, with simple methods, with simple params - possibly using decorators to expose it as rpc and perform runtime type assertion for params (exposing rpc, server side) and result (using rpc, client side) – consuming jsonrpc now looks like using any ordinary library/package that happens to have async methods (this is important, there is no special dialect of communication, it's all ordinary semantics everybody is already used to, your code on client and server side doesn't jump between mapping to/from language and jsonrpc, there is a lot of complexity that's collapsed, code looks minimal, it's small, natural to read etc).
Notifications also map naturally to well established pattern (ie. event emitter in nodejs).
And yes, that's my main criticism of MCP – you're making standard for communication meant to be used from different languages, why adding this silly, unnecessary complexity by using "/" in method names? It frankly feels like amateur mistake by somebody who thinks it should be a bit like REST where method is URL path.
Another tangent – this declaration of available enpoints is unnecessarily complicated – you can just use url: file://.. scheme to start process on that executable with stdin/stdout as communication channels (this idea is great btw, good job!), ws:// or wss:// for websocket comms to existing service and http:// or https:// for jsonrpc over http (no notifications).
Ok but why would every app and website implement this new protocol for the benefit of LLMs/agents?
Did they just now discover abstract base classes?
The only thing that idea ever lead to was more (complicated) APIs.