For a long time now, SWEs seem to have bamboozled into thinkg the only way you can connect different applications together are "integrations" (tightly coupling your app into the bespoke API of another app). I'm very happy somebody finally remembered what protocols are for: reusable communications abstractions that are application-agnostic.
The point of MCP is to be a common communications language, in the same way HTTP is, FTP is, SMTP, IMAP, etc. This is absolutely necessary since you can (and will) use AI for a million different things, but AI has specific kinds of things it might want to communicate with specific considerations. If you haven't yet, read the spec: https://modelcontextprotocol.io/specification/2025-11-25
The reason we have MCP is because early agent designs couldn't run arbitrary CLIs. Once you can run commands, MCP becomes silly.
There is a clear problem that you'd like an "automatic" solution for, but it's not "we don't have a standard protocol that captures every possible API shape", it's "we need a good way to simulate what a CLI does for agents that can't run bash".
Have you tried to use a random API before? It’s a process of trial and error.
With the MCP tools I use, it works the first time and every time. There is no “figuring out.”
the point is, is it necessary to create a new protocol?
Why did we have to invent an entire new transport protocol for this, when the only stated purpose is documentation?
Even the auth is just OAuth.
It’s JSON-RPC plus OAuth.
(Plus a couple bits around managing a local server lifecycle.)
Not accurate, but at least makes on think of the underlying semantics. Because, really, what matters is some DSL to discover and describe action invocations.
The agents are writing the mcps, so they can figure out those http and ftp calls. MCP makes it so they dont have to every time they want to do something.
I wouldnt hire a new person to read a manual and then make a bespoke json to call an http server, every single time i want to make a call, and thats not a knock on the person's intelligence. Its just a waste of time doing the same work over and over again. I want the results of calling the API, not to spend all my time figuring out how to call the API
Obviously if the self-modifying, Clawd-native development thing catches on, any old API will work. (Preferably documented but that’s not a hard requirement.)
For now though, Anthropic doesn’t host a clawd for you, so there isn’t yet a good way for it to persist customs integrations.
each ai need context management per conversation this is something that would be very clunky to replicate on top of http or ftp (as in requiring side channel information due session and conversation management)
Everyone looks at api and sure mcp seem redundant there but look at agent driving a browser the get dom method depends on all the action performed from when the window opened and it needs to be per agent per conversation
Can you do that as rest sure sneak a session and conversation in a parameter or cookie but then the protocol is not really just http is it it's all this clunky coupling that comes with a side of unknowns like when is a conversation finished did the client terminate or were just between messages and as you go and solve these for the hundredth time you'd start itching for standardization
It is not a guarantee (as we see with structured output schemas), but it significantly increases compliance.
So why MCP? Are there other protocols that will provide more correctness when trained? Have we tried? Maybe a protocol that offers more compression of commands will overall take up more context, thus offering better correctness.
MCP seems arbitrary as a protocol, because it kinda is. It doesn't >>cause<< the increase in correctness in of itself, the fact that it >>is<< a protocol is the reason it may increase correctness. Thus, any other protocol would do the same thing.
With all due respect if you are prompting correctly and following approaches such as TDD / extensive testing then correctness is not out the window. That is a misunderstanding likely caused by older versions of these models.
Correctness can be as complete as any other new code, I've used the AI to port algorithms from Python to Rust which I've then tested against math oracles and published examples. Not only can I check my code mathematically but in several instances I've found and fixed subtle bugs upstream. Even in well reviewed code that has been around for many years and is well used. It is simply a tool.
> So why MCP? ... MCP seems arbitrary as a protocol
You're right, it is an arbitrary protocol, but it's one that is supported by the industry.See the screencaps at the end of the post that show why this protocol. Maybe one day, we will get a better protocol. But that day is not today; today we have MCP.
This is one key difference between experienced and inexperienced devs; if something looks like crud, it probably is crud. Don’t follow or do something because it’s popular at the time.
I got it to build an MCP server into the app that supported sending commands to allow Claude to interact with it as if it was a user, including keypresses and grabbing screenshots, and the difference was immediate and really beneficial.
Visual issues were previously one of the things it would tend to struggle with.
> Claude imolement plan.md until all unit and browser tests pass
In my case I started with something somewhat like Playwright, and claude had a habit of interacting with the app more directly than a user would be able to and so not spotting problems because of it. Forcing it to interact by pressing keys rather than delving into the dom or executing random javascript helped. In particular I wanted to be able to chat with it as it tried things interactively. This is more to help with manual tests or exploratory testing rather than classic automated testing.
My current app is a desktop app, so playwright isn't as applicable.
I code in 8 languages, regularly, for several open source and industry projects.
I use AI a lot nowadays, but have never ever interacted with an MCP server.
I have no idea what I'm missing. I am very interested in learning more about what do you use it for.
I made a prolog program that knows the valid words and spelling along with sentence conposition rules.
Via the MCP server a translated text can be verified. If its not faultless the agent enters a feedback loop until it is.
The nice thing is that it's implemented once and I can use it in opencode and claude without having to explain how to run the prolog program, etc.
> I have no idea what I'm missing.
The questions I'd ask: - Do you work in a team context of 10+ engineers?
- Do you all use different agent harnesses?
- Do you need to support the same behavior in ephemeral runtimes (GH Agents in Actions)?
- Do you need to share common "canonical" docs across multiple repos?
- Is it your objective to ensure a higher baseline of quality and output across the eng org?
- Would your workload benefit from telemetry and visibility into tool activation?
If none of those apply, then it's not for you. Server hosted MCP over streamable HTTP benefits orgs and teams and has virtually no benefit for individuals.I have been working on a system using a Fjall datastore in Rust. I haven't found any tools that directly integrate with Fjall so even getting insight into what data is there, being able to remove it etc is hard so I have used https://github.com/modelcontextprotocol/rust-sdk to create a thin CRUD MCP. The AI can use this to create fixtures, check if things are working how they should or debug things e.g. if a query is returning incorrect results and I tell the AI it can quickly check to see if it is a datastore issue or a query layer issue.
Another example is I have a simulator that lets me create test entities and exercise my system. The AI with an MCP server is very good at exercising the platform this way. It also lets me interact with it using plain english even when the API surface isn't directly designed for human use: "Create a scenario that lets us exercise the bug we think we have just fixed and prove it is fixed, create other scenarios you think might trigger other bugs or prove our fix is only partial"
One more example is I have an Overmind style task runner that reads a file, starts up every service in a microservice architecture, can restart them, can see their log output, can check if they can communicate with the other services etc. Not dissimilar to how the AI can use Docker but without Docker to get max performance both during compilation and usage.
Last example is using off the shelf MCP for VCS servers like Github or Gitlab. It can look at issues, update descriptions, comment, code review. This is very useful for your own projects but even more useful for other peoples: "Use the MCP tool to see if anyone else is encountering similar bugs to what we just encountered"
the AI gets to do two things:
- expose hidden state - do interactions with the app, and see before/after/errors
it gives more time where the LLM can verify its own work without you needing to step in. Its also a bit more integration test-y than unit.
if you were to add one mcp, make it Playwright or some similar browser automation mcp. Very little has value add over just being able to control a browser
A static set of tools is safer and more reliable.
the agent sees tools as allowed or not by the harness/your mcp config.
For the most part, the same company that you're connecting to is providing the mcp, so its not having your data go to random places, but you can also just write your own. its fairly thin wrappers of a bit of code to call the remote service, and a bit of documentation of when/what/why to do so
Although I have been a skeptic of MCPs, it has been an immense help with agents. I do not have an alternative at the moment.
This is quite literally the opposite opinion I and many others had when first exploring MCP. It's so _obviously_ simple, which is why it gained traction in the first place.
Do you not expose an mcp endpoint? Literally every vscode or opencode node gets it for free (a small json snippet in their mcp.json config) If you do auth right
We can plug in MCP almost anywhere with just a small snippet of JSON and because we're serving it from a server, we get very clear telemetry regardless of tooling and envrionment.
So what’s the best centralized gateway available today, with telemetry and auth and all the goodness espoused in this blog post?
MCP is effectively "just another HTTP REST API"; OAuth and everything. The key parts of the protocol is the communication shape and sequence with the client, which most SDKs abstract for you.
The SDKs for MCPs make it very straightforward to do so now and I would recommend experimenting with them. It is as easy to deploy as any REST API.
https://docs.aws.amazon.com/whitepapers/latest/overview-depl...
it should be part of your app and coordinated in a way that everyone in the enterprise can find all the available mcps. Like backstage or something
Yes, technically, but you've probably meant cruft here.
It’s much easier for users to find what exactly a model can do with your app over it compared to building a skill that would work with it since clients can display every tool available to the user. There’s also no need for the model to setup any environment since it’s essentially just writing out a function, which saves time since there’s no need to setup as many virtual machine instructions.
It obviously isn’t as useful in development environments where a higher level of risk can be accepted since changes can always be rolled back in the repository.
If I recall correctly, there’s even a whole system for MCP being built, so it can actually show responses in a GUI much like Siri and the Google Assistant can.
> If I recall correctly, there’s even a whole system for MCP being built, so it can actually show responses in a GUI much like Siri and the Google Assistant can
That's MCP progress spec: https://modelcontextprotocol.io/specification/2025-11-25/bas...However, MCP is context bloat and not very good compared to CLIs + skills mechanically. With a CLI you get the ability to filter/pipe (regular Unix bash) without having to expand the entire tool call every single time in context.
CLIs also let you use heredoc for complex inputs that are otherwise hard to escape.
CLIs can easily generate skills from the —help output, and add agent specific instructions on top. That means you can give the agent all the instructions it needs to know how to use the tools, what tools exist, lazy loaded, and without bloating the context window with all the tools upfront (yes, I know tool search in Claude partially solves this).
CLIs also don’t have to run persistent processes like MCP but can if needed
Then I have a troubleshooting file (also linked from the main SKILL file) which basically lists out all the 'gotchas' that are unique to my platform and thus the LLM may struggle with in complex scenarios.
After a lot of testing, I identified just 5 gotchas and wrote a short section for each one. The title of each section describes the issue and lists out possible causes with a brief explanation of the underlying mechanism and an example solution.
Adding the troubleshooting file was a game changer.
If it runs into a tricky issue, it checks that troubleshooting file. It's highly effective. It made the whole experience seamless and foolproof.
My platform was designed to reduce applications down to HTML tags which stream data to each other so the goal is low token count and no-debugging.
I basically replaced debugging with troubleshooting; the 5 cases I mentioned are literally all that was left. It seems to be able to quickly assemble any app without bugs now.
The 'gotchas' are not exactly bugs but more like "Why doesn't this value update in realtime?" kind of issues. They involve performance/scalability optimizations that the LLM needs to be aware of.
In v0, people can add e.g. Supabase, Neon, or Stripe to their projects with one click. We then auto-connect and auth to the integration’s remote MCP server on behalf of the user.
v0 can then use the tools the integration provider wants users to have, on behalf of the user, with no additional configuration. Query tables, run migrations, whatever. Zero maintenance burden on the team to manage the tools. And if users want to bring their own remote MCPs, that works via the same code path.
We also use various optimizations like a search_tools tool to avoid overfilling context
I'd recommend that you take a peek at MCP prompts and resources spec and understand the purpose that these two serve and how they plug into agent harnesses.
> I'd recommend that you take a peek at MCP prompts and resources spec
Don't assume that if somebody does not like something they don't know what it is. MCP makes happy developers that need the illusion of "hooking" things into the agent, but it does not make LLMs happy.
A local mcp doesn't come in play because they just couldn't offer the same features in this case.
So when you run it, your codign agent is using AI to run that code (what to call, what parameters to pass, and so on). Via MCP, they don't pay any LLM cost; they just offer the code and the endpoint.
But this is usually messy for the coding agent since it fills up the context. While if you use skill + API, it's easier for the agent since there's no code in the context, just how to call the API and what to pass.
With something like this, you can then have very complex things happening in the endpoint without the agent worrying about context rot or being able to deal with that functionality.
But to have that difficult functionality, you also need to call an LLM inside the endpoint, which is problematic if the person offering the MCP service does not want to cover LLM costs.
So it does matter if it's an endpoint or an MCP because the agent is able to do more complex and robust stuff if it uses skill and HTTP.
More than 200% growth in official MCP servers in past 6 months: https://bloomberry.com/blog/we-analyzed-1400-mcp-servers-her...
IMO, by default MCP tools should run in forked context. Only a compacted version of the tool response should be returned to the main context. This costs tokens yes, but doesn't blow out your entire context.
If other information is required post-hoc, the full response can be explored on disk.
1. You can make the script very specific for the skill and permission appropriately.
2. You can have the output of the script make clear to the LLM what to do. Lint fails? "Lint rules have failed. This is an important for reasons blah blah and you should do X before proceeding". Otherwise the Agent is too focused on smashing out the overall task and might opt route around the error. Note you can use this for successful cases too.
3. The output and token usage can be very specific what the agent needs. Saves context. My github comments script really just gives the comments + the necessary metadata, not much else.
The downsides of MCP all focus on (3), but the 1+2 can be really important too.
> (I preface that this is primarily relevant for orgs and enterprises; it really has no relevance for individual vibe-coders)
The thing about tools that "democratize" software development, whether it is Visual Studio/Delphi/QT or LLMs, is that you wind up with people in organizations building internal tools on which business processes will depend who do not understand that centralization is key. They will build these tools in ignorance of the necessity of centralization-centric approaches (APIs, MCP, etc.) and create Byzantine architectures revolving around file transfers, with increasing epicycles to try to overcome the pitfalls of such an approach.
Once you have 10-20 people using agents in wildly different ways getting wildly different results, the question of "how do I baseline the capabilities across my team?" becomes very real.
In our team, we want to let every dev use the agent harness that they are comfortable with and that means we need a standard mechanism of delivering standard capabilities, config, and content across the org.
I don't see it as democratization versus corporate facism in so much as it is "can we get consistent output from developers of varying degrees of skill using these agents in different ways?"
But it's putting a lot of trust in the remote server not to prompt-inject you, perhaps accidentally. Also, what if the remote docs don't suit local conditions? You could make local edits to a skill if needed.
Better to avoid depending on a remote API when a local tool will do.
Most folks are familiar with MCP tools but not so much MCP resources[0] and MCP prompts[1]. I'd make the case that these latter two are way more powerful and significant because (most) tools support them (to varying degrees at the moment, to be fair).
For teams/orgs, these are really powerful because they simplify delivery of skills and docs and moves them out of the repo (yes, there are benefits to this, especially when the content is applicable across multiple repos) on top of surfacing telemetry that informs usage and efficacy.
Why would you do it? One reason is that now you can index your docs with more powerful tools. Postgres FTS, graph databases to build a knowledge base, extract code snippets and build a best practices snippet repo, automatically link related documents by using search, etc.
[0] https://modelcontextprotocol.io/specification/2025-06-18/ser...
[1] https://modelcontextprotocol.io/specification/2025-06-18/ser...
It provides a unified way to connect tools (whether local via stdio or remote via HTTP), handles bidirectional JSON-RPC communication natively, and forces tools to be explicit about their capabilities, which is exactly what you want for managing LLM context and agentic workflows.
This current anti-MCP hype train feels highly reminiscent of the recent phase where people started badmouthing JSON in favor of the latest niche markup language. It’s just hype driven contrarianism trying to reinvent the wheel.
Great article otherwise. I've been wondering why people are so zealous about MCP vs executable tools, and it looks like it's just tradeoffs between implementation differences to me.
Why? Because when you pair output schema with CodeAct agents (agents that reason and act by writing executable code rather than natural language, like smolagents by Hugging Face), you solve some of the most painful problems in agentic tool use:
1. Context window waste: Without output schema, agents have to call a tool, dump the raw output (often massive JSON blobs) into the context window, inspect it, and only then write code to handle it. That "print-and-inspect" pattern burns tokens and attention on data the agent shouldn't need to explore in the first place.
2. Roundtrip overhead: Writing large payloads back into tools has the same problem in reverse. Structured schemas on both input and output let the agent plan a precise, single-step program instead of fumbling through multiple exploratory turns.
There's a blog post on Hugging Face that demonstrates this concretely using smolagents: https://huggingface.co/blog/llchahn/ai-agents-output-schema
And the industry is clearly converging on this pattern. Cloudflare built their "Code Mode" around the same idea (https://blog.cloudflare.com/code-mode/), converting MCP tools into a TypeScript API and having the LLM write code against it rather than calling tools directly. Their core finding: LLMs are better at writing code to call MCP than at calling MCP directly. Anthropic followed with "Programmatic tool calling" (https://www.anthropic.com/engineering/code-execution-with-mc..., https://platform.claude.com/docs/en/agents-and-tools/tool-us...), where Claude writes Python code that calls tools inside a code execution container. Tool results from programmatic calls are not added to Claude's context window, only the final code output is. They report up to 98.7% token savings in some workflows.
So the point here is: MCP isn't just valuable for the centralization, auth, and telemetry story the author laid out (which I fully agree with). The protocol itself, specifically its structured schema capabilities, directly enables more efficient and reliable agentic workflows. That's a concrete technical advantage that CLIs simply don't offer, and it's one more reason MCP will stick around.
Long live MCP indeed.
But fundamentally that doesn’t make sense. If an AI needs to be fed instructions or schemas (context) to understand how to use something via MCP, wouldn’t it need the same things via CLI? How could it not? This article points that out, to be clear. But what I’m calling out is how simple it is to determine for yourself that this isn’t an MCP versus CLI battle. However, most people seem to be falling for this narrative just because it’s the new hot thing to claim (“MCP is dead, Long Live CLI”).
As for Google - they previously said they are going to support MCP. And they’ve rolled out that support even recently (example from a quick search: https://cloud.google.com/blog/products/ai-machine-learning/a...). But now with the Google Workspace CLI and the existence of “Gemini CLI Extensions” (https://geminicli.com/extensions/about/), it seems like they may be trying to diminish MCP and push their own CLI-centric extension strategy. The fact that Gemini CLI Extensions can also reference MCP feels a lot like Microsoft’s Embrace, Extend, Extinguish play.
Or...just don't slam 100 tools into your agent in the first place.
But I can do them with CLI so that's a negative for MCP?
100 MCP tools will bloat the context whereas 100 CLI's won't. Which part do you disagree with?
2. The part where you think your agent is going to know how to use 100 CLI tools that are not already in its training dataset without using extra turns walking the help content to dump out command names and schemas
3. The part where, without a schema defining the inputs, the LLM wastes iterations trying to correct the input format.
4. The part where, not having the full picture of the tools, your odds of it picking the same tools or the right tools is completely gambling that it outputs the right keywords to trigger the tool to be used.
5. The part where you forgot to mention that for your agent to know that your 100 CLI tools exist, you had to either provide it in context directly, provide it in context in a README.md, or have it output the directory listing and send that off to the LLM to evaluate before picking the tool and then possibly expanding the man pages for several tools and sub commands using several turns.
Don't get me wrong, CLIs are great if its already in the LLMs training set (`git`, for example). Not so great if it's not because it will need to walk the man pages anyways.
I'm not sure how that solves the issue. The shape of each individual tool will be different enough that you will need different schema - something you will be passing each time in MCP and something you can avoid in CLI. Also, CLI's can also be flexible.
> The part where you think your agent is going to know how to use 100 CLI tools that are not already in its training dataset without using extra turns walking the help content to dump out command names and schemas
By CLI's we mean SKILLS.md so it won't require this hop.
> The part where, without a schema defining the inputs, the LLM wastes iterations trying to correct the input format.
What do we lose by one iteration? We lose a lot by passing all the tool shapes on each turn.
> The part where, not having the full picture of the tools, your odds of it picking the same tools or the right tools is completely gambling that it outputs the right keywords to trigger the tool to be used.
we will use skills
> The part where you forgot to mention that for your agent to know that your 100 CLI tools exist, you had to either provide it in context directly, provide it in context in a README.md, or have it output the directory listing and send that off to the LLM to evaluate before picking the tool and then possibly expanding the man pages for several tools and sub commands using several turns.
skills
This is what the skill file is for.
>Centralizing this behind MCP allows each developer to authenticate via OAuth to the MCP server and sensitive API keys and secrets can be controlled behind the server
This doesn't require MCP. Nothing is stopping you from creating a service to proxy requests from a CLI.
The problem with this article is it doesn't recognize that skills is a more general superset compared with MCP. Anything done with MCP could have an equivalent done with a skill.
This is one of the first posts that I've see that cuts through the hype against both MCPs and CLIs with nuance findings.
There were times where it didn't make sense for using MCPs (such as connecting it to a database) and CLIs don't make sense at all for suddenly generating them for everything. It just seems like the use-case was a solution in search of a problem on top of a bad standard.
But no-one could answer "who" was the customer of each of these, which is why the hype was unjustified.