I have to say, the endorsements at the end somehow made this seem worse…
fwiw i thought the message structure was pretty clear on the docs https://modelcontextprotocol.io/docs/concepts/architecture#m...
I also think the docs are pretty good. There's just something about seeing the actual network requests that helps clarify things for me.
Every project I do is an assertion that I don't believe the thing I make exists.
I have been unable to find a streaming forward only markdown renderer for the terminal nor have I been able to find any suitable library that I could build one with.
So I've taken on the ambitious effort of building my own parser and renderer and go through all the grueling testing that entails
But it seems self-evident where constraints like markets or material conditions might demarcate usefulness and waste.
Even the learners who are as happy to hear about linguistics as they are material science I presume do some opportunity cost analysis as they learn. Personally speaking, I rarely, if ever, feel like I'm wasting time per se but I always recognize and am conscious of the other things I could be doing to better maximize alternative objectives. That omnipresent consciousness may just be anxiety though I guess...
I'll check out your proxy as well, I enjoy looking at anything built around networking.
i think it is still a proxy though unless I’m missing something (beyond the name lol).
[here's a section on macos dealing with certs](https://mitmproxy.org/posts/local-capture/macos/)
in short, you can hook calls within SSL libraries (like OpenSSL)
I really is just json-rpc 2.0 under the hood, either piped to stdio or POSTed over http.
https://google.github.io/A2A/#/documentation?id=multi-turn-c...
A2A is a conduit for agents to speak in their native modalities. From the receiving agent implementation point of view, there shouldn't be a difference in "speaking" to a user/human-in-the-loop and another agent. I'm not aware of anything in the protocol that is sensitive to the content. A2A has 'Messages' and 'Artifacts' to distinguish between generated content and everything else (context, thoughts, user instructions, etc) and should be robust to formatting challenges (since it relies on the underlying agent).
The sensitivity to prompts and response quality are related to an agent's functionality, A2A is only addressing the communication aspects between agents and not the content within.
Basically if we expose our API over MCP, agents can "figure it out". But MCP isn't secure enough today, so hoping that gets enhanced.
Which is an open standard that is Apache licensed[1]. That's no moat for Google. At best it's a drainage ditch.
"If you just get out of people's way, then they'll do a good job and the right thing!" - yea, perhaps. But how much of "getting out their way" is more a product of providing meaningful ownership and compensation in the workplace? See the paragraph above. Good employees are expensive and as time marches on, their compensation will need to continue to increase at least with inflation, while the machine will likely become cheaper to operate over time as societal advances bring down the cost and complexity of operation.
from fastmcp import FastMCP
mcp = FastMCP("Demo ")
@mcp.tool()
def add(a: int, b: int) -> int:
"""Add two numbers"""
return a + b
This is an example of fastmcp. Notice anything? Replace 2-3 lines of code and this is a Flask or FastAPI application. Why are we not just going all-in on REST/HATEOAS for these things? My only hunch is that either 1. the people designing/proselytizing these "cutting edge" solutions are simply ignorant to how systems communicate and all the existing methods that exist, or 2. they know full well that this is just existing concepts with a new shiny name but don't care because they want to ride the hype train and take advantage of it.Well, I just told my llm agent to use the `gh` cli instead.
It seems all those new protocols are there to re-invent wheels just to create a new ecosystem of free programs that corporations will be able to use to extract value without writing the safety guards themselves.
The irony is that gpt4 has no clue which approach is correct. Give it the same prompt three times and you’ll get a solution that uses each of these that has a wildly different footprint, be it via function calls or system prompts, schema or no schema, etc.
I fail to see how they're different, they're both "these are the remote procedures you can call on me, and the required parameters, maybe some metadata of the function/parameters".
An existing Swagger/OpenAPI spec is not sufficient. You want to limit options and make it easy for an LLM to call your tool to accomplish goals. The complete API surface of your application might not be appropriate. It might be too low level or require too many orchestration steps to do anything useful.
A lot of existing API's require making additional calls using the results of previous calls. GET /users to get a list of ids. Then repeatedly call GET users/$id to get the data. In a MCP world you would provide a get-users tool that would do all this behind the scenes and also impose any privacy/security/auth restrictions before handing this over to an LLM.
We see similar existing systems like GraphQL which provides a fully hydrated resultset in one call. Tons of API's like Stripe (IIRC) that provide a &hydrate= parameter to specific which relations to include full details in-line.
I do agree MCP is overhyped and might not be using best principles but I do see why its going off in its own land. It might be better suited over different protocols or transports or encodings or file formats but it seems to at least work so until something better comes along we are probably stuck with it.
For one, REST is not RPC, despite being commonly confused for it and abused as such. The conceptual models are different. It makes more sense for an action-oriented RPC protocol to be defined as such, instead of a proper REST approach (which is going to be way too verbose), or some bastardized "RESTful" protocol that's just weirdly-structured RPC designed so people can say, "look ma', I'm using HTTP verbs, I'm doing REST".
The benefit it brings is that you can add debugging endpoints which you can use directly in a browser, you get networking with hosts and ports instead of local-only exe + stdio.
stdio is a file that your computer can write to and read from
HTTP is a protocol typically used over TCP
websockets is a protocol initiated via HTTP, which again is typically over TCP
Both HTTP and websockets can be done over stdio instead of TCP.
It sounds like MCP has a lot more "irrelevant baggage" I need to learn/consider.
This is a problem solved by other protocols that are just stacked on top of eachother without knowing how eachother work.
I should probably just go read Goose's code…
But it can be a json with the tool name and the payload for the tool.
holy cow you weren't kidding. legit the last people i would trust with software development.
This also meant that you could do things like create diffs between your current service API client and an updated service API client from the broadcasting service. For example, if the service changed the parameters or data objects, deprecated or added functions then you could easily see how your client implementation differed from the service interface. It also provided some rudimentary versioning functionality, IIRC. Generally servers also made this information available with an HTML front-end for documentation purposes.
So while the promise of one day services configuring themselves at runtime was there, it wasn't really ever an expectation. IMO, the reason WSDL failed is because XML is terrifically annoying to work with and SOAP is insanely complex. JSON and REST were much simpler in every way you can imagine and did the same job. They were also much more efficient to process and transmit over the network. Less cognitive load for the dev, less processor load, less network traffic.
So the "runtime" explanation isn't really valid as an excuse for it's failure, since the discovery was really meant more in practice like "as a programmer you can know exactly what functions, parameters, data-objects any service has available by visiting a URL" and much less like "as a runtime client you can auto-configure a service call to a completely new and unknown service using WSDL". The second thing was a claim that one-day might be available but wasn't generally used in practice.
Is that how people build system even today? Dynamic service and method discovery sounds good on paper but I've never actually seen it in practice.
WSDLs and XSDs done right are a godsend for transmitting your API spec to someone. I use .NET and can call xsd.exe to generate classes from the files in a few seconds. It "just works" if both sides follow all of the rules.
The APIs I work with would be cartoonish if we didn't have these tools. We're talking 10 megabytes of generated sources. It is 100x faster to generate these types and then tunnel through their properties via intellisense than it is to read through any of these vendors' documentation.
This sounds like protobuf and gRPC. Is that a close analogy?
The tooling around these paths is also lackluster by comparison if you're using something like Visual Studio.
I'd rather fight XML namespaces and HTTP/1.1 transports than sort through the wreckage of what "best practices" has recently brought to bear - especially in terms of unattended complexity in large, legacy enterprises. Explaining to a small bank in Ohio that they're going to need to adjust all of their firewalls to accommodate some new protocols is a total nonstarter in my business.
I hate that for years the concept of RPC was equated to XML which in turn equated to some implementation of the (XML based) tool and then a whole lot of distracting discourse around XML vs JSON, we kinda do still have that these days with yaml vs whatever.
If there's one thing I've observed about developers in general, it's that they'd rather build than learn.
MCP is solving specific problems people have in practice today. LLMs need access to data that they weren't trained on, but that's really hard because there's a millions different ways you could RAG something. So MCP defines a standard by which LLMs can call APIs through clients. (and more).
A2A solves a marketing problem that Google is chasing with technology partners.
I think I can safely say which one will still be around in 6 months, and it's not the one whose contributors all work for the same company.
For example, most companies have an internal directory and internal private APIs and tools. They can build an agent to help complete internal tasks. However, they also may purchase an "HR Agent" or "Travel Assistant Agent" or "Tax Preparation Agent" or "Facilities Control Agent". These agents aren't sharing their private APIs and data with each other.
It's also difficult to model these agents as structured tools. For example, a "Tax Preparation Agent" may need to evaluate many different options and ask for specific different documents and information based on an individual users needs. Modeling this as 100s of tools isn't practical. That's where we see A2A helping. Talk to an agent as an agent.
This lets a user talk to only their company agent and then have that agent work with the HR Agent or Travel Booking Agent to complete complex tasks.
LangChain is still around but that doesn't mean much. MCP isn't much better.
MCP solves a data and API integration problem.
Both are concrete things that people need to do today. AI agents talking to one another is not a concrete problem that organizations building features that integrate AI have today.
I didn't feel the need to use Langchain, chaining LLM calls is usually just a few lines of code (I think even fewer than when using Langchain).
The jsonrpc calls look similar-ish to mcp tool calls except the inputs and outputs look closer to the inputs/outputs from calling an LLM (ie messages, artifacts, etc.).
The JS server example that they give is interesting https://github.com/google/A2A/tree/main/samples/js/src/serve... - they're using a generator to send sse events back to the caller - a little weird to expose as the API instead of just doing what express allows you to do after setting up an sse connection (res.send / flush multiple times).
There exist no such thing as "out-of-band signaling" in nature. It's something we introduce into system design, by arranging for one part to constrain the behavior of other, trading generality for predictability and control. This separation is something created by a mind, not a feature of the universe.
Consequently, humans don't support "out-of-band signalling either. All of our perception of reality, all our senses and internal processes, they're all on the same band. As such, when aiming to build a general AI system - able to function in the same environment as us, and ideally think like us too - introducing hard separation between "control" and "data" or whatever would prevent it from being general enough.
I said "or whatever", because it's an ill-defined idea anyway. I challenge anyone to come up with any kind of separation between categories of inputs for an LLM that wouldn't obviously eliminate a whole class of tasks or scenarios we would like them to be able to handle.
(Also, entirely independently of the above, thinking about the near future, I challenge anyone to come up with a separation between input categories that, were we to apply it to humans, wouldn't trivially degenerate into eternal slavery, murder, or worse.)
+++ATH0
into my comment and have it hang up your connection, so it's worth some effort to prevent the problem.Anything that would hang up on seeing that string as a monolith was operating out of Hayes spec.
There is no evidence that (real AI) is even close to being solved, from a neuroscientific, algorithmic, computer science or engineering perspective. It's far more likely we're going down a dead-end path.
I'm now waiting for the rebrand when the ass falls out of AI investment, the same way it did when ML became passé.
Yes.
Hallucinations were a big problem with single shot prompting. No one is seriously doing that anymore. You have an agentic refinement process with an evaluator in the loop that takes in the initial output, quality checks it, and returns a pass/fail to close the loop or try again, using tool calls the whole time to inject verified/real time data into the context for decision making. Allows you to start actually building reliable/reasonable systems on top of LLMs with deterministic outputs.
https://modelcontextprotocol.io/specification/2025-03-26/ser...
“ For trust & safety and security, there SHOULD always be a human in the loop with the ability to deny tool invocations.
Applications SHOULD:
Provide UI that makes clear which tools are being exposed to the AI model Insert clear visual indicators when tools are invoked Present confirmation prompts to the user for operations, to ensure a human is in the loop”
Thanks for the reference though, I'll quote that in my article.
Discouraging people from anthropomorphizing computer systems, while generally sound, is doing a number on everyone in this particular case. For questions of security, by far one of the better ways of thinking about systems designed to be general, such as LLMs, is by assuming they're human. Not any human you know, but a random stranger from a foreign land. You've seen their capabilities, but you know very little about their personal goals, their values and allegiances, nor you really know how credulous they are, or what kind of persuasion they may be susceptible to.
Put a human like that in place of the LLM, and consider its interactions with its users (clients), the vendor hosting it (i.e. its boss) and the company that produced it (i.e. its abusive parents / unhinged scientists, experimenting on their children). With tools calling to external services (with or without MLP), you also add third parties to the mix. Look at this situation through regular organizational security lens, consider principal/agent problem - and then consider what kind of measures we normally apply to keep a system like this working reliably-ish, and how do those measures work, and then you'll have a clear picture of what we're dealing with when introducing an LLM to a computer system.
No, this isn't a long way of saying "give up, nothing works" - but most of the measures we use to keep humans in check don't apply to LLMs (on the other hand, unlike with humans, we can legally lobotomize LLMs and even make control systems operating directly on their neural structure). Prompt injection, being equivalent to social engineering, will always be a problem.
Some mitigations that work are:
1) not giving the LLM power it could potentially abuse in the first place (not applicable to MLP problem), and
2) preventing the parties it interacts with from trying to exploit it, which is done through social and legal punitive measures, and keeping the risky actors away.
There are probably more we can come up with, but the important part, designing secure systems involving LLMs is like securing systems involving people, not like securing systems made purely of classical software components.
Edit: My apologies then.
The act of writing a comment on HN forces me to think through the opinions and beliefs in it, which is extremely valuable to me :). Half the time, I realize partway through that I'm wrong, and close the window instead of submitting.
Specifically the core design principal is you have to be comfortable with any possible combination of things your agent can do with its tools, not only the combination you ask for.
If your agent can search the web and can access your WhatsApp account, then you can ask it to search for something and text you the results -- cool. But there's some possible search result that would take over its brain and make it post your WhatsApp history to the web. So probably you should not set up an agent that has MCPs to both search the web and read your WhatsApp history. And in general many plausibly useful combinations of tools to provide to agents are unsafe together.
is it only use pre-vetter "Apple Store" of known good MCP integrations from well known companies, and avoid using anything else without proper review?
This has been discussed before, but the short version is: there is no solution currently, other than only use trusted sources.
Unless there is a way beyond a flat text file to distinguish different parts of the “prompt data” so they cannot interfere with each other (and currently there is not), this idea of arbitrary content going into your prompt (which is literally what MCP does) can’t be safe.
It’s flat out impossible.
The goal of “arbitrary 3rd party content in prompt” is fundamentally incompatible with “agents able to perform privileged operations” (securely and safely, that is).
A2A is for communication between the agents. MCP is how agent communicate with its tools.
Important aspect of A2A, is that it has a notion of tasks, task rediness, and etc. E.g. you can give it a task and expect completely in few days, and get notified via webhook or polling it.
For the end users for sure A2A will cause a big confusing, and can replace a lot of current MCP usage.
What if I wrap the agent as a tool in MCP?
Since the agents I got from the 'A2A' protocol is passed as tools to another Agent...
https://github.com/google/A2A/blob/72a70c2f98ffdb9bd543a57c8...
MCP - exposes prompts, resources and tools to a host, who can do whatever they like
A2A - exposes capability discovery, tasks, collaboration?/chat?, user experience discussions (can we embed an image or or a website?).
High-level it makes sense to agree on these concepts. I just wonder if we really need a fully specified protocol? Can't we just have a set of best practices around API endpoints/functions? Like, imo we could just keep using Rest APIs and have a convention that an agent exposes endpoints like /capabilities, /task_status ...
I have similar thoughts around MCP. We could just have the convention to have an API endpoint called /prompts and keep using rest apis?
Not sure what I am missing.
Ideally, the model providers would then build for the protocol, so the developers aren't writing spaghetti code for every small difference
To make this work at scale we all need to agree on the specific routes names, payloads, behaviors, etc. At that point we have defined a protocol (built on top of HTTP, itself a lower level protocol).
Companies who are betting their future on LLMs realized a few years ago that the data they can legally use is the only long term difference between them, aka “moat.”
Now that everyone has more or less the same public data access, and a thin compute moat is still there, the goal is to transfer your private textual data to them forever so they have an ever updating and tuned set of models for your data
>transfer your private textual data to them
Who is "they" (or "them") in these sentences? It's an open protocol with 50 partner companies, which can be used with AI agents from ~anyone on ~any framework. Presumably you can use this protocol in an air-gapped network, if you'd like.
Which one of the 50 partner companies is taking my data and building the moat? Why would the other 49 companies agree to a partnership if they're helping build a moat that keeps them out?
To put it bluntly, the point of creating the open interface at this level, is that you get to close off everything else.
Open the interface publicly then monetize the I/O or storage or processing.
Classic high margin SaaS approach with a veneer of “open.”
You can look at it as a standards capture
I had been working on some personal projects over the last few months that would've benefitted enormously from having this kind of standard A2A protocol available. My colleagues and I identified it months ago as a major need, but one that would require a lot of effort to get buy-in across the industry, and I'm happy to see that Google hopped in to do it.
I absolutely get the value of LLMs calling tools and APIs. I still don't see much value in LLMs calling other LLMs.
Everyone gets really excited about it - "langchain" named their whole company over the idea of chaining LLMs together - but aside from a few niche applications (Deep Research style tools presumably fire off a bunch of sub-prompts to summarize content they are crawling, Claude Code uses multiple prompts executions to edit files) is it really THAT useful? Worth building an entire new protocol with a flashy name and a bunch of marketing launch partners?
LLMs are unreliable enough already without compounding their unreliability by chaining them together!
We are working with partners on very specific customer problems. Customers are building individual agents in different frameworks OR are purchasing agents from multiple vendors. Those agents are isolated and do not share tools, or memory, or context.
For example, most companies have an internal directory and internal private APIs and tools. They can build an agent to help complete internal tasks. However, they also may purchase an "HR Agent" or "Travel Assistant Agent" or "Tax Preparation Agent" or "Facilities Control Agent". These agents aren't sharing their private APIs and data with each other.
It's also difficult to model these agents as structured tools. For example, a "Tax Preparation Agent" may need to evaluate many different options and ask for specific different documents and information based on an individual users needs. Modeling this as 100s of tools isn't practical. That's where we see A2A helping. Talk to an agent as an agent.
This lets a user talk to only their company agent and then have that agent work with the HR Agent or Travel Booking Agent to complete complex tasks when they cannot be modeled as tools.
If you believe there is value in fuzzy tasks being done by LLMs then from that it follows that having separate "agent" services with a higher order orchestrator would be required. Each calling LLMs on their own inside.
You have anything else that modifies the context, tools, model, and most importantly perhaps the iteration that controls what's going on with those other values.
Not to mention the cost being a factor here - who pays for which part.
My team's been working on implementing MCP-agents and agents-as-tools and we consistently saw confusion from everyone we were selling this into (who were already bought in to hosting an MCP server for their API or SDK) for their agents because "that's not what it's for".
Kinda weird, but kinda simple.
We are working with partners on very specific customer problems. Customers are building individual agents in different frameworks OR are purchasing agents from multiple vendors. Those agents are isolated and do not share tools, or memory, or context.
For example, most companies have an internal directory and internal private APIs and tools. They can build an agent to help complete internal tasks. However, they also may purchase an "HR Agent" or "Travel Assistant Agent" or "Tax Preparation Agent" or "Facilities Control Agent". These agents aren't sharing their private APIs and data with each other.
It's also difficult to model these agents as structured tools. For example, a "Tax Preparation Agent" may need to evaluate many different options and ask for specific different documents and information based on an individual users needs. Modeling this as 100s of tools isn't practical. That's where we see A2A helping. Talk to an agent as an agent.
This lets a user talk to only their company agent and then have that agent work with the HR Agent or Travel Booking Agent to complete complex tasks when they cannot be modeled as tools.
https://google.github.io/A2A/#/topics/a2a_and_mcp
Basically (google claims): MCP enables agents to use resources in a standard way. A2A enables those agents to collaborate with each other.
I figure this A2A idea will wind up in the infamous Google graveyard within 8 months.
The list is aimed at bureaucratic manager types (which may be the correct approach if they are generally the decision makers), its not a list that will impress engineers too much I think.
We are working with partners on very specific customer problems. Customers are building individual agents in different frameworks OR are purchasing agents from multiple vendors. Those agents are isolated and do not share tools, or memory, or context.
For example, most companies have an internal directory and internal private APIs and tools. They can build an agent to help complete internal tasks. However, they also may purchase an "HR Agent" or "Travel Assistant Agent" or "Tax Preparation Agent" or "Facilities Control Agent". These agents aren't sharing their private APIs and data with each other.
It's also difficult to model these agents as structured tools. For example, a "Tax Preparation Agent" may need to evaluate many different options and ask for specific different documents and information based on an individual users needs. Modeling this as 100s of tools isn't practical. That's where we see A2A helping. Talk to an agent as an agent.
This lets a user talk to only their company agent and then have that agent work with the HR Agent or Travel Booking Agent to complete complex tasks when they cannot be modeled as tools.
My team's been working on implementing MCP-agents and agents-as-tools and we consistently saw confusion from everyone we were selling this into (who were already bought in to hosting an MCP server for their API or SDK) for their agents because "that's not what it's for".
Kinda weird, but kinda simple.
As far as I can remember, never really left the research lab, with a few books and papers published on the matter.
Everything old is new again.
[1]: https://wiki.tcl-lang.org/page/D%27Agents+%28formerly+Agent+...
In some ways, it's a shame it didn't catch on, but the security / access control issues you mention certainly make a lot of sense. That seems to be the big issue that derailed most, if not all, of the various "mobile code" initiatives over the years.
Kinda weird, but kinda simple.
A "server" sample: https://github.com/google/A2A/tree/main/samples/js/src/serve...
So it looks like the point is that it keeps the connection/context open for multiple interactions vs. MCP, which is more like pure request-response?
1. This is in the “embrace and extend” type area vis-a-vis MCP — if you implemented A2A for a project I don’t think you’d need to implement MCP. That said, if you have an MCP server, you could add a thin layer for A2A compliance.
2. This hits and improves on a bunch of pain points for MCP, with reasonable relatively light weight answers — it specs out how in-band and out-of-band data should get passed around, it has a sane (token based largely) approach to security for function calling, it has thought about discovery and security with a simple reliance on the DNS security layer, for instance.
3. The full UI demos imagine significantly more capable clients - ones that can at least implement Iframes - and reconnect to lost streaming connections, among other things. It’s not clear to me that there’s any UI negotiation baked into this right now, and it’s not clear to me what the vision is for non-HTML-capable clients. That said, they publish clients that are text-only in the example repo. It may be an area that isn’t fully fleshed out yet, or there may be a simple answer I didn’t see immediately.
Upshot - if you’re building an MCP server right now, great —- you should read the A2A spec for a roadmap on some things you’ll care about at some point, like auth and out of band data delivery.
If you’re thinking about building an MCP server, I’m not sure I’d move ahead on vanilla MCP - I think the A2A spec is better specified, and if for some reason A2A doesn’t take off, it will only be because MCP has added support for a couple of these key pain points — it should be relatively easy to migrate.
I think any mid-size or better tool calling LLM should be able to get A2A capability json and figure out what tool to call, btw.
One last thing - I appreciate the GOOG team here for their relatively clear documentation and explanation. The MCP site has always felt a little hard to understand.
Second last thing: notably, no openAI or Anthropic support here. Let’s hope we’re not in xkcd 927 land.
Upshot: I’d think of this as a sane superset of MCP and I will probably try it out for a project or two based on the documentation quality. Worst case, writing a shim for an exact MCP capable server is a) probably not a big deal, and b) will probably be on GitHub this week or so.
That sounds exactly like the kind of thing I would outsource to an LLM. I think people over think the need for protocols here. Most AIs are already pretty good at figuring out how to plumb relatively simple things together if they have some sort of documented interface. What the interface is doesn't really matter that much. I've had good results just letting it work off openapi descriptions. Or generating those from server source code. It's not that hard.
In any case, MCP is basically glorified remote procedure calls for LLMs. And then Google adds a bit of probably necessary complexity on top of that (auth sounds important if we're connecting with third party systems). Long lived tasks and out of band data exchange sounds like it could be useful.
For me the big picture and takeaway is that a future of AIs using tools, some of which may be other AIs using tools communicating with each other asynchronously is going to be a thing. Probably rather soon. Like this year.
That puts pressure on people to expose capabilities of their SAAS services in an easily digestible form to external agents. That's going to generate a lot of short term demand from various companies. Most of whom are not really up to speed with any of this. Great times to be a consultant but beware the complexity that design by committee generates.
For example I wish they'd specify the date format more tightly - unix timestamp, some specific ISO format, precision. Which is it?
The sessionID is not specified. You can put all sorts of crazy stuff in there, and people will. Not even a finite length is required. Just pick some UUID format already, or specify it has to be an incrementing integer.
Define some field lenght limits that can be found on the model card - e.g. how long can the description field be before you get an error? Might be relevant to context sizes. If you don't you're going to have buffer overflow issues everywhere because vibe coders will never think of that.
Authentication methods are specified as "Open API Authentication formats, but can be extended to another protocol supported by both client and server". That's a recipe for a bunch of byzantine Enterprize monstrosities to rear their ugly heads. Just pick one or two and be done with it.
The lesson of past protocols is that if you don't tightly specify things you're going to wind up with a bunch of nasty little incompatibilities and "extensions" which will fragment the ecosystem. Not to mention security issues. I guess on the whole I'm against Postel's Law on this.
Thank you for the feedback? Would you consider writing up an issue on our github with some more specifics? https://github.com/google/a2a
A2A is being developed in the open with the community. You are finding some early details that we are looking into and will be addressing. We have many partners who will be contributing and want this to be a truly open, collaborative endeavor. We acknowledge this is a little different than dropping a polished '1.0' version in github on day 1. But that is intentional :)
I see MCP as vital when building an agent. An agent is an LLM with data, resources, tools, and services. However, our customers are building or purchasing agents from other providers - e.g. purchasing "HR Agent", "Bank Account Agent", "Photo Editor Agent", etc. All of these agents are closed systems and have access to private data, APIs, etc. There needs to be a way for my agent to work with these other agents when a tool is not enough.
Other comments you have are spot on - the current specification and samples are early. We are working on many more advanced examples and official SDKs and client/servers. We're working with partners, other Google teams, and framework providers to turn this into a stable standard. We're doing it in the open - so there are things that are missing because (a) its early and (b) we want partners and the community to bring features to the table.
tldr - this is NOT done. We want your feedback and sincerely appreciate it!
What “agents” need is not a protocol for operating, they need a protocol for discovery and addressability. How do I find someone’s agent? How do I talk to it and verify its identity? Once I’ve done that, it can just be a normal chat interface for all I care.
How much guarantee does Google's LLM/agent provide that it didn't hallucinate (read wrong info) in any of the steps including parsing job description and than matching that with profile of candidates?
I don't understand when these LLMs are presented to solve real life problems as if an LLM is like a sane person doing their job.
> "Today, we’re launching a new, open protocol called Agent2Agent (A2A), with support and contributions from more than 50 technology partners"
Why do you think the majority of the big consultancy firms like McKinsey, KPMG, PwC, Deloitte, Cognizant, Capgemini and Accenture are all here in this round table?
You are on the menu when they arrive to replace you with an agent.
Exhibit A:
> Hiring a software engineer can be significantly simplified with A2A collaboration. Within a unified interface like Agentspace, a user (e.g., a hiring manager) can task their agent to find candidates matching a job listing, location, and skill set.
The recruiter is now an "agent". Not a human. Don't think it isn't going to happen to you because that example targeted recruiters.
The big consultancy firms already have tens of thousands of employees and are ready to try it on them first before recommending to businesses to replace.... you.
Lots of jobs that focus on communication and data organization are out the window, including recruiters.
Why not abstract away the applicant altogether, outsourcing the search for talent that make these very systems tick. Let the candidate microtrading really take off, there's always a better candidate in the pipeline after all
My team's been working on implementing MCP-agents and agents-as-tools and we consistently saw confusion from everyone we were selling this into (who were already bought in to hosting an MCP server for their API or SDK) for their agents because "that's not what it's for".
Kinda weird, but kinda simple.
A quick scan of the "partners" for A2A includes many of the same groups that helped launch AGNTCY. Either they jumped ship or they're teaming up with everyone. The Google announcement does read like marketing hype, though, so it remains to be seen how functional it is.
Let the inter-agent standard wars begin.
This kind of feels to me like someone at google saw how successful MCP was becoming and said "we need something like that". I feel the same way about OpenAI's Agent SDK.
I think the word "Agent" appearing in any engineering project is a tell that it's driven by marketing rather than engineers' needs.
A2A works at a different level than MCP. We are working with partners on very specific customer problems. Customers are building individual agents in different frameworks OR are purchasing agents from multiple vendors. Those agents are isolated and do not share tools, or memory, or context.
For example, most companies have an internal directory and internal private APIs and tools. They can build an agent to help complete internal tasks. However, they also may purchase an "HR Agent" or "Travel Assistant Agent" or "Tax Preparation Agent" or "Facilities Control Agent". These agents aren't sharing their private APIs and data with each other.
It's also difficult to model these agents as structured tools. For example, a "Tax Preparation Agent" may need to evaluate many different options and ask for specific different documents and information based on an individual users needs. Modeling this as 100s of tools isn't practical. That's where we see A2A helping. Talk to an agent as an agent.
This lets a user talk to only their company agent and then have that agent work with the HR Agent or Travel Booking Agent to complete complex tasks.
You CAN try and build "the one agent that does everything" but in scenarios where there's many simultaneous data streams, a better approach would be to have many stateful agents handling each stream via MCP, coupled with a single "executive" agent that calls on each of the stateful agents via A2A to get the high-level info it needs to make decisions on behalf of its user.
To my understanding of this protocol it looks like it's an entity exposing a set of capabilities. Why is that different and complementary to an MCP server exposing tools? Why would you be limited to an "everything agent" in MCP?
I am struggling to see the core problem that this protocol addresses.
My assumption is that the initial A2A implementation will be done with MCP, so the LLM can ask your AI directory or marketplace for help with a task via some kind of "phone a friend" tool call, and it'll be able to immediately interop and get the info it needs to complete the task.
A2A looks like a typical enterprise, authenticated, strict DTD style spec.
Agents acting on behalf of consumers need a simple file that describes: 1) What services are provided 2) What tools are available and how to use them
Agent behavior and actions should happen in latent space. The format of any spec is almost meaningless, as long as it's self describing and conveys those 2 points.
MCP actually fills a gap since people don't normally expose things like writing to their local filesystem as a callable API.
I see this as Slack bots 2.0. Maybe this will create real revenue opportunities where the original chatops didn't.
So it’s actually looks like a strategic focus, rather than just announcement for interest, relevance or whatever
The one that will win — will be the one that gives devs the confidence to run in full “yolo/autonomous” mode. That’s the future.
I read about them only for hypothetical scenarios. Is it a real thing?
https://www.anthropic.com/engineering/building-effective-age...
If I'm reading it correctly, A2A is similar to MCP in that they both use JSONRPC but extends the capabilities for agents to be able to communicate with one another, potentially using separate backend models. MCP simply exposes applications data and workflows to a model itself and is not attempting to make agents communicate with one another.
The fact that A2A wasn't proposed as an extension to MCP seems disingenuous at best. To me, it looks like Google (among the other AI giants) is trying to create their own repository of agents, controlling the protocol, thereby enabling them to become the de-facto source for finding trusted agents.
Further, it comes off to me as a defense against the existential threat that AI poses to google's search and ads monopoly.
The problem is, as a consumer of AI, I don't want multiple agents communicating with one another. What I want is one model that communicates with non-agentic services. Making AI work well and understanding what it's doing is hard enough. You now want to pull in multiple models and companies into the picture? Talk about a risk management nightmare.
Shadow IT SaaS is already a massive problem for companies. Now imagine Shadow Agents doing work for your business using A2A to connect dozens of different unsupervised work for the company. No thanks!
For the inevitable defense of A2A "But it's open source and Apache licensed!". That's just bait. If you control the protocol, you control the ecosystem. See: Android, VSCode, Chromium, Java, Kubernetes, etc.
For me? I like my single-model audibility pulling in context using MCP. A2A just seems like an insane attempt at a land grab in the AI agent wars.
Agent2Agent in an unsupervised environment could easily lead to the first Agentic worms. It's not hard to imagine a few agents talking to one another with the right prompt injection attacks that could end up spreading to other agents via A2A.
This is of course just speculation but I could definitely see this as being a big enabler of that possibility.
I'm curious to see answers, from indie builder perspective.
I think they are trying to ride the MCP hype as well with their own implementation that is also meh. MCP itself is also an over-engineered implementation of AI plugins by OpenAI. Obviously the end game is control over a standard which can act as a strategic tool for boosting valuations or even better product positioning.
The better approach is to simply use open standards that already exists but I guess this is just not sexy.
See all those shiny badges for consulting firms? If you are a truely thought leadershiping executive, you should get one of them in ASAP to build you an A2A Registry [1] for your "Enterprise Agents" [2] to communicate via an A2A NotificationService [3] (brought to you by GCP!)
Indicative that the blog post example isn't help book a holiday but help hire a software engineer
1: https://google.github.io/A2A/#/topics/agent_discovery
2: https://google.github.io/A2A/#/topics/enterprise_ready
3: https://google.github.io/A2A/#/topics/push_notifications
edit:
> Updates to Agentspace make it easier for customers to discover, create and adopt AI agents. We're also growing the AI Agent Marketplace https://console.cloud.google.com/marketplace/browse?filter=c... , a dedicated section within Google Cloud Marketplace where customers can easily browse and purchase AI agents from partners.
It's completely open, with active engagement and direction from the community:
https://github.com/modelcontextprotocol/modelcontextprotocol
I don't think it would have such wide adoption so rapidly if it were an "over-engineered implementation".
I'm confused by this comment and a reply that both seem to be under this assumption... first, it's from Anthropic, and second, it's hardly over-engineered. If you actually go and try and implement a specific MCP server's functionality from first principles into, say, some chat client of your choosing, you will quickly run into the problems that MCP addresses.
Which ones?
[1] https://livingsystems.substack.com/p/the-future-of-data-less...
[2] https://livingsystems.substack.com/p/will-data-served-as-lan...
Agents can just be viewed as tools, and vice versa. Is this an attempt to save the launch after getting scooped by MCP?
OK, I'm being a little bit facetious. But there has been an awful lot of work in this space (or closely related space). Going back to FIPA[1], KQML[2], DAML+OIL[3], etc., up through the more recent AGNTCY[4] and Agent Communication Protocol[5] stuff, there's a lot "out there".
[1]: http://www.fipa.org/
[2]: https://en.wikipedia.org/wiki/Knowledge_Query_and_Manipulati...
[3]: https://www.w3.org/TR/daml+oil-reference/
2. You get to decide what functionality you want to expose agents to.
3. An API enables reliable tool use.
Holy shit.. NO!
Analysts, digital artists, customer service support and journalists of all levels have already been replaced.
Software engineers (of all levels) are the next knowledge workers to be replaced by agents.
Ultimately I see nothing wrong with replacing everyone, providing the newly generated wealth would be distributed to all, not just the select few "owners" of these things.. we'll see..
(I know, they still get billions in revenue, but maybe that's their curse, too comfy to take it seriously)