(We use airbyte at my company, although we self-host it.)
hmm so airbyte agents could serve as a form of MCP gateway, or a key building block of an MCP gateway, which btw is how anthropic uses mcp themselves for all their internal apps https://www.youtube.com/watch?v=CD6R4Wf3jnY&t=1s&pp=0gcJCd4K...
i think my most sad/interesting observation about ai engineers is that many ai apps are super data hungry, but many dont have the necessary data engineering background to even know they need an airbyte or what tradeoffs to make in an etl pipeline. would love a "data engineering for ai engineers" type braindump session from someone from airbyte at AIE (https://ai.engineer/cfp )
Where the comparison wasn't valid or not apples-to-apples:
Gong and Zendesk: no official native MCP exists, so we used the most popular community implementations we could find. We were only able to benchmark Gong Search as the Gong MCP does not have a Get tool call.
While our Search testing yielded the same number of records on either path, vendor-specific search implementations means results aren’t identical. Contents are similar in general, so the ratios remain directionally correct.
The general test set:
2 scenarios (Retrieval and Search) across 4 connectors isn’t a huge test set. While we hope to extend this over time, we’ve made the harness public so anyone can contribute in the meantime. Let us know if you find any MCP with better results!
Where the vendor MCP wins or ties:
Salesforce showed the smallest win at 16%. This is primarily because Salesforce, unlike many vendors, uniquely provides great search support out of the box with their SOQL.
We see identical records for Get. As noted, Search returns different sets of identical counts. Airbyte uses fewer tokens because the Salesforce records contain mandatory metadata (type and url).
Where the vendor MCP is costly to context:
Zendesk is a great example of this. The extreme gap is because the Zendesk MCP (reminder - a community alternative) returns the entire API response in search results. This averages to 9KB per record against our production Zendesk account!
Airbyte’s implementation provides filtering, which allows agents to retrieve the minimal data needed to achieve the outcome, explaining the drastic gap.
Yes, we've definitely found that some API data models are easier for models to navigate than others.
The largest factors of Agent inefficiency we've identified so far are: 1. Many APIs lack robust-enough search, forcing agents to page through hundreds or thousands of paginated responses until they find the record they are looking for (our Context Store addresses this). 2. Many APIs have HUGE response sets. Our MCP helps handle this by letting the agent decide exactly what fields they can return. 3. With our SDK, you can literally build your own MCP on top of any source we support (50+ right now and will grow). This is super powerful, and allows you to build more ergonomic MCP servers and tools - even if the models themselves are not intuitive or easy for the LLM to leverage directly.
Combining all three of these together, we see the vast majority of challenges can be addressed via a strong system prompt for guidance. Fine tuning could get you further but anyway, you'd still want your fine tuned model to build on this same foundation, since the efficiences will transfer across use cases and models.
@ecares - Does this answer your question? What do you think?