I run a multi-agent system where specialized agents handle different business functions (customer support, code review, deployment monitoring). The key insight: task decomposability determines architecture.
Parallelizable tasks (analyzing independent customer tickets, running separate test suites) show massive gains with independent agents. Sequential workflows (debugging a specific issue that requires following a chain of logic) degrade with coordination overhead.
The "tool-use bottleneck" is real. We hit it around 12-15 tools per agent. The coordination tax becomes severe. Solution: role-based tool access. Support agents get 5 tools, deployment agents get 8, code review agents get 6. Overlap is minimal.
One counter-intuitive finding: persistent memory per agent beats centralized knowledge. Each agent has AGENTS.md (instructions), TOOLS.md (available actions), and memory/ directory (session logs). Agents learn from their own mistakes without polluting each other's context.
The error amplification metric (17.2x for independent vs 4.4x for centralized) explains why we use a hub-and-spoke model with human checkpoints at handoff boundaries.
Documented these patterns at howtoopenclawfordummies.com for anyone building similar systems.
Empirically, a top level orchestrator that calls out to a planning committee, then generates a task-dag from the plan which gets orchestrated in parallel where possible is the thing I've seen put in the best results in various heterogeneous environments. As models evolve, crosstalk may become less of a liability.
The rest is trash they are forcing down our throats
That's sarcasm
---
Your "direct Gemini calls" is maybe the least impressive
edit: This paper is mostly a sort of "quantitative survey". Nothing to get too excited about requiring a grain of salt
The products they build, where the agentic stuff is, is what I find unimpressive. The quality is low, the UX is bad, they are forced into every product. Two notable examples, search in GCloud, gemini-cli, antigravity (not theirs technically, $2B whitelabel deal with windsurf iirc)
So yes, I see it as perfectly acceptable to be more skeptical of Google's take on agentic systems when I find their real world applications lackluster
The antigravity experiment yes was via windsurf - probably nobody expected that to take off but maybe was work that made have surfaced some lessons worth learning from.
There is no clear vision, coherence, or confidence that the products will be around in a another year