1 pointby mikeayles5 hours ago1 comment

mikeayles5 hours ago
I benchmarked Claude Code and GitHub Copilot on the same model (Haiku 4.5) with and without RAG-powered semantic search across 60 queries on a real codebase.
RAG didn't make search more accurate on Claude Code, but it cut token consumption by 28%. On Copilot, it cut time to resolution by 44% and improved F1 by 19.5%.
The bigger finding: controlling for model, tool design alone accounts for a 30pp recall gap between the two tools. Benchmark code and data are open source.