1 pointby mikeayles5 hours ago1 comment
  • mikeayles5 hours ago
    I benchmarked Claude Code and GitHub Copilot on the same model (Haiku 4.5) with and without RAG-powered semantic search across 60 queries on a real codebase.

    RAG didn't make search more accurate on Claude Code, but it cut token consumption by 28%. On Copilot, it cut time to resolution by 44% and improved F1 by 19.5%.

    The bigger finding: controlling for model, tool design alone accounts for a 30pp recall gap between the two tools. Benchmark code and data are open source.