Show HN: Score your GitHub repo for AI coding agents(twill.ai)

4 pointsby danoandco6 hours ago2 comments

danoandco6 hours ago
OpenAI published an article and demo for scoring how well AI agents can work in a codebase (https://openai.com/index/harness-engineering/, https://www.youtube.com/watch?v=rhsSqr0jdFw). We turned it into a free tool anyone can use.
Paste any public GitHub repo (or connect a private one) and get a live score across seven dimensions: bootstrap setup, task entry points, test harnesses, lint gates, agent docs, structured documentation, and decision records. It clones the repo, runs static analysis, and scores each dimension 0-3 with evidence pulled from actual files. Takes about 60 seconds.
Some repos we scored:
PostHog: https://twill.ai/score/fd033516-628b-4c7c-8db6-d84e3f2737ba
Supabase: https://twill.ai/score/b2825715-6c3d-4de1-a21b-fc5d9b17103b
Codex: https://twill.ai/score/d7372d95-0501-4ad3-ae90-8f112ccafee0
The pattern we keep seeing: most repos lose points on agent-specific docs and decision records. Everything else tends to be decent.
We built this scorecard as a free tool because agent performance is bounded by repo structure, not just model quality.
Would love to hear what scores people get. And whether the rubric is missing anything.
RoxaneFischer14 hours ago
not sure about the decision records. seems ideal but no one does that in practice
- danoandco4 hours ago
  true, i think the key thing is explaining somewhere in the repo "why" something was done. like the rationale for choosing X over Y service for instance.
  maybe this record is just the git log, and the agent just needs to access the git log.
  we'll see how that matures over time