1 pointby lr0013286 hours ago2 comments
  • lr0013286 hours ago
    We built a free scanner that checks 7 layers of AI discoverability — llms.txt, JSON-LD structured data, OpenAPI spec, A2A agent cards, health endpoints, robots.txt/sitemap, and whether you have a machine-readable service catalog.

    You enter a URL, it streams results in real time via SSE, and gives you a score out of 100 with specific findings per layer.

    Why we built it: 80% of URLs cited by ChatGPT, Perplexity, and Copilot don't rank in Google's top 100 for the same query. AI discovery is a fundamentally different layer from traditional search — and most sites are completely invisible to it.

    Some things we learned building the audit engine:

    - Structured data matters most. Sites with proper JSON-LD schema see measurably higher AI citation rates. Microsoft has confirmed schema markup helps their LLMs.

    - llms.txt is aspirational. We check for it, but we should be honest: no major AI platform has publicly confirmed they read it, and statistical analysis shows no correlation with citation rates. We still think it's worth having as a context primer for dev docs, but it's not the silver bullet people think.

    - AI crawlers don't execute JavaScript. GPTBot, ClaudeBot, PerplexityBot — none of them run JS. If your site is a client-rendered SPA with no SSR, AI agents see an empty page.

    - The A2A protocol is early but interesting. Google's Agent-to-Agent spec includes agent cards at /.well-known/agent-card.json. Almost nobody has one yet, but the spec exists and crawlers are starting to look for it.

    Try it: https://clarvia.dev

  • cjav_dev6 hours ago
    Nice. I had the same idea: https://github.com/cjavdev/agent-lint

    Super simple OSS tool to run with skills or at the CLI. npx @cjavdev/agent-lint https://docs.example.com

    Looking forward to being inspired by your checks!