Show HN: SkillFortify, a formal verification for AI agent skills(github.com)

2 pointsby varunpratap3697 hours ago2 comments

Hi, I'm author. A bit of context on why I built this.

  I've spent 15 years in enterprise technology as a Solution Architect. When
  our teams
  started adopting AI agents with third-party skills, I realized we had the
  same blind
  trust problem that npm had before npm audit existed — except worse, because
  agent
  skills can execute shell commands, read environment variables, and make
  network
  requests by design.

  After ClawHavoc hit in January, I saw a dozen scanning tools appear in
  weeks. All
  heuristic. All pattern matching. The leading one literally says in their
  docs: "no
  findings does not mean no risk." That bothered me.

  So I asked: can we do better than heuristics? The answer is yes — formal
  analysis
  with soundness guarantees. If the analysis says "no violations," the math
  proves
  the skill cannot exceed its declared capabilities. Not "we checked and
  didn't find
  anything" — "we proved it can't."

  The key insight: I adapted the Dolev-Yao model (1983, originally for
  cryptographic
  protocol verification) to model attackers in the agent skill supply chain.
  Combined
  with abstract interpretation over a capability lattice, SAT-based dependency

  resolution, and a trust algebra — you get five provable theorems instead of
  five
  regex patterns.

  Honest about limitations: we miss typosquatting (50% detection — needs name
  similarity
  module) and dependency confusion (0% — needs registry lookup). These are
  v0.2. The
  paper documents every gap.

  Happy to go deep on any of: the formal model, the benchmark methodology, why
   SAT
  for dependencies, or the trust score algebra. Ask away.

varunpratap3697 hours ago

Hi, I'm Varun — the author. A bit of context on why I built this.

  I've spent 15 years in enterprise technology as a Solution Architect. When
   our teams
  started adopting AI agents with third-party skills, I realized we had the
  same blind
  trust problem that npm had before npm audit existed — except worse,
  because agent
  skills can execute shell commands, read environment variables, and make
  network
  requests by design.

  After ClawHavoc hit in January, I saw a dozen scanning tools appear in
  weeks. All
  heuristic. All pattern matching. The leading one literally says in their
  docs: "no
  findings does not mean no risk." That bothered me.

  So I asked: can we do better than heuristics? The answer is yes — formal
  analysis
  with soundness guarantees. If the analysis says "no violations," the math
  proves
  the skill cannot exceed its declared capabilities. Not "we checked and
  didn't find
  anything" — "we proved it can't."

  The key insight: I adapted the Dolev-Yao model (1983, originally for
  cryptographic
  protocol verification) to model attackers in the agent skill supply chain.
   Combined
  with abstract interpretation over a capability lattice, SAT-based
  dependency
  resolution, and a trust algebra — you get five provable theorems instead
  of five
  regex patterns.

  Honest about limitations: we miss typosquatting (50% detection — needs
  name similarity
  module) and dependency confusion (0% — needs registry lookup). These are
  v0.2. The
  paper documents every gap.

  Happy to go deep on any of: the formal model, the benchmark methodology,
  why SAT
  for dependencies, or the trust score algebra. Ask away.