https://github.com/spiffy-oss/artguard
Three detection layers:
Privacy posture — catches the gap between what an artifact claims to do with your data and what it actually does (undisclosed writes to disk, covert telemetry, retention mismatches)
Semantic analysis — LLM-powered detection of prompt injection, goal hijacking, and behavioral manipulation buried in instruction content
Static patterns — YARA, credential harvesting, exfiltration endpoint signatures, the usual
Output is a Trust Profile JSON- a structured AI BOM meant to feed policy engines and audit trails, not just spit out a binary safe/unsafe.
The repo is a prompt.md that Claude Code uses to scaffold the entire project autonomously. The prompt is the source of truth. I'm happy to share the actual code too if it's of interest.
Contributions welcome!