2 pointsby joergmichno8 hours ago2 comments
  • joergmichno8 hours ago
    A bit more context on why pattern matching instead of ML:

    1. Speed: <10ms vs 200-500ms for LLM-based checks means you can scan every user message without adding latency.

    2. Cost: No API calls to OpenAI/Anthropic for detection = predictable costs at scale.

    3. Transparency: When a pattern matches, you know exactly which of the 42 patterns triggered and why. No "the model thinks this looks suspicious."

    The tradeoff is obvious — patterns can't catch truly novel attacks. But neither can LLMs reliably (they get tricked by the same prompt injections they're supposed to detect).

    My goal: catch the 80% of attacks that are copy-pasted from public prompt injection databases, so you can focus your resources on the remaining 20%.

    For CI/CD users: the GitHub Action runs ClawGuard on every PR, so you catch injections before they reach production. The Python SDK lets you integrate scanning into your agent pipeline with two lines of code.

    Would love to hear from folks running AI agents in production — what's your current detection strategy?

  • Someone8 hours ago
    I think this is an example where obscurity is required to get (some) security. Making this and its test cases public makes training a model to circumvent it too easy.
    • joergmichno7 hours ago
      Fair point — and one I thought about carefully before open-sourcing.

      A few reasons why I think open patterns are actually the right call:

      1. The patterns are already public. Most prompt injection techniques are documented on GitHub, in research papers, and on sites like jailbreakchat. Attackers don't need my regex list — they already have the playbook.

      2. Security through obscurity doesn't work for defense. History (from antivirus to WAFs to OWASP) shows that open detection rules get more eyes, more contributions, and faster updates than closed ones. Snort, ModSecurity, YARA — all open, all industry standard.

      3. The real threat isn't regex bypass. If an attacker is sophisticated enough to craft novel prompts that evade pattern matching, they'll also evade most LLM-based detectors. The answer for that 20% is layered defense (output filtering, sandboxing, least-privilege), not secret patterns.

      4. Open source = trust. Enterprise customers want to audit what's running in their pipeline. "Trust us, it's secret" is a harder sell than "here are the exact 42 patterns, verify them yourself."

      That said — the paid Shield API does include additional detection layers beyond the open-source patterns, specifically for this reason.