1 pointby agentseal8 hours ago2 comments
  • rodchalski2 hours ago
    Good coverage on the input-side attack surface. One category that's harder to probe statically: what the agent does with its tool call authority once it's running.

    Prompt injection can hijack the agent's reasoning, but the real damage happens when the agent then calls a tool it shouldn't — deletes a file, exfiltrates data, escalates its own permissions. The probe finds the injection vector; it doesn't tell you whether your authorization layer would have stopped what happened next.

    150 probes is solid for "can the agent be manipulated?" Still leaves open "once manipulated, can it cause real harm?" — which depends on what the tool boundary looks like.

    Curious if you've thought about probing tool-call authorization specifically. What scope do your injected prompts try to reach for?

  • agentseal8 hours ago
    I built AgentSeal to answer a simple question: can your AI agent be hacked?

      It sends 150+ attack probes (prompt extraction, injection, persona hijacking, encoding tricks, etc.) at your agent and gives you a trust score from 0-100 with specific fix recommendations.
    
      Key points:
      - Works with OpenAI, Anthropic, Ollama, Vercel AI SDK, LangChain, or any HTTP endpoint
      - Deterministic detection (no AI judge) — same scan twice = same results
      - Python: pip install agentseal && agentseal scan --prompt "..." --model gpt-4o
      - JS/TS: npx agentseal scan --prompt "..." --model gpt-4o
      - CI-friendly: --min-score 75 exits with code 1 if below threshold
    
      The core scanner (150 probes + adaptive mutations) is free and open source. Pro adds MCP tool poisoning, RAG poisoning, and behavioral genome mapping.
    
      GitHub: https://github.com/AgentSeal/agentseal
      Website: https://agentseal.org
    
      I'd love feedback on the probe coverage and detection approach. What attacks are we missing?