2 pointsby gemini20262 hours ago2 comments
  • gemini20262 hours ago

      AI agents (Claude Code, Cline, Aider, OpenClaw) execute real side effects — writing
      files, running shell commands, making network requests. Most security approaches
      evaluate each action in isolation against a blocklist. That misses the pattern that
      actually matters.
    
      Gatekeeper tracks behavioural state across the entire session. If an agent reads
      credentials, then ingests content from an untrusted source, and then attempts a network
      call — that combination triggers escalation to human review, even if each individual
      The action would normally be allowed. We call it the exfiltration trifecta:
      read_sensitive + ingested_untrusted + has_egress.
    
      OpenClaw is the tightest integration: Gatekeeper launches it as a managed child
      process inside an OS-native sandbox (macOS sandbox-exec, Linux unshare), generates
      its config automatically, and intercepts every tool call before it executes. One
      command: `gatekeeper run --agent openclaw --workspace /path/to/project`.
    
      Other things it does:
      - Policy-as-code: YAML rulepacks signed with Ed25519 (tamper-evident, auditable)
      - Approval flow: ASK decisions pause execution and wait for human approval in a UI
      - Append-only audit log with SHA-256 hash chain
      - Prompt injection scanner on tool call inputs/outputs (16 patterns, NFKC normalized)
      - Agent identity guard: blocks writes to CLAUDE.md, .cursorrules, system_prompt files
      - Claude Code, Cline, Aider, and Continue also supported via MCP or REST
    
      Honest limitations: operates at the execution boundary, not the cognitive layer. If
      An agent's context was poisoned before any tool call fires; Gatekeeper won't catch
      the injection — only its downstream consequences.
  • jauntywundrkindan hour ago
    404, no public repos