7 pointsby edoardobambini-5 hours ago4 comments
  • 10keane4 hours ago
    great project. think my agent will need it. but then one thing i notice is that this only catches single tool calls. most of the time the malicious behavior is a sequence where each call looks fine on its own: read a file, read another, then a curl to somewhere benign-sounding. individually each one scores low. the arc is the dangerous part and per-call scoring kinda misses that.
  • macroteam5 hours ago
    With the release of opus 4.7 i've been more and more concerned about ai agents, il'' take a look
  • albertonlyone5 hours ago
    Looks really interesting!
    • ostelbigger5 hours ago
      i don't know, are the layer deterministic or probabilistic?
  • EdoardoIaga5 hours ago
    [dead]