10keane4 hours ago
great project. think my agent will need it. but then one thing i notice is that this only catches single tool calls. most of the time the malicious behavior is a sequence where each call looks fine on its own: read a file, read another, then a curl to somewhere benign-sounding. individually each one scores low. the arc is the dangerous part and per-call scoring kinda misses that.
macroteam5 hours ago
With the release of opus 4.7 i've been more and more concerned about ai agents, il'' take a look
albertonlyone5 hours ago
Looks really interesting!
- ostelbigger5 hours ago
  i don't know, are the layer deterministic or probabilistic?
  - edoardobambini-5 hours ago
    Hi! they are deterministic
EdoardoIaga5 hours ago
[dead]