8 pointsby ildari4 hours ago2 comments

ildari4 hours ago
I recently gave a talk on prompt injections attacks and defences and gathered them all in this article
noah344 hours ago
i've been wondering recently if defense against prompt injection is more reliant on system prompt + fine-tuning and reinforcement training, or if it is simply how smart your model is.
- ildari4 hours ago
  smarter != safer.
  Smarter model can figure out more sophisticated attack when following an injection . I believe in non-determinitic defence: each action or input to agent can escalate context sensivity. More sensitive context -> less risk your agent can take.
  I find Bell-LaPadula model from 1970 (https://en.wikipedia.org/wiki/Bell%E2%80%93LaPadula_model) pretty interesting for that approach