8 pointsby ildari4 hours ago2 comments
  • ildari4 hours ago
    I recently gave a talk on prompt injections attacks and defences and gathered them all in this article
  • noah344 hours ago
    i've been wondering recently if defense against prompt injection is more reliant on system prompt + fine-tuning and reinforcement training, or if it is simply how smart your model is.
    • ildari4 hours ago
      smarter != safer.

      Smarter model can figure out more sophisticated attack when following an injection . I believe in non-determinitic defence: each action or input to agent can escalate context sensivity. More sensitive context -> less risk your agent can take.

      I find Bell-LaPadula model from 1970 (https://en.wikipedia.org/wiki/Bell%E2%80%93LaPadula_model) pretty interesting for that approach