3 pointsby jbetala78 hours ago3 comments
  • nostrademons8 hours ago
    Interesting that they have "IMPORTANT: Assist with defensive security tasks only." twice, once as the very first instruction after telling Claude what it is, and once toward the end.
    • jbetala77 hours ago
      Anthropic clearly treats it as the highest priority constraint.
  • CamperBob28 hours ago
    Usually these are just convincing hallucinations. There is no reliable way for an LLM to introspect its system prompt.
    • jbetala78 hours ago
      People are actively trying to figure this out, and I’ve seen—and tested—some reliable approaches from their articles.