3 pointsby jbetala79 days ago3 comments
  • CamperBob29 days ago
    Usually these are just convincing hallucinations. There is no reliable way for an LLM to introspect its system prompt.
    • jbetala79 days ago
      People are actively trying to figure this out, and I’ve seen—and tested—some reliable approaches from their articles.
  • nostrademons9 days ago
    Interesting that they have "IMPORTANT: Assist with defensive security tasks only." twice, once as the very first instruction after telling Claude what it is, and once toward the end.
    • jbetala79 days ago
      Anthropic clearly treats it as the highest priority constraint.