3 pointsby jbetala79 days ago3 comments

CamperBob29 days ago
Usually these are just convincing hallucinations. There is no reliable way for an LLM to introspect its system prompt.
- jbetala79 days ago
  People are actively trying to figure this out, and I’ve seen—and tested—some reliable approaches from their articles.
nostrademons9 days ago
Interesting that they have "IMPORTANT: Assist with defensive security tasks only." twice, once as the very first instruction after telling Claude what it is, and once toward the end.
- jbetala79 days ago
  Anthropic clearly treats it as the highest priority constraint.
jbetala79 days ago
https://x.com/jbetala7/status/2016924713168290279?s=20