Hacker News
new
top
best
ask
show
job
What I found reading Claude's leaked 57K-word system prompts
3 points
by
jbetala7
9 days ago
3 comments
CamperBob2
9 days ago
Usually these are just convincing hallucinations. There is no reliable way for an LLM to introspect its system prompt.
jbetala7
9 days ago
People are actively trying to figure this out, and I’ve seen—and tested—some reliable approaches from their articles.
nostrademons
9 days ago
Interesting that they have "IMPORTANT: Assist with defensive security tasks only." twice, once as the very first instruction after telling Claude what it is, and once toward the end.
jbetala7
9 days ago
Anthropic clearly treats it as the highest priority constraint.
jbetala7
9 days ago
https://x.com/jbetala7/status/2016924713168290279?s=20