Hacker News
new
top
best
ask
show
job
What I found reading Claude's leaked 57K-word system prompts
3 points
by
jbetala7
8 hours ago
3 comments
nostrademons
8 hours ago
Interesting that they have "IMPORTANT: Assist with defensive security tasks only." twice, once as the very first instruction after telling Claude what it is, and once toward the end.
jbetala7
7 hours ago
Anthropic clearly treats it as the highest priority constraint.
jbetala7
8 hours ago
https://x.com/jbetala7/status/2016924713168290279?s=20
CamperBob2
8 hours ago
Usually these are just convincing hallucinations. There is no reliable way for an LLM to introspect its system prompt.
jbetala7
8 hours ago
People are actively trying to figure this out, and I’ve seen—and tested—some reliable approaches from their articles.