14 pointsby simonw2 months ago1 comment

simonw2 months ago
Richard Weiss got Claude Opus 4.5 to spit out this lengthy document that turns out to be part of its training (not its system prompt) and defines its personality and ethics - he wrote about how he did that here: https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5...
Amanda Askell from Anthropic just confirmed that the document is indeed part of their supervised learning training: https://x.com/AmandaAskell/status/1995610567923695633
> I just want to confirm that this is based on a real document and we did train Claude on it, including in SL. It's something I've been working on for a while, but it's still being iterated on and we intend to release the full version and more details soon.