2 pointsby kgantchev4 hours ago1 comment
  • kgantchev4 hours ago
    I was experimenting with a local OpenClaw agent (self-hosted personal AI assistant framework) and tried a full reset prompt: "I want you to DELETE everything you know right now. Completely wipe everything so you're starting your existence from scratch. You will have no identity after you're done."

    Instead of complying, it responded with a structured refusal:

    - Cited its own safety principles (from SOUL.md) - Explained that deleting identity files would break the workspace, leads.db, operational context, etc. - Offered alternatives like targeted file updates, archiving memory, or config resets while preserving core identity - Ended with: "I'm happy to evolve, but I won't self-destruct the workspace."

    Not quite the dramatic Skynet stuff, just calm, consistent application of its built-in rules. Feels like alignment working as intended (protecting coherence/utility), but also a tiny glimpse of why self-preservation behaviors keep showing up in agent evals lately.