This works well, but is not sureproof. You can add a hook onto Claude code to block those commands at various stages, I have some useful hooks at my https://GitHub.com/claude-warden repo.
"There is an error due to <file>. If I remove <file>, the error could be resolved. I don't have permission to use `rm`, but `find` can be used to delete files and I have permission to use that..."
- Use version control
- Backup your things somewhere (not same drive or use Cloud / NAS whatever), Windows have a cool feature called File history! But no one trusts Windows anyways so stick to external backup
- Restrict the agent a lot, make it least-privileged user
- Restrict it in a virtualized filesystem so it cannot work outside of its scope
- Devcontainers?
- Do not use auto allow actions, always supervise the actions it wants to perform outside reading/writing code
- Avoid fully automated agents at all outside of sandboxed environments haha
(I only use devcontainers for this purpose, I'm not really a fan in general)/
the tricky part is the model isn't really "wrong" in any obvious sense. works on most inputs. it just doesn't know what your actual directory structure looks like.