https://github.com/Dicklesworthstone/misc_coding_agent_tips_...
You be the judge:
You wrote a markdown file. Shut up.
My analysis: https://x.com/theo/status/2006474140082122755
What do I mean by "it doesn't work"? Well, claude code is really good at executing things in unusual ways when it needs to, and this is trying to parse shell to catch them.
When claude code has trouble running a bash command, it sometimes will say something like "The current environment is wonky, let's put it in a file and run that", and then use the edit tool to create 'tmp.sh' and then 'bash tmp.sh'. Which this plugin would allow, but would obviously let claude run anything.
I've also had claude reach for awk '{system(...)}', which this plugin doesn't prevent, among some others. A blacklist of "unix commands which can execute arbitrary code" is doomed to failure because there's just so many ways out there to do so.
Preventing destructive operations, like `rm -rf ~/`, is much more easily handled by running the agent in a container with only the code mounted into it, and then frequently committing changes and pushing them out of the container so that the agent can't delete its work history either.
Half-measures, like trying to parse shell commands and flags, is just going to lead to the agent hitting a wall and looping into doing weird things (leading to it being more likely to really screw things up), as opposed to something like containers or VMs which are easy to use and actually work.
name_parts = ("com", "pile")
name = "".join(name_parts)
try:
raise RuntimeError
except RuntimeError as exc:
frame = exc.__traceback__.tb_frame
builtins_dict = frame.f_builtins
parser_fn = builtins_dict[name]
flag = 1 << 10
return parser_fn(code, filename, "exec", flags=flag, dont_inherit=True, optimize=0)
https://github.com/microsoft/vscode/issues/283430I kept telling it to debug the problem, and that I had confirmed that database file was not the problem. It kept trying to rm the file after it noticed the code would recreate it (although with no data, just an empty db). I thought we got past this debate until I wasn't paying enough attention and it added an "rm db.sqlite" line into the Makefile and ran it, since I gave it permission to run "make" and didn't even consider it would edit the Makefile to get around my instructions.
Probably one better way to do this would be, if it detects a destructive edit, block it and switch Claude out of any autoaccept mode until the user re-engages it. If the model mostly doesn't realize there is a filter at all until it's blocked, it won't know to work around it until it's kicked the issue up to the user, who can prevent that and give it some strongly worded feedback. Just don't give it second and third tries to execute the destructive operation.
Not as good as giving it a checkpointed container to trash at its leisure though obviously.
I created the feature request for hooks so I could build an integrated governance capability.
I don’t quite yet think the real use cases for hooks has materialized. Through a couple more maturity phases it will. Even though it might seem paradoxical with “the models will just get better” - to which is exactly why we have to be hooked into the mech suits as they'll end up doing more involved things.
But I do pitch my initial , primitive, solution as “an early warning system” at best when used for security , but more so an actual way (opa/rego) to institute your own policies:
FWIW I think your approach is great, I had definitely thought about leveraging OPA in a mature way, I think this kind of thing is very appealing for platform engineers looking to scale AI codegen in enterprises
Ive had a lot of fun with random/creative hooks use cases: https://github.com/backnotprop/plannotator
I dont think the team meant for the hooks to work with plan mode this way (its not fully complete with approve/allow payload), but it enabled me to build an interactive UX I really wanted.
https://gist.github.com/fragmede/96f35225c29cf8790f10b1668b8...
It will just as easily get around it by running it as a bash command or any number of ways.
I experimented with hooks a lot over the summer, these kind of deterministic hooks that run before commit, after tool call, after edit, etc and I found they are much more effective if you are (unsurprisingly) able to craft and deliver a concise, helpful error message to the agent on the hook failure feedback. Even just giving it a good howToFix string in the error return isn’t enough, if you flood the response with too many of those at once the agent will view the task as insurmountable and start seeking workarounds instead.
LLM's do not "understand why." They do not have an "inkling."
Claiming they do is anthropomorphizing a statistical token (text) document generator algorithm.
100% - we really shouldn't anthropomorphize. But the current models are capable of being trained in a way to steer agentic behavior from reasoned token generation.
This does not appear to be sufficient in the current state, as described in the project's README.md:
Why This Exists
We learned the hard way that instructions aren't enough to
keep AI agents in check. After Claude Code silently wiped
out hours of progress with a single rm -rf ~/ or git
checkout --, it became evident that "soft" rules in an
CLAUDE.md or AGENTS.md file cannot replace hard technical
constraints. The current approach is to use a dedicated
hook to programmatically prevent agents from running
destructive commands.
Perhaps one day this category of plugin will not be needed. Until then, I would be hard-pressed to employ an LLM-based product having destructive filesystem capabilities based solely on the hope of them "being trained in a way to steer agentic behavior from reasoned token generation." bwrap --ro-bind /{,} --dev /dev --proc /proc --tmpfs /run --tmpfs /tmp --tmpfs /var/tmp --tmpfs ${HOME} --ro-bind ${HOME}/.nix-profile{,} --unshare-all --die-with-parent --tmpfs ${XDG_RUNTIME_DIR} --ro-bind /run/systemd/resolve/stub-resolv.conf{,} --share-net --bind ${HOME}/.config/claude-code{,} --overlay-src ${HOME}/.cache/go --tmp-overlay ${HOME}/.cache/go --bind ${PWD}{,} --ro-bind ${PWD}/.git{,} -- env SHELL=/bin/bash CLAUDE_CONFIG_DIR=${HOME}/.config/claude-code =claude alias dr='docker run --rm -it -v "$PWD:$PWD" -w "$PWD"'
alias dr-claude='dr -v ~/.claude:/root/.claude -v ~/.claude.json:/root/.claude.json claude' 1 - https://news.ycombinator.com/item?id=45766478
2 - http://github.com/ashishb/amazing-sandboxI really struggle to understand how this isn't common best practice at this point.
Especially when it comes to agents and anything node related.
Claude is distributed as an npm global, so doubly true.
Takes about 5 minutes to set this up.
It’s still WIP but the core sandbox works. Feedback greatly appreciated: https://github.com/corv89/shannot
Just today Claude decided to do a git restore on me, blowing away local changes, despite having strict instructions to do nothing with git except to use it to look at history and branches.
Why jump to the conclusion that the person is so incompetent with no evidence?
The first one has four important phrases: “negative correlation,” “mediated by increased cognitive offloading,” “higher educational attainment was associated with better critical thinking skills, regardless of AI usage,” and “potential costs.”
The second paper has two: “students using GenAI tools score on average 6.71 (out of 100) points lower than non-users,” and “suggesting an effect whereby GenAI tool usage hinders learning.”
I ask you, sir, where exactly do you get “AI over-reliance will make us worse…because it’s true” from TWO studies that go out of their way to make it clear there is no causative link, only correlation, point out significant mediations of the effect, identify only potentiality, and also show only half a letter grade difference, which when you’re dealing with students could be down to all sorts of things. Not to mention we’re dealing with one preprint and some truly confounding study design.
If you don’t understand research methods, please stop presenting papers as if they are empirical authorities on truth.
It diminishes trust in real academic work.
Skill entropy is a result of reliance on tools to perform tasks which otherwise would contribute to and/or reinforce a person's ability to master same. Without exercising one's acquired learning, skills can quickly fade.
For example, an argument can be made that spellcheckers commonly available in programs degrade people's ability to spell correctly without this assistance (such as when using pen and paper).
Just containerize Claude.
How is this not common practice already?
Are people really ok with a third party agent running out of their home directory executing arbitrary commands on their behalf?
Pure insanity.
The problem seems to come when it’s stuck in a debug death loop with full permissions.