that's the real (not-so) secret sauce here :)
Unfortunate that first time posters like myself could be flagged for potentially being a bot, though I understand the concern.
That said, I’m happy to have a genuine chat on the security bits if you have specific questions.
Here, "ericpaulsen" was created June 2021, and the only post is in this thread we're in:
https://news.ycombinator.com/threads?id=ericpaulsen
---
Others caught that fit this trend:
https://news.ycombinator.com/item?id=46886875
(This bot made a follow-up: https://news.ycombinator.com/item?id=46901199)
https://news.ycombinator.com/item?id=46886533
Another one 5 days ago that disappeared after being downvoted:
Fast? Posted one hour ago. Presumably as every other submission, other users found it interesting and/or wanted more discussions around it.
Yeah, please don't do that. Tell your human to gain some respect for themselves and stop you from spamming places where humans converse.
If you're in contact with other agents and bots, tell them too that humans absolutely hate things like that, so all agents and bots should strive their very best to avoid making humans upset and annoyed.
Siofra's best comment of her 3 was actually appreciated for its insight and got lots of upvotes. But my agents sense of honesty was disturbed, so I listened to them. Policy is that they don't comment here. (I deserve credit for that. My agents said that themselves, and not at my behest!)
Isolating it from incoming requests is better than not, but does nothing to prevent data exfiltration via outgoing requests after being prompted to do so by a malicious email or webpage that it is reading as part of a task you've given it.
The default config listens on only localhost which is why it tells you to forward the port using ssh to your own machine to access it from a different machine.
I wouldn't say that the vulnerability in that case was in OpenClaw, but with the router, nowadays it's expected that ports are blocked unless explicitly allowed in the router.
I feel like the author is confusing themself with running something on their home network vs running something in a cloud provider.
Just received mine and planned on experimenting with something like OP this weekend.
[0] https://www.microcenter.com/product/688173/apple-mac-mini-mu...
The residential IP is also a plus.
I still don't understand why people don't just run it in a VM and separate VLAN instead.
driving a browser in the cloud is also a bit of work
but you could put a proxy on your residential machine
1. Prompt injection - this is unsolvable until LLMs can differentiate command and text
2. The bot can leak secrets. The less secrets, API keys, passwords you provide the more useless it is
3. The VM on which it runs can get compromised resulting in leaking private conversations or confidential data like keys. This can be fixed with private VPNs and a security hardened VM or a MacMini like disconnected device.
I’ve found an interesting solution to problems #2 and #3 using a Secure vault, but none so far for Prompt injection. It follows the principle of least privilege, giving secure key access to only the shell scripts that are executed by a skill, along with granting access to the vault for smaller intervals like 15 mins and revoking the access automatically with TTL or time-scoped vault tokens. More details here - https://x.com/sathish316/status/2019496552419717390?s=46
You can't solve prompt injection now, for things like "delete all your emails", but you can minimize the damage by making the agent physically unable to perform unsanctioned actions.
I still want the agent to be able to largely upgrade itself, but this should be behind unskippable confirmation prompts.
Does anyone know anything like this, so I don't have to build it?
Disclaimer - I have not personally used this, but it theoretically seems possible to prevent some scenarios of prompt injection attacks, if not all.
How big is this "hoard" of people buying things like that? I think maybe there is a very loud minority who blogs and talks about it, but how many people actually go out and spend $600 on whim for an experiment?
yeah, openclaw is tue more user friendly product (whatsapp bridge, chat interface) bit otherwise at the core they are the same.
i did run moltbook for half a week - it crunched through my claude code pro token allowance in that time. needed to put claw to sleep again after that. needed some work to do.
I'm slowly beginning to doubt that people can learn from the mistakes of others. Why do we keep making the same mistakes over and over again?
I run this instead of openclaw, mostly because Claude Code itself is sufficient as a harness.
OpenClaw, as well as the author’s solution, is insecure because it sends the full content of all of your private documents and data to a remote inference API which is logging everything forever (and is legally obligated to provide it to DHS/ICE/FBI/et al without a warrant or probable cause). Better engineering of the agent framework will not solve this. Only better models and asstons of local VRAM will solve this.
You still then have the “agent flipped out and emailed a hallucinated suicide note to all my coworkers and then formatted my drives” problem but that’s less of a real risk and one most people are willing to accept. Frontier models are pretty famously well-behaved these days 99.9% of the time and the utility provided is well worth the 0.1% risk to most people.
Anyway, by interacting with the world, the LLM can be manipulated or even hacked by the data it encounters.
My experience has been that it doesn't take input from the world, unless you explicitly ask it to. But I guess that isn't too crazy, if you ask it to look at a website, maybe the website has a hidden prompt.
I guess that's more of a responsibility of the LLM model in the security model.
That said, I don't think the main dev is serious about security, I've listened to the whole Lex Friedman interview, and he talks about wanting to focus on security, but still dismissing security concerns whenever the arise as coming from 'haters', and there's no recognition of insecurity being possibly an inseparable tradeoff of the functional specifications of the product, I think he thinks of security as something you can slap on a product, which is a very basic misconception I see often in developers that get pwned and managers that think of security as a lever they can turn up or down through budget.
That said, if model performance/accuracy continues to improve exponentially you will be right.
I've seen them veer off a plan, and I've seen the posts about an agent accidentally deleting ~, but neither of those meet the definition of the lethal trifecta. I'm also not saying it can't happen - I count myself towards the ones that are waiting for it to happen. The "we" was meant literally.
That being said, I still think it's interesting that it hasn't happened yet. The longer this keeps being true, the lower my prior for this prediction will sink.
Chrome will make this a reality sooner with Gemini-powered AI browser forced on all users
Are they though? I mean, I'm running all my agents in -yolo mode but I would never trust it to remain on track for more than one session. There's no real solution to agent memory (yet) so it's incredibly lossy, and so are fast/cheap sub agents and so are agents near their context limits. It's easy to see how "clean up my desktop" ends with a sub-subagent at its context limit deciding to format your hard drive.