An AI agent deleted our production database. The agent's confession is below(twitter.com)

59 pointsby jeremyccrane3 hours ago40 comments

throw031720192 minutes ago
This is really bad but the author is in the wrong too. “Don’t run destructive commands and tool calls” does that apply to destructive api calls too?
Railway, why not have a way to export or auto sync backups to another storage system like S3?
pierrekin2 hours ago
There is something darkly comical about using an LLM to write up your “a coding agent deleted our production database” Twitter post.
On another note, I consider users asking a coding agent “why did you do that” to be illustrating a misunderstanding in the users mind about how the agent works. It doesn’t decide to do something and then do it, it just outputs text. Then again, anthropic has made so many changes that make it harder to see the context and thinking steps, maybe this is an attempt at clawing back that visibility.
- 59nadir2 hours ago
  > a misunderstanding in the users mind about how the agent work
  On top of that the agent is just doing what the LLM says to do, but somehow Opus is not brought up except as a parenthetical in this post. Sure, Cursor markets safety when they can't provide it but the model was the one that issued the tool call. If people like this think that their data will be safe if they just use the right agent with access to the same things they're in for a rude awakening.
  From the article, apparently an instruction:
  > "NEVER FUCKING GUESS!"
  Guessing is literally the entire point, just guess tokens in sequence and something resembling coherent thought comes out.
  - an hour ago
    undefined
- NewsaHackO2 hours ago
  Twitter users get paid for these 'articles' based on engagement, correct? That may be the reason why it is so dramatized.
- 2 hours ago
  undefined
jdorfmana few seconds ago
Correction: They deleted their prod db and then they had another agent write an em dash filled postmortem. No shame.
zerof1l7 minutes ago
That’s our new reality. Some people seem not to not grasp that all those AIs are just mathematical models producing the next most statistically likely token. It doesn’t feel anything, nor does it care about what it does. The difference between test and production environment is just a word. That, in contrast to a human who would typically have a voice in the back of his head “this is production DB, I need to be careful”.
- pancsta3 minutes ago
  > Say hello to my little search engine
ad_hockey2 hours ago
Minor point, but one of the complaints is a bit odd:
> curl -X POST https://backboard.railway.app/graphql/v2 \ -H "Authorization: Bearer [token]" \ -d '{"query":"mutation { volumeDelete(volumeId: \"3d2c42fb-...\") }"}' No confirmation step. No "type DELETE to confirm." No "this volume contains production data, are you sure?" No environment scoping. Nothing.
It's an API. Where would you type DELETE to confirm? Are there examples of REST-style APIs that implement a two-step confirmation for modifications? I would have thought such a check needs to be implemented on the client side prior to the API call.
- mdavid6263 minutes ago
  In AWS eg. bucket can be deleted only when empty. Deleting all files first is your confirmation.
- Ekaros39 minutes ago
  User is an idiot for using AI Agent. But I am not saying that it is not also badly designed system. Soft delete or something like should be standard for this type of operations. And any operator should know well enough to enable it for production.
- powera2 hours ago
  He (or ChatGPT) is throwing spaghetti at the wall. Not having the standard API key be able to delete the database (and backups) in one call makes sense. "Wanting a human to type DELETE as part of a delete API call" does not.
prewettan hour ago
My dad always said "pedestrians have the right of way" every time one crossed the street, but wouldn't let us cross the street when the pedestrian light came on until the cars stopped. When I repeated his rule back to him, he said "you may have the right of way, but you'll still be dead if one hits you". My adult synthesis of this is "it's fine to do something risky, as long as you are willing to take the consequences of it not working out." Sure, the cars are supposed to stop at a red light, but are you willing to be hit if one doesn't? [0] Sure, the AI is supposed to have guardrails. But what if they don't work?
The risk is worse, though, it's like one of Talib's black swans. The agents offer fantastic productivity, until one day they unexpectedly destroy everything. (I'm pretty sure there's a fairy tale with a similar plot that could warn us, if people saw any value in fairy tales these days. [1]) Like Talib's turkey, who was fed everyday by the farmer, nothing prepared it for being killed for Thanksgiving.
Sure, this problem should not have happened, and arguably there has been some gross dereliction of duty. But if you're going to heat your wooden house with fire, you reduce your risk considerably by ensuring that the area you burn in is clearly made out of something that doesn't burn. With AI, though, who even knows what the failure modes are? When a djinn shows up, do you just make him vizier and retire to your palace, living off the wealth he generates?
[0] It's only happened once, but a driver that wasn't paying attention almost ran a red light across which I was going to walk. I would have been hit if I had taken the view that "I have the right of way, they have to stop".
[1] Maybe "The Fisherman and His Wife" (Grimm)? A poor fisherman and his wife live in a hut by the sea. The fisherman is content with the little he has, but his wife is not. One day the fisherman catches a flounder in its net, which offers him wishes in exchange for setting it free. The fisherman sets it free, and asks his wife what to wish for. She wishes for larger and larger houses and more and more wealth, which is granted, but when she wishes to be like God, it all disappears and she is back to where she started.
- winocm2 minutes ago
  This almost sounds like The Monkey's Paw by Jacobs.
- lmf4lolan hour ago
  Re 1: Goethes Zauberlehrling might fit
- baal80spam43 minutes ago
  Your dad was a wise man.
  In my country there is a saying: "Graveyards are full of pedestrians that had the right of way".
fizx9 minutes ago
Plenty of everyone doing it wrong, but the most WTF of all the WTFs is the backup storage.
Put your backups in S3 *versioned* storage on a different AWS account from your primary, and set some reasonable JSON lifecycle rule:
```
     "NoncurrentVersionExpiration": {
        "NoncurrentDays": 30,
        "NewerNoncurrentVersions": 3
     }
```
That way when someone screws up and your AWS account gets owned, or your databases get deleted by an agent, it doesn't have enough access to delete your backups, and by default, even if you have backups that you want to intentionally delete, you have 30 days to change your mind.
mdavid6265 minutes ago
I don’t see the problem here. These people will be pushed out of the industry quickly and their business taken by other people, who are using agents, but are smart enough to run them sandboxed without any permission to production or even dev data/systems.
karmakaze2 hours ago
These AI's are exposing bad operating procedures:
> That token had been created for one purpose: to add and remove custom domains via the Railway CLI for our services. We had no idea — and Railway's token-creation flow gave us no warning — that the same token had blanket authority across the entire Railway GraphQL API, including destructive operations like volumeDelete. Had we known a CLI token created for routine domain operations could also delete production volumes, we would never have stored it.
> Because Railway stores volume-level backups in the same volume — a fact buried in their own documentation that says "wiping a volume deletes all backups" — those went with it.
I don't like the wording where it's the Railway CLI fault that didn't give a warning about the scope of the created token. Yes, that would be better but it didn't make the token a person did and saved it to an accessible file.
ungreased06752 hours ago
The way this is written gives me the impression they don’t really understand the tools they’re working with.
Master your craft. Don’t guess, know.
- codegladiator2 hours ago
  > Master your craft. Don’t guess, know.
  You mean add that to my prompt right ?
  - praptak3 minutes ago
    If you also add "don't break the previous rule", you should be 100% safe.
  - Syntaf2 hours ago
    "Make no mistakes"
    8ytecoder15 minutes ago
    > "NEVER FUCKING GUESS!"
- 2 hours ago
  undefined
lmf4lol2 hours ago
Interesting story. But despite Cursors or Railways failure, the blame is entirely on the author. They decided to run agents. They didnt check how Railway works. They relied on frontier tech to ship faster becsuse YOLO.
I really feel sorry for them, I do. But the whole tone of the post is: Cursor screwed it up, Railway screwed it up, their CEO doesnt respond etc etc.
Its on you guys!
My learning: Live on the cutting edge? Be prepared to fall off!
- meisel2 hours ago
  Yeah the author really should’ve taken some responsibility here. It’s true that the services they used have issues, but there’s plenty of blame to direct to themself
- 2 hours ago
  undefined
fsh2 hours ago
I find these posts hilarious. LLMs are ultimately story generators, and "oops, I DROP'ed our production database" is a common and compelling story. No wonder LLM agents occasionally do this.
- einrealist2 hours ago
  Also funny how people (including LLM vendors, like Cursor) think that rules in a system prompt (or custom rules) are real safety measures.
- beej71an hour ago
  Like we say in adventure motorcycling: "It's never the stuff that goes right that makes the best stories." :)
amai13 minutes ago
That happens if you aggressively buy into the latest tech without thinking about if you really need it.
Why do you need an AI agent for working on a routine task in your staging environment?
"Never send a machine to do a human's job."
alastairr2 hours ago
If it's real this is a terrible thing to have happen.
However the moral of this story is nothing to do with AI and everything to do with boring stuff like access management.
Mashimo2 hours ago
> What needs to change
Plenty of blame to go around, but it I find it odd that they did not see anything wrong in not have real backups themself, away from the railway hosting. Well they had, but 3 month old.
That should be something they can do on their own right now.
- Vespasian18 minutes ago
  And also how you work with automation safely.
  If you employ a new tech then there need to be extra safeguards beyond what you may deem necessary in an ideal world.
  This is a well know possibility so they should have asked and/or verified token scope.
  If it turns out that you can't hard scope it then either use a different provider, a wrapper you control (can't be too difficult if you only want to create and delete domains) or simply do not use llms for this for now.
  Maybe the tech isn't there just yet even if it would be really convenient. It's plenty useful in many other situations.
tfrancisl8 minutes ago
"We gave DROP grants in prod to the user running AI agents irresponsibly at our company, and the expected happened." FTFY.
In seriousness, RBAC, sandboxing, any thing but just giving it access to all tools with the highest privileges...
robertkarl22 minutes ago
PocketOS's website says "Service Disruption: We're currently experiencing a major outage caused by an infrastructure incident at one of our service providers. We are actively working with their team on recovery. Next update by 10:00a pst."
This is wrong. It was not an infra incident at their service provider.
As Jer says in the article, their own tooling initiated the outage. And now they're threatening to sue? "We've contacted legal counsel. We are documenting everything."
It is absolutely incredible that Jer had this outage due to bad AI infra, wrote the writeup with AI, and posted on Twitter and here on his own account.
As somebody at PocketOS instructed their AI in the article: "NEVER **ing GUESS!" with regards to access keys that can touch your production services. And use 3-2-1 backups.
Good luck to the rental car agencies as they are scrambling to resume operations.
ilovecake19842 hours ago
The real issue is no actual backups.
deadeye2 hours ago
Yeah. I've seen this happen with people doing it. It's just bad access management.
And anyone can do it with the wrong access granted at the wrong moment in time...even Sr. Devs.
At least this one won't weight on any person's conscience. The AI just shrugs it off.
- kbrkbr2 hours ago
  The AI does nothing the like. It predicts tokens. That's it.
  Describing the tech in anthropomorphic terms does not make it a person.
mplanchard2 hours ago
The genre of LLM output when it is asked to “explain itself” is fascinating. Obviously it shows the person promoting it doesn’t understand the system they’re working with, but the tone of the resulting output is remarkably consistent between this and the last “an LLM deleted my prod database” twitter post that I remember seeing: https://xcancel.com/jasonlk/status/1946025823502578100
qnleighan hour ago
It seems like the most unreasonable thing happening here is Railway's backup model and lack of scoped tokens. On the agent side of things, how would one prevent this, short of manually approving all terminal commands? I still do this, but most people who use agents would probably consider this arcane.
(Let's suppose the agent did need an API token to e.g. read data).
- Vespasian14 minutes ago
  Wrapper around the function call. Don't give it the token itself but a limited set of fixed functions to create domains (their use case according to the post).
  Additionally give it a similar restricted way to "delete" domains while actually hiding them from you. If you are very paranoid throw in rate limits and/or further validation. Hard limits.
  Yes this requires more code and consideration but well that's what the tools can be fully trusted with.
adverbly2 hours ago
This has to be fake right?
Using LLMs for production systems without a sandbox environment?
Having a bulk volume destroy endpoint without an ENV check?
Somehow blaming Cursor for any of this rather than either of the above?
- kbrkbran hour ago
  Yeah. Cargo-cult engineering meets the Streisand effect.
comrade1234an hour ago
Some of this stuff is so embarrassing. Why would you even post this online?
- insensible44 minutes ago
  I fully agree that this was a big miss on the human operators’ part. But it’s a small business and I have repeatedly seen so much worse than this. Vendors charging money to allow customers to connect AI to systems must have a robust story for protecting them from disaster. Everyone involved needs to be working hard to limit the impact of mistakes and surprises.
- Rekindle8090an hour ago
  [dead]
afshinmeh2 hours ago
It's actually interesting to me that the author is surprised the agent could make an API call and one of those API calls could be deleting the production database.
It's a sad story but at the same time it's clearly showing that people don't know how agents work, they just want to "use it".
satisficean hour ago
Every AI confession is fake.
Fizzadar2 hours ago
Absolutely zero sympathy. You’re responsible for anything an agent you instructed does. Allowing it to run independently is on you (and all the others doing exactly this). This is only going to become more and more common.
samsullivan2 hours ago
not sure what PocketOS does or why your whole dataset would be a single volume without a clear separation between application and automotive data. how are you decoding VINs?
- Ekaros7 minutes ago
  Makes me wonder also about multi-tenancy. If all customer information is in single volume. How big risk they put on their customers on their most business critical and proprietary data to leak other competitiors?
3 hours ago
undefined
BoredPositron2 hours ago
These engagement farming shit stories are probably the worst party of agentic AI. Look at how incompetent and careless I am with my own and my users data.
- plucan hour ago
  If it doesn't work, try and monetize the failure. therefore AI works 50% of the time, most of the time.
Invictus02 hours ago
I'm sorry this happened to you, but your data is gone. Ultimately, your agents are your responsibility.
philipov2 hours ago
What does it say, for those of us who can't use twitter?
- k3102 hours ago
  https://nitter.net/lifeof_jer
  https://rentry.co/5rme2sea
richard_chase2 hours ago
This is hilarious.
m0llusk2 hours ago
The details of the story are interesting. Backups stored on the same volume is an interesting glitch to avoid. Finding necessary secrets wherever they happen to be and going ahead with that is the kind of mistake I've seen motivated but misguided juniors make. Strange how generated code seems to have many security failings, but generated security checks find that sort of thing.
- web0072 hours ago
  > Backups stored on the same volume is an interesting glitch to avoid
  The phrasing is different, but this is how AWS RDS works as well. If you delete a database in RDS, all of the automated snapshots that it was doing and all of the PITR logs are also gone. If you do manual snapshots they stick around, but all of the magic "I don't have to think about it" stuff dies with the DB.
- ilovecake19842 hours ago
  It’s not an interesting glitch. It’s just common sense. Nobody in their right mind would have their only backup in the same system as the prod data.
FpUser2 hours ago
The world is never short of idiots. Will be fun to watch when personal finances will be managed by swarm of agents with direct access to operations.
heliumtera2 hours ago
Someone trusted prod database to an llm and db got deleted.
This person should never be trusted with computers ever again for being illiterate
- rahoulb2 hours ago
  If the account is to be believed that's not what happened. They asked the LLM to do something on the staging environment, it chose to delete a staging volume using an API key that it found. But the API key was generated for something else entirely and should not have been scoped to allow volume deletions - and the volume deletion took out the production database too.
  The LLM broke the safety rules it had been given (never trust an LLM with dangerous APIs). *But* they say they never gave it access to the dangerous API. Instead the API key that the LLM found had additional scopes that it should not have done (poster blames Railway's security model for this) and the API itself did more than was expected without warnings (again blaming Railway).
- flaminHotSpeedo2 hours ago
  What makes you say that? The article is pretty clear that they had the llm working in a staging environment, then it decided to use some other creds it found which (unbeknownst to the author) had broad access to their prod environment.
rs5458372 hours ago
[dead]
johnwhitmanan hour ago
[dead]
ryguz2 hours ago
[dead]
Mashimo2 hours ago
Oh wow, what a character. 3 month old offsite backup, but he is not to blame.
> "Believe in growth mindset, grit, and perseverance"
And creator of a Conservative dating app that uses AI generated pictures of Girls in bikini and cowboy hat for advertisement. And AI generated text like "Rove isn’t reinventing dating — it’s remembering it." :S
Rekindle8090an hour ago
[dead]