21 pointsby vanburen4 hours ago6 comments
  • mrothroc2 hours ago
    Yeah, this is what happens when there's nothing between "the agent decided to do this" and "it happened." The agent followed the state file logically. It wasn't wrong. It just wasn't checked.

    His post-mortem is solid but I think he's overcorrecting. If he does this as part of a CICD pipeline and he manually reviews every time, he will pretty quickly get "verification fatigue". The vast majority of cases are fine, so he'll build the habit of automatically approving it. Sure, he'll deeply review the first ones, but over time it becomes less because he'll almost always find nothing. Then he'll pay less attention. This is how humans work.

    He could automate the "easy" ones, though. TF plans are parseable, so maybe his time would be better spent only reviewing destructive changes. I've been running autonomous agents on production code for a while and this is the pattern that keeps working: start by reviewing everything, notice you're rubber-stamping most of it, then encode the safe cases so you only see the ones that matter.

    • dmixan hour ago
      Or just never run agents on anything that touches production servers. That seems extremely obvious to me. He let Claude control terminal commands which touched his live servers.

      That's very different than asking it for help to make a plan.

      • cozzyd16 minutes ago
        Are agents clever enough to seek and maybe use local privilege escalations? It seems like they should always run as their own user account with no credentials to anything, but I wonder if they will try to escape it somehow...
      • scuff3d23 minutes ago
        But the CEOs are saying everyone is going to be replaced by LLMs in 6 months. Surely that means they're capable of handling production environments without oversight from a professional.
  • wpm2 hours ago
    "Developers let Claude Code delete their production setup, including database"

    Claude Code has no agency. It does what you tell it, where you let it, with a randomized temperature where it might randomly deviate.

    • 19 minutes ago
      undefined
  • mannyvan hour ago
    This actually is easy to do with terraform and shared infrastructure; you don't need an AI in the loop.

    Who hasn't accidentally deleted a resource because that property triggers a resource delete/create instead of an update?

    It would help if it was obvious what the key fields were. But for some reason docs usually don't tell you.

  • rhoopr2 hours ago
    Sloppy vibe infra management and no backups, peanut butter and chocolate.
  • Surac2 hours ago
    no backup? well played
    • dmixan hour ago
      no offsite backups*