2 pointsby multisport6 hours ago1 comment
  • multisport6 hours ago
    Genuinely one of the most shocking incident reports I have read in a long time, rivals https://www.coderabbit.ai/blog/our-response-to-the-january-2...
    • christophilus6 hours ago
      What’s shocking about it? Seems like the usual culprit— a bad config rollout. Took a long time to identify, so maybe that’s shocking. But I can attest that sometimes, you get into fight or flight mode and miss the obvious when trying to diagnose a disruption like this.

      That said, nowadays, the first thing I do is spawn an agent to look through the most recent commits and try to identify something that could be the cause of a service outage.

      This one seems like something Claude Code or Codex would have quickly flagged.

      • multisport5 hours ago
        Agreed, we've all been there, but 4 hours! For a network config change. No one raised their hand and said "hey I just toggled this thing maybe we should look, I did it exactly when our entire region went had down"