There's the small shops where you're running some kind of monolith generally open to the Internet, maybe you have a database hooked up to it. These shops do not need dedicated DevOps/SRE. Throw it into a container platform (e.g. AWS ECS/Fargate, GCP Cloud Run, fly.io, the market is broad enough that it's basically getting commoditized), hook up observability/alerting, maybe pay a consultant to review it and make sure you didn't do anything stupid. Then just pay the bill every month, and don't over-think it.
Then you have large shops: the ones where you're running at the scale where the cost premium of container platforms is higher than the salary of an engineer to move you off it, the ones where you have to figure out how to get the systems from different companies pre-M&A to talk to each other, where you have N development teams organizationally far away from the sales and legal teams signing SLAs yet need to be constrained by said SLAs, where you have some system that was architected to handle X scale and the business has now sold 100X and you have to figure out what band-aids to throw at the failing system while telling the devs they need to re-architect, where you need to build your Alertmanager routing tree configuration dynamically because YAML is garbage and the routing rules change based on whether or not SRE decided to return the pager, plus ensuring that devs have the ability to self-service create new services, plus progressive rollout of new alerts across the organization, etc., so even Alertmanager config needs to be owned by an engineer.
I really can't imagine LLMs replacing SREs in large shops. SREs debugging production outages to find a proximate "root" technical cause is a small fraction of the SRE function.
The article's premise (AI makes code cheap, so operations becomes the differentiator) has some truth to it. But I'd frame it differently: the bottleneck was never really "writing code." It was understanding what to build and keeping it running. AI helps with one of those. Maybe.
Edit: Or maybe he is fully aware and just need to push some books before it's too late.
I actually argue that AI will therefore impact these levels of management the most.
Think about it, if you were employed as a transformational CEO would you risk trying to fight existing managers or just replace them with AI?
Not AI but bad economy and mass layoffs tend to wipe out management positions the most. As a decent IC, in case of layoffs in bad economy, you'll always find some place to work at if you're flexible with location and salary because everyone still needs people who know how to actually build shit, but nobody needs to add more managers in their ranks to consume payroll and add no value.
AI will transform this.
At all large MNCs I worked at, management got hired and fired mostly on their (or lack thereof) connections and less on what they actually did. Once they got let go, they had near impossible time finding another job without connections in other places.
That said, Claude has absolutely no problem not only answering questions, but finding bugs and adding new features to it.
In short, I feel they're just as screwed as us devs.
Look at the 'Product Engineer' roles we are seeing spreading in forward-thinking startups and scaleups.
That's the future of SWE I think. SWEs take on more PM and design responsibilities as part of the existing role.
But not defining what an SRE is feels like a glaring, almost suffocating, omission.
> ...
> Are you keeping up with security updates? Will you leak all my data? Do I trust you? Can I rely on you?
IMO, if the answers to those questions matter to you, then you damn well should care how it works. Because even if you aren't sufficiently technically minded to audit the system, having someone be able to describe it to you coherently is an important starting point in building that trust and having reason to believe that security and privacy will work as advertised.
- testing
- reviewing, and reading/understanding/explaining
- operations / SREAs for more dedicated to Ops side, it's garbage in, garbage out. I've already had too many outages caused by AI Slop being fed into production, calling all Developers = SRE won't change the fact that AI can't program now without massive experienced people controlling it.
SREs are great when the problem is “the network is down” or “kubernetes won’t run my pods”, but expecting a random engineer to know all the failure modes of software they didn’t build and don’t have context on never seems to work out well.
A tricky part becomes when you don't have both roles for something, like SRE-developed tools that are maintained by the ones writing them, and you need to strike the balance yourselves until/unless you wind up with that split. If you're not aware of both hats and juggling wearing them intentionally, in that case, you can wind up with tools out of SRE that are worse than any SWE-only tool might ever be, because the SREs sometimes think they won't make the same mistakes, but all the same feature-focused things apply for SRE-written tools too...
Why (imo)? Senior leaders still like to say: I run a 500 headcount finance EMEA organization for Siemens, I am the Chief People Officer of Meta anf I lead an org of 1000 smart HR pros. Most of their status is still tight to the org headcount.
Ultimately hardware, software, QA, etc is all about delivering a system that produces certain outputs for certain inputs, with certain penalties if it doesn’t. If you can, great, if you can’t, good luck. Whether you achieve the “can” with human development or LLM is of little concern as long as you can pay out the penalties of “can’t”.
AI will not get much better than what we have today, and what we have today is not enough to totally transform software engineering. It is a little easier to be a software engineer now, but that’s it. You can still fuck everything up.
Wow, where did this come from?
From what just comes to my mind based on recent research, I'd expect at least the following this or next year:
* Continuous learning via an architectural change like Titans or TTT-E2E.
* Advancement in World Models (many labs focusing on them now)
* Longer-running agentic systems, with Gas Town being a recent proof of concept.
* Advances in computer and browser usage - tons of money being poured into this, and RL with self-play is straightforward
* AI integration into robotics, especially when coupled with world models
And no, as an SRE I won't read DEV code, but I can help my team test it.
I mean to each their own. Sometimes if I catch a page and the rabbit hole leads to the devs code, I look under the covers.
And sometimes it's a bug I can identify and fix pretty quickly. Sometimes faster than the dev team because I just saw another dev team make the same mistake a month prior.
You gotta know when to cut your losses and stop searching the rabbit hole though, that's true.