If you think 10x devs are unicorns consider how much harder it is to get someone 10x at the intersection of both domains. (Personally I have never met one). You are far better off with people that can work together across the bridge, but that requires actual mutual trust and respect, and we’re not able to do that.
The way I have seen it in my carreer is to have operational and development capabilities within the same team. And the idea of a „DevOps guy“ is a guy „developing operations integrations“.
As opposed to completely siloing ops nd dev.
Anyone who thinks they can hire a devop or declare that they do devops is as deluded as 97% of the folks who claim that they are doing Agile. (If you are firmly on the other side of each of the four principles of the Agile Manifesto, you may or may not be doing great software development, but it's not Agile.)
The problem with the typical DevOps team is that there's no operations expertise.
Expecting Devs or Ops to do both types of work, is usually asking for trouble, unless the organization is geared up from the ground up for such seamless work. It is more of a corporate problem, rather than a team working style or work expectations & behavior problem.
The same goes for Agile vs Waterfall. Agile works well if the organization is inherently (or overhauled to be) agile, otherwise it doesn't.
Are you claiming it's fundamentally impossible for people to get along, or just that positive interpersonal relationships can't be reliably forced at scale?
It would be like asking an Amazon delivery drivers to care about oil changes and tire rotations. It's much easier to have a team of mechanics whose primary responsibility is enabling drivers to just drive and focus on delivering packages.
The reality is that most devs do not consider a holistic picture that includes the infrastructure they will be deploying to. In many cases, it's certainly a skill issue; good devs are hard to find. And to flip the coin, it's hard to find good ops people too.
The reason DevOps continues to linger, however vague a discipline it is, is because it allows the business to differentiate between revenue generating roles and cost center roles. You want your dev resources to prioritize feature work, at the beckon of PMs or upper management, and let your "DevOps" resources to be responsible for actually getting the product deployed.
In essence, it's a ploy to further commoditize engineering roles, because finding unicorns that understand the picture top-to-bottom is difficult (finding /top/ talent is difficult!). In this way, DevOps is well and alive, as a Romero zombie.
Remove the handcuffs from your ops team and your reliability will SOAR.
Kubernetes deployment configurations and Ansible playbooks are code. PromQL is code. Dockerfiles and cloud-init scripts are code. Terraform HCL is code.
It’s all code I personally hate writing, but that doesn’t make it less valid “software development” than (say) writing React code.
EDIT: lol - I am getting downvoted for suggesting some DevOps engineers will actually be ready to take on tasks that were previously more intimidating. I really hope those folks are from the never-coding-agent camp. When I refer to. reliance on CC or Codex, I meant being engaged at a wholesome level with AI -- not blindly one-shotting solutions. This means having the patience to understand the complexity of the system, the criticality of its downtime in the overall architecture (in this case it's the k8s controller), ability to learn the codebase, using the right MCPs to delve into all the details needed for testing changes locally etc). These are system-level skills and barely overlaps with just coding skills.
DevOps is a methodology, not a role.
Having an ops team does not mean devs get to through on call team over the wall to someone else. That's a sure recipe for resentment and turnover
Have you done devops yourself? It sounds like a resounding No. Like you complained ops doesn't like to code (not a core skill for the job), ops complains that devs can't understand basic concepts of how their software runs. Is this also a failure of leadership? Is everyone supposed to know parts of everyone else's jobs?
I assume the first time this happens at any given company will be the moment they realize fully autonomous code changes made on production systems by agents is a terrible idea and every change needs a human to take responsibility for and ownership of it, even if the changes were written by an LLM.
Understanding code you didn't personally write is part of the job.
Teams will figure out how to mitigate such situations in future without sacrificing the potential upside of "fully autonomous code changes made on production systems" (e.g invest more in a production-like env for test coverage).
Software engineering purists have to get out of some of these religious beliefs
To me, the Claude superfans like yourself are the religious, like how you run around poffering unsubstantiated claims like this and believe in / anthropomorphize way too much. Is it because Anthrop'ic is an abbreviation of Anthropomorphic?
The caveat is that we have to be fairly good at steering them in the right direction, as things stand today. It is exhaustive to do it the right way.
I disagree that they are really really capable engineers et al. They have moments where they shine like one. They also have moments where they perform worse than a new grad/hire. This is not what a really really capable engineer looks like. I don't see this fundamental changing, even with all the improvements we are seeing. It's lower level and more core than something adding more layers on top can resolve, that a only addresses best it can
Do you have fail hards to share along with your wins? Are we going to only share our wins like stonk hussies?
These incidents have been less and less over the last year - switching it Opus made failure frequencies less. Same thing for code reviews. Most of it is fluff, but it does give useful feedback, if the instructions are good. For example, I asked for a blind code review of a PR ("Review this PR"), and it gave some generic commentary. I made the prompt more specific ("Follow the API changes across modules and see impact") - it found a serious bug.
The number of times I had to give up in frustration has been going down over the last one year. So I tend believe a swarm of agents could do a decent job of autonomous development/maintenance over the next few years.
The teams are going to figure out how to mitigate bad deploys by using even more AI & giving it even better information gathering.
DevOps isn't a tool, but there are lots of tools that make it easier to implement.
DevOps isn't how management can eliminate half the org and have one person do two roles, specialization is still valuable.
DevOps isn't an organization structure, though the wrong org structure can make it fail.
DevOps is collaboration. It's getting two distinct roles to better interoperate. The dev team that wants to push features fast. And the ops team that wants stability and uptime.
From the management side, if you aren't focused on building teams that work well together, eliminating conflicts, rewarding the team collectively for features and uptime, and giving them the resources to deliver, that's not a DevOps failure, that's a management failure.
If you can't account for someone spending x% of their time working with a team but for budgetary purposes belonging to a different team then sack your accountants.
DevOps,like agile, when done correctly should help to create teams that understand complete systems or areas of a business work more efficiently than having stand alone teams. The other part of the puzzle is to include the QA team too to ensure that the impact of full system, performance and integration tests are understood by all and that both everyone understands how their changes impact everything else.
Having the dev team build code that makes the test and ops teams life easier benefits everyone. Having the ops team provide solutions that support test and dev helps everyone. Having test teams build system that work best with the Dev and ops teams helps everyone.
Agile development should enable teams to work at a higher level of performance by granting them the agency to make the right decisions at the right time to deliver a better product by building what is needed in the correct timeline.
DevOps and agile fail where companies try to follow waterfall models whilst claiming agile processes. The goal with all these business and operating models is to improve efficiency. When that isn't happening then either you aren't applying the model correctly or you need to change the model.
however, you don't want config being turing complete, that creates a host of other problems at a layer you don't want them
If your config is turing complete and consumed as-is, then without a lot of discipline you can dig yourself into a hole, sure.
If you're producing YAML that is not turing complete, that constraint means you have to code in a way that produces deterministic output. It's actually very safe, and YAML maps 1:1 to types in something like Python.
My favourite go-to example is for AWS Cloudformation:
Think bigger, it's not something you are using today. The next config language should have schemas built in and support for modules/imports so we can do sharing/caring. It should look and feel like config languages and interoperate with all of those that we currently use. It will be a single configuration fabric across the SDLC.
This exists today for you to try, with CUE
I've been cooking up something the last few weeks for those interested, CUE + Dagger
https://github.com/hofstadter-io/hof/tree/_next/examples/env
For comments, I use a _comment field for my custom JSON reading apps
How are people not embarrassed by this complete lack of quality in their work?
The current popular config choices cause a lot of extra work, bugs, and effort. Is improving the status quo not a worthy goal anymore? Are we at a point in history throwing our hands up and saying meh, I deal with this... is basically where people are today? (I'm somewhat a believer of this based on anecdata and vibes)
1. already installed everywhere,
2. easy to parse in every language,
3. supported by editors/linters/CI tools,
4. stable enough that vendors bet on them.
It seems to have become: "we turned ops into coding too, so now the ops team needs to be good at software engineering"
My personal experience says that the best way is that Ops team shouldn not be repurposed as Developers, rather put the experienced Developers into Production Support (incident management, that's intense Ops, working in shifts and weekends, etc.). And rotate them whenever needed. Over a period of time, you'll invariably see less defects and issues percolating down from the Devs, and then after both sides are stable and working well together with less friction and open tickets, then some more tech savvy Ops members can be rotated into Development teams as rookie devs to help reduce costs a bit (as there'll invariably be some natural attrition among the Devs and Ops, so this gives an alternative career path to the Ops team (who are usually less paid, and more stressed), and pushes the Devs not to become complacent). Such an approach is doable and productive.
Like everything, the original intentions must have been noble. But as we can see, looking back, it got popular and popular enough to get to the enterprise types.
Nothing really survives that.
PS: I have witnessed a sysadmin team being renamed DevOps and then SRE with not much other meaningful changes. I couldn't believe it at the time.
The problem in your case is not the dev vs ops split, it's a company culture thing which I'm sure you see play out in more places than this current focus
DevOps is a methodology. DevOps as a role or team name is a fantasy from people who do not understand the methodology.
If you want DevOps to work, your Ops must be member of the development team, take part in the sprints, etc. But many company do not want to do that because they want to separate ops and dev budget/accounting and do not want to hire enough people with ops skills.
I first hand saw in, AWS devDays, an AI giving SIWINCH as "root-cause" of Apache error in a containerized process is in EKS for a backend FCGI process connection error. It has been extremely hard since that demo to trust any AI for system level debugging.
(2) AWS is not a leader, if even a contender, in the AI space. I would not evaluate the potential based on a demo they produced
I'm so sick of this nonsense. "Devops" isn't failing, isn't an issue, you can rename it whatever you want, but throughout my career the devops engineers (the ones you don't skimp on) are the best, highest paid professionals at the company.
I don't know why I keep reading these completely crazy think-pieces hemming and hawing about a system (having a few engineers who master performance/backups/deployments/oncall/retros) that seems to be wildly successful. It would be nice if more engineers understood under-the-hood, but most companies choose not to exclusively hire at that caliber.
DevOps, shift left, full stack dev, all reminds me of the Futurama episode where Hermes Conrad successfully reorgs the slave camp he's sent to, so that all physical labour is done by a single Australian man
Speaking darker, there is a kind of - well, perhaps not misanthropy, but certainly a not-so-well-meaning dismissiveness, to the "silo breaking" philosophy that looks at complex fields and says "well these should all just be lumped together as one thing, the important stuff is simple, I don't know why you're making all these siloes, man" - assuming that ops specialists, sysadmins, programmers, DBAs, frontend devs, mobile devs, data engineers and testers have just invented the breadth and depth and subtleties of their entire fields, only as a way of keeping everybody else out
But modern systems are complex, they are only getting more so, and the further you buy into the shift-left everyone-is-everything computer-jobs-are-all-the-same philosophy the harder and harder it will get to find employees who can straddle the exhausting range of knowledge to master
I don’t think this is the right take. “Silo’s” is an ill-defined term, but let’s look at a couple of the negative aspects. “Lack of communication”, and “Lack of shared understanding” (or different models of the world). I’m going to use a different industry example, as I think it helps think about the problem more abstractly.
In the world of biomedical engineering, the types of products you are making require the expertise of two very different groups of people. Engineers and Doctors. A member of either of these groups have an in-group language, and there is an inherent power differential between them. Doctors are more “important” than engineers. But to get anything made, you need the expertise of both.
One way to handle this is to keep the engineers and doctors separate and to communicate primarily via documents. The doctor will attempt to detail exactly how a certain component should work. The engineer will attempt to detail the constraints and request clarifications.
The problem with this approach is that the engineer cannot speak “doctorese” nor can the doctor speak “engineerese”; and the consequence is a model in each person’s head that differs significantly from the other. There is no shared model; and the real world product suffers as a result.
The alternative is to attempt to “break the silos”; force the engineers and doctors to sit with each other, learn each other’s language, and build a shared mental model of what is being created. This creates a far better product; one that is much closer to the “physical reality” it must inhabit.
The same is true across all kinds of business groups. If different groups of people are required to collaborate, in order to do something, those people are well served by learning each other’s languages and building a shared mental model. That’s what breaking silos is about. It is not “everyone is the same”, it’s “breaking down the communication barriers”.
I don't think anyone thinks siloes are themselves a good thing, but they might be a necessary consequence of having specialists. Shift-left is mostly designed to reduce conversations between groups, by having individuals straddle across tasks. It's actually kind of anti-collaboration, or at least pessimistic that collaboration can happen
I am arguing that all such people, whether developers or ops or ux designers or product managers; need to engage in this learning as they collaborate. This doesn’t mean that we want the DevPM as a resultant title, just that Siloing these different groups will lead to perverse outcomes.
Dev and ops have been traditionally siloed. DevOps was a silly attempt to address it.
Hence my vitriol: https://news.ycombinator.com/item?id=46662287.
Also: please could he please avoid doing it by illustrating his non-sense with graphs that are both childish and non-sensical?
DevOps is a mess of our own making - embracing K8s created complexity for little gain for nearly all companies.
And I don't want to trivialize the reality of enterprise platforms where bespoke connectors rule. I have dealt with migrations of platforms that are business critical and managing version compatibility and ensuring none of the integrations regressed was par for the course. I am not even saying that that makes me qualified to replicate Honeycomb.io. But I do think someone with a deep technical background in building observability platforms armed with Claude Code or Codex and armed with the right set of MCP's and all the necessary tooling should be able to build a clone of Honeycomb.uio.
Maybe it won't be a fast turnaround like a typical vibe-coded project but even if it is a month-long project to even get to 60% feature parity. these vendors will have to sit up and pay attention.
Eventually a bureaucrat becomes the manager of the team, and seeks to expand the set of things under DevOps' control. This makes the team a single point of failure for more and more things, while driving more and more developer processes towards mediocrity. Velocity slows, while the DevOps bottlenecks are used as a reason to hire.
It's an organizational problem, not a talent or knowledge problem. Allowing a group to hire and grow within an organization, which is not directly accountable for the success of the other parts of the organization that it was intended to support, is creating a cancer, definitionally.