Given Amazon and Google have their own secret sauce for replacing etcd, I am wondering if Netflix does anything special?
Netflix actually has much fewer cells than you'd expect btw, their special sauce IMO is federation and using a small subset of k8s APIs.
That is my point. I work at a large multinational and we run tens of thousands of kubernetes nodes on-prem and Im pretty sure that would be in the hundreds of millions of dollars per year to run in EKS. We run on-prem nodes about equivalent to c6a.32xlarge and even with 2 year reserved pricing you are looking at $17k/year/node. At 20000 nodes you are looking at $340 million/year, not including egress fees or any other AWS service charges (such as EBS).
I can tell you with certainty that the all-in costs to run kubernetes on-prem (including staffing costs) is a lot less than $340 million/year AND we don’t have vendor lock in. In total we have 7 full time engineers building and running on-prem kubernetes. The more nodes you have, the more it makes sense as the team size is mostly independent of the number of nodes, so that team of 7 could also run 40000 nodes without issues. The cost becomes dominated by the capex to purchase hardware. I would say team size is log(nodes).
For a company the scale of Netflix, I would assume the math is similar — especially since they already have in house expertise to run their own hardware — but maybe they get a very steep discount from AWS.
Is there something different about the world that changed the trade-off calculus for cloud vs on-prem from how it was in the last 15 years compared to now?
(I’m as anti-cloud-overspend as the next guy on hn btw. Just trying to make sense of your comment’s worldview.)