56 pointsby dalvrosa5 days ago5 comments
  • lukax3 days ago
    It's refreshing to see a tech article that isn't about AI. It feels like 5 years ago.
  • __turbobrew__3 days ago
    Anyone know if Netflix does anything for the k8s storage layer? I imagine they are at the scale where etcd starts to go kaboom? Or maybe they have enough cells where that isn’t a problem?

    Given Amazon and Google have their own secret sauce for replacing etcd, I am wondering if Netflix does anything special?

    • scripni3 days ago
      This runs on AWS managed EKS these days, this talk goes into more detail about Netflix's special sauce around the k8s control plane: https://www.youtube.com/watch?v=vaTOiXR2KSM

      Netflix actually has much fewer cells than you'd expect btw, their special sauce IMO is federation and using a small subset of k8s APIs.

      • __turbobrew__2 days ago
        I am surprised a company at that scale is running on managed EKS, maybe I underestimate how large the clusters are.
        • zbentley2 days ago
          EKS can get pretty damn big, well into the thousands of nodes without much special tuning, and beyond that with some care and control plane monitoring. Expensive, though.
          • __turbobrew__a day ago
            > Expensive, though.

            That is my point. I work at a large multinational and we run tens of thousands of kubernetes nodes on-prem and Im pretty sure that would be in the hundreds of millions of dollars per year to run in EKS. We run on-prem nodes about equivalent to c6a.32xlarge and even with 2 year reserved pricing you are looking at $17k/year/node. At 20000 nodes you are looking at $340 million/year, not including egress fees or any other AWS service charges (such as EBS).

            I can tell you with certainty that the all-in costs to run kubernetes on-prem (including staffing costs) is a lot less than $340 million/year AND we don’t have vendor lock in. In total we have 7 full time engineers building and running on-prem kubernetes. The more nodes you have, the more it makes sense as the team size is mostly independent of the number of nodes, so that team of 7 could also run 40000 nodes without issues. The cost becomes dominated by the capex to purchase hardware. I would say team size is log(nodes).

            For a company the scale of Netflix, I would assume the math is similar — especially since they already have in house expertise to run their own hardware — but maybe they get a very steep discount from AWS.

    • stackskipton3 days ago
      It's possible they are using kine: https://github.com/k3s-io/kine
  • whinvik3 days ago
    I see Netflix pumping out tech articles but can't help but notice how much worse the UI experience is getting. Video erroring out, general slowness etc.

    Did they just give up?

    • pjmlp2 days ago
      Easy answer, not the same team.
  • jamesblonde3 days ago
    It certainly feels like Netflix is now a k8s shop. And it probably only a matter of time until they start repatriating workloads to optimize for costs. Then the world will sit up and notice.
    • beng-nl2 days ago
      I don’t get what you’re implying. What is repatriating; You think they will move their workloads to on-prem?

      Is there something different about the world that changed the trade-off calculus for cloud vs on-prem from how it was in the last 15 years compared to now?

      (I’m as anti-cloud-overspend as the next guy on hn btw. Just trying to make sense of your comment’s worldview.)

      • jamesblonde2 days ago
        Yes, coding agents have reduced the skills/knowledge required to operate workloads on virtualized hardware. K8S and its ecosystem has changed so that it now provides 90% of what you need from the public cloud providers. Big changes that make 8-15X savings by running your own workloads. I think it will be the big players who move first, as they have most to save and have the resources to make it happen.
  • scripni3 days ago
    Congrats, this is awesome!