In the real world however, the bursts can be correlated, due to factors like timeouts/retries, thundering herd, correlated bursts.
so the real economics of load-balanced system is a simple reliability story: being able to reasonably serve the peak traffic, which leads to over-provisioning of those systems.
using cloud allows some form of scale up/down of resources, but doesn't completely solve the problem. I think the migration away from synchronyous systems towards async systems and letting clients gradually absorb the delays is a better approach (rather than forcing infrastructure to be dynamically scaled up/down and be billed per request-second by your cloud provider)
See also the gamedev technique of having sacrificial assets or code, so when you need to free up space late in the schedule to ship, you have something you can actually shed.
Very true, as application-layer load-balancing often explicitly pre-bakes the traffic schedule to several hundred distributed IPs for data locality. Essentially bypassing the functional need for DNS and local round-robin traffic balancers.
One trades concurrent bandwidth for slightly higher latency, and dynamically adapted capacity as traffic load changes. =3
The global edge networks that I’m aware of all use L4 LBs and L7 LBs. Cloudflare picks anycast over DNS LB, but DNS LB is still widely used.
I don’t see these things changing.
> Of course, this assumes independent events. World Cup, super bowls, etc break these assumptions.
Yes, this is very true. The model here works for Poisson arrivals and exponential service time (the M/M), which are poor approximations of real-world traffic patterns (which tend to be non-stationary and non-ergodic, and include substantial seasonality). However, the frequency of that seasonality is typically rather low (e.g. daily cycles), and so these stronger assumptions are quite defensible for short time periods.
A better approach is to do simulation with real traffic patterns, or even with more sophisticated parametric models, and get better answers (e.g. https://stability-sim.systems/). The good news is that kind of simulation is cheaper to do than ever before.
As in between the service and the load balancer? There's already an infinite queue in the load balancer. You can try that out on https://stability-sim.systems/ to see the effect, but the short version is that (in this model) it makes things worse.
If you're saying that the queue in the load balancer should be limited in size to reduce tail latency, then I agree.
Still, queuing theory is so cool.
I think that the issue is in part due to the variables. Plotting the mean request time is less intuitive than plotting throughput.
If you plot throughput vs number of servers, it'll be a straight line. And asking people that, I think most would agree on a straight line. But who knows!