Anyone offering $2 GPUs is either losing money on DC space/power, or their service is so sketchy under the covers, which they do their best to hide. It is one thing to play around with $2 gpus and another to run a business. If you're trying to do the latter, you're not considering how you are risking your business on unreliable compute.
AWS really twerked people's perception of what it takes to run high end enterprise GPU infrastructure like this. People got used to the reliability hyperscaler offers. They don't consider what 999999% uptime + 45kW+ rack infrastructure truly costs.
There is absolutely no way anyone is going to be making any money offering $2 H100s unless they stole them and they get free space/power...
Assuming you mean 99.9999%; your hyperscaler isn't giving you that. MTBF is comparable.
It's hardware at the end of the day, the VM hypervisor isn't giving you anything on GPU instances because those GPU instances aren't possible to live-migrate. (even normal VMs are really tricky).
In a country with a decent power grid and a UPS (or if you use a colo-provider) you're going to get the same availability guarantee of a machine, maybe even slightly higher because less moving parts.
I think this "cloud is god" mentality betrays the fact that server hardware is actually hugely reliable once it's working; and the cloud model literally depends on this fact. The reliability of cloud is simply the reliability of hardware; they only provided an abstraction on management not on reliability.
Our industry has really lost sight of reality and the goals we're trying to achieve.
Sufficient scalability, sufficient performance, and as much developer productivity as we can manage given the other two constraints.
That is the goal, not a bunch of cargo-culty complex infra. If you can achieve it with a single machine, fucking do it.
A monolith-ish app, running on e.g. an Epyc with 192 cores and a couple TB of RAM???? Are you kidding me? That is so much computing power, to the point where for a lot of scenarios it can replace giant chunks of complex cloud infrastructure.
And for something approaching a majority of businesses it can probably replace all of it.
(Yes, I know you need at least one other "big honkin server", located elsewhere, for failover. And yes, this doesn't work for all sets of requirements, etc)
I manage an infrastructure with tens of thousands of VMs and everyone is obsessed with auto scaling and clustering and every other thing the vendor sales dept shoved down their throats while simultaneously failing to realize that they could spend <5% of what we currently do and just use the datacenter cages we _already have_ and a big fat rack of 2S 9754 1U servers.
The kicker? These VMs are never more than 8 cores a piece, and applications never scale to more than 3 or 4 in a set. With sub 40% CPU utilization each. Most arguments against cloud abuse like this get ignored because VPs see Microsoft (Azure in this case) as some holy grail for everything and I frankly don't have it in me to keep fighting application dev teams that don't know anything about server admin.
And that's without getting into absolutely asinine price/perf SaaS offerings like Cosmos DB.
Once management is convinced by sales people or consultants any technical argument can be brushed away as not seeing the strategic big picture of managing enterprise infrastructure.
I worked for a payments company (think credit cards). We designed the system to maintain very high availability in the payment flow. Multi-region, multi-AZ in AWS. But all other flows such as user registration, customer care or even bill settlement had to stop during that one incident when our main datacenter lost power after a testing switch. The outage lasted for three hours and it happened exactly once in five years.
In that specific case, investing into higher availability by architecting in more redundancy would not be worth it. We had more downtime caused by bad code and not well thought out deployments. But that risk equation will be different for everyone.
This isn't really true. I mean it's true in the sense that you could get the same reliability on-premise given a couple decades of engineer hours, but the vast majority of on-premise deployments I have seen have significantly lower reliability than clouds and have few plans to build out those capabilities.
E.g. if I exclude public cloud operator employers, I've never worked for a company that could mimick an AZ failover on-prem and I've worked for a couple F500s. As far as I can recall, none of them have even segmented their network beyond the management plane having its own hardware. The rest of the DC network was centralized; I recall one of them in specific because an STP loop screwed up half of it at one point.
Part of paying for the cloud is centralizing the costs of thinking up and implementing platform-level reliability features. Some of those things are enormously expensive and not really practical for smaller economies of scale.
Just one random example is tracking hardware-level points of failure and exposing that to the scheduler. E.g. if a particular datacenter has 4 supplies from mains and each rack is only connected to a single one of those supplies, when I schedule 4 jobs to run there it will try to put each job in a rack with a separate power supply to minimize the impact of losing a mains. Ditto with network, storage, fire suppression, generators, etc, etc, etc.
That kind of thing makes 0 economic sense for an individual company to implement, but it starts to make a lot of sense for a company who does basically nothing other than manage hardware failures.
Some of the cloud providers don't even do live-migration. They adhere to the cloud mantra of "oh well, its up to the customer to spin up and carry on elsewhere".
I have it on good authority that some of them don't even take A+B feeds to their DC suites - and then have the chutzpah to shout at the DC provider when their only feed goes down, but that's another story... :)
Yeah we've already had about a day's worth of downtime this year on office 365 and Microsoft is definitely a hyperscaler. So that's 99.3% at best.
This is not the first time that "low yield" karma comments have sporadic changes to their votes.
It seems unlikely at the rate of change (roughly 3-5 point changes per hour) that two people would simultaneously (within a minute) have the same desire to flag a comment, so I can only speculate that:
A) Some people's flag is worth -2
B) Some people, passionate about this topic, have multiple accounts
C) There's bots that try to remain undetected by making only small adjustments to the conversation periodically.
I'm aware that some peoples job very strongly depends on the cloud, but nothing I said could be considered off topic or controversial: Cloud for GPU compute relies on hardware reliability just like everything else does. This is fact. Regardless of this, the voting behaviour on my comments such as this are extremely suspicious.
At the highest power settings, H100s consume 400 W. Add another 200 W for CPU/RAM. Assume you have an incredibly inefficient cooling system, so you also need 600 W of cooling.
Google tells me US energy prices average around 17 cents/kWh - even if you don't locate your data centre somewhere with cheap electricity.
17 cents/kWh * 1200 watts * 1 hour is only 20.4 cents/hour.
Factor in space, networking, cooling, security, etc., and $2 really do seem undoable.
Depending on how fast their value depreciates, selling them might recoup more money then renting them away. And being exposed to 3y of various risks.
Selling now at a 40% loss gets you back the equivalent of 60c/h over three years, and without having other costs (DC, power, network, security) and risks.
Imagine I own a factory, and I've just spent $50k on a widget-making machine. The machine has a useful life of 25,000 widgets.
In addition to the cost of the machine, each widget needs $0.20 of raw materials and operator time. So $5k over the life of the machine - if I choose to run the machine.
But it turns out the widget-making machine was a bad investment. The market price of widgets is now only $2.
If I throw the machine in the trash on day 1 without having produced a single widget, I've spent $50k and earned $0 so I've lost $50k.
If I buy $5k of raw materials and produce 25k widgets which sell for $50k, I've spent $55k and earned $50k so I've lost $5k. It's still a loss, sure, but a much smaller one.
Rackspace. Networking. Physical safety. Physical security. Sales staff. Support staff. Legal. Finance. HR. Support staff for those folks.
That’s just off the top of my head. Sitting down for a couple days at the very least, like a business should, would likely reveal significant depths that $2 won’t cover.
The market doesn't care how much you're losing, it will set a price and it's up to you to take it, or leave it.
There are very few data centers left that can do 45kW+ rack density, which translates to 32 H100/MI300x GPUs in a rack.
Most datacenters, you're looking at 1 or 2 boxes of 8 GPU, a rack. As a result, it isn't just the price of power, it is whatever the data center wants to charge you.
Then you factor in cooling on top of that...
Well yes, because for GPU datacentres fixed/capital costs make up a much higher fraction than power and other expenses than for CPUs. To such an extent that power usage barely even matters. A $20k that uses 1 kW ( which is way more than it would in reality ) 24x7 would cost $1.3k to run per year at 0.15$ per kWh, that's almost insignificant compared to depreciation.
The premise is that nobody could make any money by renting H100s for 2$ even if they got them for free unless they only had free power. That makes no sense whatsoever when you can get 2x AMD EPYC™ 9454P servers at 2x408 W (for full system) for around $0.70 in a German data center.
Launching any multitenant system is HARD. Many of them are held together with bubble gum and good intentions….
That’s essentially what the OP says. But once you’ve already invested in the H100s you’re still better off renting them out for $2 per hour rather than having them idle at $0 per hour.
For datacentre GPUs the energy, infrastructure and other variable costs seem to be relatively insignificant to fixed capital costs. Nvidia's GPUs are just extremely expensive relative to how much power they use (compared to CPUs).
> H100s you’re still better off renting them out for $2 per hour rather than having them idle at $0 per hour.
If you're barely breaking even at $2 then immediately selling them would seem like the only sensible option (depreciation alone is significantly higher than the cost power of running a H100 24x365 at 100% utilization).
If you can then probably yes. But why would someone else buy them (at the price you want), when they can rent at $2 per hour instead?
I've said it before and I've said it again....
Read the cloud provider small-print before you go around boasting about how great their SLAs are.
Most of the time they are not worth the paper they are written on.
https://azure.microsoft.com/files/Features/Reliability/Azure...
If you read the article, such prices happen because a lot of companies bought hardware reservations for the next few years. Instead of keeping the hardware idle (since they pay for it anyway), they rent it out on the cheap to recoup something.
This company TensorWave covered by TechCrunch [0] this week sounds very similar, I almost thought it was the same! Anyway, best of luck, we need more AMD GPU compute.
[0] https://techcrunch.com/2024/10/08/tensorwave-claims-its-amd-...
What do you mean by "risking your business on unreliable compute"? Is there a reason not to use one of these to train whatever neural nets one's business needs?
In rural areas or even with low population it takes forever to get fiber to roll out and if your selling access to your hardware infrastructure then you really want to get a direct connection to the nearest IX so you can offer customers the best speed for accessing data and the IX would probably be one of the few places you might be able to get 400G or higher direct fiber. But if your hooking up to a IX chances are your not an end user but a autonomous system and already are shoving moving and signing NDA's to be a peer with other Autonomous Systems in the exchange and be able to bgp announce.
(Source - my old highschool networking class where I got sick of my shitty internet and looked into how I could get fiber from an exchange. I'm probably mistaken on stuff here as it was years ago and its either wrong or outdated from all those years ago.)
In NW Washington state at least, the rural counties (Whatcom, Island, Skagit, etc) have had a robust market in dark fiber for over two decades.
The normal telcos weren't responsive to need, so private carriers picked up the slack. When I was last involved in this market, you could get a P2P strand, including reasonable buildout, for less than a cost of a T1 line with a two-year commit.
The tiny four-branch credit union I worked for had dedicated fiber loops between all our locations, no big deal. It was great.
Surely someone in the trillion dollar datacenter industry can figure out a way to take waste heat and use it in a profitable way, right?
I know that's basically impossible to answer generically, especially given that the recurring cost is likely already zero, given that the GPUs are already paid...
TLDR, VC money, is being burnt/lost
I think that's the point. Trying to buy and run H100s now either for yourself or for someone else to rent it is a terrible investment because of oversupply.
And prices you can get for compute are not enough to cover the costs.
If you could get a hold H100s and had an operational data center you essentially had the keys to an infinate money printer on anything above $3.50/hr.
Of course, because we live in a world of effecient markets that was never going to last forever. But they are still profitible at $2.00 assuming they have cheap electricity/infra/labor.
but yes sfcompute home page is now quoting $0.95/hr average. wild.
If a store advertised $0.50 burgers, but when you visit, they say they're not for sale, wouldn't you consider that a scam?
We don't have a limited number of slots!
We just go down a lot. It's VERY beta at the moment; we literally take the whole thing down about once a week. So if we know of some major problem, or we're down, we just don't let people on (since they'll have a bad experience).
You're right though that the prices are probably lower because of this. That's why we have a thing on our website that says "*Prices are from the sfcompute private beta and don’t represent normal market conditions."
If you'd like on anyway, I can let you on, just email me at evan at sfcompute, but it may literally break!
Also, I'm really impressed at how great your replies about your product are! You're a gem.
Yup, shall do!
> Also, I'm really impressed at how great your replies about your product are! You're a gem.
Thank you! :D
Its really not that hard to validate this claim, you can just rent for 4 hours at $1.50 - which is under $50
Also like I said, they are *not* the only one, shop around
I think you're right about the small private beta resulting in relatively low demand. But it's also a different value prop. If you need a large cluster for a reasonable period of time, you're not paying $1/hr. But if you can use the remnants of someone who contracted for a large allocation, but doesn't need part of it, they can offer it into the market and recoup what would otherwise just be wasted hours.
Currently they have some issues around stability, and spin up times are longer than ideal (ca. 15 min), but the team is super responsive and all of these are likely to be resolved in the near future. (No affiliation, just happy users rooting for the sfcompute team).
With promotional pricing it can be $0 for qualified customers.
Note also, how the author shows screenshots for invites for private alpha access. It can be mutually beneficial for the data center to provide discounted alpha testing access. The developer gets discounted access, the data center gets free/realistic alpha testing workflows.
Now its public: SFCompute list it on their main page - https://sfcompute.com/
And they are *not* the only one
The article says $2. Which is quite consistent for a small cluster
If you look at lambda one click clusters they state $4.49/H100/hr
In general, the $2 GPUs are either PE venture losing money, long contracts, huge quantities, pcie, slow (<400G) networking, or some other limitation, like unreliable uptime on some bitcoin miner that decided to pivot into the GPU space and has zero experience on how to run these more complicated systems.
Basically, all the things that if you decide to build and risk your business on these sorts of providers, you "get what you pay for".
We're not getting Folding@Home style distributed training any time soon, are we.
Is it big enough for foundation model training from scratch = ~$3+ Otherwise it drops hard
Problem is "big enough" is a moving goal post now, what was big, becomes small
ofcourse it woudl still cost a lot to do... but if the difference is $2/hr vs $4.49/hr then there's some size where it makes sense
Its a risker move - then just taxing the excess compute now, and print money on the margins from bag holders
Bag holders, do not want to be shouting to the world they are bag holders.
At best 100 and this number will go down as many would fail to make money. Even traditional 100 software development companies would have a very low success rate and here we're talking about products that themselves work probabilistically all the way down.
So yea, it would be rough
Has anyone actually trained a model actually worth all this money? Even OpenAI is s struggling to staunch the outflow of cash. Even if you can get a profitable model (for what?) how many billion dollar models does the world support? And everyone is throwing money into the pit and just hoping that there's no technical advance that obsoletes everything from under them, or commiditisation leading to a "good enough" competitor that does it cheaper.
I mean, I get that everyone and/or they investors has got the FOMO for not being the guys holding the AGI demigod at the end of the day. But from a distance it mostly looks like a huge speculative cash bonfire.
I would say Meta has (though not a startup) justified the expenditure.
By freely releasing llama they undercut every a huge swath of competition who can get funded during the hype. Then when the hype dies they can pick up what the real size of the market is, with much better margins than if there were a competitive market. Watch as one day they stop releasing free versions and start rent seeking on N+1
And even if AI as we know it today is still relevant and useful in that future, and the marginal value per training-dollar stays (becomes?) positive, will they be able to defend that position against lesser, cheaper, but more agile AIs? What will the position even be that Llama2030 or whatever will be worth that much?
Like, I know that The Market says the expected payoff is there, but what is it?
Ironically, by supporting the LLM community with free compute-intense models, they’re decreasing demand (and price) for the compute.
I suspect they’ll never directly monetize LLAMA as a public service.
I imagine admongers like Meta and Google have data that shows they are right to think they have a winning ticket in their AI behemoths, but if my YouTube could present any less relevant ads to me, I'd be actually impressed. They're intrusive, but actually they're so irrelevant that I can't even be bothered to block them, because I'm not going to start online gambling or order takeaways.
There’s a lot more that goes into the ad space than just picking which ad to show you, and it’ obviously depends on who wants to reach you. For example, probabilistic attribution is an important component on confirming that you actually got the ad and took the action across multiple systems.
Also, since you mentioned it, TV ads tend to be less targeted because they’re not direct-action ads. Direct action ads exist in a medium where you can interact with the ad immediately. Those ads are targeted to you more, because they’re about getting you to click immediately.
TV ads are more about brand recognition or awareness. It’s about understanding the demographic who watches the show, and showing general ads to that group. Throw a little tracking in there for good measure, but it’s generally about reaching a large group of people with a common message.
All that said, I am an enthusiastic paying customer of YouTube Prime and Music, Colab (I love Colab), and sometimes GCP. For many years I have happily have told Google my music and YouTube preferences for content. I like to ask myself what I am getting for giving up privacy in a hopefully targeted and controlled way.
For other people that that sentence didn't make sense for at first glance: "by supporting the LLM community with free compute-intense models [to run on their own hardware] they’re decreasing demand (and price) for the compute [server supply]."
They’re decreasing demand for expensive GPUs that would be required to train a model. Fine-tuning and inference are less compute intense, so overall demand for top-end GPU performance is decreased even if inference compute demand is increased.
Basically, why train an LLM from scratch, and spend millions on GPUs, when you can fine tune LLAMA and spend hundreds instead.
It will be primarily gas, maybe some coal. The nuclear thing is largely a fantasy; the lead time on a brand new nuclear plant is realistically a decade, and it is implausible that the bubble will survive that long.
Without the squeeze there'd be a risk for some AI company getting enough cash to buy out Facebook just for the user data. If you want to keep status quo it's good to undercut someone in the cradle that could eventually take over your business.
So it might cost Meta pretty penny but it's a mitigation for existential risk.
If you climbed up to the top of wealth and influence ladder you should spend all you can to kick off the ladder. It's gonna be always worth it. Unless you still fall because it wasn't enough.
Don't underestimate the power of the ego...
Look at their bonfire, we need one like that but bigger and hotter
https://nitter.poast.org/edzitron/status/1841529117533208936
Cries in sadness that my university lab was unable to buy compute from 2020+ when all the interesting research in AI was jumping up and now AI is going into winter finally compute will be cheap again.
Their research was obsolete before they were halfway through.
Usually some PhD students get depressed, but these 4 had awful timing. Their professors were stuck on 3-10 year grants doing things like BERT finetuning or convolution or basic level AI work - stuff that as soon as GPT-3 came out, was clearly obsolete, but nobody could admit that and lose the grants.. In other cases, their work had value, but drew less attention than it should have became all attention went to GPT-3 or people assumed it was just some wrapper technology.
The nature of academia and the incentive system caused this; academia is a cruise ship which is hard to turn. If the lighthouse light of attention moves off your ship on to another fancy ship, your only best is lifeboats(industry) or hoping the light and your ship intersect again.
The professors have largely decided to steer either right into Generative AI and using the larger models (which they could never feasibly train themselves) for research, or gone even deeper into basic AI.
The problem? The research grants are all about LLMs, not basic AI.
So basically a slew of researchers willing and able to take on basic AI research are leaving the field now. As many are entering as usual ofcourse, but largely on the LLM bandwagon.
That may be fine. The history of AI winters suggests putting all the chips on the same game like this is folly.
I recall journals in the 90s and 2000s (my time in universities was after they were released, but I read them), the distribution of AI was broad. Some GOFAI, some neural nets, many papers about filters or clear visual scene detection etc. Today it's largely LLM or LM papers. There is not much of a "counterweight underdog" like neural networks served the role off in the 90s/00s.
At the same time, for people working in the fields you mention, double check the proportion of research money going into companies vs institutions. While it is true things like TortoiseTTS[1] were an individual effort, that kind of thing is now a massive exception. In stead companies like OpenAI/Google literally have 1000+ researchers each developing the cutting edge in about 5 fields. Universities have barely any chance.
This is how the DARPA AI winter went to my understanding(and I listened to one of the few people who "survived via hibernation" during my undergraduate); over promising - central focus on one technology - then company development of projects - government involvement - disappointment - cancellation.
Those same industry companies are GPU rich too, unlike most of academia (though Christopher Manning claims that Princeton has lots of GPUs even though Stanford doesn't!)
In The Prize: The Epic Quest for Oil, Money & Power, Daniel Yergin explains the boom-and-bust cycle in the oil industry as a recurring pattern driven by shifts in supply and demand. Key elements include:
1. Boom Phase: High oil prices and increased demand encourage significant investment in exploration and production. This leads to a surge in oil output, as companies seek to capitalize on the favorable market.
2. Oversupply: As more oil floods the market, supply eventually exceeds demand, causing prices to fall. This oversupply is exacerbated by the long lead times required for oil development, meaning that new oil from earlier investments continues to come online even as demand weakens.
3. Bust Phase: Falling prices result in lower revenues for oil producers, leading to cuts in exploration, production, and jobs. Smaller or higher-cost producers may go bankrupt, and oil-dependent economies suffer from reduced income. Investment in new production declines during this phase.
4. Correction and Recovery: Eventually, the cutbacks in production lead to reduced supply, which helps stabilize or raise prices as demand catches up. This sets the stage for a new boom phase, and the cycle repeats.
Yergin highlights how this cycle has shaped the global oil industry over time, driven by technological advances, geopolitical events, and market forces, while creating periods of both rapid growth and sharp decline.
As open source models improve, OpenAI needs to keep on improving their models to stay ahead of them. Over time though, if it hasn’t already happeened, the advantages of OpenAI will not matter to most. Will OpenAI be forced to bleed money training? What does it mean for them over the next few years?
Who did?
(This is not financial advice.)
Yes, H100s are getting cheaper, but I can see the cheap price drawing in a wave of fine tuning interest, which will result in more GPU demand for both training and inferencing. Then there’s the ever need for bigger data centers for foundational model training, which the article described as completely separate from public auction prices of H100s.
I don’t think the world has more GPU compute than it knows what to do with. I think it’s still the opposite. We don’t have enough compute. And when we do, it will simply drive a cycle of more GPU compute demand.
They contacted (and we spoke with) several of the largest partners they had, including education/research institutions and some private firms, and could not find ANYONE that could accommodate our needs.
AWS also did not have the capacity, at least for spot instances since that was the only way we could have afforded it.
We ended up rolling our own solution with (more but lower-end) GPUs we sourced ourselves that actually came out cheaper than renting a dozen "big iron" boxes for six months.
It sounds like currently that capacity might actually be available now, but at the time we could not afford to wait another year to start the job.
I think we're splitting hairs here, it was more about choosing a good combination of least effort, time and money involved. When you're spending that amount of money, things are not so black and white... rented H100s get the job done faster and easier than whatever we can piece together ourselves. L40 (cheaper but no FP64) was also brand new at the time. Also our code was custom OpenCL and could have taken advantage of FP64 to go faster if we had the devices for it.
Not convinced anything has burst yet. Or will for that matter. The hype may be bubble like but clearly we will need a lot of compute.
Another question: what is the maximum size of model I can fine-tune on 1 H100?
I assume that anyone doing good work in the AI space is being "sniffed" on, and if not, than the relevant "sniffers" are failing to do their jobs!
* walks past gnabgib's desk
"Good morning!"
"Who are you talking to? Me? You haven't specified who you're interacting with. Which morning? Today? What metric are you measuring by good? This is too confusing for me."
The original article title is:
> $2 H100s: How the GPU Bubble Burst
Either AI is super dead, or a new alien GPU rained from the sky
I think AI is not gonna die even in its current stocastic parrot incarnation. It is a useful tool for some tasks and, albeit not transformative like some CEOs, I believe it's gonna stay.
At most I believe we will enter another AI winter until there's the next algorithmic breakthrough.
There's also cheaper NVIDIA L40/L40S if you don't need FP64.
Here's one for ~$18 inc shipping with 6GB DDR5:
https://www.ebay.com/itm/Nvidia-Tesla-K20X-6GB-90Y2351-C7S15...
That said, could I see them being offloaded in bulk for pennies on the dollar if the (presumed) AI bubble pops? Quite possibly, if it collapses into a black hole of misery and bad investments. In that case, it’s entirely plausible that some enterprising homelabs could snatch one up for a few grand and experiment with model training on top-shelf (if a generation old) kit. The SXMs are going for ~$26-$40k already, which is cheaper than the (worse performing) H100 Add-In Card when brand new; that’s not the pricing you’d expect from a “red hot” marketplace unless some folk are already cutting their losses and exiting positions.
Regardless, interesting times ahead. We either get AI replacing workers en masse, or a bust of the tech industry not seen since the dot-com bubble. Either way, it feels like we all lose.
https://www.reddit.com/r/nvidia/comments/1fw68rl/retiring_a_...
It's a little bit slower, but while I wait for the text to generate I have another cup of coffee.
Sometimes I even prompt myself to generate some text while I'm waiting.
My cat loved it, too. She'd lay on my desk right behind my computer and get blasted by the heat.
I was in an apartment that used resistive heat, so the crypto I mined was effectively free since energy consumed by my GPU meant using the heater less.
Is the AI infra bubble already bursting?
With cheap compute for everyone to finetune :)
- sincerely, all of the cloud providers
At $2/hr, that's 2.8 years to RoI. And that's just for the GPU and not the other hardware you'll need to plug it into, and doesn't include the power, and also assumes you're using it 100% of the time. Really, you're probably looking at 3.5+ years to RoI.
I'd rather rent than buy in that scenario.
Only the ones who can give you below MSRP essentially
What is the expected hardware operation lifespan in hours of this system?
How much would the hardware cost have to drop for the economic of $2/hour to work?
Better question: what support contract does the provider have with their manufacturers? For example, we buy Dell pro support 3 year next business day contracts on all of our gear.
But reality is not 100%, so I would argue at-least 25% or even 50% drop in the H100 price (approx 50k each, after factoring other overheads)
Predicting the future is very difficult, especially in an unprecedented revolution like this. As Nobel Prize winner Parisi said: "No matter how hard you try to predict the future, the future will surprise you"