$2 H100s: How the GPU Rental Bubble Burst(www.latent.space)

403 pointsby swyx9 months ago27 comments

latchkey9 months ago
I am building a bare metal mi300x service provider business.
Anyone offering $2 GPUs is either losing money on DC space/power, or their service is so sketchy under the covers, which they do their best to hide. It is one thing to play around with $2 gpus and another to run a business. If you're trying to do the latter, you're not considering how you are risking your business on unreliable compute.
AWS really twerked people's perception of what it takes to run high end enterprise GPU infrastructure like this. People got used to the reliability hyperscaler offers. They don't consider what 999999% uptime + 45kW+ rack infrastructure truly costs.
There is absolutely no way anyone is going to be making any money offering $2 H100s unless they stole them and they get free space/power...
- dijit9 months ago
  > 999999% uptime
  Assuming you mean 99.9999%; your hyperscaler isn't giving you that. MTBF is comparable.
  It's hardware at the end of the day, the VM hypervisor isn't giving you anything on GPU instances because those GPU instances aren't possible to live-migrate. (even normal VMs are really tricky).
  In a country with a decent power grid and a UPS (or if you use a colo-provider) you're going to get the same availability guarantee of a machine, maybe even slightly higher because less moving parts.
  I think this "cloud is god" mentality betrays the fact that server hardware is actually hugely reliable once it's working; and the cloud model literally depends on this fact. The reliability of cloud is simply the reliability of hardware; they only provided an abstraction on management not on reliability.
  - llm_trw9 months ago
    I think people just don't realize how big computers have gotten since 2006. A t2.micro was an ok desktop computer back then. Today you can have something 1000 times as big for a few tens of thousands. You can easily run a company that serves the whole of the US out of a closet.
    JohnBooty9 months ago
    It's just wild to me how seemingly nobody is exploiting this.
    Our industry has really lost sight of reality and the goals we're trying to achieve.
    Sufficient scalability, sufficient performance, and as much developer productivity as we can manage given the other two constraints.
    That is the goal, not a bunch of cargo-culty complex infra. If you can achieve it with a single machine, fucking do it.
    A monolith-ish app, running on e.g. an Epyc with 192 cores and a couple TB of RAM???? Are you kidding me? That is so much computing power, to the point where for a lot of scenarios it can replace giant chunks of complex cloud infrastructure.
    And for something approaching a majority of businesses it can probably replace all of it.
    (Yes, I know you need at least one other "big honkin server", located elsewhere, for failover. And yes, this doesn't work for all sets of requirements, etc)
    uid655349 months ago
    I feel this every day I talk with cloud-brained coworkers.
    I manage an infrastructure with tens of thousands of VMs and everyone is obsessed with auto scaling and clustering and every other thing the vendor sales dept shoved down their throats while simultaneously failing to realize that they could spend <5% of what we currently do and just use the datacenter cages we _already have_ and a big fat rack of 2S 9754 1U servers.
    The kicker? These VMs are never more than 8 cores a piece, and applications never scale to more than 3 or 4 in a set. With sub 40% CPU utilization each. Most arguments against cloud abuse like this get ignored because VPs see Microsoft (Azure in this case) as some holy grail for everything and I frankly don't have it in me to keep fighting application dev teams that don't know anything about server admin.
    And that's without getting into absolutely asinine price/perf SaaS offerings like Cosmos DB.
    JohnBooty9 months ago
    I'm going to borrow the term "cloud-brained."
    _3u109 months ago
    Servers that big rarely fail anyway, everything is hotswap and redundant.
    latchkey9 months ago
    The GPUs/OAM,UBB are not redundant and they do fail.
    From what I hear, Nvidia has an exceptionally high failure rate.
    geodel9 months ago
    Well the problem nowadays is what can be done has become what must be done. totally bypassing on question of what should be done. So now instead of single service serving 5 million requests in a business is replaced by 20 micro services generating traffic of 150 million requests with distributed transactions, logging (MBs of log per request), monitoring, metrics and so on. All leading to massive infrastructure bloat. Do it for dozen more applications and future is cloudy now.
    Once management is convinced by sales people or consultants any technical argument can be brushed away as not seeing the strategic big picture of managing enterprise infrastructure.
    dartos9 months ago
    Well you’d probably also at least want a cdn in each region, so like 3 closets.
    jgalt2129 months ago
    Cloudflare caching of static resources is cheap, so back to one closet. But three if you want to be pure and totally cloudless.
    felixgallo9 months ago
    With one closet you can also lose the entire business if one water pipe breaks or one wire goes bad in drywall. Back to three closets.
    llm_trw9 months ago
    If only it were possible to make backups. Alas no such technology exists.
    LoganDark9 months ago
    Yes. That's what the other closets are for. Redundancy.
    dartos9 months ago
    backups do not prevent downtime.
    krab9 months ago
    Yes. That's a risk assessment every company must make. What's the probability of downtime vs the development slowdown and the operating costs of a fully redundant infrastructure?
    I worked for a payments company (think credit cards). We designed the system to maintain very high availability in the payment flow. Multi-region, multi-AZ in AWS. But all other flows such as user registration, customer care or even bill settlement had to stop during that one incident when our main datacenter lost power after a testing switch. The outage lasted for three hours and it happened exactly once in five years.
    In that specific case, investing into higher availability by architecting in more redundancy would not be worth it. We had more downtime caused by bad code and not well thought out deployments. But that risk equation will be different for everyone.
    llm_trw9 months ago
    The 2018 server in my garage has had a better uptime than aws in the last 6 years.
    packetlost9 months ago
    Very few businesses are living and breathing by their system uptime. Sure, it's bad, but having a recovery plan and good backups (or modest multi-site redundancy, if you're really worried) is sufficient for most.
    dartos9 months ago
    Let’s call it a day and just go for a single colo.
  - zaptrem9 months ago
    As someone who has done a bunch of large scale ML on hyperscaler hardware I will say the uptime is nowhere near 99.9999%. Given a cluster of only a few hundred GPUs one or multiple failures is a near certainty to the point where we spend a bunch of time on recovery time optimization.
  - everforward9 months ago
    > The reliability of cloud is simply the reliability of hardware; they only provided an abstraction on management not on reliability.
    This isn't really true. I mean it's true in the sense that you could get the same reliability on-premise given a couple decades of engineer hours, but the vast majority of on-premise deployments I have seen have significantly lower reliability than clouds and have few plans to build out those capabilities.
    E.g. if I exclude public cloud operator employers, I've never worked for a company that could mimick an AZ failover on-prem and I've worked for a couple F500s. As far as I can recall, none of them have even segmented their network beyond the management plane having its own hardware. The rest of the DC network was centralized; I recall one of them in specific because an STP loop screwed up half of it at one point.
    Part of paying for the cloud is centralizing the costs of thinking up and implementing platform-level reliability features. Some of those things are enormously expensive and not really practical for smaller economies of scale.
    Just one random example is tracking hardware-level points of failure and exposing that to the scheduler. E.g. if a particular datacenter has 4 supplies from mains and each rack is only connected to a single one of those supplies, when I schedule 4 jobs to run there it will try to put each job in a rack with a separate power supply to minimize the impact of losing a mains. Ditto with network, storage, fire suppression, generators, etc, etc, etc.
    That kind of thing makes 0 economic sense for an individual company to implement, but it starts to make a lot of sense for a company who does basically nothing other than manage hardware failures.
  - traceroute669 months ago
    > instances aren't possible to live-migrate
    Some of the cloud providers don't even do live-migration. They adhere to the cloud mantra of "oh well, its up to the customer to spin up and carry on elsewhere".
    I have it on good authority that some of them don't even take A+B feeds to their DC suites - and then have the chutzpah to shout at the DC provider when their only feed goes down, but that's another story... :)
  - yencabulator9 months ago
    > (even normal VMs are really tricky)
    For what it's worth, GCP routinely live-migrates customer VMs to schedule hardware for maintenance/decommissioning when hardware sensors start indicating trouble. It's standard everyday basic functionality by now, but only for the vendors who built the feature in from the beginning.
    dijit9 months ago
    I’m aware, but it won’t work for gpu accelerated workloads.
  - wkat42429 months ago
    > Assuming you mean 99.9999%; your hyperscaler isn't giving you that. MTBF is comparable.
    Yeah we've already had about a day's worth of downtime this year on office 365 and Microsoft is definitely a hyperscaler. So that's 99.3% at best.
  - dijit9 months ago
    meta: I'm always interested how the votes go on comments like this. I've been watching periodically and it seems like I get "-2" at random intervals.
    This is not the first time that "low yield" karma comments have sporadic changes to their votes.
    It seems unlikely at the rate of change (roughly 3-5 point changes per hour) that two people would simultaneously (within a minute) have the same desire to flag a comment, so I can only speculate that:
    A) Some people's flag is worth -2
    B) Some people, passionate about this topic, have multiple accounts
    C) There's bots that try to remain undetected by making only small adjustments to the conversation periodically.
    I'm aware that some peoples job very strongly depends on the cloud, but nothing I said could be considered off topic or controversial: Cloud for GPU compute relies on hardware reliability just like everything else does. This is fact. Regardless of this, the voting behaviour on my comments such as this are extremely suspicious.
- michaelt9 months ago
  > There is absolutely no way anyone is going to be making any money offering $2 H100s unless they stole them and they get free space/power...
  At the highest power settings, H100s consume 400 W. Add another 200 W for CPU/RAM. Assume you have an incredibly inefficient cooling system, so you also need 600 W of cooling.
  Google tells me US energy prices average around 17 cents/kWh - even if you don't locate your data centre somewhere with cheap electricity.
  17 cents/kWh * 1200 watts * 1 hour is only 20.4 cents/hour.
  - ckastner9 months ago
    That's just the power. If one expects a H100 to run for three years at full load, 24 x 365 x 3 = 26280. Assuming a price of $25K per H100, that means about $1/h to amortize costs. Hence the unless they stole them, I guess.
    Factor in space, networking, cooling, security, etc., and $2 really do seem undoable.
    Negitivefrags9 months ago
    None of that matters if you already bought the H100 and have no use for it. You might as well recoup as much money as you can on it.
    ckastner9 months ago
    > You might as well recoup as much money as you can on it.
    Depending on how fast their value depreciates, selling them might recoup more money then renting them away. And being exposed to 3y of various risks.
    Selling now at a 40% loss gets you back the equivalent of 60c/h over three years, and without having other costs (DC, power, network, security) and risks.
    dwattttt9 months ago
    If you already have the H100s, renting access to them at a loss isn't better. Throwing them in the trash will lose you less money.
    michaelt9 months ago
    That's not how this works.
    Imagine I own a factory, and I've just spent $50k on a widget-making machine. The machine has a useful life of 25,000 widgets.
    In addition to the cost of the machine, each widget needs $0.20 of raw materials and operator time. So $5k over the life of the machine - if I choose to run the machine.
    But it turns out the widget-making machine was a bad investment. The market price of widgets is now only $2.
    If I throw the machine in the trash on day 1 without having produced a single widget, I've spent $50k and earned $0 so I've lost $50k.
    If I buy $5k of raw materials and produce 25k widgets which sell for $50k, I've spent $55k and earned $50k so I've lost $5k. It's still a loss, sure, but a much smaller one.
    listenallyall9 months ago
    The concept you're looking for is "marginal cost". The initial $50,000 for the machine has already been spent - the only calculation left is that each new widget costs 20 cents to make (that's the marginal cost) and generates $2.00 in revenue. At this point, making widgets is highly profitable.
    adgjlsfhk19 months ago
    and for GPUs, the math is even more stark because rather than having a 25k item lifespan, the lifespan is the time until GPUs improve enough to make the current one irrelevant.
    ericpauley9 months ago
    GGP already showed the marginal power cost is well below $2.
    cheschire9 months ago
    There is so much more to lifecycle sustainment cost than that.
    Rackspace. Networking. Physical safety. Physical security. Sales staff. Support staff. Legal. Finance. HR. Support staff for those folks.
    That’s just off the top of my head. Sitting down for a couple days at the very least, like a business should, would likely reveal significant depths that $2 won’t cover.
    ericpauley9 months ago
    These are all costs of any server hosting business. Other commenters have already shown that $2/hr for a racked 1U server at 400W is perfectly sustainable.
    dwattttt9 months ago
    Just because you have all of those costs already doesn't make them go away. If you're cross-subsidising the H100 access with the rest of a profitable business, that's a choice you can make, but it doesn't mean it's suddenly profitable at $2: you still need the profitable rest of the business in order to lose money here.
    H8crilA9 months ago
    So you terminate all of the above right now, or continue selling at a loss (which still extends the runway) and wait for better times? Also, do you know that similar situations occasionally occur in pretty much any market out there?
    The market doesn't care how much you're losing, it will set a price and it's up to you to take it, or leave it.
    dragonwriter9 months ago
    No, if its only a “loss” due to counting amortization of the sunk cost of initial acquisition, throwing them in the trash will lose you more money. The only way you can avoid the key cost is to travel back in time and not buy them, and, yeah, if you can do that instead, maybe you should (but, the time travel technology will make you more money than the H100s would ever cost, so maybe don't bother.)
    swyx9 months ago
    amortization curves for gpus are 5-7 years per my gpu rich contacts. even after they cease to be top of the line they are still useful for inference. so you can halve that $1/h
    stogot9 months ago
    Haven’t electric costs been increasing though? Eventually those two curves should death cross
  - latchkey9 months ago
    You are not looking at the full economics of the situation.
    There are very few data centers left that can do 45kW+ rack density, which translates to 32 H100/MI300x GPUs in a rack.
    Most datacenters, you're looking at 1 or 2 boxes of 8 GPU, a rack. As a result, it isn't just the price of power, it is whatever the data center wants to charge you.
    Then you factor in cooling on top of that...
  - sandworm1019 months ago
    For the fuller math one has to include the cost of infrastructure financing, which is tied to interest rates. Given how young most of these H100 shops are, I assume that they pay more to service their debts than for power.
    Wytwwww9 months ago
    > I assume that they pay more to service their debts than for power.
    Well yes, because for GPU datacentres fixed/capital costs make up a much higher fraction than power and other expenses than for CPUs. To such an extent that power usage barely even matters. A $20k that uses 1 kW ( which is way more than it would in reality ) 24x7 would cost $1.3k to run per year at 0.15$ per kWh, that's almost insignificant compared to depreciation.
    The premise is that nobody could make any money by renting H100s for 2$ even if they got them for free unless they only had free power. That makes no sense whatsoever when you can get 2x AMD EPYC™ 9454P servers at 2x408 W (for full system) for around $0.70 in a German data center.
- neom9 months ago
  This reads exactly like what people said about DigitalOcean when we launched it.
  - count9 months ago
    To be fair, DO was muuuch sketchier in the past (eg https://news.ycombinator.com/item?id=6983097).
    Launching any multitenant system is HARD. Many of them are held together with bubble gum and good intentions….
    neom9 months ago
    Boy I'm never going to live that one down around here huh? Hackernews always going to keep you honest, ha. :D
  - imglorp9 months ago
    How was DO able to provide what AWS didn't want to? Was it purely margins?
    neom9 months ago
    AWS just really didn't want to, very different market segment. They were doing a pure enterprise play, looking to capture most of the enterprise. We were doing a b2c play that we presumed over time would suck us up into the SMB. My theory was we had like 1% risk from them. From what I could tell Jeff and Jassy had zero interest in our segment. I left just before the IPO but when we started it, the margin was about 60%, after we figured out how many VMs we could comfortable fit on the box, Ben U just did napkin math and said "50% seems like a fine enough margin to start"
  - 9 months ago
    undefined
- bjornsing9 months ago
  > There is absolutely no way anyone is going to be making any money offering $2 H100s unless they stole them and they get free space/power...
  That’s essentially what the OP says. But once you’ve already invested in the H100s you’re still better off renting them out for $2 per hour rather than having them idle at $0 per hour.
  - Wytwwww9 months ago
    Then how come you can still get several last gen EPYC or Xeon systems that would use the same amount of power for under $1 per hour?
    For datacentre GPUs the energy, infrastructure and other variable costs seem to be relatively insignificant to fixed capital costs. Nvidia's GPUs are just extremely expensive relative to how much power they use (compared to CPUs).
    > H100s you’re still better off renting them out for $2 per hour rather than having them idle at $0 per hour.
    If you're barely breaking even at $2 then immediately selling them would seem like the only sensible option (depreciation alone is significantly higher than the cost power of running a H100 24x365 at 100% utilization).
    bjornsing9 months ago
    > If you're barely breaking even at $2 then immediately selling them would seem like the only sensible option (depreciation alone is significantly higher than the cost power of running a H100 24x365 at 100% utilization).
    If you can then probably yes. But why would someone else buy them (at the price you want), when they can rent at $2 per hour instead?
    Wytwwww9 months ago
    I don't think the why matters as long as people are buying them at very high prices, which they seemingly still are.
    bjornsing9 months ago
    What makes you think they are?
    Wytwwww9 months ago
    Nvidia's quarterly income statements?
    TacticalCoder9 months ago
    I'm not saying NVidia sales are slowing down (the books are complete for quite a while AIUI) but... Where would we hear first about a slowing down in sales? From a NVidia quarterly statement or from the market of renting GPU compute?
    Wytwwww9 months ago
    Used GPU prices still seem to be pretty high and availability is low? But yes, if the GPU compute rental market is highly unprofitable (I'm not sure it is, though) while HW prices are still high that indicates a clear inefficiency in the market. Meaning that you should sell ASAP before it self corrects.
- traceroute669 months ago
  > 999999% uptime
  I've said it before and I've said it again....
  Read the cloud provider small-print before you go around boasting about how great their SLAs are.
  Most of the time they are not worth the paper they are written on.
  - kjs39 months ago
    This is beyond true. Read and understand what your cloud SLAs are, not what you think they are or what you think they should be. There was significant consternation generated when I pointed out that the SLA for availability for an Azure storage blob was only 4 nines with zone redundancy.
    https://azure.microsoft.com/files/Features/Reliability/Azure...
  - latchkey9 months ago
    Not just the fine print, but also look at how they present themselves. A provider with pictures of equipment and detailed specifications is always going to be more interesting than a provider with just a marketing website and a "contact us" page.
- marcyb5st9 months ago
  But it is about minimizing losses, not making profits.
  If you read the article, such prices happen because a lot of companies bought hardware reservations for the next few years. Instead of keeping the hardware idle (since they pay for it anyway), they rent it out on the cheap to recoup something.
- rajnathani9 months ago
  From your bio, your company is Hot Aisle.
  This company TensorWave covered by TechCrunch [0] this week sounds very similar, I almost thought it was the same! Anyway, best of luck, we need more AMD GPU compute.
  [0] https://techcrunch.com/2024/10/08/tensorwave-claims-its-amd-...
  - latchkey9 months ago
    Thanks! Definitely not the same at all.
- tasuki9 months ago
  > If you're trying to do the latter, you're not considering how you are risking your business on unreliable compute.
  What do you mean by "risking your business on unreliable compute"? Is there a reason not to use one of these to train whatever neural nets one's business needs?
  - oefrha9 months ago
    Well, someone who’s building a GPU renting service right now obviously wants to scare you into using expensive and “reliable” services; the market crashing is disastrous for them. The reality is high price is hardly an indicator of reliability, and the article very clearly explains why H100 hours are being sold at $2 or less, and it’s not because of certain providers lacking reliability.
    latchkey9 months ago
    Nah, don't be silly. No need to scare anyone into anything. Use whatever you want to use. My point in saying and of this is to simply point out that we offer this service to people who value these things.
  - lazide9 months ago
    If it crashes half way through, you don’t get a useful model, and you’re still on the hook for the rental costs to get there maybe?
    tasuki9 months ago
    That's... possible? But a little unlikely.
    I think I'll take that risk over paying more for your allegedly more reliable GPUs anytime.
    latchkey9 months ago
    Depends on the SLA.
- dx0349 months ago
  Since most applications aren't latency sensitive, space and power can be nearly free by setting up the data center in a place where it's cold, there's nearly free electricity and few people live. Leaves you with cost for infrastructure and connectivity, but I guess electricity prices shouldn't be the issue?
  - tonetegeatinst9 months ago
    I'd think cost of internet would be the big issue even if can afford the AI hardware.
    In rural areas or even with low population it takes forever to get fiber to roll out and if your selling access to your hardware infrastructure then you really want to get a direct connection to the nearest IX so you can offer customers the best speed for accessing data and the IX would probably be one of the few places you might be able to get 400G or higher direct fiber. But if your hooking up to a IX chances are your not an end user but a autonomous system and already are shoving moving and signing NDA's to be a peer with other Autonomous Systems in the exchange and be able to bgp announce.
    (Source - my old highschool networking class where I got sick of my shitty internet and looked into how I could get fiber from an exchange. I'm probably mistaken on stuff here as it was years ago and its either wrong or outdated from all those years ago.)
    oasisbob9 months ago
    Assuming rural areas have less fiber availability isn't always a good assumption.
    In NW Washington state at least, the rural counties (Whatcom, Island, Skagit, etc) have had a robust market in dark fiber for over two decades.
    The normal telcos weren't responsive to need, so private carriers picked up the slack. When I was last involved in this market, you could get a P2P strand, including reasonable buildout, for less than a cost of a T1 line with a two-year commit.
    The tiny four-branch credit union I worked for had dedicated fiber loops between all our locations, no big deal. It was great.
  - serjester9 months ago
    Ambient cooling can only go so far. At the end of the day if you have a rack of GPU’s using 6000 watts per node, you’re going to need some very serious active cooling regardless of your location. You’ll save a little but it’s a small percentage of your overall costs.
    pie4209 months ago
    in industrial manufacturing, recovering waste heat is a very common junior engineer task, usually a great first year project from recent grads to do a simple, $50-100k project that has a 1-2 year payback period.
    Surely someone in the trillion dollar datacenter industry can figure out a way to take waste heat and use it in a profitable way, right?
    coredog649 months ago
    I’d guess that there’s not enough energy density in the waste heat to do anything useful, especially once you bring it away from the clean areas of the facility where it’s produced to someplace you could actually use it at scale.
- foobiekr9 months ago
  You should consider the possibility that one outcome is that no one is going to make money offering H100s.
- fhars9 months ago
  I think this is what they are insinuating with the "Hot the Bubble Burst" in the headline. You are not expected to make money if you have invested in a bursting bubble.
- wolfgangK9 months ago
  For training, doesn't checkpoint saving make high reliability a moot point ? Why pay for 99.99999? uptime when you can restart your training from last/best model ?
- scotty799 months ago
  > There is absolutely no way anyone is going to be making any money offering $2 H100s unless they stole them and they get free space/power...
  I think that's the point. Trying to buy and run H100s now either for yourself or for someone else to rent it is a terrible investment because of oversupply.
  And prices you can get for compute are not enough to cover the costs.
- acd10j9 months ago
  May be their business model is running compute at loss and stealing ip/code from people using platform?
- hnaccount_rng9 months ago
  Can you elaborate on the cost basis? With how little could a very lean operation still make money?
  I know that's basically impossible to answer generically, especially given that the recurring cost is likely already zero, given that the GPUs are already paid...
- pico_creator9 months ago
  Someone is losing the money. It’s elaborated in the article how and why this happens
  TLDR, VC money, is being burnt/lost
  - shermantanktop9 months ago
    Tons of VC money burned in pursuit of low-probability success. It’s no wonder that some people find it easier to scam VCs than it is to build a real business.
TechDebtDevin9 months ago
I've been saying this would happen for months. There (was) a giant arbitrage for data centers that already have the infra.
If you could get a hold H100s and had an operational data center you essentially had the keys to an infinate money printer on anything above $3.50/hr.
Of course, because we live in a world of effecient markets that was never going to last forever. But they are still profitible at $2.00 assuming they have cheap electricity/infra/labor.
- pico_creator9 months ago
  Problem is - u can find some at $1
  - startupsfail9 months ago
    The screenshot there is 1xH100 PCIE, for $1.604. Which is likely promotional pricing to get customers onboarded.
    With promotional pricing it can be $0 for qualified customers.
    Note also, how the author shows screenshots for invites for private alpha access. It can be mutually beneficial for the data center to provide discounted alpha testing access. The developer gets discounted access, the data center gets free/realistic alpha testing workflows.
    pico_creator9 months ago
    When I did the screenshot a month ago, it wasn't public info yet.
    Now its public: SFCompute list it on their main page - https://sfcompute.com/
    And they are *not* the only one
    ipsum29 months ago
    Okay, but you can't actually buy it at that price, it's a pure marketing ploy.
    pico_creator9 months ago
    Not at $0.5 (which the lower bound in their marketing), but $1.5 is very doable on right times (done so multiple times)
    The article says $2. Which is quite consistent for a small cluster
    ipsum29 months ago
    The average consumer cannot. Only those who have access to sfcompute's private beta can access those prices. Once it opens up to the public, the price will increase.
    zaptrem9 months ago
    Running preprocessing jobs on a $0.5 SFCompute H100 node RN (though price usually bounces up to what you mentioned).
    electronbeam9 months ago
    The PCIE has much lower perf than even a 1x slice of an SXM
    pico_creator9 months ago
    I really suggest shopping around. <$2 SXM is a real thing, if your patient enough on the schedule.
    shrubble9 months ago
    So are you thinking that the lower price is to get the customer in the door and then when they need the Infiniband connected GPUs to charge them more?
    9 months ago
    undefined
  - swyx9 months ago
    original title i wrote for this piece was "$1 H100s" but i deleted because even i thought it was so ridiculously low lol
    but yes sfcompute home page is now quoting $0.95/hr average. wild.
    ipsum29 months ago
    sfcompute is a scam. You can't buy GPUs at that price. They're running a "private beta" where people can bid for a spot GPU, but they let a limited number of people into the beta, so the prices are artificially low.
    neom9 months ago
    As an advisor to those guys, I take a great deal of objection with you calling it a scam. It's not a scam. They're testing things out, so the price is low and not many people can use it... because they're testing. That isn't a scam.
    ipsum29 months ago
    They are advertising a ridiculously low price for their GPUs that can't be rented.
    If a store advertised $0.50 burgers, but when you visit, they say they're not for sale, wouldn't you consider that a scam?
    flaque9 months ago
    Hi! I run sfcompute.
    We don't have a limited number of slots!
    We just go down a lot. It's VERY beta at the moment; we literally take the whole thing down about once a week. So if we know of some major problem, or we're down, we just don't let people on (since they'll have a bad experience).
    You're right though that the prices are probably lower because of this. That's why we have a thing on our website that says "*Prices are from the sfcompute private beta and don’t represent normal market conditions."
    If you'd like on anyway, I can let you on, just email me at evan at sfcompute, but it may literally break!
    Schiendelman9 months ago
    If I may recommend: put a note where people will see those prices so that they understand those prices are unlikely to remain. If the outcome of your current UX is people thinking you're a scam, you have a problem that will last as you start to scale. It's hard to measure now, but it's harder to fix later.
    Also, I'm really impressed at how great your replies about your product are! You're a gem.
    flaque9 months ago
    > put a note where people will see those prices so that they understand those prices are unlikely to remain.
    Yup, shall do!
    > Also, I'm really impressed at how great your replies about your product are! You're a gem.
    Thank you! :D
    authorfly9 months ago
    They might be and thanks for warning about that one company - but if this is anything like renting 3090s (ignoring the period of time during crypto rises), the prices really can go low to a loss level, I guess sunk cost crisis for the owners or the inertia of not pulling them out and selling them hits hard.
    pico_creator9 months ago
    I actually signed up for separate new account, to double check that my business account was not being favored or rigged in "private beta"
    Its really not that hard to validate this claim, you can just rent for 4 hours at $1.50 - which is under $50
    Also like I said, they are *not* the only one, shop around
    ipsum29 months ago
    I signed up and don't have access currently. My point that the prices are low because demand is limited because of lack of users still stands. Once people sign up and hear about it, the price will increase substantially.
    qeternity9 months ago
    We are actively using sfcompute at the moment. It's a great product for us where we have a backlog of R&D workloads that can be incrementally run in short bursts.
    I think you're right about the small private beta resulting in relatively low demand. But it's also a different value prop. If you need a large cluster for a reasonable period of time, you're not paying $1/hr. But if you can use the remnants of someone who contracted for a large allocation, but doesn't need part of it, they can offer it into the market and recoup what would otherwise just be wasted hours.
    Currently they have some issues around stability, and spin up times are longer than ideal (ca. 15 min), but the team is super responsive and all of these are likely to be resolved in the near future. (No affiliation, just happy users rooting for the sfcompute team).
electronbeam9 months ago
The real money is in renting infiniband clusters, not individual gpus/machines
If you look at lambda one click clusters they state $4.49/H100/hr
- latchkey9 months ago
  I'm in the business of mi300x. This comment nails it.
  In general, the $2 GPUs are either PE venture losing money, long contracts, huge quantities, pcie, slow (<400G) networking, or some other limitation, like unreliable uptime on some bitcoin miner that decided to pivot into the GPU space and has zero experience on how to run these more complicated systems.
  Basically, all the things that if you decide to build and risk your business on these sorts of providers, you "get what you pay for".
  - jsheard9 months ago
    > slow (<400G) networking
    We're not getting Folding@Home style distributed training any time soon, are we.
    krasin9 months ago
    Distributed training data creating & curation is more useful and feasible. Training gets cheaper 1.5x every year, but data is just as expensive, if not more, given that the era of "free web crawls of human knowledge" is over.
- marcyb5st9 months ago
  I agree with you, but as the article mentioned, if you need to finetune a small/medium model you really don't need clusters. Getting a whole server with 8/16x H100s is more than enough. And I also believe with the article when it states that most companies are finetuning some version of llama/open-weights models today.
- pico_creator9 months ago
  Exactly, it covered in the article that there is a segmentation happening via GPU cluster size.
  Is it big enough for foundation model training from scratch = ~$3+ Otherwise it drops hard
  Problem is "big enough" is a moving goal post now, what was big, becomes small
  - swyx9 months ago
    so why not buy up all the little h100s and enough together for a cluster? seems like a decent rollup strategy?
    ofcourse it woudl still cost a lot to do... but if the difference is $2/hr vs $4.49/hr then there's some size where it makes sense
    ipsum29 months ago
    Only if they're networked with Infiniband.
    pico_creator9 months ago
    Makes sense, though only folks like runpod / sfcompute / etc, have enough visibility to maybe pull this off?
    Its a risker move - then just taxing the excess compute now, and print money on the margins from bag holders
    latchkey9 months ago
    Correct me if I'm wrong, but if I recall, neither of those two companies own their own compute. They are marketplaces.
    pico_creator9 months ago
    Yup, but they at-least know where all these "small unused clusters" are.
    Bag holders, do not want to be shouting to the world they are bag holders.
    qeternity9 months ago
    I think sfcompute does own a lot or most of the current compute on their platform? Not entirely sure though.
ranger_danger9 months ago
Last year we reached out to a major GPU vendor for a need to get access to a seven figure dollar amount worth of compute time.
They contacted (and we spoke with) several of the largest partners they had, including education/research institutions and some private firms, and could not find ANYONE that could accommodate our needs.
AWS also did not have the capacity, at least for spot instances since that was the only way we could have afforded it.
We ended up rolling our own solution with (more but lower-end) GPUs we sourced ourselves that actually came out cheaper than renting a dozen "big iron" boxes for six months.
It sounds like currently that capacity might actually be available now, but at the time we could not afford to wait another year to start the job.
- chronogram9 months ago
  If you were able to make do with cheaper GPUs, then you didn't need FP64 so you didn't need H100s in the first place right? Then you made the right choice in buying a drill for your screw work instead of renting a jackhammer even if the jackhammer would've seemed cooler to you at the time.
  - KeplerBoy9 months ago
    Does anyone doing AI need FP64, and yet they sell well.
  - ranger_danger9 months ago
    > didn't need H100s
    I think we're splitting hairs here, it was more about choosing a good combination of least effort, time and money involved. When you're spending that amount of money, things are not so black and white... rented H100s get the job done faster and easier than whatever we can piece together ourselves. L40 (cheaper but no FP64) was also brand new at the time. Also our code was custom OpenCL and could have taken advantage of FP64 to go faster if we had the devices for it.
- InkCanon9 months ago
  [dead]
wg09 months ago
> Collectively there are less than <50 teams worldwide who would be in the market for 16 nodes of H100s (or much more), at any point in time, to do foundation model training
At best 100 and this number will go down as many would fail to make money. Even traditional 100 software development companies would have a very low success rate and here we're talking about products that themselves work probabilistically all the way down.
- pico_creator9 months ago
  Im quite sure there is more than a 100 clusters even. Though that would be harder to prove.
  So yea, it would be rough
Der_Einzige9 months ago
I just want to observe that there are a lot of people paying huge amounts of money for consulting about this exact topic and that this article is jam packed with more recent and relevant information than almost any of these consultants have.
- pico_creator9 months ago
  Feel free to forward to the clients of "paid consultant". Also how do i collect my cut.
- swyx9 months ago
  author @pico_creator is in here actively replying in case u have any followups.. i just did the editing
  - 9 months ago
    undefined
- pico_creator9 months ago
  Also: how many of those consultants, have actually rented GPU's - used them for inference - or used them to finetune / train
  - aurareturn9 months ago
    I’m guessing most of them are advising Wallstreet on AI demand.
grues-dinner9 months ago
> For all the desperate founders rushing to train their models to convince their investors for their next $100 million round.
Has anyone actually trained a model actually worth all this money? Even OpenAI is s struggling to staunch the outflow of cash. Even if you can get a profitable model (for what?) how many billion dollar models does the world support? And everyone is throwing money into the pit and just hoping that there's no technical advance that obsoletes everything from under them, or commiditisation leading to a "good enough" competitor that does it cheaper.
I mean, I get that everyone and/or they investors has got the FOMO for not being the guys holding the AGI demigod at the end of the day. But from a distance it mostly looks like a huge speculative cash bonfire.
- justahuman749 months ago
  > For all the desperate founders rushing to train their models to convince their investors for their next $100 million round.
  I would say Meta has (though not a startup) justified the expenditure.
  By freely releasing llama they undercut every a huge swath of competition who can get funded during the hype. Then when the hype dies they can pick up what the real size of the market is, with much better margins than if there were a competitive market. Watch as one day they stop releasing free versions and start rent seeking on N+1
  - grues-dinner9 months ago
    Right, but that is all predicated that, when they get to the end, having spent tons of nuclear fuel, container shiploads of GPUs and whole national GDPs on the project, there will be some juice worth all that squeeze.
    And even if AI as we know it today is still relevant and useful in that future, and the marginal value per training-dollar stays (becomes?) positive, will they be able to defend that position against lesser, cheaper, but more agile AIs? What will the position even be that Llama2030 or whatever will be worth that much?
    Like, I know that The Market says the expected payoff is there, but what is it?
    vineyardmike9 months ago
    As the article suggests, the presence of LLAMA is decreasing demand for GPUs. Which are critical to Metas ad recommendation services.
    Ironically, by supporting the LLM community with free compute-intense models, they’re decreasing demand (and price) for the compute.
    I suspect they’ll never directly monetize LLAMA as a public service.
    grues-dinner9 months ago
    With all these billions upon billions in AI hardware screaming along, are ads actually that much better targeted than they used to be?
    I imagine admongers like Meta and Google have data that shows they are right to think they have a winning ticket in their AI behemoths, but if my YouTube could present any less relevant ads to me, I'd be actually impressed. They're intrusive, but actually they're so irrelevant that I can't even be bothered to block them, because I'm not going to start online gambling or order takeaways.
    vineyardmike9 months ago
    A better question, with a growing push for privacy, how can they keep ads from regressing?
    There’s a lot more that goes into the ad space than just picking which ad to show you, and it’ obviously depends on who wants to reach you. For example, probabilistic attribution is an important component on confirming that you actually got the ad and took the action across multiple systems.
    Also, since you mentioned it, TV ads tend to be less targeted because they’re not direct-action ads. Direct action ads exist in a medium where you can interact with the ad immediately. Those ads are targeted to you more, because they’re about getting you to click immediately.
    TV ads are more about brand recognition or awareness. It’s about understanding the demographic who watches the show, and showing general ads to that group. Throw a little tracking in there for good measure, but it’s generally about reaching a large group of people with a common message.
    mark_l_watson9 months ago
    You ask a great question, and I wonder how the push for more privacy will pan out (pardon the gold mining analogy). I am almost done with the very good new book The Tech Coup by Marietje Schaake, and I have also read Privacy is Power and Surveilance Capitalism. I think more of the public is waking up to the benefits of privacy.
    All that said, I am an enthusiastic paying customer of YouTube Prime and Music, Colab (I love Colab), and sometimes GCP. For many years I have happily have told Google my music and YouTube preferences for content. I like to ask myself what I am getting for giving up privacy in a hopefully targeted and controlled way.
    jorvi9 months ago
    > Ironically, by supporting the LLM community with free compute-intense models, they’re decreasing demand (and price) for the compute.
    For other people that that sentence didn't make sense for at first glance: "by supporting the LLM community with free compute-intense models [to run on their own hardware] they’re decreasing demand (and price) for the compute [server supply]."
    vineyardmike9 months ago
    Sorry, I should have been more clear.
    They’re decreasing demand for expensive GPUs that would be required to train a model. Fine-tuning and inference are less compute intense, so overall demand for top-end GPU performance is decreased even if inference compute demand is increased.
    Basically, why train an LLM from scratch, and spend millions on GPUs, when you can fine tune LLAMA and spend hundreds instead.
    jorvi9 months ago
    Thank you for the extra clarification, I hadn’t even thought of inference vs training!
    fragmede9 months ago
    How fungible is that compute though? Having even a single H100 is different than having a bunch of 4090's, nevermind a properly networked supercomputer of H100s.
    vineyardmike9 months ago
    That’s the point. You can run inference on a 4090 but training is better on a H100. If you use llama, you don’t need to train on an H100, so you can free that supply up for meta.
    fragmede9 months ago
    I haven't been following llama closely but I thought the latest model was too big for inference on 4090's, and that you can't fine tune on 4090's either, but furthermore, the other question is if the market is there for running inference on 4090s.
    vineyardmike9 months ago
    Well, (1) there are a ton of GPUs out there of various specs, and you can also use an inference provider who can use a H100 or similar to serve multiple inference requests at once. (2) there are a ton of LLAMA sizes, from 1b, 2b, 8b, 70b, and 400b. The smaller ones can even run on phone GPUs.
    rsynnott9 months ago
    > having spent tons of nuclear fuel
    It will be primarily gas, maybe some coal. The nuclear thing is largely a fantasy; the lead time on a brand new nuclear plant is realistically a decade, and it is implausible that the bubble will survive that long.
    scotty799 months ago
    > there will be some juice worth all that squeeze.
    Without the squeeze there'd be a risk for some AI company getting enough cash to buy out Facebook just for the user data. If you want to keep status quo it's good to undercut someone in the cradle that could eventually take over your business.
    So it might cost Meta pretty penny but it's a mitigation for existential risk.
    If you climbed up to the top of wealth and influence ladder you should spend all you can to kick off the ladder. It's gonna be always worth it. Unless you still fall because it wasn't enough.
  - pico_creator9 months ago
    Given their rising stock price trend, due to their moves in AI. Definitely worth it for them
  - mlinhares9 months ago
    Given meta hasn’t been able to properly monetize WhatsApp I seriously doubt they can monetize this.
    fragmede9 months ago
    Who says they haven't?
- jordwest9 months ago
  > I get that everyone and/or they investors has got the FOMO for not being the guys holding the AGI demigod at the end of the day
  Don't underestimate the power of the ego...
  Look at their bonfire, we need one like that but bigger and hotter
  - bugbuddy9 months ago
    I spit out my tea when I read your last sentence. You should consider standup comedy.
    wyclif9 months ago
    It's a bonfire, turn the lights out
- Aeolun9 months ago
  Isn’t OpenAI profitable if they stop training right at this moment? Just because they’re immediately reinvesting all that cash doesn’t mean they’re not profitable.
  - Attach61569 months ago
    And if they stop training right now their "moat" (which I think is only o1 as of today) would last a good 3 to 6 months lol, and then to the Wendy's it is.
    Aeolun9 months ago
    That is similarly true for all other AI companies. It’s why they don’t do that. But everyone is still happy to give them more money because their offering is good as it is.
    wyclif9 months ago
    >and then to the Wendy's it is
    I didn't really catch that pop culture reference. What does that mean?
    slater9 months ago
    My guess: The competition catches up, you lose all paying clients, and you get to apply for jobs at Wendy's...?
    wyclif9 months ago
    Or does it mean it's a trivial decision, as trivial as deciding what fast food joint to choose?
    slater9 months ago
    Guess we'll never know :(
  - 0xDEAFBEAD9 months ago
    This guy claims they are losing billions of dollars on free ChatGPT users:
    https://nitter.poast.org/edzitron/status/1841529117533208936
    fragmede9 months ago
    Ed Zitron's analysis hinges on a lot of assumptions. Much of it comes down to the question of how much it actually costs to run a single inference of ChatGPT. That $20/month pro subscription could be a loss-leader or it could be making money, depending on the numbers you want to use. If you play with the numbers, and compare it to, say, $2/hr for an H100 currently on the front page, $20/$2/hr gets you 10 hours of GPU time before it costs more in hardware than your subscription, and then factoring in overhead on top, it's just not clear.
    Aeolun9 months ago
    You’d need to know how much they are using for that. I only use the API and the $20 I bought a year ago aren’t gone yet.
- elcomet9 months ago
  Not everyone is doing LLM training. I know plenty of startups selling AI products for various image tasks (agriculture, satellite, medical...)
  - mark_l_watson9 months ago
    Yes, a lot of the money to be made is in the middleware and application sides of development. I find even small models like Llama 3.2 2B to be extremely useful and fine tuning and integration with existing businesses can have a large potential payoff for smaller investments.
- hackernewds9 months ago
  Lots of companies have. Most recently Character AI trained an internal model and did raise $100M early last year. They didn't release any benchmarks since the founding team and Noam taken to Google
- tonetegeatinst9 months ago
  Pretty sure anthropic has
anshulbhide9 months ago
This reminds me of the boom and bust oil cycle as outlined in The Prize: The Epic Quest for Oil, Money & Power by Daniel Yergin.
- swyx9 months ago
  care to summarize key points for the class?
  - dplgk9 months ago
    It seems appropriate, in this thread, to have ChatGPT provide the summary:
    In The Prize: The Epic Quest for Oil, Money & Power, Daniel Yergin explains the boom-and-bust cycle in the oil industry as a recurring pattern driven by shifts in supply and demand. Key elements include:
    1. Boom Phase: High oil prices and increased demand encourage significant investment in exploration and production. This leads to a surge in oil output, as companies seek to capitalize on the favorable market.
    2. Oversupply: As more oil floods the market, supply eventually exceeds demand, causing prices to fall. This oversupply is exacerbated by the long lead times required for oil development, meaning that new oil from earlier investments continues to come online even as demand weakens.
    3. Bust Phase: Falling prices result in lower revenues for oil producers, leading to cuts in exploration, production, and jobs. Smaller or higher-cost producers may go bankrupt, and oil-dependent economies suffer from reduced income. Investment in new production declines during this phase.
    4. Correction and Recovery: Eventually, the cutbacks in production lead to reduced supply, which helps stabilize or raise prices as demand catches up. This sets the stage for a new boom phase, and the cycle repeats.
    Yergin highlights how this cycle has shaped the global oil industry over time, driven by technological advances, geopolitical events, and market forces, while creating periods of both rapid growth and sharp decline.
    DebtDeflation9 months ago
    This isn't just the story of GPUs or Oil, this is the entire story of capitalism going back to the early Industrial Revolution in the 1700s. The economist Hyman Minsky added asset prices and debt financing to it to round out a compelling theory of the business cycle including the extreme bubbles and depressions sometimes seen.
    automatic61319 months ago
    Aren't these both simply cases of the bullwhip effect?
    https://en.wikipedia.org/wiki/Bullwhip_effect
    DebtDeflation9 months ago
    That's a supply chain specific example. If you're looking for something more fundamental, they're all examples of unstable systems with positive feedback loops.
    bgnn9 months ago
    or bistable systems
    swyx9 months ago
    have you ever read a good expanation of why Minsky Moments happen? it always occured to me if you can time them right you can make a ton of money on the way up and on the way down
    immibis9 months ago
    If they could be accurately predicted, they wouldn't happen.
authorfly9 months ago
Haha.
Cries in sadness that my university lab was unable to buy compute from 2020+ when all the interesting research in AI was jumping up and now AI is going into winter finally compute will be cheap again.
- 77341289 months ago
  I don't feel any winter yet.
  - thelastparadise9 months ago
    At least not until LLM gains hit a wall. So far every open weight model has far surpassed the previous releases at the same model size.
    danpalmer9 months ago
    But closed models are clearly slowing. It seems reasonable to expect that as open weight models reach the closed weight model sizes they’ll see the same slowdown.
  - alecco9 months ago
    If you remove LLMs, there is absolutely an AI winter.
    kkzz999 months ago
    Audio generation (music, tts, voice cloning), Video and Image generation, multi-modal models, protein simulation... where is the winter?
    authorfly9 months ago
    Well, it's in academia, in traditional universities, any way. I think corporates are still thriving. I can say from an academic point of view, I knew 4 PhDs who started in 2018/2019, all 4 got depressed and left the field.
    Their research was obsolete before they were halfway through.
    Usually some PhD students get depressed, but these 4 had awful timing. Their professors were stuck on 3-10 year grants doing things like BERT finetuning or convolution or basic level AI work - stuff that as soon as GPT-3 came out, was clearly obsolete, but nobody could admit that and lose the grants.. In other cases, their work had value, but drew less attention than it should have became all attention went to GPT-3 or people assumed it was just some wrapper technology.
    The nature of academia and the incentive system caused this; academia is a cruise ship which is hard to turn. If the lighthouse light of attention moves off your ship on to another fancy ship, your only best is lifeboats(industry) or hoping the light and your ship intersect again.
    The professors have largely decided to steer either right into Generative AI and using the larger models (which they could never feasibly train themselves) for research, or gone even deeper into basic AI.
    The problem? The research grants are all about LLMs, not basic AI.
    So basically a slew of researchers willing and able to take on basic AI research are leaving the field now. As many are entering as usual ofcourse, but largely on the LLM bandwagon.
    That may be fine. The history of AI winters suggests putting all the chips on the same game like this is folly.
    I recall journals in the 90s and 2000s (my time in universities was after they were released, but I read them), the distribution of AI was broad. Some GOFAI, some neural nets, many papers about filters or clear visual scene detection etc. Today it's largely LLM or LM papers. There is not much of a "counterweight underdog" like neural networks served the role off in the 90s/00s.
    At the same time, for people working in the fields you mention, double check the proportion of research money going into companies vs institutions. While it is true things like TortoiseTTS[1] were an individual effort, that kind of thing is now a massive exception. In stead companies like OpenAI/Google literally have 1000+ researchers each developing the cutting edge in about 5 fields. Universities have barely any chance.
    This is how the DARPA AI winter went to my understanding(and I listened to one of the few people who "survived via hibernation" during my undergraduate); over promising - central focus on one technology - then company development of projects - government involvement - disappointment - cancellation.
    [1] https://github.com/neonbjb/tortoise-tts
    KaoruAoiShiho9 months ago
    Technology progressing too far is the opposite of a winter, this sounds like a "too hot" problem rather than the opposite.
    Der_Einzige9 months ago
    Why care about research grants? It's all about publishing at NeurIPS/competitors or ACL/competitors. Let the industry pay you 3x what you'd fight for in grants and reap the rewards of lots of citations.
    Those same industry companies are GPU rich too, unlike most of academia (though Christopher Manning claims that Princeton has lots of GPUs even though Stanford doesn't!)
physicsguy9 months ago
Open models like Llama make it pointless for the majority of companies to train from scratch. It was obvious this would happen.
- 77341289 months ago
  Inference should always be more significant than training in the end though.
  - Tepix9 months ago
    There are more options for inference.
- bjornsing9 months ago
  True. The hard part is timing it.
kristopolous9 months ago
This sounds like bad news for the gpu renter farms. Am reading this right?
- swyx9 months ago
  the marketplaces like sfcompute do great, bc so much cheap supply and theres lots of demand. its the foundation model startups who locked into peak hype contracts for access that are eating a lot of losses right now... (which perhaps explains why the bigcos are acquiring only the founders and not assuming the liabilities of the oldco...)
  - sgu9999 months ago
    > which perhaps explains why the bigcos are acquiring only the founders and not assuming the liabilities of the oldco...
    Who did?
murtio9 months ago
Enjoyed the article and I was ready to try the promoted featherless.ai. I signed up and spent 15 minutes trying load or chat with Llama 3 models. All attempts failed. Naturally I would ask, if it's so cheap to run GPU's, why I would need to sign up to try a model?
ctrlGsysop9 months ago
A good in depth mkt analysis. While it’s not crypto, many of the key points are rinse and repeat of mining - things like insatiable demand and projected ROI. Markets and tech solve high costs all the time. Great point made about the $4/hr number that was most likely a top bullet in a 1000 pitch decks citing NVIDIA. Bagholders could just be all the nations buying all the billionaire’s stories.
- pico_creator9 months ago
  Yea, the older GPU providers, were pushing 3-5 year commits for a reason. They seen this before
- bugbuddy9 months ago
  There is one big exception in the list of all nations. I don’t know what to make of it. Irony?
- aurareturn9 months ago
  The only difference is that LLMs have a real world value.
- wmf9 months ago
  Yeah, I did this same kind of math all the time back during the early ASIC mining days except it was accelerated; you had to break even in 9 months or never due to the exponentially growing difficulty.
evbogue9 months ago
I was surprised recently when I fired up ollama on my refurbished Thinkpad -- a laptop that doesn't even have a GPU. All of the hype had me convinced that I couldn't run any of this LLM stuff at home!
It's a little bit slower, but while I wait for the text to generate I have another cup of coffee.
Sometimes I even prompt myself to generate some text while I'm waiting.
- heiploy9 months ago
  training is the phase that needs all that compute
  - evbogue9 months ago
    This is good to know. I had read somewhere (that was probably on the Internet) that every time I submitted a prompt at the Meta AI web site that I was vaporizing an entire bottle of water, so imagine how thrilled I was to be saving so much water by prompting AI at home! But alas, the water was already vaporized. The climate? Already changed.
    gloflo9 months ago
    Nope, climate is changing to even worse. It's not a "oops, OK now we live with this new reality" but "oh fuck, the rollercoaster is getting steeper AND is accelerating more and more, the breaks are lose and we already lost half of the wagons".
    evbogue9 months ago
    Maybe with enough H100s we can next word predict a solution to this global issue.
- m3kw99 months ago
  Current 1b model will do you no good, just rotate through all the free stuff and it would cover most of you usecases
  - evbogue9 months ago
    I will admit that Llama3.1 70B does make my old Thinkpad pretty cranky. But winter is coming, so if I can change the climate of my bedroom while I'm waiting that's always a bonus this time of year.
    Sohcahtoa829 months ago
    Heh, back in 2014, I heated my room with an AMD R9 290 by mining crypto.
    My cat loved it, too. She'd lay on my desk right behind my computer and get blasted by the heat.
    I was in an apartment that used resistive heat, so the crypto I mined was effectively free since energy consumed by my GPU meant using the heater less.
    evbogue9 months ago
    This is begging for a distributed work algorithm that favors GPUs in cold bedrooms to render your next greatest hallucination.
bjornsing9 months ago
Thanks for the heads-up. I just increased my short position in NVDA a tiny bit. The peak should be near.
(This is not financial advice.)
- aurareturn9 months ago
  I would not bet against Nvidia right now.
  Yes, H100s are getting cheaper, but I can see the cheap price drawing in a wave of fine tuning interest, which will result in more GPU demand for both training and inferencing. Then there’s the ever need for bigger data centers for foundational model training, which the article described as completely separate from public auction prices of H100s.
  I don’t think the world has more GPU compute than it knows what to do with. I think it’s still the opposite. We don’t have enough compute. And when we do, it will simply drive a cycle of more GPU compute demand.
  - bjornsing9 months ago
    I don’t think I’m betting against Nvidia. I’m betting against Nvidia being worth 3.3 trillion.
    Der_Einzige9 months ago
    Still a bad bet. Their moat is deeper now than it was in 2022. The engineers you need to poach are all paid well over 1+ million USD per year now. The number of people capable of writing quality CUDA code to optimize transformer language models world-wide is likely less than 10000, and I'm being very generous. Nvidia holds a significant portion of that group, and some of the others you'll never find in the market at all since they hide behind discord profile pictures and mental illness.
    bjornsing9 months ago
    Only reason that is hard is that CUDA is general purpose. You don’t need a graphics + GPGPU platform to run transformers. Nvidia is eating its own moat.
- KaoruAoiShiho9 months ago
  I just went balls deep into long positions including calls and 2x etfs.
  - bjornsing9 months ago
    Interesting… What’s your thesis?
- alecco9 months ago
  "Markets can stay irrational for longer than you can stay solvent"
  - bjornsing9 months ago
    I know. :) That’s why I keep it small. And I’m long semiconductors as a whole.
    9 months ago
    undefined
yalogin9 months ago
What does it mean for OpenAI?
As open source models improve, OpenAI needs to keep on improving their models to stay ahead of them. Over time though, if it hasn’t already happeened, the advantages of OpenAI will not matter to most. Will OpenAI be forced to bleed money training? What does it mean for them over the next few years?
Havoc9 months ago
There is also the small matter of a new gen coming out…
Not convinced anything has burst yet. Or will for that matter. The hype may be bubble like but clearly we will need a lot of compute.
h_tbob9 months ago
I'm really hoping Jim Keller and Tenstorrent crew drop it even further with GDDR backed AI compute.
sva_9 months ago
I've been wondering if any state actors might seem it favorable to offer gpus and sniff on the training data/model architectures
- Der_Einzige9 months ago
  I'm sure this is happening. Hell, weights and biases was doing this years ago to early free users (maybe they never stopped).
  I assume that anyone doing good work in the AI space is being "sniffed" on, and if not, than the relevant "sniffers" are failing to do their jobs!
hamilyon29 months ago
Is this the most computational bang for buck one ever seen?
Another question: what is the maximum size of model I can fine-tune on 1 H100?
bsder9 months ago
So, where can a plebian like me buy a (or 10) used H100?
- wmf9 months ago
  I don't expect them to hit the used market before 2026-2027. Data centers will start replacing H100 with R100 at that time.
askl9 months ago
$2/h rental, not $2 sales price. Pretty misleading.
- squigz9 months ago
  Misleading? Anyone who read this title and thought it was referring to the full purchase price might deserve to be misled.
  - hackernewds9 months ago
    That is what the title says explicitly. That's how click bait works
    squigz9 months ago
    It also explicitly says "rental", so I'm not sure how one can possibly arrive at the conclusion that they meant "$2 to own an H100"
    gnabgib9 months ago
    It didn't say that at the time, the article still has the submitted title: $2 H100s: How the GPU Bubble Burst
    squigz9 months ago
    Even so, I genuinely don't see how anyone who might be clicking this article could possibly interpret it the way GP is saying.
    cuu5089 months ago
    Well, case in point, I did. When I read the title I thought – "IIRC these were going for thousands, could they have really dropped so hard? Well, sometimes companies, cars, real estate properties cost $1, but there's always of course a catch. Let's see what the catch is here... <click> ah, it's a 4x reduction of rental price, boring"
    nottorp9 months ago
    Anyone who isn't an "AI" fanatic can and will interpret the title as the sale price :)
    bongodongobob9 months ago
    Holy fuck.
    * walks past gnabgib's desk
    "Good morning!"
    "Who are you talking to? Me? You haven't specified who you're interacting with. Which morning? Today? What metric are you measuring by good? This is too confusing for me."
    CapeTheory9 months ago
    "Do you wish me a good morning, or mean that it is a good morning whether I want it or not; or that you feel good this morning; or that it is a morning to be good on?”
    kibibu9 months ago
    The HN title has been editorialized, perhaps recently.
    The original article title is:
    > $2 H100s: How the GPU Bubble Burst
- pico_creator9 months ago
  If we $2 H100 this year or next.
  Either AI is super dead, or a new alien GPU rained from the sky
  - marcyb5st9 months ago
    There's option 3: current capacity is enough for our AI needs and so GPUs now the market is flooded.
    I think AI is not gonna die even in its current stocastic parrot incarnation. It is a useful tool for some tasks and, albeit not transformative like some CEOs, I believe it's gonna stay.
    At most I believe we will enter another AI winter until there's the next algorithmic breakthrough.
    dangsux9 months ago
    [dead]
    friendzis9 months ago
    Current stochastic parrots do not have to be transformative, they have to appear smart enough for a critical mass of dumb enough people. And judging anecdotally from scanning social media - they already do. Even here, on HN, you find numerous comments of the shape: "${my favorite gpt} says this: <insert some gibberish>"
  - ranger_danger9 months ago
    Blackwell B100/B200 did kinda rain down, also the AMD MI300X and increased availability of H200.
    There's also cheaper NVIDIA L40/L40S if you don't need FP64.
  - askl9 months ago
    I'm hoping for the first one
- qingcharles9 months ago
  Some of the Tesla GPUs are almost at this price per unit on eBay now. I've seen them go for under $15 online.
  Here's one for ~$18 inc shipping with 6GB DDR5:
  https://www.ebay.com/itm/Nvidia-Tesla-K20X-6GB-90Y2351-C7S15...
  - pico_creator9 months ago
    That hurts - i used those GPUs before at their peak Now any random GPU in the computer store murders it
    chessgecko9 months ago
    Not just gpus, the k20 was at 3.9 Tflops (fp32) and the new iPhone is at 4.3 (fp16). If you don’t need the precision it got passed by the phones
  - dplgk9 months ago
    It appears this GPU cost $7700 when it launched in 2012? GPUs have gotten that much better that this thing isn't even with $100?
  - barrenko9 months ago
    Are these a viable buy?
    chessgecko9 months ago
    You’d get better perf training on a current gen phone than that gpu, but it probably functions
    pico_creator9 months ago
    Only if ur a collector (so no if ur plugging it in)
- stego-tech9 months ago
  Agreed, and I doubt we’ll see one retail at that price even on the secondhand market anytime soon.
  That said, could I see them being offloaded in bulk for pennies on the dollar if the (presumed) AI bubble pops? Quite possibly, if it collapses into a black hole of misery and bad investments. In that case, it’s entirely plausible that some enterprising homelabs could snatch one up for a few grand and experiment with model training on top-shelf (if a generation old) kit. The SXMs are going for ~$26-$40k already, which is cheaper than the (worse performing) H100 Add-In Card when brand new; that’s not the pricing you’d expect from a “red hot” marketplace unless some folk are already cutting their losses and exiting positions.
  Regardless, interesting times ahead. We either get AI replacing workers en masse, or a bust of the tech industry not seen since the dot-com bubble. Either way, it feels like we all lose.
- osigurdson9 months ago
  2 bucks for a GPU? Maybe a PIC microcontroller.
  - askl9 months ago
    They don't even have HDMI ports so they are pretty useless, but I'd buy one at $2 as a desk ornament.
    qingcharles9 months ago
    GPU display stand:
    https://www.reddit.com/r/nvidia/comments/1fw68rl/retiring_a_...
    dchftcs9 months ago
    You don't need an HDMI port, you just need a driver to support running the right graphics calculations and producing image to funnel to another output port. The GPU may lack some features, may have an architecture that is bad for rendering, and may be suboptimal in delivering the performance per watt. Exactly like how a CPU doesn't have a display port.
  - qingcharles9 months ago
    https://news.ycombinator.com/item?id=41806396
- two_handfuls9 months ago
  Agreed, "$2/h" would be the correct unit, "$2" reads to me like a typo.
- renewiltord9 months ago
  Bruh
amelius9 months ago
Does that include electricity?
lamontcg9 months ago
so, time to short NVDA?
- pico_creator9 months ago
  Hard to say, i mean A100's had the same freefall - and nvidia just grew with H100's
  - swyx9 months ago
    can you do a quick rerun of the ROI math with BH200 numbers now that we know them? minus the fp4 shenanigans ofc
    pico_creator9 months ago
    Do we have actual fp8 numbers? (or i could proxy it by /2 the fp4)
- aurareturn9 months ago
  The fact that cheaper GPU prices have drawn so much interest here should tell you that prices will bounce back. The lower the price, the more people will experiment with fine tuning and inferencing.
- Ekaros9 months ago
  Old adage still stands. But I would certainly unload some if I had any.
  - _sys491529 months ago
    past 4 years have taught me to bet on irrational
    andreasmetsala9 months ago
    That works until it doesn’t.
hislaziness9 months ago
TLDR: Don’t buy H100s. The market has flipped from shortage ($8/hr) to oversupplied ($2/hr), because of reserved compute resales, open model finetuning, and decline in new foundation model co’s. Rent instead.
Is the AI infra bubble already bursting?
- pico_creator9 months ago
  I’m hopping more for an open weights AI boom
  With cheap compute for everyone to finetune :)
- TechDebtDevin9 months ago
  No, but the prices will likely converge with MSRP pricing. A lot of datacenter were filled with h100s that cost a premium to get ahold of.
  - pico_creator9 months ago
    Covered in the article. They are below MSRP essentially
  - hislaziness9 months ago
    It is not just MSRP, management and operations cost too. The article goes into the details of this.
    pico_creator9 months ago
    Q_Q yes - ur right on that - and i wrote the article (about a month ago)
- swyx9 months ago
  (editor here) we've been commenting on the Winds of AI Winter for a while now :) https://latent.space/p/mar-jun-2024
- justahuman749 months ago
  Yes, please only rent instead
  - sincerely, all of the cloud providers
  - Sohcahtoa829 months ago
    An H100 is what, $50,000 MSRP?
    At $2/hr, that's 2.8 years to RoI. And that's just for the GPU and not the other hardware you'll need to plug it into, and doesn't include the power, and also assumes you're using it 100% of the time. Really, you're probably looking at 3.5+ years to RoI.
    I'd rather rent than buy in that scenario.
  - pico_creator9 months ago
    ~Cough~ not all cloud provider (there are many still willing to charge you an arm and a leg)
    Only the ones who can give you below MSRP essentially
bugbuddy9 months ago
At $2 per hour, factoring in the overall hardware cost, labor, electricity, and other sunk costs like floor space and bandwidth, how many total hours does it take to break even?
What is the expected hardware operation lifespan in hours of this system?
How much would the hardware cost have to drop for the economic of $2/hour to work?
- hislaziness9 months ago
  The details are in the article. They have done the math.
  - bugbuddy9 months ago
    There was no answer to my last question which I think is the most important thing when considering if we are going to have another GFC this year or next year.
    rsynnott9 months ago
    Does "GFC" stand for "global financial crisis" here? It seems implausible that the collapse of the LLM bubble will cause one; it might have a pretty dramatic impact on the markets, but it's unclear how it would cause the sort of systemic failure that we saw in the noughties.
- latchkey9 months ago
  > What is the expected hardware operation lifespan in hours of this system?
  Better question: what support contract does the provider have with their manufacturers? For example, we buy Dell pro support 3 year next business day contracts on all of our gear.
- pico_creator9 months ago
  You could technically break even at $2, assuming 100% allocation, and cheap electricity.
  But reality is not 100%, so I would argue at-least 25% or even 50% drop in the H100 price (approx 50k each, after factoring other overheads)
- 9 months ago
  undefined
frhack9 months ago
Artificial Intelligence will replace many jobs and business. So the race is on to become the main AI providers of the future. For the big players this is an opportunity and a necessity. The question is: - how long will this race last? - how long will NVIDIA be the main GPU provider and beneficiary of this race?
Predicting the future is very difficult, especially in an unprecedented revolution like this. As Nobel Prize winner Parisi said: "No matter how hard you try to predict the future, the future will surprise you"