For infra systems, this is great: code against the GCS API, and let the user choose the cost/latency/durability tradeoffs that make sense for their use case.
Absurd claim. S3 Express launched last year.
S3 offers some multi-region replication facilities, but as far as I’ve seen they all come at the cost of inconsistent reads - which greatly complicates application code. GCS dual-region buckets offer strongly consistent metadata reads across multiple regions, transparently fetch data from the source region where necessary, and offer clear SLAs for replication. I don’t think the S3 offerings are comparable. But maybe I’m wrong - I’d love more competition here!
https://cloud.google.com/blog/products/storage-data-transfer...
Entirely different claim.
- single-zone object storage buckets
- regional object storage buckets
- transparently replicated, dual region object storage buckets
I agree that AWS has two of the three. AFAIK AWS does not have multi-region buckets - the closest they have is canned replication between single-region buckets.
S3 does have replication, but it is far from transparent and frought with gotchas.
And it certainly doesn't have all of that with a single API.
To be honest I'm not actually sure how different the API is. I've never used it. I just frequently trip over the existence of parallel APIs for directory buckets (when I'm doing something niche, mostly; I think GetObject/PutObject are the same.)
s3: https://aws.amazon.com/pm/serv-s3
s3 express: https://aws.amazon.com/s3/storage-classes/express-one-zone/
cross-region replication: https://docs.aws.amazon.com/AmazonS3/latest/userguide/replic...
It’s much, much easier to code against a dual-region GCS bucket because the bucket namespace and object metadata are strongly consistent across regions.
Did find some interesting recent (March 28th, 2025) reads though!
Colossus under the hood: How we deliver SSD performance at HDD prices https://cloud.google.com/blog/products/storage-data-transfer...
I kind of thought you meant ZNS / https://zonedstorage.io/ at first, or it's more recent better awesomer counterpart Host Directed Placement (HDP). I wish someone would please please advertize support for HDP, sounds like such a free win, tackling so many write amplification issues for so little extra complexity: just say which stream you want to write to, and writes to that stream will go onto the same superblock. Duh, simple, great.
They charge $20/TB/month for basic cloud storage. You can build storage servers for $20/TB flat. If you add 10% for local parity, 15% free space, 5% in spare drives, and $2000/rack/month overhead, then triple everything for redundancy purposes, then over a 3 year period the price of using your own hard drives is $115/TB and google's price is $720. Over 5 years it's $145 versus $1200. And that's before they charge you massive bandwidth fees.
* hetzner storage box starts from $4/month for 1TB, and then goes down to $2.4/TB/month if you rent a 10TB box.
* mega starts from €10/month for 2TB, and goes down to €2/TB/month if you get a 16TB plan
* backblaze costs (starts from?) $6/TB/month
I was looking for a cheap cloud storage recently, so have a a list of these numbers :)
Moreover, these are not even the cheapest one. The cheapest one I found had prices starting from $6.5 for 5TB, going down to $0.64/TB/month for plans starting with 25TB (called uloz, but I haven't tested them yet).
Also, looking at lowendbox you can find a VPS in Canada with 2TB storage for $5/month and run whatever you want there.
How all that compares to $20/TB/month?!
Please feel free to correct me if i'm comparing apples to oranges, though. But I can't believe all of these offers are scam or so-called "promotional" offers which cost companies more than you pay for it.
Now you could use it with Synology NAS and it is a lot cheaper than doing RAID 5 for ZFS / BTRFS with Muti redundancy.
I wonder if there are any NAS that does that automatically? Any drawbacks? Also wonder if the price could go down to $5 / TB in a few years time.
BTW at Hetzner you can rent servers with very large (hundred of TB) non-redundant storage for an effective price of about $1.50/TB/month. If you want to build a cloud storage product, that seems like a good starting point - of course, once you take into account redundancy, spare capacity, and paying yourself, the prices you charge to your customers will end up closer to the price of Backblaze at a minimum.
and thus, market efficiency feels like a myth. This feels most true when it comes to cloud services. They're way overpriced in multiple different common cases at the big providers
Also, don't forget the hidden cost/risk of giving a third party full access to your data.
Building a Server and keeping it secure and up-to-date and fixing hardware issues, takes relevant time
Cloud is more than bare metal, but plenty of folks discount the cost benefits of elasticity.
For example, I agree that elasticity is great. But at the same time, to me, it sounds like bad engineering. Why do you need to store terabytes of data and then delete it - couldn't it be processed continuously, streamed, compressed, process changes only, and so on. A lot of engineering today is incredibly wasteful. Maybe your data source doesn't care, and just provides you with terabyte csv files, and you have no choice, but for engineers that care about efficiency, it reeks.
It might make a lot of sense in a highly corporate context where everything is hard, nobody cares, and the cost of inefficiency is just passed on to the customer (i.e. often government and tax payers). But the real problem here is that customers aren't demanding more efficiency.
And plenty of use cases have natural growth. I do not throw away my pictures for example.
Data also grows dependent of users. More users, more 'live' data.
We have such a huge advantage with digital, we need to stop thinking its wasteful. Everything we do digital (pictures, finance data, etc.) is so much more energy and space efficient than what we had 20 years ago, we should just not delete data because we feel its wasteful.
Most instances of a cloud ___ created in a region are allocated and exist at the zonal level (i.e. a specific zone of a region).
A physical "region" usually consists of three or more availability zones, and each zone is physically separated from other zones, limiting the potential for foreseeable disaster events from affecting multiple zones simultaneously. Zones are close enough networking-wise to have high throughput and low latency interconnection, but not as fast as same-rack, same-cluster communications.
Systems requiring high availability (or replication) generally attain this by placing instances (or replicas) in multiple availability zones.
Systems requiring high-availability generally start with multi-zone replication, and Systems with even higher availability requirements may use multi-region replication, which comes at greater cost.
NB: I am on the rapid storage team.
There are a number of zones in a region. Region usually means city. Zone can mean data center. Rarely just means some sort of isolation (separate power / network).
SSDs with high random I/o speeds are a significant contributor to the advantage. I think 20m writes per second are likely distributed over a network of drives to make that kind of speed possible.
> 20x faster random-read data loading than a Cloud Storage regional bucket.
[1] https://www.warpstream.com/blog/s3-express-is-all-you-need
I mean, surely a Mac Studio with an M4 Max must be the best, right? It's an entire CPU generation ahead and it's maximum! Of course, it's not... the M3 Ultra is the best.
Naming things is hard.
Rapid Storage: A new Cloud Storage zonal bucket that enables you to colocate your primary storage with your TPUs or GPUs for optimal utilization. It provides up to 20x faster random-read data loading than a Cloud Storage regional bucket.
(Normally we wouldn't allow a post like this which cherry-picks one bit of a larger article, but judging by the community response it's clear that you've put your finger on something important, so thanks! We're always game to suspend the rules when doing so is interesting.)
(I was just adding some explanation for more seasoned users who might wonder why we were treating this a bit differently.)
Also, welcome to posting on HN and we hope you'll continue!
> If you saw that code, you wouldn't _defy_ it
(I work on Google storage)
Rapid Storage will have all of your data local and fast, including writes. It also adds the ability to have fast durable appends, which is something you can't get from the standard buckets.
(I work on Google storage)
"Today's innovation isn't born in a lab or at a drafting board; it's built on the bedrock of AI infrastructure. "
Uhh..No. Even as an AI developer I can tell that is some AI Comms person tripping over.
Google aren't the only company that consistently mess this up, but given how they built a 1.95 trillion company on top of crawling URLs on the web they really should have an internal culture that values giving things unique URLs!
[I had to learn this lesson myself: I used to blog "weeknotes" every week or two where I'd bundle all of my project announcements together and it sucked not being able to link to them as individual posts]
In case marketing seems to move faster than documentation though, since I can't find any mention of this in the main GCS docs. https://cloud.google.com/search?hl=en&q=rapid%20storage
No link and no details though.
>if you're reading from, like, big Parquet files, that probably means lots of random reads
and it also usually means that you shouldn't use s3 in the first place for workloads like this. Because they are usually very inefficient comparing to distributed fs. Unless you have some prefetch/cache layer, you will get both bad timings and higher costs
I don't know that it took "AI branding" to convince anybody. I think these workloads potentially enabled additional demand/market for such a product that may not have been there before.
One of the challenges with exposing native Colossus was always that it's just different enough from how people elsewhere are used to use Storage that there was a lot of uncertainty about the addressable market of a "native" Colossus offering. It's not a POSIX file system. Some of the specific differences (eg. no random writes) are part of what makes Colossus powerful and performant on HDDs, but it means you have to write your application to work well within its constraints. Google has been doing that for a long time. If you haven't, even if it's an amazing product, is it worth rewriting your applications or middleware?
Rapid Storage basically addresses this by adding the object store API on top if it (TIL from this thread that there's a lower abstraction client in the works as well).
Anyway, the team behind this is awesome. Awesome tech, awesome people. Seeing this launched at Next and seeing some appreciation on HN makes me very grateful.
I always assumed (from outside Google) that the problem was that Colossus had to make a "no malicious actors" assumption in its design in order to make the performance/scaling guarantees it does; and that therefore just exposing it directly to the public would make it possible for someone to DoS-attack the Colossus cluster.
My logic was that there's actually nothing forcing [the public GCP service of] BigTable to require that a full copy of the dataset be kept hot across the nodes, with pre-reserved storage space — rather than mostly decoupling origin storage from compute† — unless it was to prevent some DoS vector.
As for exactly what that DoS vector is... maybe GC/compaction policy-engine logic? (AFAICT, Colossus has pluggable "send compute to data" GC, which internal-BigTable and GCS both use. But external-BigTable forces the GC to be offloaded to the client [i.e. to the BigTable compute nodes the user has allocated] so that the user can't just load down the system with so many complex GC policies that the DC-scale Colossus cluster itself starts to fall behind its GC time budget.)
---
† Where by "decouple storage from compute", I mean:
• Each compute node gets a fixed-sized DAS diskset, like GCE local NVMe SSDs;
• each disk in that diskset gets partitioned up at some fixed ratio, into two virtual disksets;
• one virtual diskset gets RAID6'ed or ZFS'ed together, and is used as storage for non-Colossus-synced tablet-LDB nursery level SSTs;
• the other virtual diskset gets RAID0'ed or LVM-JBOD-ed together and is used as a bounded-size LFU read-through cache of the Colossus-synced tablets — just like BigQuery compute nodes presumably have.
(AFAIK the LDB nursery levels already get force-compacted into "full" [128MiB] Colossus-synced tablets after some quite-short finality interval, so it's not like this increases data loss likelihood by much. And BigTable doesn't guarantee durability for non-replicated keys anyway.)
Didn't think deep into it, could this be solved with billing designs with more nuance?
What jeffbee is talking about is Google's proprietary Colossus File System, and all its transitive dependencies.
It looks like every other clustered file system. What's special about Google's Colossus?
- You can only append to an object, and each object can only have one writer at the time. This is useful for distributed systems - you could have one process adding records to the end of a log, and readers pulling new records from the end.
- It's also possible to "finalize" an object, meaning that it can't be appended to any more.
(I work on Rapid storage.)On the other hand, if you've been building your applications against expectations of different semantics (like POSIX), retrofitting this into your existing application is really hard, and potentially awkward. This is (IMO) why there hasn't been an overtly Colossus based Google Cloud offering previously. (Though it's well publicized that both Persistent Disk and GCS use Colossus in their implementation.)
One of the reasons why it would be extremely hard to just set up or build CFS elsewhere or on a different abstraction level is that while it may look quite achievable to implement the high level architecture, there is vast complexity in the practical implementation side. The tremendous user isolation it affords for an MT system, the resilience it has against various types of failures and high throughput planned maintenance, the specialization it and its dependencies have to use specific hardware optimally.
(I work on Google storage part time, I am not a Colossus developer.)
They say it comes in two configuration, 256 chips or 9,216 chips. They also say that the maximal configuration of 9,216 chips delivers 24x the compute power of the world's largest supercomputer (which they say is called El Capitan). They say that this comes to 42.6 exaFLOPs.
This implies that the 9,216 chip configuration doesn't actually exist in any form in reality, or else it would now be the world's largest supercomputer (by flops) by a huge margin.
Am I massively misunderstanding what the claims being made are about the TPU and the 42.6 exaFLOPs? I feel like this would be much bigger news if this was fully legit.
Edit: The flops being benchmarked are not the same as regular supercomputer flops.
If all you care about is an 8-bit AI workload (there's definitely a market for that), it's nice to have 24x the speed.
AFAIK, there wasn't a faster 8-bit super computer to compare to - which is why they made the comparison.
I'm guessing the $300 of Google Cloud credit offered in this webpage wouldn't go very far using any of this stuff?
You can bring data in and out of the GPU quickly and improve utilization.