It turned out Uber engineers just loved Redis. Having a need to distribute your work? Throw that to Redis. I remember debating with some infra engineers why we couldn't throw in more redis/memcached nodes to scale our telemetry system, but I digressed. So, the price service we built was based on Redis. The service fanned out millions of requests per second to redis clusters to get information about individual hexagons of a given city, and then computed dynamic areas. We would need dozens of servers just to compute for a single city. I forgot the exact number, but let's say it was 40 servers per an average-sized city. Now multiply that by the 200+ cities we had. It was just prohibitively expensive, let alone that there couldn't other scalability bottlenecks for managing such scale.
The solution was actually pretty simple. I took a look at the algorithms we used, and it was really just that we needed to compute multiple overlapping shapes. So, I wrote an algorithm that used work-stealing to compute the shapes in parallel per city on a single machine, and used Elasticsearch to retrieve hexagons by a number of attributes -- it was actually a perfect use case for a search engine because the retrieval requires boolean queries of multiple attributes. The rationale was pretty simple too: we needed to compute repetitively on the same set of data, so we should retrieve the data only once for multiple computations. The algorithm was of merely dozens of lines, and was implemented and deployed to production over the weekend by this amazing engineer Isaac, who happens to be the author of the library H3. As a result, we were able to compute dynamic areas for 40 cities, give or take, on a single machine, and the launch was unblocked.
Implementing Google's S2 is simpler, but it has the same overall benefits as H3 such as a hierarchical data structure.
Uber internally had extensive research on what kind of grid system to use. In fact, we started with S2 and geo-hash, but H3 is superior. Long story short, hexagons are like discretized circles, and therefore offer more symmetry than S2 cells[1]. Consequently, hexagons offer more uniform shapes when we compose hierarchical structures. Besides, H3 cells have more consistent sizes in different latitudes, which is very important for uber to compute supply and demand of cars.
[1] One of the complications is that H3 has to have pentagons to tile the entire world, just like a soccer ball. We can easily see why by Euler's characteristic formula.
For anyone doing geo queries it's a powerful tool.
Using it, you're introducing network latency and serialization overhead. Sometimes that's worth it, especially if your database is falling over, but a lot of the time people use it and it just makes everything more complex and worse.
If you need to share cached data across processes or nodes, sometimes you have to use it, but a lot of the stuff I work with is partitioned anyway. If your data is already partitioned, you know what works well a lot of the time? A boring, regular hashmap.
Pretty much every language has some thread-safe hashmap in there, and a lot of them have pretty decent libraries to handle invalidation and expiration if you need those. In Java, for example, you have ConcurrentHashMap for simple stuff, and Guava Caches or Caffeine Caches for more advanced stuff.
Even the slowest [1] local caching implementation will almost certainly be faster than anything that hits the network; in my own testing [2] Caffeine caches have sub-microsecond `put` times, and you don't pay any serialization or deserialization cost. I don't think you're likely to get much better than maybe sub-millisecond times with Redis, even in the same data center, not to mention if you're caching locally that's one less service that you have to babysit.
Again, I don't hate Redis, there are absolutely cases where it's a good fit, I just think it's overused.
[1] Realistic I mean, obviously any of use could artificially construct something that is slow as we want.
[2] https://blog.tombert.com/posts/2025-03-06-microbenchmark-err... This is my own blog, feel free to not click it. Not trying to plug myself, just citing my data.
There’s nothing worse than when someone does the latter. I had to write a tool to remove deletes from the AOF log because someone fucked up ordering of operations big time trying to pretend they had proper transactions.
I'm using redis only for temp state data like a session (when I can't use a jwt).
Or when I have to scale and need a warmed up cache
Is that bad now?
I'm also wondering right now why there is no local cache with p2p self discovery and sync. Should be easier than deploying an extra piece of software.
Why not use a regular database for this (can be as simple as an sqlite file, depending on your needs), or the default thingy that comes with your framework or programming language? This is built into everything I've ever used, no need to reinvent session storage or overengineer the situation with jwt or some other distributed cryptographic system and key management
Ah but in trendy microservices world, it isn’t in many micro frameworks, you have to reinvent it
The whole design space for this type of API is weirdly under-explored, but there are some well-supported mainstream solutions out there.
Fundamentally, Redis ought to be a NuGet library, a Rust crate, or something like it. It's just a distributed hash table, putting it onto its own servers is a bit bizarre if the only need is caching.
Microsoft's Service Fabric platform and the Orleans library both implement distributed hash tables as fundamental building blocks. Both can trivially be used "just" as a cache to replace Redis, and both support a relatively rich set of features if you need more advanced capabilities.
Of course, there's Scala's Akka and the Akka.NET port also.
It is JVM based "shared cache" so can be used to transparently share results of expensive queries - but also to share sessions. It mostly just works but the free version have some issues when one upgrade data models.
I know half the people here probably loathe JVM but once one is aware of one implementation I guess it should be possible to find similar things for .Net and maybe also go and Python.
Microsoft could do better than that!
For example, Azure App Service could use an out-of-process shared cache feature so that web apps could have local low-latency caches that survive app restarts.
The thing that bothers me is people adding it in places that don't make sense; I mentioned in a sibling thread that the I've seen people use it as a glorified global variable in stuff like Kafka streaming. Kafka's stuff is already partitioned, you likely don't gain anything from Redis compared to just keeping a local map, and at that point you can just use a Guava Cache and let it handle invalidation in-process.
But that doesn’t work for caching non trivial calculations or intermediate state. There’s a sweet spot for transitory persistence.
You could throw a bunch of your production data in SSAS tabular and there you go you have an in memory cache. I've actually deployed that as a solution and the speed is crazy.
You could store the key->version separately, and read the said version. If the cached version is lower, it's a cache miss.
Of course, evicting something from cache (due to memory constraints) is a bit harder (or less efficient) in such setup.
for example, if the problem we're talking about is related to slow _writes_, not slow reads, the typical usage of a cache isn't going to help you at all. implementing write-through caching is certainly possible, but has additional pitfalls related to things like transactional integrity between your cache and your authoritative data store.
Could be worse: you could have met me! I used to laugh at caching and thought that if your website is so slow that you need a caching layer (Wordpress comes to mind), you're just doing it wrong: perhaps you're missing indexes on your database or you simply can't code properly and made it more complex than necessary (I was young, once). Most of my projects are PHP scripts invoked by Apache, so they have no state and compute everything fresh. This is fine (think <30ms typical page generation time) for 95% of the types of things I make, but in more recent years I had two projects where I really struggled with that non-pragmatic mentality and spent long hours experimenting with different writing strategies (so data wouldn't change as often and MariaDB's built-in optimizations better), indexes on low-cardinality columns, indexes on combined columns in specific orders, documenting with each query which index it requires and maps to, optimizing the query itself of course, in one experiment writing my own on-disk index file to search through some gigabytes of data much faster than the database seemed to be able to do for geospatial information, upgraded the physical hardware from HDD to SSD...
Long story short, I now run Redis and the website is no longer primarily bound by computation power but, instead, roughly equally by bandwidth
I'm still very wary of introducing Redis to projects lest I doom them: it'll inevitably outgrow RAM if I indiscriminately stick things in there, which means turning them off (so far, nearly no links or tools on my website ever turned 404 because they're all on a "keep it simple" WAMP/LAMP stack that can do its thing for many years, perhaps search-and-replacing something like mysql_query() with mysqli->query() every five years but that's about the extent of the maintenance)
So anyway I think we're in agreement about "apply where appropriate" but figured I'd share the counter-example of how one can also be counterproductive in the other direction and that there is something to be said for the pragmatic people that consider/try a cache, which often does help even if there's often a different underlying problem and my perfectionism wouldn't like it
Then when you lose a cache node, the DB gets slammed and falls over, because when the DB team implemented service-based rate-limiting, the teams cried that they were violating their SLOs so the rate limits were bumped waaaay up.
It's an interview though. Most people just watch youtube videos and "copy and paste" the answer.
In a way it's the format of the interview that's the problem. Similar to leet code style interviews a lot of the times we're not checking for what we need.
btw, "scale up" is the second most common answer from those who can't provide better solutions. :)
My point isn't that the interview can't weed out bad candidates. That's in a way the easy part. The problem is it can't identify not-bad candidates.
The interview is broken because of how standardized it is. It's like a certain game genre and most people will play it the same way. It's more like a memory test.
> In an interview there is no "the answer", it's a dialogue.
It pretends to be or you assume it is. There are numerous 'tutorials' / videos / guides on system design it's >90% rehearsed. So again, my point is the interviewee is trained and will give you the standard answer even if you deviate some. There are just too many risks otherwise. If I had a more novel approach I'd risk the interviewer not understanding or taking longer than the allocated time to finish.
Especially in big tech - interviewers are trained to look for "signals" and not whether you're good or bad. They need to tick certain boxes. Even if you have a "better" answer if it's outside the box it fails.
Then I would have to explain „no, we have caching stuff ‚in process’, just use that, our app will use more RAM but that’s what we need„.
But I've seen people use Redis as a glorified "global variable" for stuff like Kafka streaming. The data is already partitioned, it's not going to be used across multiple nodes, and now you've introduced another service to look at and made everything slower because of the network. A global hashmap (or cache library, like previously mentioned) would do the job faster, with less overhead, and the code would be simpler.
polyglot teams when you have big data pipeline running in java, but need to share data with node/python written services.
if you dont have multiple isolated micro services, then redis is not needed
Things like: counters, news feeds, chat messages, etc
The cost of delivery for doing these things well with a LSM based DB or RDB might actually be higher than Redis. Meaning: you would need more CPUs/memory to deliver this functionality, at scale, than you would with Redis, because of all the overhead of the underlying DB engine.
But for 99% of places that aren’t FAANG, that is fine actually. Anything under like 10k QPS and you can do it in MySQL in the dumbest way possible and no one would ever notice.
It's not fine. I feel like you're really stretching it thin here in an almost hand-waving way. There are so many cases at far smaller scale where latency is still a primary bottleneck and a crucial metric for valuable and competitive throughput, where the definitively higher latency of pretty much any comparable set of operations performed in a DBMS (like MySQL) will result in large performance loss when compared to a proper key-value store.
An example I personally ran into a few years ago was a basic antispam mechanism (a dead simple rate-limiter) in a telecoms component seeing far below 10k items per second ("QPS"), fashioned exactly as suggested by using already-available MySQL for the counters' persistence: a fast and easy case of SELECT/UPDATE without any complexity or logic in the DQL/DML. Moving persistence to a proper key-value store cut latency to a fraction and more than doubled throughput, allowing for actually processing many thousands of SMSes per second for only an additional $15/month for the instance running Redis. Small operation, nowhere near "scale", huge impact to performance and ability to process customer requests, increased competitiveness. Every large customer noticed.
That said, I agree that if you need a KV store, use a KV store. Though of course, Postgres can get you close out of the box with `CREATE UNLOGGED TABLE (data hstore);`.
The vast majority of companies never need to deal with even one thousand of anything per second. Your situation was absolutely an unusually large scale.
Did you profile the issue?
MySQL's query optimizer/planner/parser perform a lot more "gyrations" than Redis or MemcacheDB do before finally reaching the point of touching the datastore to be read/written, even in the case of prepared statements. Their respective complexities are not really comparable.
I reevaluated it for a job processing context a couple of years ago and opted for websockets instead because what I really needed was something that outlived an HTTP timeout.
I've never actually seen it used in a case where it wasn't an architecture smell. The codebase itself is pretty clean and the ideas it has are good, but the idea of externalizing datastructures like that just doesn't seem that useful if you're building something correctly.
I’ve used Redis for leaderboards and random matchmaking though, stuff which is doable in postgres but is seriously write-heavy and a bit of a faff. Gives you exactly the sort of goodies you need on top of a K/V store without being difficult to set up.
As for caching - it’s nice to use as an engineer for sure, but pretty pricey. It wouldn’t be my default choice any more.
I like the sidekiq guy and wish him the best, but for me, the ubiquitous Redis dependency on my Rails apps is forever gone. Unless I actually need a KV store, but even for that, I can get away with PG and not know the difference.
Unfortunately there are still some CTOs out there that haven’t updated their knowledge are are still partying like it’s 2015.
Like you have to push those kinds of use cases if you’re trying to build a business around it, because a process that runs on your server with your other stuff isn’t a SaaS and everyone wants to sell SaaS, but it’s far enough outside its ideal niche that I don’t understand why it got popular to use that way.
By the time it was clear we would have been better off with Redis’ sharding solution the team was comfortable with the devil they knew.
I actually agree with the author that Redis was not the right solution for the situations he was presented with, but he's far from proving it is not the solution for a whole host of other problems.
e.g. MySQL 8.0.1+ adds SKIP LOCKED modifier to SELECT ... FOR UPDATE.
Then you can increment the first available row, otherwise insert a new row. On read aggregate the values.
In the software world in the mid 00's, the trend started to work around the latency, cost and complexity of expensive servers and difficult databases by relying on the speed of modern networks and RAM. This started with Memcached and moved on to other solutions like Redis.
(this later evolved into NoSQL, when developers imagined that simply doing away with the complexity of databases would somehow magically remove their applications' need to do complex things... which of course it didn't, it's the same application, needing to do a complex thing, so it needs a complex solution. computers aren't magic. we have thankfully passed the hype cycle of NoSQL, and moved on to... the hype cycle for SQLite)
But the tradeoff was always working around one limitation by adding another limitation. Specifically it was avoiding the cost of big databases and the expertise to manage them, and accepting the cost of dealing with more complex cache control.
Fast forward to 2025 and databases are faster (but not a ton faster) and cheaper (but not a ton cheaper) and still have many of the same limitations (because dramatically reinventing the database would have been hard and boring, and no software developer wants to do hard and boring things, when they can do hard and fun things, or ignore the hard things with cheap hacks and pretend there is no consequence to that).
So people today just throw a cache in between the database, because 1) databases are still kind of stupid and hard (very very useful, but still stupid and hard) and 2) the problems of cache complexity can be ignored for a while, and putting off something hard/annoying/boring until later is a human's favorite thing.
No, you don't need Redis. Nobody needs Redis. It's a hack to avoid dealing with stateless applications using slow queries on an un-optimized database with no fast read replicas and connection limits. But that's normal now.
Hence, fads dominate. I hate to sound so cynical but that has been my experience in every instance of commercial software development.
Drop Redis, replace with in-memory SQLite.
But for real, the :memory: feature is actually pretty awesome!
If you want to access a bloom filter, cuckoo filter, list, set, bitmap, etc... from multiple instances of the same service, Redis (slash valkey, memorydb, etc...) is really your only option
It also has arrays, sets, and bitstrings, though for the latter you can just as easily (and with less space consumed) map it in your app, and store an integer.
this seems like a classic case of impedance mismatch, trying to implement a Redis-ism using an RDBMS.
for a shared list in a relational database, you could implement it like you've said, using an array type or a jsonb column or whatever, and simulate how it works in Redis.
but to implement a "shared list" in a way that meshes well with the relational model...you could just have a table, and insert a row into the table. there's no need for a read-modify-write cycle like you've described.
or, if you really need it to be a column in an existing table for whatever reason, it's still possible to push the modification to the database without the heavy overhead. for example [0]:
> The concatenation operator allows a single element to be pushed onto the beginning or end of a one-dimensional array. It also accepts two N-dimensional arrays, or an N-dimensional and an N+1-dimensional array.
0: https://www.postgresql.org/docs/current/arrays.html#ARRAYS-M...
You’re right that managing lists in RDMSes is easy-ish, if you don’t have too many of them, and they’re not too large. But, like I mentioned in my original comment, redis really shines as a complex data structure server. I wouldn’t want to implement my own cuckoo filter in Postgres!
As a cache or ephemeral store like a throttling/rate limiting, lookup tables, or perhaps even sessions store, it's great; but it's impossible to rely on the persistence options (RDB, AOF) for production data stores.
You usually only see this tendency with junior devs, though. It might be a case where "when all you have is a hammer, all you see are nails", or when someone discovers Redis (or during the MongoDB hype cycle ten years ago), which seems like it's in perfect alignment with their language datatypes, but perhaps this is mostly because junior devs don't have as many production-ready databases (from SQL like Postgresql, CockroachDB, Yugabyte to New/NoSQL like ScyllaDB, YDB, Aerospike) to fall back on.
Redis shines as a cache for small data values (probably switch to memcache for larger values, which is simpler key-value but generally 3 to 10 times faster for that more narrow use case, although keep an eye on memory fragmentation and slab allocation)
Just think carefully before storing long-term data in it. Maybe don't store your billing database in it :)
Sure, don't introduce a data store into your stack unless you need it. But if you had to introduce one, Redis still seems like one of the best to introduce? It has fantastic data structures (like sorted sets, hash maps), great performance, robust key expiry, low communication overhead, low runtime overhead... I mean, the list goes on.
https://redis.io/docs/latest/develop/data-types/probabilisti...
Redis as transactional, distributed and/or durable storage is pretty poor. Their "active active" docs on conflict resolution for example don't fill me with confidence given there is no formalism, just vague examples. But this comes from people who not only not know how do distributed locks, they refuse to learn when issues are pointed out to them: https://martin.kleppmann.com/2016/02/08/how-to-do-distribute...
Every time I find code that claims to do something transactional in Redis which is critical for correctness, not just latency optimization, I get worried.
I thought Redis was single threaded running on a single core.
Having multiple cores provides no benefit (and arguably could hurt since large multicore systems typically have a lower clock)
And one other thing is you should be able to fall back if your cache is invalidated.
In our case we keep a bunch of metadata in redis that’s relatively expensive (in cost and speed) to pull/walk in realtime. And it needs to be fast and support lots of clients. The latter sort of obviates direct-to-database options.
I'm assuming based on the rest of the article that the author and team knew what they were doing, but if you aren't familiar with Postgres' UPDATE strategy and HOT updates [0], you should familiarize yourself before attempting this, otherwise, you're going to generate a massive amount of WAL traffic and dead tuples at scale.
[0]: https://www.postgresql.org/docs/current/storage-hot.html
I think people forget (or don’t know) that adding data storage system to your architecture also involves management, scaling, retention, and backup. It’s not free.
And second, sometimes you do need to invest in storage to permit failover or to minimize latency but people do it for traffic when they really have little traffic. A server from 20 years ago could push a lot of traffic and systems have gotten only beefier.
BAAAHHH HAAHA HA HA HA HHHAA. Ok, look. Just because it's in Redis does not disqualify the clause in the geo service's license that you NOT permanently store the data. The author did not say that's what they were doing but a very similar thing came up in a previous work place for me and we chose to call the service for every lookup the way the service company expected their customers to do. I recall the suggestion in using a caching database as a workaround and it was not made by the engineering team. Sorry, I'm still chuckling at this one...
Then I decided to give up and use it only as an empehemral cache. I have a large number of standalone Redis instances (actually, now they are Valkey), no storage, only memory, and have Envoy proxy on top of them for monitoring and sharding. And I'm really happy with it, storing hundreds of GBs of data there, if one goes down, only a small part of the data needs to be reloaded from the primary source, and with the Envoy proxy, applications see it as a single Redis server. I was considering just replacing it with memcached, but the redis data model is more rich, so I kept using it, just not expecting anything put there to be actually stored forever.
And you're almost certainly still caching it to RAM, not sure if you're aware that the kernel keeps in-memory copies of recently-read/written pieces of files
It's easy to work with and does it's job so well that for a certain type of dev it's easier to implement caching than understand enough SQL to add proper indices.
You can use Postgres instead of them, but then it can easily become a bottleneck.
REDIS tradeoffs have been so widely discussed because many, many engineers disagree with them. REDIS is so lowly regarded that some companies ban it completely, making their engineers choose between memcached or something more enterprisey (hazelcast these days, Coherence some time ago).
Why are you writing Redis in all caps like it's an acronym? Reminds me of those old dog C programmers who write long rants about RUST with inaccuracies that belie the fact they've never actually used it.
Core Redis is a solid piece of technology, afaik
Distributed systems is hard, and what seems like a trivial choice about design constraints can quickly balloon in cost. =3
I have a dedicated server that's running an instance of redis on each of the 32 cores and it's been up for over a year now. Each core is doing about 30k QPS