I suppose we need a new rule, "Any sufficiently successful data store eventually sprouts at least one ad hoc, informally-specified, inconsistency-ridden, slow implementation of half of a relational database"
PS: I've worked at Elastic for a long time, so it is fun to see the arguments for a young product.
But that's besides the point. When people say "RDBMS" or "filesystem" they mean the full suite of SQL queries and POSIX semantics-- neither of which you get with KV stores like BigTable or distributed storage like Colossus.
The simplest example of POSIX semantics that are rapidly discarded is the "fast folder move" operation. This is difficult to impossible to achieve when you have keys representing the full path of the file, and is generally easier to implement with hierarchical directory entries. However, many applications are absolutely fine with the semantics of "write entire file, read file, delete file", which enables huge simplifications and optimizations!
Spanner for GCS actually explains how public Google Cloud was always ACID for object listing, while S3 only implemented it around 2020. I always suspected that there must be some very hard piece to implement that AWS didn't have until 2020. Makes sense now that that piece was Spanner.
Problems that we faced by using elastic search: High load, high Ram usage : db goes down, more ram needed. Luckily we had ES experts in infra team, helped us alot.(ecommerce company)
To Write and read after, you need to refresh the index or wait a refresh. More inserts, more index refreshes. Which ES is not designed for, inserts become slow. You need to find a way to insert in bulk.
Api starts, cannot find es alias because of connection issue, creates a new alias(our code did that when it cant find alias, bad idea). Oops whole data on alias is gone.
Most important thing to use ES as main db is to use "keyword" type for every field that you don't text search.
No transaction: if second insert fails you need to delete first insert by hand. Makes code look ugly.
Advantages: you can search, every field is indexed, super fast reads. Fast development. Easy to learn. We never faced data loss, even if db crashed.
Anyone in engineering who recommends using a search engine as a primary data store is taking on risk of data loss for their organization that most non-engineering people do not understand.
In one org I worked for, we put the search engine in front of the database for retrieval, but we also made sure that the data was going to Postgres.
It is true that Elasticsearch was not designed for it, but there is no reason why another "search engine" designed for that purpose couldn't fit that role.
I used it about 7 years ago. Text search was not that heavily used, but we utilized the keyword filter heavily. It's like having a database where you can throw any query at it and it would return a response in reasonable time, because you are just creating an index on all fields.
My take is that ES is good for exploration and faster development but should switch to SQL as soon the product is successful if you're using it as the main db.
They messed up a $30 million dollar project big time at a previous company. My cto swore to never recommend them
I’ve either been involved with or adjacent to dozens of Accenture projects at 5 companies over the last 20 years, and not a single one had a satisfactory outcome.
I’ve never heard a single story of “Accenture came in, and we got what we wanted, on time and on budget.” Cases of “we got a minimum viable solution for $100m instead of $30m, and it was four years late” seem more typical.
I've also found they do a good job of getting cadre of executives that float between companies hiring them when they move between companies while they get wined and dined.
If you hire your own people you can make them feel how well the business is doing and get features out the door tomorrow and build to the larger thing over time.
What does a $30 million dollar mess-up look like?
"A $30 million mess-up" can look like (at least) two things. It can be $30 million was spent on a project that earned $0 revenue and was ultimately canceled, or it can look like $x was spent on a project to win a $30 million contract but a competitor won the contract instead.
Even if they don't understand what ES is and what a "normal" database is, I'm sure some of those people run into issues where their "db" got either corrupted of lost data even when testing and building their system around it. This is and was general knowledge at the time, it was no secret that from time to time things got corrupted and indexes needed to be rebuilt.
Doesn't happen all the time, but way greater than zero times and it's understandable because Lucene is not a DB engine or "DB grade" storage engine, they had other more important things to solve in their domain.
So when I read stories of data loss and things going South, I don't have sympathy for anyone involved other than the unsuspecting final clients. These people knew or more or less knew and choose to ignore and be lazy.
I agree.
Its been a while since I touched it, but as far as I can remember ES has never pretended to be your primary store of information. It was mostly juniors that reached for it for transaction processing, and I had to disabuse them of the notion that it was fit for purpose there.
ES is for building a searchable replica of your data. Every ES deployment I made or consulted sourced its data from some other durable store, and the only thing that wrote to it were replication processes or backfills.
Best example is IoT marketing, as if it can handle the load without bazillion shards, and since when does a text engine want telemetry
> things got corrupted and indexes needed to be rebuilt.
How is postgres and elastic any different here?
these new data stores don't usually require that level of durability or reliability.
I would never. Ever. Bet my savings on ES being stable enough to always be online to take in data, or predictable in retaining the data it took in.
It feels very best-effort and as a consultant, I recommend orgs use some other system for retaining their logs, even a raw filesystem with rolling zips, before relying on ES unless you have a dedicated team constantly monitoring it.
- No edge-case is thrown at them
- No part of the system is stressed ( software modules, OS,firmware, hardware )
- No plug is pulled
Crank the requests to 11 or import a billion rows of data with another billion relations and watch what happens. The main problem isn't the system refusing to serve a request or throwing "No soup for you!" errors, it's data corruption and/or wrong responses.
Now I work for a company whose log storage product has ES inside, and it seems to shit the bed more often than it should - again, could be bugs, could be running "clusters" of 1 or 2 instead of 3.
Storing logs in ElasticSearch is just stupid, as it does not preserve order:
Elastic’s own consultants will tell you this …
Turns out running complicated large distributed systems requires a bit more than a ./apply, who would have guessed it?
Which is why you supply the parameter
refresh: ”wait_for”
in your writes. This forces a refresh and waits for it to happen before completing the request.”schema migrations require moving the entire system of record into a new structure, under load, with no safety net”
Use index aliases. Create new index using the new mapping, make a reindex request from old index to new one. When it finishes, change the alias to point to the new index.
The other criticisms are more valid, but not entirely: for example, no database ”just works” without carefully tuning the memory-related configuration for your workload, schema and data.
Outside of memory, log_duration, temp_file_limit, a good query plan visualizer and some backup and replication (e.g. PGBackrest and Patroni) are also generally recommended if self-hosting. Patroni doesn't even need an external config store anymore, which is great since you can just run it onto 3-4 nodes and get a high quality HA, easy to manage PostgreSQL cluster.
But those two parameters are pretty much all to have a PostgreSQL process thousands of transactions per second without further tuning. Even our larger DBs hosting simple REST-applications (opposed to ETL/Data warehousing) had to grow quite a lot until further configuration was necessary, if at all.
Checkpointing probably becomes the next issue then, but modern PostgreSQL actually has great logging there -- "Checkpoints occur too frequently, consider these 5 config parameters to look at". And don't touch VACUUM jobs, as a consultant once joked, he sometimes earns thousands of dollars to say "You killed a VACUUM job? Don't do that".
So yeah, actually running PostgreSQL takes a few considerations, but compared to 10 - 15 years ago, you can get a lot with little effort.
the difference is that waiting for refresh assures that the data will be available for search after the "insert" finishes. forcing refresh might be foot gun that will criple the servers.
A database is a big proposition: transactions, indexes, query processing, replication, distribution, etc. A fair number of use cases are just "Take this data and give it back to me when I ask for it".
ES (or any other not-a-database) might not be a full-bore DBMS. But it might be what you need.
The one thing relational databases don't have, that you might need, is scaling. Maintaining data consistency implies a certain level of non-concurrency. Conversely, maintaining perfect concurrency implies a certain level of data inconsistency.
You could maybe consider Rel if you have a particular type of workload, but, realistically, just use a tablational database. It will be a lot easier and is arguably better.
I've seen some examples of people using ES as a database, which I'd advise against for pretty much the reasons TFA brings up, unless I can get by on just a YAGNI reasoning.
Feel like the christmas story kid --
>simplicity, and world-class performance, get started with XXXXXXXX.
A crummy commercial?
And it was good for a lot of things.
Metrics are much more efficient and are the tool of choice for longer term storage and debugging.
I work with a cluster that holds 500+ TB logs (where most are stored for a year and some for 5 years because of regulations) in searchable snapshots backed by a locally hosted S3 solution. I can do filtering across most of the data in less than 10 seconds.
Some especially gnarly searches may take around 60-90 seconds on the first run as the searchable snapshots are mounted and cached, but subsequent searches in the cached dataset are obviously as fast as any other search in hot data.
Obviously Elasticsearch isn't without its quirks and drawbacks, but I have yet to come across anything that performs better and is more flexible for logs — especially in terms of architectural freedom and bang-for-the-buck.
It should obviously NOT be a "main" database but part of an ETL pipeline for search purposes for instance.
It is much too late for that, but you're right that we'd be wise to put effort into undoing that. This is exactly how you end up with people using Elasticsearch as a primary datastore. When someone hears that they need a database, a database is what you are going to see them pick.
If we regularly used the proper terminology with appropriate specificity then those without the deep technical knowledge required to understand all the different kinds of databases and the tradeoffs that come them are able to narrow their search to the solutions that fit within the specification.
I think their API is great and have had amazing results with it. Their recent innovations around quantization (bbq) has been amazing for my use case building an agentic movie database for discovering movies and personalized movie recommendations.
There are benefits to not using your database for everything, even if it adds a bit of complexity by introducing another dependency. If the benefits out weigh the cost of complexity reaching for elastic has almost always been worth it for me.
(Obviously I'm referring to a famous YouTube video on the subject)
https://clickhouse.com/docs/engines/table-engines/mergetree-...