The custom database is extraordinary fast at some things, so it's a complex decision. I needed benchmarks.
I chose something that mirrors how data is accessed in the custom database. The existing database does have several tables, but it's hard to use so most accesses are single table, indexed by rowid, with maybe 100,000,000 rows. So I benchmarked a single table, indexed on primary key only, 100,000,000 small rows. Since it was easy because I could use the same SQL I benchmarked both Sqlite3 and Postgresql.
I expected Sqlite3 to beat Postgresql on single row access / updates, and Postgresql3 to get the upper hand on multi row access / updates. I was surprised to see Sqlite3 was about twice as fast as Postgresql on everything, including inserting the 100,000,000 records. It was always single writer. Had I tested multiple writers I expect I would have seen really flex its muscles.
In case your wondering, the custom database is 100 to 1,000 times faster than Sqlite3 at accessing a single record. Getting that performance was achieved with some big tradeoffs, which make it so hard to code for it creates reliability problems.
For example when you insert a row that result in a write to the WAL. That's a data safety measure sqlite unlikely to be taking.
You can configure Postgres to perform better with different tradeoffs - here's some thing to investigate:
- synchronous_commit=off
- wal_level=minimal
- wal_writer_delay=200ms
- checkpoint_completion_target=0.9
- autovacuum=off
- bgwriter_delay=10000ms
- shared_buffers=512MB
- work_mem=64MB
- UNLOGGED tables
- run from RAM Disk (/dev/shm)
It's probably unlikely that you're consuming the upstream sqlite c lib directly with its intentionally antiquated defaults. You're much more likely using a provider that has tweaked the defaults to more modern standards.
Do you know any details about those tweaks you mention because it would be very nice to know what edge cases I gotta watch out for using any of these
Looking at their github stats and open issues and prs, it looks like the project is healthy.
I have a usecase where our application has events where it would be a good time to trigger disk IO. The changes that occur between those app level checkpoints can totally be ephemeral.
- Create an unlogged table named events_active.
- At the app_checkpoint time, start a transaction:
1. Rename events_active to events_pending_write.
2. Create a new unlogged table, events_active, to handle writes for the next app_checkpoint.
- In a new transaction: 1. Set events_pending_write to LOGGED. This writes the entire table into the WAL. [1]
2. Rename to events_pending_write to events_part_20250304
3. Attach events_part_20250304 to a partitioned events table.
For stronger consistency, combine the two transactions, but you'll block incoming writes until the transaction completes.Unlogged tables don't guarantee Postgres uses RAM for the table. If you need a stronger guarantee, mount a directory on a ramdisk and create a table space to hold the events_active table. Then, to promote to a logged table, use a create-table-as statement to copy the table into a new partition.
Apart from that, everything was done in "reasonable sized transactions", with the same size being used for both Postgresql and Sqlite3. Necessarily so, because I ran the same SQL statements over both. For single row fetches or updates the transaction was a single row. When I was testing how fast I could update a batch of 10 rows, all 10 were done in a single transaction. (Aside: the transaction overhead far outweighed the cost of one statement. Updating 100 rows only took twice the time of updating 1.)
When creating the database of 100,000,000 rows, they were inserted in transactions of 10,000 rows. I was emulating how our system inserts records into this 100,000,000 row table. It uses an almost ascending primary key, which makes life easier for the underlying b-tree. I've found emulating that sort of thing is important for realistic benchmarks. That's because on this particular table the primary key is (current-unix-time, 32-bit-random-int). current-unix-time was monotonically increasing, but in any given second some 100 randomly ordered 32-bit-random-int's were used. From memory Sqlite3 inserted 100k records per second, Postgresql some number less than that. But don't pay much attention to the absolute numbers, as it was on spinning rust while the real system was doing other things (again because the real system had the same constraints). Only my Macbook with NVME it was several times faster in absolute terms, but Sqlite3 and Postgresql maintained their relative positions.
The obvious caveat here is any situation where you need global tables. I've never worked on a product where 100% of the schema could be partitioned like this. I've done a ton of SQLite-per-<entity type>, but there's always been a singleton SQLite database above this tracking the metadata.
Looking up a user session via token is not an analytics task. Neither is checking if an email address is already registered. These are very pedestrian queries that demand global visibility, ideally with hard transactional guarantees.
SaaS for B2B works well with this, you partition by organization. Each organization has their own DB. Most organizations are typically small (no more than 100 users), and it greatly simplifies things. However, it's still problematic with large organizations (200k-300k is where it starts to fall apart).
Create a separate DWH storage where all necessary data from all databases is pulled and processed/aggregated in the background (via change data capture) and saved in a format that is more suitable for analytical queries. The downside is that it requires additional preparation and some upfront design.
>Similarly, admin tools for customer support etc easily become very hard to build.
Our CRM understands the concept of organizations/tenants, so when a customer files a ticket, we know their organization, and from there, for a tech support person, accessing "coolcompany.mysaas.com" is no different from accessing "mysaas.com". In my opinion, it makes tech support easier, because you have far less data/fewer logs to analyze/sift through (only a specific organization's DB) when investigating a problem.
Nile (https://www.thenile.dev/) is trying to address this use case with a fully isolated PG databases. Though, I don't know how they handle scaling/sharding.
I had a service that was adding 30 ms to TTFB because most of the other decisions we made during page render were predicated on this response. I would later find about a quarter of this time was in bookkeeping code that got out of hand, but that's a story for another day.
The biggest problem was that this data requires a capital-P Process to change, so it took about 20 minutes to change it and fifteen seconds to roll it back. There had been plans to have the service do more but in the end it looked more like a remote request for a feature toggle. We were already broadcasting feature toggle changes through Consul, which is great for this kind of data. So I did more of the same and got to decom a cluster.
Moral of the story is, it matters what kind of global data you require to build your system, and that's the sort of fact you should keep in mind while designing the system because you can pick a design that will scale or one that absolutely won't, because you've given the users features that make the entire system 10x more expensive per request.
A lot of people still end up storing data that's not frequently updated in a traditional OLTP database like Postgres.
However:
I think it always helps to think about these problems as "how would you do it in Cassandra/DynamoDB?"
In the case of Cassandra/DynamoDB, the relevant data (e.g. user ID, channel ID, etc) is always in the partitioning key.
For Durable Objects, you can do the same thing by building a key that's something like:
``` // for a simple keys: env.USER_DO.idFromName(userId);
// or for composite keys: env.DIRECT_MESSAGE_CHANNEL_DO.idFromName(`${userAId}:${userBId}`); // assumes user A and B are sorted ```
I've spoken with a lot of companies using _only_ this architecture for Durable Objects and it's working well.
If you have hundreds of services hammering the datastore concurrently, then the ability to offer transactional guarantees seems absolutely mandatory for me.
> Don’t they need to coordinate themselves outside of transactions?
I think we're back at the beginning of our journey here.
That being said: my understanding is we're always going to have something that needs to maintain its own state that's global, and you're naming that problem as well.
For example, let's say we partition users based on the first letter of their email addresses.
This works great for most user-specific queries (e.g., fetching a user profile).
But what happens when someone registers a new account?
At that point, we must ensure the email is globally unique.
A purely partitioned approach won't help here—we'll need some kind of global database or service maintaining a single source of truth for email uniqueness checks.
(then it gets complicated, because of the simple level at which I can understand and communicate about it. Why not just partition based on the first letter of an email? Well, yes, then we just have to deal with emails changing. Maybe a better example is session tokens, because they don't come with an email. But we could require that, or do some bespoke thing...there's security concerns there but they seem overrated...but to your point, you end up adding a ton of complexity just so you can fit a square peg in a round hole)
For example, following your partitioning logic, if the user registers as john.smith@example.com, we'd need to query only partition j.
My broad understanding is that you can always "patch" or "work around" any single objection to partitioning or sharding—like using extra coordination services, adding more layers, or creating special-case code.
But each of these patches adds complexity, reduces flexibility, and constrains your ability to cleanly refactor or adapt later. Sure, partitioning email addresses might neatly solve registration checks initially, but then email changes require extra complexity (such as maintaining global indices and coordinating between partitions).
In other words, the real issue isn't that partitioning fails in a single obvious way—it usually doesn’t—but rather that global state always emerges somewhere, inevitably. You can try to bury this inevitability with clever workarounds and layers, but eventually you find yourself buried under a mountain of complexity.
At some point, the question becomes: are we building complexity to solve genuine problems, or just to preserve the appearance that we're fully partitioned?
(My visceral objection to it is, coming from client-side dev virtually my entire career: if you don't need global state, why do you have the server at all? Just give use a .sqlite for my account, and store it for me on S3 for retrieval at will. And if you do need global state...odds are you or a nearby experienced engineer has Seen Some Shit, i.e. the horror that arises in a codebase worked on over years, doubling down on an seemingly small, innocuous, initial decision. and knows it'll never just be one neat design decision or patch)
Check the other partition for the user name. Create the new user with the same pointer (uuid, etc) to the user’s sqlite file, delete the old user in the other partition. Simple user name changed. Not really that complex to be honest. (After thinking this through I’m probably going to suggest us changing to sqlite at work…)
> if you don't need global state, why do you have the server at all?
2 reasons I can think of right off of the top of my head are:
- validation (preventing bad actors, or just bad input)
- calls to external services
Wrote a bit about it here: https://news.ycombinator.com/item?id=43246212
If a single request frequently touches multiple partitions, your use cases may not work well.
It's the same deal as Cassandra & DynamoDB: use cases like chat threads or social feeds fit really well because there's a clear ownership hierarchy. e.g. message belongs to a single thread partition, or a social post belongs to a feed partition.
What I’d really like is an easy way to sync the SQLite database state to a cloud service. Most existing options expect you to query against a remotely hosted database and charge per read/write.
Since the database will have around 100,000 rows and you're typically working with all the data at once, streaming parts of it doesn’t make sense for my use case.
The closest I’ve found is Turso, which has offline writes in private beta, and SQLite Cloud, which lists local-first and offline sync as "coming soon."
The simplest approach might be letting users push to S3 storage with versioning. Ideally, it would also support point-in-time restores, tracking incremental updates alongside full snapshots.
Even better, I’d manage minimal server-side infrastructure and just pull the SQLite database from a service that handles syncing and management.
You can save conflicts to another changeset. There is also a rebaser to help deal with multiple way syncing.
https://www.sqlite.org/sessionintro.html - overview
https://www.sqlite.org/session/sqlite3changeset_apply.html - conflict information
It's https://vlcn.io/docs/cr-sqlite/intro , and i find it amazing that this is doable in sqlite. It is perfect for small scale collaboration imho, but it also works to sync across local client and remote server (for a single db per user scenario).
Right now, the proof-of-concept they've provided seems simplistic. Their progress seems to have shifted from cr-sqlite into "Zero" instead. I'm guessing it has something to do with CRDTs being quite app-specific and hard to generalize.
I would want to see this library used in production first before hyping it
The term you're looking for is "eventual consistency".
I’ve also flirted with the idea of forking litestream and stripping it down dramatically. The reason why is that I don’t like the idea of the production server being in charge of rotation and deletion. It seems like the thing getting backed up shouldn’t have the privilege of deleting backups in case it gets compromised. I might even go so far as to propose that the “even liter stream” process merely writes to a different local volume and then some other process does the uploading but I haven’t gotten beyond the daydream stage.
> It seems like the thing getting backed up shouldn’t have the privilege of deleting backups in case it gets compromised.
For backups, I added a nightly cron job which exports my SQLite db to a write-only S3 bucket.
What's the additional risk?
> It seems like the thing getting backed up shouldn’t have the privilege of deleting backups in case it gets compromised.
(agreed)
> For backups, I added a nightly cron job which > exports my SQLite db to a write-only S3 bucket.
Why not only do this and use an s3 sync instead? You can safely backup SQLite databases while they're being written to, so no need to export (dump) them; just copy the files themselves.
This might mean that your entire backup/restore strategy is just to copy some files. If so, that's ideal.
(Of course, s3 sync does require reading as well as writing, so perhaps just increase your cron job to run more often so it fits within your RPO)
Implementing snapshot backups on top of that would be as simple as a VACUUM and S3 PUT
For point-in-time restores, mvsqlite is a cool solution: https://github.com/losfair/mvsqlite
It implements CRDT as SQLite extension.
Not an easy problem for sure, but the web platform is surprisingly capable these days.
The one I thought of (mostly because I worked there before they went under/bought by MongoDB) is RealmDB: https://en.wikipedia.org/wiki/Realm_(database)
I have long since lost touch with the state of it, but at the time the syncing to their server was fast and had worked with a long list of environments/languages.
The one thing I will caution: their model was that you almost had to have a database-per-customer. You could have a second one that contained common information, but they had no concept of only syncing part of a database based on some logic. So many customer implications had the clients syncing multiple databases, and then a back-end client that would aggregate the data from all of those databases into one for backend processes. Extra complexity that I always thought was a real killer.
Don't do this, but an application I used to work on (to replace it) copied the sqlite file to a folder then used rsync to sync it with a backup node. Apparently it worked and was good enough for that use case (inefficient php backend application with at most a dozen concurrent users).
100.000 rows is only a few megabytes at most, right? Should be fine.
What's wrong with that? Of course it will work fine; SQLite, with or without WAL, has a ton of protections against corruption from writes-in-progress, which is what makes hot backups work.
Each user would have their own database-file which contains only information about that user. Then 1 shared database-file which contains info needed for all users.
Users would update their own data, which is a small database file which can be easily uploaded. They would not need to update the shared data.
Not knowing your app I don't know what the shared data would contain, presumably something. Perhaps the shared data-file would be updated on the server based on what individual user-data the users upload.
Developers should expect users to connect to the service using multiple devices (clients).
AFAIK bare SQLite doesn't offer synchronization mechanisms between multiple SQLite instances.
I believe Turso offers some solution of this kind, but not sure if that's open source or not.
But probably using only one device at a time by a single user?
My thought, and it is just a thought, here is that instead of trying to provide a GENERAL solution for all kinds of data-update patterns, it is often possible to think in terms of what my current application specifically needs. It is easier to come up with such a solution with SQLite per app because SQLite is so "lite".
I can't speak for the "general solution" except to say that many times you don't need an all-encompassing general solution, just a solution for your current app.
It depends on your expectations of concurrent use. Computer + tablet + phone means many users may use different devices within seconds of each other. If you want to support offline-first usage, concurrent updates from different clients for the same user becomes more likely.
Synchronization may introduce a lot more problems, especially when you want to automatically sync the database to some other place. You will need to deal with sync errors, inconsistency, version conflicts, rollbacks...
If your users could accept that, a simple full version backup is the best solution.
Like, the article's three sections are:
1. The challenges of sharding
2. The benefits of these new sharded Sqlite solutions over conventional Sqlite
3. A list conflating the benefits of SQL databases generally with the benefits of Sqlite
None of which answer the question of "Why should I use sharded Sqlite instead of, say, sharded Postgres, for hyperscale?".
Agreed — I think adding some comparisons to other database partitioning strategies would be helpful.
My 2 cents, specifically about manually partitioning Postgres/MySQL (rather than using something like Citus or Vitess):
SQLite-on-the-server works similarly to Cassandra/DynamoDB in how it partitions data. The number of partitions is decoupled from the number of databases you're running, since data is automatically rebalanced for you. If you're curious, Dagster has a good post on data rebalancing: https://dagster.io/glossary/data-rebalancing.
With manual partitioning, compared to automatic partitioning, you end up writing a lot of extra complex logic for:
- Determining which database each piece of data lives on (as opposed to using partitioning keys which do that automatically)
- Manually rebalancing data, which is often difficult and error-prone
- Adding partitions manually as the system grows
- (Anecdotally) Higher operational costs, since matching node count to workload is tricky
Manual partitioning can work fine for companies like Notion, where teams are already invested in Postgres and its tooling. But overall, I think it introduces more long-term problems than using a more naturally partitioned system.
To be clear: OLTP databases are great — you don’t always need to reach for Cassandra, DynamoDB, or SQLite-on-the-server depending on your workload. But I do think SQLite-on-the-server offers a really compelling blend of the developer experience of Postgres with the scalability of Cassandra.
Most sharding databases use consistent hashing.
> - Manually rebalancing data, which is often difficult and error-prone
not really. When you setup the database, you choose a highly divisible number of shards and then consistent hashing to spread the data across the shards. Each server hosts N-shards where N changes as your data (and server count) grows
> - Adding partitions manually as the system grows
Not really. Just choose a reasonably high number of shards and divide them across your servers.
> - (Anecdotally) Higher operational costs, since matching node count to workload is tricky
This could be true, but also, there is overhead to managing a ton of SQLite databases too. I think there are tradeoffs here.
My solution? The app is a webpage, which reads SQLite. If user wants to use app, the database is downloaded, unpacked, and used on users device.
Links:
- https://github.com/rumca-js/Internet-Places-Database - search.html provides a preview for my database file (but code supports also reading zip file).
- https://rumca-js.github.io/search?file=top&page=1&search=neo... - uses JSON files stored in a zip file. Will be replaced soon with zip + sqlite file
- https://rumca-js.github.io/search?file=music&view_display_ty... - example showing my fav music. As above uses JSON files, in a zip file
you can do kinda magic things like this and build websites that connect to multiple different databases around the web and... well, i'll leave the rest up to your imagination.
go here: https://just.billywhizz.io/sqlite/squeel/
hit CTRL/CMD + Q on your keyboard.
paste in this sql
``` attach database 'https://raw.githubusercontent.com/just-js/just.billywhizz.io...' as chinook ;
select * from albums ; ```
and hit CTRL/CMD + g to run the queries.
Keyword being "barely".
There are organization benefits if you can structure your data into a DB, instead of having each page redundantly hold the same header & metadata info.
Extracting data from it also becomes really really easy with selects. Otherwise I would have to implement, or reuse some algorithms to filter JSON data, etc.
In Telegram Bot API, each TDLib instance handles more than 24,000 active bots simultaneously.[1]
> Apple runs the world's largest known number of Cassandra/ScyllaDB instances with roughly 300,000 nodes as of 2022
The Big Tech approach to web services is like everyone piling on a cruise ship to commute to work. Certainly brings some conveniences, but the complexity is insane and overall it feels absurd.
The relatively new DB-per-tenant type approaches described in this article are more like renting/leasing a car, or maybe riding the bus to work. Definitely a step in the right direction.
The future will be a mix of these solutions, and selfhosting/indiehosting. Indiehosting is like owning a car. Once the software is as reliable as cars are, many more people will do it.
I've chatted with a few medium-sized companies looking at Durable Objects for this reason. DB-per-tentant removes much of the need for another dedicated team to provision & maintain infrastructure for the services. It's almost like what microservices were trying to be but fell woefully short of achieving.
It's disappointing (but understandable) that "serverless" received a bad rap. It's never going to fully replace traditional infrastructure, but it does solve a lot of problems.
The damn thing depends on mongodb and, worse, an old version of mongodb. I found a script that installed it all but it seems a system like that Ubiquiti controller really ought to use sqlite for a "just works" experience. For a while I thought maybe mongo was necessary if you had a really big installation with 10,000+ access points but if this article is right, sqlite ought to be good for the biggest installations out there.
I've been pleased at work to see some on-prem server apps switching to SQLite recently. Most recent that comes to mind is EFT Server (Windows SFTP/FTPS server), which switched from SQL Server Express to SQLite. SSE always makes me groan: huge install, pain to manage, etc. SQLite is just so easy to copy files around and make backups. I'm sure it's probably faster also.
That's a slightly misleading headline, because it can be interpreted as implying SQLite is BAD at micro-scale. That's not an argument the rest of the piece makes - it's more about how SQLite is great at large scale and people who focus on the small-scale aspects might miss that.
I'm aware that there a bunch of sqlite-compatible implementations and extensions that add MVCC. How close are those to actually being SQLite? Are they just a totally different table storage and transaction engine sitting behind SQLite's SQL parser/compiler and VM?
If You want global consistency then you'll have to either spend some time at runtime to achieve it, Have complicated protocols, fast networking, synchronized clocks.
Does this look like actor model (from Erlang) if you squint a bit?
In fact – Durable Objects talks a bit about its parallels with the actor model here: https://developers.cloudflare.com/durable-objects/what-are-d...
You might also appreciate this talk on building a loosely related architecture using Erlang, though it doesn't implement an actor-per-database pattern – https://www.youtube.com/watch?v=huGVdGLBJEo
SQLite is doing very well for my use cases. Especially using the experimental Node.js version - all I have been using. So I would highly recommend using it when you get to the point of moving data around.
On the other hand there is one big thing. And this may be only the Node.js version. But several times when I have been doing development work my file based database has become corrupted. I violated some constraint, the sql was poorly formed, etc. Always my fault. I simply remove the file that SQLite is using and create a new one or use a backup copy.
The thing is that I have never seen a PostgreSQL DB become corrupt. No matter what horrendous code I wrote. Never. I have not really appreciated how completely bullet proof it has been for me.
Just my 2 cents worth.
That really shouldn't be able to happen unless you're using the API wrong or have weakened safety with pragmas or database configurations that trade off resilience for performance. Even with crashes, attempts to violate constraints, poorly formed SQL, etc, the file should never become corrupt. If it does, that's a bug with SQLite, not you. Corruption bugs are exceptionally rare in SQLite these days.
My point is NOT that users should not adopt SQLite, just the opposite. I have found that using SQL helps me build things much more easily. And better. It help s you think about your code more effectively.
My recent adoption of SQLite and using it in smaller, lighter situations has worked extremely well and I will keep on using it. But... my limited experience and comparison of the two products suggests that as you would expect given the Lite, there are some differences that have real life consequences.
I develop a third party software and I receive reports of corrupted SQLite databases roughly every 2 months.
However, without any reproducibility, it might very well be due to hardware faults etc.
Still beats having end-users install something like postgres.
However, I suspect the infrastructure will provide this natively as it matures:
- Cloudflare will probably eventually add read replicas for Durable Objects. They're already rolling it out for D1 (their other SQLite database offering). [1]
- Turso has their own story for read replicas. [2]
[1] https://blog.cloudflare.com/building-d1-a-global-database/#s... [2] https://docs.turso.tech/features/embedded-replicas/introduct...
And of course there's the Phoenix LiveView solution where the write contains the new data, gets injected back into the page that contained the edit link. That doesn't quite cover the save-reload loop that some users with well-earned trust issues follow.
From the chat logs example in your article: how do you cope with a requirement such as "I as a user want to see all my latest thread activity in one place, across all my chat rooms?"
In this case I think you can let them become inconsistent in the face of, e.g., write errors.
> In this case I think you can let them become inconsistent in the face of, e.g., write errors.
For devs using CF Durable Objects, people frequently use CF Queues or CF Workflows to ensure that everything is eventually consistent without significant overhead.
It's a similar pattern to what large cos already do at scale with keeping data up to date across multiple partitions with Cassandra/DynamoDB.
create a copy, for example
Cheers
Do you use any special tools to manage all these separate databases, track performance and debug problems?
From the companies I've talked to, most developers using this architecture are building quick scripts to do this in-house. Both Turso and Durable Objects SQLite already a surprising amount of usage that people don't talk about much publicly yet, so I suspect some of this tooling will start to be published in the next year.
If running using an orchestration framework like Kubernetes, each container in a pod will have its own SQlite database. How can we assure the data is consistent across pods?
But they're slow (or incapable) of doing inserts, updates, and deletes, because the columnar formats are typically immutable. They're also relatively slow at operations that need to look at all of the data for a particular row.
OLTP databases are much better for use cases where you're frequently inserting, updating, and accessing individual rows, as for the database backing a web application.
A common pattern is to use an OLTP database (like postgres) to back your application, then replicate the data to an OLAP store like Clickhouse or a data lake to run analytical queries that would overwhelm postgres.
DuckDB also has a WASM target: https://duckdb.org/docs/stable/clients/wasm/overview.html
I don't know enough about DuckDB to understand the tradeoffs it made compared to SQLite to achieve this performance.
https://benchmark.clickhouse.com/#eyJzeXN0ZW0iOnsiQWxsb3lEQi...
So which is faster depends on your access pattern. There are dumb stupid terrible names for "access all of one row" (OLTP) and "access all of one column" (OLAP) type access patterns.
Maybe there's some edge case they've found where SQLite is badly optimized and DuckDB is totally optimized, but that's absolutely not the general case.
Databases are primarily limited by disk/IO speed. Yes there are plenty of optimizations but they result in 10% improvements, not 915x.
But how does WASM DuckDB store files in IndexedDB? Any info on that?
As I recall duckdb’s concurrency model did not sound viable for a web server but I may be behind the times or outright wrong.
I've looked at Turso before and this is exactly what came to mind. I do see some use cases where it could work, but for most real-world applications, this is an absolute red flag that shouldn't be overlooked.
However, this isn't so much different from Cassandra/DynamoDB which have a similar problem. You _can_ query cross-partition, but it's strongly discouraged and will strain any reasonably sized cluster.
More companies than people realize are already building and scaling with DO SQLite or Turso internally. Almost every company I've talked to that chooses Postgres hits scaling issues around Series A — these companies aren’t.
"SQLite but distributed" is like NoSQL during its hype cycle: a good choice for one or two use cases, a bad choice for most every other use case, yet people are choosing it for every use case, because it's trendy and has lots of marketing hype. It appears to work at first, so people rejoice and make bold claims about it solving every problem in the universe. It's only later that the poor design choice bites them in the butt and they go looking for the next trendy thing.
The biggest strength of using SQLite here is that it provides the benefits of a familiar SQL environment with the scaling benefits of Cassandra/DynamoDB.
DBs other than sqlite need their own dedicated servers, it says.
A good way to store online chats is db-per-chat, it says.
Sqlite scales great with [fusilade of very specific proprietary cloud services], it says.
Sqlite is great but companies like this seem determined to ruin it. MongoDB all over again.
To clarify — is your concern that the only scaling options I listed are proprietary services?
If so, I completely agree. This article was inspired by a tool we're building internally, based on the same architecture. We knew this was the right approach, but we refuse to rely on proprietary databases, so we built our own in-house.
We’re planning to open-source it soon.