On the STRICT mode, I've asked this elsewhere and never gotten an answer: does anyone have a loose-typing example application where SQLite's non-strict, different-type-allowed-for-each-row has been a big benefit? I love the simplicity of SQLite's small number of column types, but the any-type-allowed-anywhere design always seemed a little strange.
This flexibility is convenient when only one application reads and writes to the table. But if multiple applications access the same tables, the lack of a strictly enforced schema becomes a liability. The same is true when using generic tools to process data in SQLite tables, because such tools don't know what type of data to expect. The column type may be X but the actual data may be of type Y.
I think the naive approach is to tokenize the input and append "*" for prefix matching. I'm not too experienced and this can probably be improved a lot. There are many settings like different tokenizers, stemming, etc. Additionally, a lot can be built on top like weighting, boosting exact matches, etc.
Does anyone know good resources for this to learn and draw inspiration from?
O'Reilly's "Relevant Search" isn't the worst here, but you'll be porting/writing a bit yourself.
Is there a reason why something more custom built, like ParadeDB Community edition won't meet your needs?
I understand you're speaking about SQLite, while ParadeDB is PostgreSQL but as you know, it's non-trivial to get good search quality, so I'm trying to understand your situation and needs.
I don't think I've ever done much with SQLite's JSON functions, but I have on one or two occasions used a constraint to enforce a TEXT column contains valid JSON, which would have been very tedious to do otherwise.
I very, very much did. I was using a Python package that used a lot of NumPy internally, and sometimes its return values would be Python integers, and sometimes they'd be NumPy integers.
The Python integers would get written to SQLite as SQLite integers. The NumPy integers would get written to SQLite as SQLite binary blobs. Preventing you from doing simple things like even comparing for equal values.
Setting to STRICT caused an error whenever my code tried to insert a binary blob into an integer column, so I knew where in the code I needed to explicitly convert the values to Python integers when necessary.
Curious if others here are running SQLite in production and if you would be interested in something like this.
They recently landed multi-writer support for their rust SQLite re-implementation, which is personally the biggest issue I've had with using SQLite for high concurrency applications.
`PRAGMA journal_mode = 'mvcc';`
https://docs.turso.tech/tursodb/concurrent-writes
Very excited to see if SQLite responds by adding native support, I'm hoping competition here will spur improvements on both sides.
Granted it was small DB (few megabytes), I also wanted to avoid collecting changes one by one, I simply wanted a diff over last time.
And ON CONFLICT which can help dedupe among other things in a simple and performant way.
with thing_key as (
insert into item(key, description) values('thing', 'a thing') on conflict do nothing )
insert into user_note(uid, key, note) values (123, 'thing', 'I like this thing') on conflict (uid, thing) do update set note = 'I like this thing');I did this recently for a fork of the main MiSTer executable because of a few disagreements with how Sorg runs the project, and it was actually pretty easy to change out the save file features to use SQLite, and now things are a little more resistant to crashes and sudden power loss than you'd get with the terrible raw-dogged writing that it was doing before.
It's fast, well documented, and easy to use, and yeah it has a lot more features than people realize.
I did not know SQLite allows writing data that does not match the column type. Yuck. Now I need to review anything I built and fix it.
I understand why they wouldn’t, but STRICT should be the default.
Why is it a problem that it allows data that does not match the column type? SQLite is intended for embedded databases, where only your application reads and writes from the tables. In this scenario, as long as you write data that matches the column's data type, data in the table does match the column type.
Does anyone have any other favorite modern bloom-filter-based search solutions that dont need to store copies of all the documents in the search db? Ideally something that can run in WASM too so we can ship a tiny search index to the browser. I found https://github.com/tinysearch/tinysearch but haven't tried it yet.
However, I will concede, and the article doesn't mention at all, far less are aware that you can build HA, cross region replicated SQLite using purely OSS software provided you architect your software around it. Now that would be a really good `Modern SQLite: Features You Didn't Know It Had` article!
Another interesting discussion point is how far self hosted PostgreSQL and pgBackRest can get you to a near-zero data loss high RPO, RTO setup. Its simply amazing we can self host all this.
All of the listed features except for strict tables and generated columns have been in SQLite for 10+ years, and those two are certainly not new. The JSON APIs were not made part of the standard distribution until 3.38 (2022-02) but were added in 3.9 (2015-10) and widely used long before they were upgraded from an optional extension to a core feature.
- Generated columns: 3.31 (2020-01)
- Strict tables: 3.37 (2021-11)
Correct.
As I mentioned in my GP, these features have been around a long, long, long time and in the current age of AI that would happily tell you these features exist if you remotely even hint at it, I would assume one would really have to go out of their way to be ignorant of them.
It doesn't hurt to have an article like this reiterate all that information though, I just would have loved the same level of effort put into something that's not as easily available yet.
A gap exists, and the article doesn't mention at all, about actually configuring solutions like litestream, rqlite, dqlite (not the same!) to build HA, cross region replicated SQLite using purely OSS software provided you architect your software around it. Now that would be a really good "Modern SQLite: Features You Didn't Know It Had" article!
Another interesting discussion point is how far self hosted PostgreSQL and pgBackRest can get you to a near-zero data loss high RPO, RTO setup. It just blows my mind that we can self host all this on consumer grade hardware without having to touch a DC or "the cloud" at all.
SELECT payload->>'$.user.id' AS user_id FROM events ...
This works a little more consistently than JSON_EXTRACT - the -> operator always returns a JSON string; the ->> operator (used here) always returns a SQL scalar (text, number, or null). More details: https://sqlite.org/json1.html#jptrI just got hit badly by Dreamhost, which is still running MySQL 8.0. That version has a "JSON extension" which is almost totally useless. I designed something before discovering how different MySQL 8.4 and MySQL 8.0 are.