Interested here: for me it works for out of core work. Where is the limit? On a related note: do you need to handle concurrency restrictions?
"Performance Does DuckDB use SIMD? DuckDB does not use explicit SIMD (single instruction, multiple data) instructions because they greatly complicate portability and compilation. Instead, DuckDB uses implicit SIMD, where we go to great lengths to write our C++ code in such a way that the compiler can auto-generate SIMD instructions for the specific hardware. As an example why this is a good idea, it took 10 minutes to port DuckDB to the Apple Silicon architecture."
It needs manual tuning to avoid those errors and I couldn’t find the right incantation, nor should I need to - memory management is the job of the db, not me. Far too flakey for any production usage.
Search the issues of the duckdb GitHub there’s at least 110 open and closed oom (out of memory) and maybe 400 to 500 that reference “memory”.
Ah, missed this the first time around. Will check this out. And yes, I noticed that DuckDB rather aggressively tries to use the resources of your computer.
SQLite isn’t small and crashy, it’s small and reliable.
There’s something fundamentally wrong with the codebase/architecture if there’s so many memory problems.
And the absolute baseline requirement for a production database is no crashes.
I basically use it to load csv, jsonl, parquet etc etc formats and do arbitrary transformations. Are people doing something else with it?
Some downsides are: No unique constraints with indexes (can accidentally shoot yourself in the foot with double ingestion), writing is a bit cumbersome if you already have parquet files.