2 pointsby ManfredMacx2 hours ago1 comment

ManfredMacx2 hours ago
Two architectural decisions worth explaining for anyone curious about the internals:
Arrow-backed storage: cell data lives in Apache Arrow arrays, organized as stripes (blocks of rows x columns). Range operations like SUMIFS, VLOOKUP, and XLOOKUP receive typed numeric slices (&[f64]) directly rather than iterating cell-by-cell. This is what makes criteria aggregates over large ranges fast rather than a loop over boxed values.
Incremental dependency graph: every formula registers its precedents at parse time. On edit, we propagate a dirty set via reverse edges and only re-evaluate the affected subgraph. For a model with many formulas, a single-cell edit typically touches a small fraction of them. Formal benchmarks are in progress across linear chains, fan-out/fan-in, SUMIFS-heavy, and spill-heavy workload shapes.