2 pointsby ManfredMacx2 hours ago1 comment
  • ManfredMacx2 hours ago
    Two architectural decisions worth explaining for anyone curious about the internals:

    Arrow-backed storage: cell data lives in Apache Arrow arrays, organized as stripes (blocks of rows x columns). Range operations like SUMIFS, VLOOKUP, and XLOOKUP receive typed numeric slices (&[f64]) directly rather than iterating cell-by-cell. This is what makes criteria aggregates over large ranges fast rather than a loop over boxed values.

    Incremental dependency graph: every formula registers its precedents at parse time. On edit, we propagate a dirty set via reverse edges and only re-evaluate the affected subgraph. For a model with many formulas, a single-cell edit typically touches a small fraction of them. Formal benchmarks are in progress across linear chains, fan-out/fan-in, SUMIFS-heavy, and spill-heavy workload shapes.