3 pointsby loganboyd2 hours ago1 comment

loganboyd2 hours ago
I made i, a tensor computation language with a simple explicit scheduling model. The only scheduling concepts in i are loop splits, loop ordering, and input producer staging (where one intermediate value is computed inside the loop nest of another).
Those three concepts are enough to write numerically stable online blockwise FlashAttention. Loop tiling, loop fusion, storage folding, and (critically) online reduction rewriting fall out as _predictable_ consequences of the lowering.
The hope is that this scheduling model makes it easier for people and search algorithms to find performant schedules.
Current status: working proof-of-concept. Compiler/runtime in Rust, C backend, small Python front-end, zero dependencies. Not fast (yet).