However, everything you do sounds very interesting, useful and well thought out, please keep doing it, I'd encourage others to work in the same direction too.
I hope, more of us can find the time for more than best wishes in the near future.
HybridAttention O(n·W + n·D): 0.35s / 286.6 tok/s
So i needed to make fundamental arquitecture changes .Do some KV cache tricks.
And then prove the new arquitecture was faster with benchmarks and perplexity was acceptable.