What's novel here is the extremely small KV cache memory usage per long context windows, like 0.77GB with 512K, a 90% memory usage reduction compare to the already really small KV cache memory usage of Deepseek V4 Flash.