There have also been proposals to use flash memory in inference accelerators instead of DRAM. You can make high bandwidth flash using the same stacking technique used for HBM DRAM.
It is obviously unsuitable for training because of limited write cycles. But the read bandwidth is decent, and the density/$ is much better.