The matrix version is just an implementation detail to store the tree in a less tree-like shape so you don't need as many pointers.
Popcount works great in this context, but that only gives you linear speedups. Doing rank/select in O(1) instead of O(N) is a bigger win, and you get that by precomputing superblocks.
> Or are they used with 4x4 Matrix operators? Are wavelets good for that kind of math?
Nope, different kind of matrix. Just refers to a nicer packing of a wavelet tree with space wasted by bookkeeping pointers between tree nodes.
No.
Many people don't know what you would use wavelets for or where they really shine. I for example know wavelets are used in image compression algorithms but that's about it. I am curious where else this could be applied.