It would be interesting if in your performance analysis on the readme you also showed the false positive rate, assuming the memory use between the data structures you're comparing is identical.
(You may have addressed this in your thesis, feel free to tell me to go RTFD ;)
Though that being said, with such massive datasets you'll already be bottlenecked by the necessary communication between GPUs (sadly even with NVLink) since the queried data always lives on the GPU.