Does this need depth data capture as well? The “casual captures” makes it seem like it only needs images, but apparently they are using depth data as well
I think it does use depth data from parameters in docs: python infer_shape.py --input_pkl <sample.pkl> (possibly achievable using software like MapAnything). I believe CUDA only.