We started an Ollama container on a MacBook. There's no NVIDIA GPU, no CUDA toolkit, and macOS doesn't even have CUDA drivers. Ollama found an NVIDIA GPU anyway: A 128 GB Blackwell GPU on a DGX Spark across the network.
Our anpproach enables this by intercepting CUDA calls and forwarding them to a remote server. It takes one command, requires no code changes, and the application has no idea.