I would like to add some anecdata to this.
When I was a PhD student, I already had 12 years of using and administrating Linuxes as my personal OS, and I'd already had my share of package manager and dependency woes.
But managing Python, PyTorch, and CUDA dependencies were relatively new to me. Sometimes I'd lose an evening here or there to something silly. But I had one week especially dominated by these woes, to the point where I'd have dreams about package management problems at the terminal.
They were mundane dreams but I'd chalk them up as nightmares. The worst was having the pleasant dream where those problems went away forever, only to wake up to realize that was not the case.
> When I was a PhD student, I already had 12 years of using and administrating Linuxes as my personal OS, and I'd already had my share of package manager and dependency woes.
I'm in a very similar boat (just defended a few months ago).More than once I had installed pytorch into a new environment and subsequently spent hours trying to figure out why things suddenly aren't working. Turns out, PyTorch had just uploaded a bad wheel.
Weirdly I feel like CUDA has become easier yet Python has become worse. It's all package management. Honestly, I find myself wanting to use package managers less and less because of Python. Of course `pip install` doesn't work, and that is probably a good thing. But the result of this is that any time you install a package it adds the module as a system module, which I thought was the whole thing we were trying to avoid. So what? Do I edit every package build now so that it runs a uv venv? If I do that, then this seems to just get more complicated as I have to keep better track of my environments. I'd rather be dealing with environment modules than that. I'd rather things be rapped up in a systemd service or nspawn than that!
I mean I just did a update and upgrade and I had 13 python packages and 193 haskell modules, out of 351 packages! This shit is getting insane.
People keep telling me to keep things simple, but I don't think any of this is simple. It really looks like a lot of complexity created by a lot of things being simplified. I mean isn't every big problem created out of a bunch of little problems? That's how we solve big problems -- break them down to small problems -- right? Did we forget the little things matter? If you don't think they do, did you question if this comment was written by an LLM because I used a fucking em dash? Seems like you latched onto something small. I think it is hard to know when the little things matter or don't matter, often we just don't realize the little things are part of the big things.
But the point is more that, for me, this is a somewhat rare instance where I think using the term "nightmare" in the title is justified.
> Public index servers SHOULD NOT allow the use of direct references in uploaded distributions. Direct references are intended as a tool for software integrators rather than publishers.
This means that PyPI will not accept your project metadata as you currently have it configured. See https://github.com/pypi/warehouse/issues/7136 for more details.
cpu = [
"torch @ <https://download.pytorch.org/whl/cpu/torch-2.7.1%2Bcpu-cp312-cp312-manylinux_2_28_x86_64.whl> ; python_version == '3.12'",
"torch @ <https://download.pytorch.org/whl/cpu/torch-2.7.1%2Bcpu-cp313-cp313-manylinux_2_28_x86_64.whl> ; python_version == '3.13'",
]
:-/ It reminds me of Microsoft calling their thing "cross platform" because it works on several copies of WindowsIn all seriousness, I get the impression that pytorch is such a monster PITA to manage because it cares so much about the target hardware. It'd be like a blog post saying "I solved the assembly language nightmare"
If you do not care about performance and would rather have portability, use an alternative like tinygrad that does not optimize for every accelerator under the sun.
This need for hardware-specific optimization is also why the assembly language analogy is a little imprecise. Nobody expects one binary to run on every CPU or GPU with peak efficiency, unless you are talking about something like Redbean which gets surprisingly far (the creator actually worked on the TensorFlow team and addressed similar cross-platform problems).
So maybe the the blogpost you're looking for is https://justine.lol/redbean2/.
Or, looked at a different way, Torch has to work this way because Python packaging has too narrow of an understanding of platforms which treats many things that are materially different platforms as the same platform.
That is: there's nothing stopping the author from building on the approach he shares to also include Windows/FreeBSD/NetBSD/whatever.
It's his project (FileChat), and I would guess he uses Linux. It's natural that he'd solve this problem for the platforms he uses, and for which wheels are readily available.
So, you're doubling down on OP's misnomer of "cross platform means whatever platforms I use", eh?
You should be specific about which distributions you have in mind.
In the comparative table, they claim that conda doesn't support:
* lock file: which is false, you can freeze your environment
* task runner: I don't need my package manager to be a task runner
* project management: You can do 1 env per project ? I don't see the problem here...
So no, please, just use conda/mamba and conda-forge.
`pip install torchruntime`
`torchruntime install torch`
It figures out the correct torch to install on the user's PC, factoring in the OS (Win, Linux, Mac), the GPU vendor (NVIDIA, AMD, Intel) and the GPU model (especially for ROCm, whose configuration varies per generation and ROCm version).
And it tries to support quite a number of older GPUs as well, which are pinned to older versions of torch.
It's used by a few cross-platform torch-based consumer apps, running on quite a number of consumer installations.
This ends up wasting space and slowing down installation :(
Speaking of PyTorch and CUDA, I wish the Vulkan backend becomes stable, but that seems to super far dream...
https://docs.pytorch.org/executorch/stable/backends-vulkan.h...
It doesn't solve how you package your wheels specifically, that problem is still pushed on your downstream users because of boneheaded packaging decisions by PyTorch themselves but as the consumer, Pixi soften's blow. The condaforge builds of PyTorch also are a bit more sane.
That's why people will go stupid lengths to convert model from PyTorch / TensorFlow with onnxtools / coremltools to avoid touch the model / weights themselves.
The only one that escaped this is llama.cpp, which weirdly, despite the difficulty of model conversion with ggml, people seem to do it anyway.