orbanlevi6 hours ago
I have 1 DGX Spark and running models with vLLM to, out of curiosity why not using Llama.cpp / TensorRT-LLM or any other alternatives?
awedisee3 hours ago
Oh thank god. Finally a man of the people who can show us how to optimize 10k worth of equipment.
Because we all have at least two of these. Shout out to OP!!
TechPreacher6 hours ago
[flagged]