| user: | waybarrios |
| created: | Apr 11, 2024 |
| karma: | 2 |
| about: | Hey HN! I built vLLM-MLX because vLLM falls back to CPU-only mode on macOS,which is painfully slow on Apple Silicon machines. vLLM-MLX brings native GPU acceleration using Apple's MLX framework, with:
Quick start:
pip install -e .
vllm-mlx serve mlx-community/Llama-3.2-3B-Instruct-4bitWorks with standard OpenAI SDK. Happy to answer questions! |