Survey of Open-Source Vision Language Models (2026)(blog.overshoot.ai)

1 pointby YounElh3 hours ago1 comment

YounElh3 hours ago
Hey HN, I'm Younes, cofounder of Overshoot (YC W26). We're building inference infrastructure for vision models.
We put this survey together because we needed it ourselves. When you're trying to figure out which VLM to actually deploy, the information is scattered across HuggingFace cards, GitHub READMEs, and Twitter threads. We wanted one table that answers: what's out there, how big is it, does it actually work with vLLM/SGLang, and how much context can I give it.
The survey covers every significant open-source VLM released since Dec 2024: Qwen3.5, Qwen3-VL, InternVL3/3.5, Molmo2, GLM-4.6V, Kimi-VL, MiniCPM-V, Keye-VL, Tarsier2. For each model we tracked parameter counts, context lengths, release dates, framework support, and weekly downloads.
We also published deployment guides for 30 of these models (https://blog.overshoot.ai/blog/deploying-vlms-through-vllm), tested vLLM commands, flags, troubleshooting, what actually breaks and how to fix it. All on H200s.
Happy to answer questions about any specific model or deployment issues. We've hit most of the sharp edges at this point.