Research

Aliyun’s CAP has a piece on picking an inference engine that narrows the field to four: Ollama, vLLM, SGLang, and Hugging Face Pipeline. In 2024, that framing was fine. By 2026, it’s missing half the map. NVIDIA’s TensorRT-LLM has completed its “PyTorch-ification,” SGLang became famous as the first open-source project to reproduce DeepSeek’s large-scale deployment, Hugging Face slapped a “maintenance mode” banner on TGI and told you to switch to vLLM — and the real throughline of the entire 2025 inference landscape can be summed up in one word: disaggregate.