This article introduces Alibaba Cloud’s Best Paper win at FAST 2026 for its new “Latte” local–cloud converged storage architecture and its breakthrough evolution in cloud native storage.
This article shows how SGLang RBG + Mooncake enable production-grade, cloud-native LLM inference with PD-disaggregation.
This article offers a framework for choosing between self-hosted GPUs and MaaS for LLM inference by weighing cost, data, engineering, and scalability tradeoffs.
This article introduces ACK GIE's precision-mode prefix cache-aware routing that maximizes KV-Cache hit rates for distributed LLM inference.
This article introduces how to use the cloud-native AI suite to integrate open-source inference service framework KServe and quickly deploy NVIDIA NIM in an ACK cluster.