×
LLM Inference

Alibaba Cloud Storage Wins Best Paper Award at Global Top Conference FAST

This article introduces Alibaba Cloud’s Best Paper win at FAST 2026 for its new “Latte” local–cloud converged storage architecture and its breakthrough evolution in cloud native storage.

Building a Production-Grade Cloud-Native Large Model Inference Platform with SGlang RBG + Mooncake

This article shows how SGLang RBG + Mooncake enable production-grade, cloud-native LLM inference with PD-disaggregation.

Self-Hosted GPU or Model-as-a-Service? A Strategic Guide for AI Leaders

This article offers a framework for choosing between self-hosted GPUs and MaaS for LLM inference by weighing cost, data, engineering, and scalability tradeoffs.

Caching is Efficiency: Achieving Precise LLM Cache Hits with Alibaba Cloud ACK GIE

This article introduces ACK GIE's precision-mode prefix cache-aware routing that maximizes KV-Cache hit rates for distributed LLM inference.

Use NVIDIA NIM to Accelerate LLM Inference in Alibaba Cloud ACK

This article introduces how to use the cloud-native AI suite to integrate open-source inference service framework KServe and quickly deploy NVIDIA NIM in an ACK cluster.