Community

Blog Events Webinars Tutorials Forum

Create Account

×

LLM Inference

Alibaba Cloud Tair KVCache Simulation Analysis: High-Precision Computational and Caching Simulation Design and Implementation

This article introduces Tair-KVCache-HiSim, a high-fidelity CPU-based simulator for optimizing multi-tier KV Cache configurations in LLM inference.

ApsaraDB May 22, 2026 1,629

SGLang Hierarchical Sparse Attention

This article introduces hierarchical sparse attention: the full KV Cache is stored on the CPU, while the GPU keeps only a Top-k LRU Buffer.

ApsaraDB May 22, 2026 1,564

Alibaba Cloud Storage Wins Best Paper Award at Global Top Conference FAST

This article introduces Alibaba Cloud’s Best Paper win at FAST 2026 for its new “Latte” local–cloud converged storage architecture and its breakthrough evolution in cloud native storage.

Alibaba Cloud Community March 25, 2026 1,269

Building a Production-Grade Cloud-Native Large Model Inference Platform with SGlang RBG + Mooncake

This article shows how SGLang RBG + Mooncake enable production-grade, cloud-native LLM inference with PD-disaggregation.

OpenAnolis March 10, 2026 5,286

Self-Hosted GPU or Model-as-a-Service? A Strategic Guide for AI Leaders

This article offers a framework for choosing between self-hosted GPUs and MaaS for LLM inference by weighing cost, data, engineering, and scalability tradeoffs.

Alibaba Cloud Indonesia March 9, 2026 4,709

Caching is Efficiency: Achieving Precise LLM Cache Hits with Alibaba Cloud ACK GIE

This article introduces ACK GIE's precision-mode prefix cache-aware routing that maximizes KV-Cache hit rates for distributed LLM inference.

Alibaba Container Service February 26, 2026 5,944

Use NVIDIA NIM to Accelerate LLM Inference in Alibaba Cloud ACK

This article introduces how to use the cloud-native AI suite to integrate open-source inference service framework KServe and quickly deploy NVIDIA NIM in an ACK cluster.

Alibaba Container Service December 4, 2024 4,986

Related Tags

artificial intelligence big data cloud computing