Community

Blog Events Webinars Tutorials Forum

Create Account

×

SGLang

Alibaba Cloud Tair KVCache Simulation Analysis: High-Precision Computational and Caching Simulation Design and Implementation

This article introduces Tair-KVCache-HiSim, a high-fidelity CPU-based simulator for optimizing multi-tier KV Cache configurations in LLM inference.

ApsaraDB May 22, 2026 1,880

SGLang Hierarchical Sparse Attention

This article introduces hierarchical sparse attention: the full KV Cache is stored on the CPU, while the GPU keeps only a Top-k LRU Buffer.

ApsaraDB May 22, 2026 1,820

Hybrid Model Support | SGLang's Support Scheme for Hybrid Architecture Models like Mamba-Transformer

This article introduces a dual memory-pool inference framework enabling efficient hybrid Transformer-Mamba model execution by resolving conflicting caching mechanisms.

ApsaraDB February 4, 2026 8,225

Building a Production-Grade Cloud-Native Large Model Inference Platform with SGlang RBG + Mooncake

This article shows how SGLang RBG + Mooncake enable production-grade, cloud-native LLM inference with PD-disaggregation.

OpenAnolis March 10, 2026 5,715

Alibaba Cloud Tair Partners with SGLang to Build HiCache: Constructing a New Cache Paradigm for "Agentic Inference"

This article introduces HiCache, a hierarchical KVCache infrastructure developed by Alibaba Cloud Tair and SGLang to optimize performance and memory capacity for long-context "agentic" LLM inference.

ApsaraDB December 29, 2025 10,848

Related Tags

artificial intelligence big data cloud computing