Community

Blog Events Webinars Tutorials Forum

Create Account

×

KV Cache

I Tested 19 LLM API Workloads on Real Calls and Cut Costs 79% — Here's the Data

518 real API calls. $33.99 → $7.06 in a single run. The same parameter change projects $15,667/year saved on a healthcare workload — here's the exact code, the math, and every scenario I measured.

Farruh June 9, 2026 377

Alibaba Cloud Tair KVCache Simulation Analysis: High-Precision Computational and Caching Simulation Design and Implementation

This article introduces Tair-KVCache-HiSim, a high-fidelity CPU-based simulator for optimizing multi-tier KV Cache configurations in LLM inference.

ApsaraDB May 22, 2026 1,929

Memahami Kebutuhan GPU Memory untuk LLM: Panduan Lengkap

Apakah kamu berencana melakukan deployment Large Language Model (LLM) tapi nggak tahu berapa GPU memory yang dibutuhkan? atau model AI yang kamu gunak...

Alibaba Cloud Indonesia January 22, 2026 1,186

LLM Inference Acceleration: GPU Optimization for Attention in the Decode Phase (2)

This article briefly discuss how to further improve the calculation performance of MMHA in this interval.

Alibaba Cloud Community October 31, 2024 9,854

Related Tags

artificial intelligence big data cloud computing