×
KV Cache

I Tested 19 LLM API Workloads on Real Calls and Cut Costs 79% — Here's the Data

518 real API calls. $33.99 → $7.06 in a single run. The same parameter change projects $15,667/year saved on a healthcare workload — here's the exact code, the math, and every scenario I measured.

Alibaba Cloud Tair KVCache Simulation Analysis: High-Precision Computational and Caching Simulation Design and Implementation

This article introduces Tair-KVCache-HiSim, a high-fidelity CPU-based simulator for optimizing multi-tier KV Cache configurations in LLM inference.

Memahami Kebutuhan GPU Memory untuk LLM: Panduan Lengkap

Apakah kamu berencana melakukan deployment Large Language Model (LLM) tapi nggak tahu berapa GPU memory yang dibutuhkan? atau model AI yang kamu gunak...

LLM Inference Acceleration: GPU Optimization for Attention in the Decode Phase (2)

This article briefly discuss how to further improve the calculation performance of MMHA in this interval.