×
Multi Head Attention

LLM Inference Acceleration: GPU Optimization for Attention in the Decode Phase

This article introduces how the Attention in the decode phase is optimized on GPU based on RTP-LLM practices.