×
GPU Optimization

LLM Inference Acceleration: GPU Optimization for Attention in the Decode Phase (2)

This article briefly discuss how to further improve the calculation performance of MMHA in this interval.

LLM Inference Acceleration: GPU Optimization for Attention in the Decode Phase

This article introduces how the Attention in the decode phase is optimized on GPU based on RTP-LLM practices.