This article briefly discuss how to further improve the calculation performance of MMHA in this interval.
This article introduces how the Attention in the decode phase is optimized on GPU based on RTP-LLM practices.