This article introduces how the Attention in the decode phase is optimized on GPU based on RTP-LLM practices.