This article introduces SAPO, a new reinforcement learning method that stabilizes and improves policy optimization for training large language models.