×
SAPO

SAPO: A Stable and Performant Reinforcement Learning Method for Training Large Language Models

This article introduces SAPO, a new reinforcement learning method that stabilizes and improves policy optimization for training large language models.