SWiRL: Step-Wise Reinforcement Learning for Reasoning & Tool Use

Paper