SWiRL: Step-Wise Reinforcement Learning for Reasoning & Tool Use
Paper