-
A recipe to fine-tune DeepSeekMath-Base 7B to act as a "reasoning agent" that can solve mathematical problems via a mix of natural language reasoning and the use of the Python REPL to compute intermediate results.
-
A novel decoding algorithm for tool-integrated reasoning (TIR) with code execution feedback to generate solution candidates during inference.

- For each problem, copy the input N times to define the initial batch of prompts to feed vLLM. This effectively defines the number of candidates one uses for majority voting.
- Sample N diverse completions until a complete block of Python code is produced.
- Execute each Python block and concatenate the output, including tracebacks if they appear.
- Repeat M times to produce a batch of generations of size N and depth M, allowing the model to self-correct code errors using the traceback. If a sample fails to produce sensible outputs (e.g., incomplete code blocks), prune that result.
- Postprocess the solution candidates and then apply majority voting to select the final answer
-
A variety of internal validation sets that we used to guide model selection and avoid overfitting to the public leaderboard.