Inference-only MCTS where expansion is sampling N children, evaluation is self-generated LM score + self-consistency score (max % agree)
Also on failure, generates Reflection to inform subsequent expansions
Like RAP, uses MCTS over LLM, but uses actual envs
Comparisons
"Since our method is based on Monte Carlo Tree Search and is model-free, one limitation of LATS on decision-making tasks is that it requires the agent to be able to revert to earlier states in the environments... this reversion property is feasible in many real-world applications (despite being not universally applicable in all possible environments)"