See also Deep/formal reasoning
References
Concepts
Quiet-STaR, 2024
Use REINFORCE to learn helpful “thoughts”
Thoughts look like this
[2402.14083] Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping, Meta 2024
Self taught reasoner (STaR), 2022: just bootstraps rationale-labeled dataset from a few examples into a larger unlabeled dataset
[2211.09066] Teaching Algorithmic Reasoning via In-context Learning
[2303.04910] Baldur: Whole-Proof Generation and Repair with Large Language Models
[2202.01344] Formal Mathematics Statement Curriculum Learning
https://arxiv.org/abs/2401.08967 ReFT: Reasoning with Reinforced Fine-Tuning
https://arxiv.org/abs/2401.00757 A & B == B & A: Triggering Logical Reasoning Failures in Large Language Models
Self critique
Self refine
Self-consistency chain of thought
Tree of thought
Self-reflection
Least to most decomposition
Domain adaptation
Self improve