concretely explain the core technique, illustrating with examples
Edit
The core technique of "Reasoning to Learn from Latent Thoughts" is a data-efficient approach to language model pretraining that augments observed data with inferred latent thoughts. Let me explain this concretely:
The key insight is that human-written text is "compressed" - it's the final outcome of an underlying thought process that includes reasoning steps, background knowledge, and contextual information. The authors propose that explicitly modeling and inferring these latent thoughts can significantly improve learning efficiency.
The approach models the text generation process as:
For each document:
Raw text: "The second step is to find the new most marked axis. How do we find the best most standard axis to achieve principal component analysis?"
Inferred latent thought: