Paper



At the start of training, the recurrently computed hidden states are essentially random noise.
The Bootstrap Problem and Its Solution
Initial Training Dynamics
At the very beginning:
- The hidden states z_L and z_H are indeed gibberish
- The networks receive: (garbage_state, other_garbage_state, actual_input)
- But crucially, they still have access to the actual input x̃
What likely happens early in training:
- Networks learn to mostly ignore the noisy hidden states
- They focus on the actual input signal x̃
- Essentially become feedforward networks at first
- Gradually learn to incorporate hidden state information as it becomes meaningful
Why This Can Work