Paper

image.png

image.png

image.png

At the start of training, the recurrently computed hidden states are essentially random noise.

The Bootstrap Problem and Its Solution

Initial Training Dynamics

At the very beginning:

What likely happens early in training:

  1. Networks learn to mostly ignore the noisy hidden states
  2. They focus on the actual input signal x̃
  3. Essentially become feedforward networks at first
  4. Gradually learn to incorporate hidden state information as it becomes meaningful

Why This Can Work