Untitled

Mixture of depths

image.png

The key innovation is a method to dynamically allocate computation across tokens in a transformer model by letting some tokens skip certain transformer blocks entirely. Here's how it works:

  1. Budget Setting: