Muon, Keller Jordan 2024

SOAP, Harvard 2024

Overview:

  1. Background and Problem:
  2. Main Contribution - SOAP Algorithm:
  3. Key Results:
  4. Implementation Details:
  5. Future Work:

The significance of this work is that it combines the benefits of first-order methods (Adam) and second-order methods (Shampoo) in a principled way, resulting in an optimizer that is both more efficient and easier to tune than existing approaches. The empirical results on large language model training demonstrate substantial practical benefits.

Shampoo