• Collectives

    Untitled

  • Implementing collectives

    • Regardless of tree vs ring all-reduce, there are 2(n-1) transfers, so busbw is 2S(n-1)/n where S is array size (source)

    • Ring all-reduce: first a reduce-scatter, then an all-gather

      image.png

    • Tree all-reduce:

      image.png

    • Ring all-gather

      image.png