Full Bandwidth Broadcast, Reduction and Scan with Only Two Trees

Jesper Larsson Traff
Seminar

We present a new, simple algorithmic idea for exploiting the capability for
bidirectional communication present in many modern interconnects for the
collective MPI operations broadcast, reduction and scan. Our algorithms
achieve up to twice the bandwidth of most previous and commonly used
algorithms. In particular, our algorithms for reduction and scan are the
currently best known. Experiments on clusters with Myrinet and InfiniBand
interconnects show significant reductions in running time for broadcast and
reduction, for reduction even close to the best possible factor of two.