Nonblocking and Sparse Collective Operations on Petascale Computers

Torsten Hoefler
Seminar

This talk introduces new classes of collective operations from an implementation as well as an application programmer's perspective. We discuss issues with schedule generation, caching, and progression, and how these influence the application programmer. Then we focus on simple strategies, such as loop tiling, pipelining, and simple code movement, that can be used to optimize application performance with nonblocking collectives. We also discuss how the new semantics can be utilized to design new, asymptotically optimal algorithms for one-level termination detection, which is important for data-driven algorithms. The second part of the talk focusses on sparse collective operations and static binding of communication topologies. We discuss a possible interface for MPI-3, several productivity and performance issues, and show some performance results and potential for future work and architectures.