Achieving MPI Performance Portability through Annotation-based Compiler Transformations

Ziaul Haque
Seminar

The Message Passing Interface (MPI) is widely considered to be the de-facto standard for portable parallel programming on high-end computing systems. However, the MPI standard only provides functional portability, and not performance portability, across platforms. That is, while an MPI application can be executed anywhere, it is hard to predict the relative performance of different MPI operations on a given platform. For example, how much data does the MPI implementation need to coalesce before pushing it to the network for optimal performance? Is a bulk-synchronous communication model better or worse than an asynchronous PUT/GET based model on a given platform? Can RMA PUT/GET operations safely be replaced with local load/store options for local buffers on all platforms? The answers to all these questions are specific to the MPI implementation and to the hardware architecture. The goal of our work is to design a compiler-assisted framework that would use user annotations , hardware architecture information, and program analysis to transform the input MPI code into a better optimized architecture-specific output MPI code. I will describe the design of our framework and several aspects that the framework needs to work around for optimal performance while maintaining correct behavior. I'll also show some preliminary performance numbers demonstrating the improved performance that our framework enables.

Short Bio:
Md. Ziaul Haque is a PhD student at Dept. of Computer Science, University of Texas at San Antonio. His PhD advisor is Dr. Qing Yi, UTSA. Currently he is working on MPI-Refactoring under the supervision of Dr. Pavan Balaji, Argonne National Laboratory. His research interest includes High Performance Computing, Compiler optimization, Automatic source to source transformation.