Efficient Inter-node Communication in High Performance GPU Systems

Ashwin Aji
Seminar

Graphics Processing Units (GPUs) are becoming more common in High Performance Computing (HPC) systems due to their unprecedented raw performances in a variety of domains, including molecular modeling, biological sequence analysis, linear solvers, etc. Moreover, three out of the current top five most powerful supercomputers in the world have GPUs as their fundamental compute accelerators. However, the efficiency of these heterogeneous systems has remained below par, mainly because of explicit and slow data transfers over the PCIe between their disjoint memory subsystems. The programmability of these systems has also remained a challenging goal, because it is hard and error-prone to efficiently manage the different memories within each node. As part of this internship, we extend one of the widely used Message Passing Interface (MPI) implementations, MPICH2, to natively support data transfers between any two memory regions, either host or GPU device, over the network. We efficiently perform internal pipelined data transfers between the GPU, the host and the network, so that the PCIe latency between the GPU and the host is hidden for improved performance. The GPU programming interface can be either CUDA or OpenCL, and our design is flexible to extend support to more programming models and device types if required. Our experiments over an infiniband network indicate that our pipelined design can achieve up to 22% improvement in two-sided (GPU-GPU) communication over the default blocking data communication at the user level, along with improved programmability. In the future, we will make our implementation available to the HPC community by releasing it with a future version of MPICH2.

Bio: Ashwin Aji is a PhD candidate (2nd year) from the Dept. of Computer Science at Virginia Tech. He is advised by Prof. Wu Feng, whose projects include the Green500 list, mpiBLAST, etc to name a few. Dr. Feng also guided him towards his Masters degree at Virginia Tech in 2008, for which he received the Outstanding M.S. Student award. Ashwin's research interests include high performance and parallel computing, emerging parallel architectures and performance modeling. He hopes to graduate with the doctoral degree by May 2013.
Web links:
Ashwin Aji: http://people.cs.vt.edu/~aaji
Dr. Wu Feng: http://people.cs.vt.edu/~feng