Lift Single Node Memory Limit Using PGAS-like Pragmas for MPI

Jichi Guo
Seminar

With the increasing of the problem size of scientific applications, the required local memory might be too large to fit into a single node. Developers usually have to manually modify the source code to redistribute the local memory onto a number of nodes. If the local memory is accessed irregularly that makes it difficult to tile, the result code could become hundreds of times slower than the original sequential version because of the inter-node communication latency. The Partitioned Global Address Space (PGAS) programming model is widely used for expressing irregular memory accesses over the distributed shared memory. But some PGAS languages such as UPC do not allow setting the number of nodes to share memory smaller than total node number at runtime because of the absence of the sub-communicator support.

To address the problem, we propose a source-to-source translation framework with PGAS-like programming interface to automatically convert local memory access to remote memory access with minor modifications. Most of the modifications could be done using non-intrusive pragmas so that the source code could also be compiled to run sequentially on single node. The generated output code is implemented with MPI one-sided communication, which supports using sub-communicators to select the nodes to share memory. We evaluate our approach with several kernels and the our best implementation will not increase the magnitude of the runtime compared to the sequential single node implementation.

Bio:
Jichi Guo is a PhD student working on compilers and program optimization at the University of Colorado at Colorado Springs. He completed his MS in computer science at the University of Texas at San Antonio in 2012. His current research interests lie in compiler optimization, analytical performance modeling, and parallel programming.