Design and implementation of an optimized one-sided communication run-time for Global Arrays on Blue Gene/P

Sreeram Potluri
Seminar

The Partitioned Global Address Space (PGAS) models bring together the best features of the shared memory programming model and the message passing model. While providing simple operations for one-sided communication, they expose the data distribution information to the applications for them to exploit locality. Though these models provide an efficient interface, their performance largely depends on how their implementations efficiently exploit the features provided by the underlying systems. Some of these implementations use other run-time systems to provide them the abstraction for remote memory access (RMA) operations. In such cases, the API that the run-time system exposes and its implementation play a crucial role in the overall performance of the PGAS model. This work focuses on Global Arrays, a library based PGAS model, and its run-time system, namely ARMCI, on the Blue Gene/P. First, the various features offered by the BG/P system and its low level communication API, DCMF, were explored and evaluated using micro-benchmarks. The impact of shared system resources on performance was studied at scale. Then, the limitations of the interface and implementation of the current AMRCI run-time were analyzed. Finally, the A1 library, a super-set of ARMCI, was designed, providing API that would allow Global Arrays to take advantage of the features and performance offered by the underlying system.

Biography: Sreeram Potluri is a Graduate Student in the Department of Computer Science and Engineering at The Ohio State University. He is a member of the Network-Based Computing Laboratory lead by Dr. D. K. Panda. He had received his Bachelors degree in Computer Science and Engineering from the Jawaharlal Nehru Technological University, Hyderabad, India. His research interests include high speed interconnects, parallel programming models and high-end computing applications. His recent work includes optimizing AWP-ODC, a widely used seismic modeling application, using MPI-1 Non-blocking and MPI-2 RMA semantics on large scale InfiniBand clusters. This work was published at ICS'10 and is part of the application's entry as a finalist for the 2010 Gordon Bell Prize. Sreeram is involved in the design and development of MVAPICH2, an open-source high-performance implementation of MPI-2 over InfiniBand and 10GigE/iWARP. This software is currently used by over 1,100 organizations in 56 countries.