Analyzing MPI-3.0 Process-Level Shared Memory: A Case Study with Stencil Computations

Xiaomin Zhu
Seminar

Intranode communication plays an important role in many applications. MPI adopts nearly the same approach for intranode communication as internode one, which needs multiple memory copies and is suboptimal. The recent MPI-3.0 standard introduced a process-level shared-memory interface, which enables processes within the same node to have direct access to others’ memory. We used the five-point stencil computation as an example to investigate how to efficiently use MPI-3.0 shared memory to achieve true zero-copy intranode communication.  We analysed the overheads and proposed solutions to them. With these solutions, the extra overhead of computation over the shared memory is eliminated. More important, the communication performance is improved from 40% to 90% on various platforms compared with the a version implemented with MPI_Send/Recv.

Bio:

Xiaomin Zhu is a visiting scholar at Argonne from last March. He is an associate research fellow at National SuperComputer Center in Jinan, China. He leads the research group of the center. His research area includes: parallel computing, parallel I/O and application performance optimization.  Before he joined National SuperComputer Center in Jinan, he got a Ph.D. degree from the Institute of Computing Technology, Chinese Academy of Sciences in 2010.