Characterizing the Implications of Intra-

Li Rao
Seminar

Recent advances in computer architecture have lead to significant changes in the degree of intra-node parallelism in supercomputing systems. Rapid increases in the number of cores per chip and multi-socket nodes have introduced significant non-uniformity into the latency of memory accesses and on-node communication operations. Hierarchical, node-aware collective communication have been incorporated into many MPI implementations, however these algorithms assume uniform intra-node communication latencies between all cores. In this work, we develop a NUMA-aware performance model for intra-node communication and use it to optimize the performance of MPI collective communication operations. We evaluate the performance of these NUMA-aware collective communication operations on multi-socket Intel and AMD node architectures and demonstrate that the choice of communication algorithm and topology can yield significant performance gains.

Bio: Li Rao is master student from Institute of Software, Chinese Academy of Sciences(ISCAS), he is supposed to graduate in July,2012. Li Rao's mentor is Dr.Yunquan Zhang, who is the main organizer of China TOP100 List of High Performance Computer. Li Rao will continue this work in Argonne as his Master Thesis.