Fast and Faithful Performance Prediction of MPI Applications with SimGrid/SMPI: the HPL Case Study

Arnaud Legrand, Le Centre National de la Recherche Scientifique (CNRS/Inria)
Supercomputer showdown

Abstract: Finely tuning MPI applications (number of processes, granularity, collective operation algorithms, topology and process placement) is critical to obtain good performance on supercomputers.  With a rising cost of modern supercomputers, running parallel applications at scale solely to optimize their performance is extremely expensive. Having inexpensive but faithful predictions of expected performance could be a great help for researchers and system administrators. The methodology we propose captures the complexity of adaptive applications by emulating the MPI code while skipping insignificant parts. I will demonstrate this capability with High Performance Linpack (HPL), the benchmark used to rank supercomputers in the TOP500 and which requires a careful tuning.  I will explain (1) how we  slightly modified the open-source version of HPL to allow a fast emulation on a single commodity server at the scale of a supercomputer and (2) how to model the different components (network, BLAS, ...) of the system. I will show that a careful modeling of both spatial and temporal node variability allows us to obtain predictions within a few percents of real experiments, and which are thus faithful baselines to compare to.

Bio: Arnaud Legrand is a senior researcher for CNRS at University Grenoble Alpes since 2004 and he leads the Inria POLARIS team since 2016. His research targets the management (mostly from an algorithmic point of view, i.e., scheduling, load balancing, fairness, game theory, online learning, ...) and performance evaluation (in particular through simulation, visualization, statistical analysis, ...) of large scale distributed computing infrastructures such as clusters, grids, desktop grids, volunteer computing platforms, clouds,... when used for scientific computing. He is one of the main developer of the SimGrid project and in the last decade he has been particularly active in promoting better experimental and research practices through tutorials, keynotes, lectures and even a MOOC on reproducible research.

BlueJeans Link: https://bluejeans.com/185367202