A Systematic Approach for the Characterization of Performance Variability

Tom Cornebize
Seminar

HPC production systems are becoming increasingly complex, both in terms of hardware and software. As a result, applications deployed on such systems experience performance variability; i.e., two executions of the same application with the same parameters will not take the same time.

To provide a better understanding of this performance variability across all its dimensions, we present here a systematic approach to identify the different sources of variability and quantify their contribution to the observed variations. We illustrate this approach with early results on a study of multi-threaded implementations of dgemm.