Compilers in HPC - On Parallel Abstraction Penalties and Domain Knowledge Expecting Performance Estimation

Johannes Doefert, Postdoctoral Appointee, Argonne National Laboratory
Seminar

Especially in HPC, applications are commonly tuned to achieve good performance. However, given the freedom general purpose languages offer and the non-trivial implications any source difference, even syntactic, can have on the performance, it is hardly an easy task. In this talk, we highlight different ongoing efforts in the LLVM compiler framework (which includes Clang and Flang) that directly target HPC applications and developers. We describe how programmers can manually improve the performance when OpenMP is used for parallelism or accelerator offloading before we present compiler optimizations to automate them. This does not only lift the maintenance burden from the developer, but it also encourages code designs with the scientific problem in mind, not the execution model of the hardware.

In the second part of the talk, we will introduce a tool to guide developers towards static domain knowledge that, once manifested in the program, will actually increase the performance. We show that there are various opportunities to encode knowledge, e.g., through restrict pointers in C/C++, but most of them will not change the performance, often not even the resulting binary. Our tool allows semi-automatic exploration of this vast space as it produces suggestions for probably correct and definitively beneficial source code annotations. The prototype was able to propose minimal code additions or changes that improved the performance of multiple DOE proxy applications by up to 20%.

Miscellaneous Information: 

The seminar will be streamed; see details at https://anlpress.cels.anl.gov/mcs-streaming-seminars.