Debugging and Correctness Tools on Aurora

JaeHyuk Kwack, ALCF
Webinar
October Dev Session Graphic featuring title, date, and image of speaker.

Debugging in HPC environments requires specialized tools due to the complexity of parallel and distributed systems as well as various components on the powerful compute nodes. In this webinar, ALCF's JaeHyuk Kwack will provide overview of debugging and correctness tools available on Aurora, coupled with instructions on how to use them with applications. 

On Aurora, application developers can use Intel OneAPI GDB for inspecting GPU kernels execution and identifying bugs in GPU-accelerated codes. This includes unique commands and features tailored for GPU debugging that are not present in standard GDB. Linaro DDT is an advanced parallel debugger for optimizing complex HPC applications at scale on Aurora. Its intuitive graphical interface enables developers to easily identify bugs across thousands of processes. Intel Sanitizer is a correctness tool to detect addressability issues, memory leaks, use of uninitialized memory, and data races and deadlock in GPU-accelerated applications on Aurora. 

JaeHyuk Kwack is a computational scientist in performance engineering group at Argonne Leadership Computing Facility (ALCF). He is a lead of performance profiling and debugging tools for ALCF computing resources, and he is responsible for ensuring the readiness of several major scientific applications for performant use on Aurora exa-scale system.