Study of the Behavior of HPC Applications under Checkpointing

Event Sponsor: 
Mathmatics and Computer Science Division Seminar
Start Date: 
Dec 5 2018 - 10:45am
Building/Room: 
Building 240/Room 3178
Location: 
Argonne National Laboratory
Speaker(s): 
Shu-Mei Tseng
Speaker(s) Title: 
University of California, Irvine
Host: 
Bogdan Nicolae

This talk presents the experiences and lessons learned in studying the behavior of several HPC applications with and without checkpointing. After a brief introduction to checkpointing, it focuses on several approaches to profile and extract behavior patterns with respect to CPU, memory and network utilization. These approaches are tailored for HPC machines such as Theta, where monitoring certain resources such as networking is non-trivial. In the second part, the talk focuses on the results obtained by applying the approaches to two representative HPC applications (HAC C, LatticeQCD). In particular, it discusses several findings related to periodicity of resource utilization and interference. It concludes with a series of future directions of research that exploit the findings.

Miscellaneous Information: 

This seminar will be streamed, see details at https://anlpress.cels.anl.gov/mcs-streaming-seminars.

Please click here [schedule.ics] to add this event to your calendar.