ATP and STAT on Theta

Help Desk

Hours: 9:00am-5:00pm CT M-F
Telephone: 866-508-9181 (Toll-Free, US Only) or 630-252-3111
Email: support@alcf.anl.gov

Theta

Introduction

ATP and STAT are tools to debug abnormal program terminations such as segfaults.  ATP (Abnormal Termination Processing) monitors a program while it runs.  If the program crashes, ATP will invoke STAT (the Stack Trace Analysis Tool) to merge the stack backtraces of the application processes to an output file "atpMergedBT.dot".  This merged stack backtrace file may then be visualized using STAT's visualization tool, stat-view.

Using ATP with stat-view

Scenario:

When you try to run you get a segfault.   After running, the job's stderr file (which defaults to $COBALT_JOBID.error) contains:

user@thetalogin6:~> cat $COBALT_JOBID.error
_pmiu_daemon(SIGCHLD): [NID 03834] [c7-1c2s14n2] [Sat Aug 18 03:21:19 2018] \
                                   PE RANK 30 exit signal Segmentation fault
[NID 03834] 2018-08-18 03:21:19 Apid 4938801: initiated application termination

ATP and stat-view can be used to look into the segfault.

  1. Compile/Link Setup

To use ATP, the ATP module should be loaded before linking your application . By default it is loaded on Theta, but to verify this, run module list, and check that the atp module is loaded.

user@thetalogin6:~> module list
Currently Loaded Modulefiles:
  1) modules/3.2.10.6
  2) intel/18.0.0.128 
  3) craype-network-aries
  ...     
  16) atp/2.1.2
  17) perftools-base/7.0.2
  ...

 

user@thetalogin6:~> make
  1. Running the code

Next, the environment variable ATP_ENABLED must be set in the job script to enable ATP.

Systems
Topics