Automatic Performance Collection (AutoPerf)

A library for the automatic collection of hardware performance counter and MPI information is available on ALCF BG/Q machines (Mira, Cetus, Vesta). This library transparently collects performance data from running jobs and saves it into files at jobs completion.

AutoPerf is enabled by default on Cetus, Mira, and Vesta - no action will be needed to utilize the library. Executables complied or linked on on these machine  will automatically be linked to the AutoPerf library. To disable compiling with AutoPerf library, set the environment variable


in your shell before compiling. Autoperf may also be disabled at run time by setting the environment variable AP_DISABLE=1 as part of your qsub command when submitting your job.

Codes compiled prior to the installation of AutoPerf will need to be recompiled or relinked in order to use AutoPerf.

You may specify that a summary of the data collected by AutoPerf be written to your run directory by setting the environment variable "AP_OUTPUT_LOCAL=1" in the qsub command. Upon successful completion of a run the library will create an output file named ap-<cobaltid>-<jobid> in the system directory /gpfs/{mira,vesta}-fs0/logs/autoperf/{year}/{month}/{day}, and if the "AP_OUTPUT_LOCAL" environment variable was set a local summary file will also be created in the run directory.

AutoPerf output is in plain text and includes MPI usage and performance information indicating which MPI routines were called, how many times each routine was called, the time spent in each routine, and the number of bytes send or received if applicable. Data from the hardware performance counters is also collected and written.

The collection of performance data and the generation of performance data files requires that the program use MPI, call MPI_Init() and MPI_Finalize(), terminate without error, and not use other performance tools or libraries that utilize the PMPI_ interface and the BGPM performance counter API. The impact of data collection on program runtime is expected to be less than 1%.