Core File Settings

About Core Files

By default, a rank that aborts will dump core, and the control system will signal the other ranks to quit (without dumping core).   Multiple core files will be generated only when several ranks abort almost simultaneously.  The ranks that do not dump core are those that received the system signal to quit before anything bad happened on that rank.  The settings listed below can modify this behaviour in various ways.

The core files generated will be in a lightweight format as opposed to a full binary image.  The lightweight core file is in text format and is human-readable although you may find it useful to employ a tool such as bgq_stack to convert the stack backtrace addresses into symbolic form.  It is possible to generate full binary core files (see below), which may be examined with gdb.  However, when working with binary core files, extreme caution is advised to restrict the number to just one (or a very small number), because the volume of data puts severe stress on the system.
 

 

Options for Core File Generation

The following environment variables influence core file creation and contents. Specify regular (non-script) jobs using the qsub argument -–env (Note: two dashes). Specify script jobs (--mode script) using the --envs (Note: two dashes) or --exp_env (Note: two dashes) options of runjob. For additional information about setting environment variables in your job, visit http://www.alcf.anl.gov/user-guides/running-jobs#environment-variables.

 

BG_COREDUMPONEXIT=1

Creates a core file when the application exits. This is useful when the application performed an exit() operation and the cause and location of the exit() is not known.

BG_COREDUMPONERROR=1

Creates a core file when the application exits with non-zero status.

BG_COREDUMPDISABLED=1

Disables creation of any core files.

BG_COREDUMPFILEPREFIX

Sets the filename prefix of the core files. The default is "core". The MPI task number is appended to this prefix to form the filename.

BG_COREDUMPPATH

Sets the directory for the core files.  (The default is the current working directory.)

 

Format Controls

Lightweight Core Files

By default, the created core files are in a lightweight format rather than a full binary dump format. 

The following Boolean settings control whether or not register information is included in the lightweight core files:

BG_COREDUMP_REGS

This is the master switch.

BG_COREDUMP_GPR

GPR (integer) registers.

BG_COREDUMPFPR

FPR (floating-point) registers.

BG_COREDUMP_SPR

SPR (special purpose) registers.

The following Boolean settings control fields in the lightweight core files:

BG_COREDUMP_PERS

The node's personality information (e.g., XYZ dimension location, memory size).

BG_COREDUMP_INTCOUNT

These Booleans control whether the number of interrupts     handled by the node.

BG_COREDUMP_TLBS

TLB memory layout for the cores.

BG_COREDUMPSTACK

Application stack address information.

 

Full Binary Core Files

It is also possible for a job to dump a full binary core file, as shown below:

BG_COREDUMPBINARY=0

Generates a full core file only for rank 0.

IMPORTANT: A problem with this option has been found and reported to IBM.  At this time we recommend only using this on single-rank test jobs.

BG_COREDUMPBINARY=1,2,3

Generates full core files only for ranks 1, 2, and 3.

IMPORTANT: A problem with this option has been found and reported to IBM.  At this time we recommend only using this on single-rank test jobs.

BG_COREDUMPBINARY=*

Generates full core files for all ranks (NOT RECOMMENDED).

Use extreme caution when enabling full binary core-file generation. The volume of data written can be huge so it is not recommended to dump more than a small number of ranks.

The full binary core files may be examined using gdb.

For additional information, visit Blue Gene/Q Application Development Manual.