Darshan

References

Introduction

Darshan is a lightweight I/O instrumentation library that can be used to investigate the I/O behavior of production applications. It records statistics, such as the number of files opened, time spent performing I/O, and the amount of data accessed by an application.

XC40

The Theta environment includes the Darshan module by default.

$ module list 2>&1 | grep darshan
21) darshan/3.1.4

In most cases, no additional steps are needed to enable Darshan instrumentation. Code compiled with the Cray compiler wrappers {cc, CC, ftn} will include the Darshan library by default. Dynamically linked applications are the most notable exception. See the “Dynamic Linking” section later in this document for instructions on how to enable Darshan instrumentation if you plan to use dynamic libraries.

When a Darshan-enabled job completes, it will generate a single output file containing I/O characterization results. Each output file is placed in the following directory based on the start time of the job:

  • /lus/theta-fs0/logs/darshan/theta/<YEAR>/<MONTH>/<DAY>

The name of the output file will be in the format:

  • <USERNAME>_<BINARY_NAME>_id<COBALT_JOB_ID>_<DATE>-<UNIQUE_ID>_<TIMING>.darshan

A graphical summary of I/O behavior can be generated using the darshan-job-summary.pl utility. The utility should be available in your default path, but if not, it can be loaded using the module command:

$ module load darshan

The following example shows how to execute the utility:

$ darshan-job-summary.pl /lus/theta-fs0/logs/darshan/theta/carns_my-app_id114525_7-27-58921_19.darshan --output ~/job-summary.pdf

The entire contents of the output file can be translated into text format for more detailed analysis using the following command:

$ darshan-parser /lus/theta-fs0/logs/darshan/theta/carns_my-app_id114525_7-27-58921_19.darshan > ~/job-characterization.txt

Note: The resulting text file will be verbose. To interpret its contents, use the guidelines in the Guide to Darshan-parser Output.

Dynamic Linking

Darshan can also be used with applications that have been dynamically linked, but in this case you must set explicit environment variables in your job script and your qsub command in order to enable Darshan. See the following example. The DARSHAN_PRELOAD variable will be set automatically when the Darshan module is loaded; the commands below just relay it to the application runtime environment.

# job_script.sh
aprun –n <n> -N <N> -e LD_PRELOAD=$DARSHAN_PRELOAD <binary> <args>

$ qsub <..> --env DARSHAN_PRELOAD=$DARSHAN_PRELOAD job_script.sh

Possible Reasons for Missing Output Files

Darshan will not produce output files in the following scenarios:

  • Use of languages besides C, C++, or FORTRAN
  • Use of non-standard MPI libraries or linkers
  • Use of other MPI profilers that conflict with Darshan
  • Use of dynamic linking without using LD_PRELOAD
  • Job did not call MPI_Finalize(). Reasons may include:
    • Job hit wall time limit
    • Abnormal termination
    • The executable is not an MPI program

In such cases, contact ALCF Support for help. Depending on your situation, it may still be possible to use Darshan.

Disabling Darshan

We do not recommend disabling Darshan unless you have a specific problem or have been instructed by the ALCF support team to do so. Disabling Darshan limits the ALCF’s ability to assist in supporting your application, and Darshan instrumentation does not add significant overhead to execution time.

Disabling at Compile Time

The Darshan module can be unloaded, and when an application is linked, the intercept library will no longer be included.

$ module unload darshan
$ make

Disabling at Runtime

Darshan can be disabled by setting the DARSHAN_DISABLE=1 environment variable on the aprun command. This does not require relinking the application, and Darshan can be deactivated on a case-by-case basis for existing executables.

# job_script.sh
aprun –n <n> -N <N> -e DARSHAN_DISABLE=1 <binary> <args>

Mira, Cetus, and Vesta

When a Darshan-enabled job completes, it will generate a single output file containing I/O characterization results. Each output file is placed in the following directory:

  • Mira or Cetus: /gpfs/mira-fs0/logs/darshan/mira/<YEAR>/<MONTH>/<DAY>
  • Vesta: /gpfs/vesta-fs0/logs/darshan/vesta/<YEAR>/<MONTH>/<DAY>

The name of the output file will be in the format:

  • <USERNAME>_<BINARY_NAME>_<COBALT_JOB_ID>_<DATE>_<UNIQUE_ID>_<TIMING>.darshan.gz

A graphical summary of I/O behavior can be generated using the darshan-job-summary.pl utility. This utility is installed on Mira, Vesta and the Cooley analytics cluster. In order to use the utility on these machines, you must first add the SoftEnv key +darshan to your ~/.soft.cooley file or ~/.soft file, respectively (in case you do not have it) and run the "resoft" command. The following example shows how to execute the utility.

# on Mira and Cooley login node:
darshan-job-summary.pl /gpfs/mira-fs0/logs/darshan/mira/carns_my-app_id114525_7-27-58921_19.darshan.gz --output ~/job-summary.pdf 
# on Vesta login node:
darshan-job-summary.pl /gpfs/vesta-fs0/logs/darshan/vesta/carns_my-app_id114525_7-27-58921_19.darshan.gz --output ~/job-summary.pdf 

The entire contents of the output file can be translated into text format for more detailed analysis using the following command, which is available on Mira, Vesta, and Cooley:

# on Mira or Cooley:
darshan-parser /gpfs/mira-fs0/logs/darshan/mira/carns_my-app_id114525_7-27-58921_19.darshan.gz > ~/job-characterization.txt
# on Vesta:
darshan-parser /gpfs/vesta-fs0/logs/darshan/vesta/carns_my-app_id114525_7-27-58921_19.darshan.gz > ~/job-characterization.txt

Note: The resulting text file will be verbose. To interpret its contents, use the guidelines in the Guide to Darshan-parser Output.

Disabling Darshan on Mira or Vesta

Disabling is discouraged on Darshan unless you have a specific problem or have been instructed by the ALCF support team to do so. Disabling Darshan limits the ALCF’s ability to assist in supporting your application.

Darshan can be disabled by setting the DARSHAN_DISABLE=1 environment variable. If this variable is set at compile time, then Darshan instrumentation will not be included in your executable at all.  It can also be used at run time (in your job submission) to deactivate Darshan on a case-by-case basis for existing executables.

Possible Problems on Mira or Vesta

Darshan will not produce output files in the following scenarios:

  • Use of any language besides C, C++, or FORTRAN
  • Use of non-standard MPI libraries or linkers
  • Use of MPI profilers
    • Darshan defers to any other tool that uses the PMPI profiling interface
  • Use of dynamic linking
  • Job did not call MPI_Finalize(). Reasons may include:
    • Job hit wall time limit
    • Abnormal termination
    • The executable is not an MPI program

In such cases, contact ALCF Support for help. Depending on your situation, it may still be possible to use Darshan.

Cooley

Darshan is not automatically enabled for all jobs on Cooley. Unlike Mira and Vesta, all applications on Cooley are dynamically linked by default, which means that Darshan must be loaded at runtime using the LD_PRELOAD environment variable. In order to instrument a job on Cooley, you must first add the SoftEnv key +darshan to your ~/.soft.cooley file and run the “resoft” command. Then add the following to the mpirun command line in your job script:

   --env LD_PRELOAD=$DARSHAN_PRELOAD

Example

   # within Cooley job script
   mpirun --env LD_PRELOAD=$DARSHAN_PRELOAD -np <number of processes> -f $COBALT_NODEFILE ./app.exe <arguments>

After your job completes, you can find the Darshan output file in the following directory:

   /gpfs/mira-fs0/logs/darshan/tukey/<year>/<month>/<day>

Note the path component tukey for data generated on Cooley.

The same tools described in the Mira and Vesta documentation can be used to interpret Darshan output files generated on Cooley.