OpenSpeedshop is an open-source performance tool for the analysis of applications using:

  • Sampling Experiments
  • Support for Callstack Analysis
  • Hardware Performance Counters
  • MPI Profiling and Tracing
  • I/O Profiling and Tracing
  • Floating Point Exception Analysis

A more detailed list of the individual experiment names and functionality definition follows:

pcsamp     Periodic sampling the program counters gives a low-overhead view of where the time is being spent in the user application.
usertime   Periodic sampling the call path allows the user to view inclusive and exclusive time spent in application routines. It also allows the user to see which routines called which routines. Several views are available, including the “hot” path.
hwc        Hardware events (including clock cycles, graduated instructions, i- and d-cache and TLB misses, floating-point operations) are counted at the machine instruction, source line and function levels.
hwcsamp    Similar to hwc, except that sampling is based on time, not PAPI event overflows. Also, up to six events may be sampled during the same experiment.
hwctime    Similar to hwc, except that call path sampling is also included.
io         Accumulated wall-clock durations of I/O system calls: read, readv, write, writev, open, close, dup, pipe, creat and others.
iot        Similar to io, except that more information is gathered,  such as bytes moved, file names, etc.
mpi        Captures the time spent in and the number of times each MPI function is called. Trace format option displays the data for each call, showing its start and end times.
mpit       Records each MPI function call event with specific data for display using a GUI or a command line interface (CLI).
fpe        Find where each floating-point exception occurred. A trace collects each with its exception type and the call stack contents. These measurements are exact, not statistical.



Using OpenSpeedShop

Because of the unique nature of the BG/Q platform, typical OpenSpeedShop usage models do not apply. On clusters, a user can gather and display a default view of the performance information in one step using a command such as:

osspcsamp "how you normally run your application".  

However, on the BG/Q platform to do essentially the same performance gathering and display, we need multiple steps.

STEP 1: To include OpenSpeedShop's performance gathering collectors (one for each type of data collected) users must relink their application. Make sure to first load the OpenSpeedShop runtime environment:

resoft openspeedshop

The OpenSpeedShop script, osslink, is provided to manage linking in the OpenSpeedShop libraries that are needed to gather the performance information. An example using smg2000 follows. We prepend an osslink command and the collector type (-c <collector type> to create new build targets.

        smg2000: smg2000.o
	@echo  "Linking" $@ "... "
	${CC} -o smg2000 smg2000.o ${LFLAGS}

# Target to create a program counter sampling experiment that 
# uses a timer to periodically interrupt the application and record where it is.
smg2000-pcsamp: smg2000.o
	@echo  "Linking" $@ "... "
	osslink -c pcsamp  ${CC} -o smg2000-pcsamp smg2000.o ${LFLAGS}

# Target to create a hardware counter sampling experiment that 
# uses PAPI to gather hardware counter event information for up to 6 events
smg2000-hwcsamp: smg2000.o
	@echo  "Linking" $@ "... "
	osslink -c hwcsamp  ${CC} -o smg2000-hwcsamp smg2000.o ${LFLAGS}

# Target to create a hardware counter overflow experiment that 
# uses PAPI to gather hardware counter event information for a single event
smg2000-hwc: smg2000.o
	@echo  "Linking" $@ "... "
	osslink -c hwc  ${CC} -o smg2000-hwc smg2000.o ${LFLAGS}

Then to build a smg2000 application that will gather program sampling information:

make smg2000-pcsamp
Linking smg2000-pcsamp ... 
osslink -c pcsamp /bgsys/drivers/ppcfloor/comm/gcc/bin/mpixlc -o smg2000-pcsamp smg2000.o -L. -L../struct_ls -L../struct_mv -L../krylov -L../utilities -lHYPRE_struct_ls -lHYPRE_struct_mv -lkrylov -lHYPRE_utilities -lm


Now that the OpenSpeedShop collectors are included in the user application, we can run the application to gather the performance information. OpenSpeedShop needs to write files to a shared file system while the application runs. To tell OpenSpeedShop where to write the files OpenSpeedShop provides an environment variable (OPENSS_RAWDATA_DIR) to specify the shared file system location. The user is responsible for passing this environment variable when submitting the application run and making sure the directory passed is empty for each execution. Here is an example of a typical job submission using smg2000 and following the example in step 1.

rm -rf /veas-fs0/jgalaro/RAW_PCSAMP_SMG2000
mkdir /veas-fs0/jgalaro/RAW_PCSAMP_SMG2000
qsub -A PEACEndStation -t 40 -n 128 --mode c2 --proccount 256 --env BG_SHAREDMEMSIZE=32MB:PAMID_VERBOSE=1:OPENSS_RAWDATA_DIR=/veas-fs0/jgalaro/RAW_PCSAMP_SMG2000 /veas-fs0/jgalaro/demos/smg2000/test/smg2000-pcsamp -n 50 50 50

After the application runs, the OpenSpeedShop raw data files will be created in the location specified by OPENSS_RAWDATA_DIR.


Run the ossutil script to convert the raw data files into an OpenSpeedShop database file for viewing with the graphical user interface (GUI) or the interactive command line interface (CLI).

ossutil /veas-fs0/jgalaro/RAW_PCSAMP_SMG2000
Processing raw data for smg2000
Processing processes and threads ...
Processing performance data ...
Processing functions and statements ...

When ossutil finishes an OpenSpeedShop database file, complete with all the applications symbols and the performance information gathered, will have been created. This database file can be moved to other machines for viewing or viewed where it is. There will be no need to reference the application for viewing of the performance data. OpenSpeedShop database files have ".openss" as their suffix. Database files created using ossutil will be named X.n.openss. You may rename the database file to something more meaningful by moving the file to rename.


View the performance information by using the graphical user interface (GUI) or the interactive command line interface (CLI). The OpenSpeedShop quick start guide] is handy for showing some of the GUI and CLI features and commands.


Another useful tool may be the osscompare script which can be used on the BG/Q FE and will compare database files to each other and create a side-by-side comparison listing. So, a base run could be made and used to compare against after source modifications have been made to see if those source modifications have impacted application performance.

There are man pages for osscompare, osspcsamp, ossusertime, etc.. Even though you can not use osspcsamp, ossusertime, etc. to gather data as you would on a vanilla flavored cluster, there is still a good bit of information about the experiment type that might be useful on the BG/Q platform.

A few examples of environment variables that control the rate and what is gathered for each experiment. Pass these environment variable settings similar to OPENSS_RAWDATA_DIR in the example above.

For pcsamp: OPENSS_PCSAMP_RATE=100, is default 100 samples per second
For usertime OPENSS_USERTIME_RATE=35, is default 35 samples per second
For hwc, hwctime: OPENSS_HWC_EVENT=PAPI_TOT_CYC, is default PAPI event, can specify your own desired event
For hwcsamp: OPENSS_HWCSAMP_EVENTS=PAPI_TOT_CYS,PAPI_FP_OPS, can specify your own list of events
For io: OPENSS_IO_TRACED=<all I/O functions>, can specify subset 
For iot: OPENSS_IOT_TRACED=<all I/O functions> , can specify subset
For mpi: OPENSS_MPI_TRACED=<all MPI functions>, can specify subset
For mpit: OPENSS_MPIT_TRACED=<all MPI functions>, can specify subset
For fpe: OPENSS_FPE_EVENT=<all FPE exceptions>, can specify subset

Additional Information

Additional information may be found on the OpenSpeedShop website -

The OpenSpeedShop quick start guide] is handy for showing some of the GUI and CLI features and commands.

Known Issues on BG/Q

Of the performance information experiment types that OpenSpeedShop supports,all except fpe are currently working:

pcsamp  - Program counter sampling
hwc     - Hardware counter overflow 
hwcsamp - Hardware counter sampling
usertime  - Call path profiling, Hot call path detection, inclusive and exclusive CPU time
hwctime   - Hardware counter overflow with call path profiling
io, iot   - I/O function tracing experiments
mpi, mpit - MPI function tracing experiments
fpe       - Floating point exception tracing