gdb

Using gdb

Preliminaries

You should prepare a debug version of your code:

  • Compile using -O0 –g
  • If you are using the XL compilers with OpenMP, also add qsmp=omp:noopt:noauto.

Next, in order to use gdb, a partition must be allocated using “qsub -I” to run through the Cobalt queuing system. If you are unable to get reasonable job turnaround for your debugging purposes, request a reservation for debugging.  (See http://www.alcf.anl.gov/user-guides/reservations)

Debugging with gdb under Cobalt using "qsub -I"

The command “qsub -I” submits an interactive Cobalt job.  When the job reaches the head of the queue and runs, a partition is allocated for your use and you are given a shell prompt.  You may then issue a runjob command to start a debugging run.  For example:

qsub -I -q prod -t 30 -n 32 -A myproject

Wait until your job reaches the head of the queue and runs (use qstat in another window to check on it). When it starts, you will be receive another shell prompt:

>qsub -I -t 30 -n 32
project: Performance
Wait for job 195057 to start...
Opening interactive session to VST-20420-31531-32
>

Note that the Cobalt interactive job will remain active until either the job's wallclock time expires, or you exit the shell.

Before proceeding, run the command "wait-boot" to ensure that your block has completed booting:

>wait-boot
Checking status of VST-22040-33151-32
Block state: B (please wait)
Block state: B (please wait)
Block state: B (please wait)
Block state: B (please wait)
Block state: I - READY TO RUN!
>

At this point, you can enter:

# Important: don't forget the --block $COBALT_PARTNAME
runjob --block $COBALT_PARTNAME --np 4 -p 16 --start-tool /sbin/gdbtool --tool-args “--rank=0 --listen_port=10000” : yourprogram.exe

# Note the underscore in "--listen_port"

The number of mpi tasks (--np) together with mode (-p) can be anything that will fit in the node count (-n) requested with the initial qsub arguments (in this case, 32). After runjob startup, you will get a prompt from the gdb server. Query it for the IP address that corresponds to the rank to which you want to attach the debugger. For example, for the IP for rank 0, type 0. Copy and paste this IP, along with the port number you choose with --listen_port, into the gdb client target command in a later step.

Enter a rank to see its associated I/O node's IP address, or press enter to start the job:
0
rank 0 uses I/O node Q02-I7-J01 at IP address 172.25.67.1

NOTE: Do not hit <return> on a blank line yet. This will start execution prematurely.

In another shell window, run the gdb client (note: this is not the default gdb in your PATH):

/bgsys/drivers/ppcfloor/gnu-linux/bin/powerpc64-bgq-linux-gdb yourprogram.exe

Using the IP from the gdbserver output together with the PORT you specified with --listen_port, type:

target remote 

for example:

target remote 172.25.67.1:10000

After you hit return, the gdb client will pause (it is waiting for your program to start running).

Go back to the shell window with the runjob.

Now you can type <return> to start the executable.

The gdb client should now connect and show a PC location and prompt.

Now, type whatever gdb commands you would like (e.g., “cont”).

Please note that when the wallclock time of the interactive cobalt job is expired, the runjob will be killed, but you will still retain the shell started by the qsub.  You must exit this shell manually.

>exit
exit
Exiting interactive job 195057
>

In Case of Trouble

If a runjob command fails to start, it may be because your job's time has expired.  Check for this using the qstat command:

> qstat $COBALT_JOBID

(Empty output indicates the job is no longer running.)

 

Advanced Usage

You can connect more than one gdb instance to your program. However, there is a system limit of four debug tools of any type that may connect to a job, and each gdb server tool counts separately.

To connect additional gdb tools, proceed as above as far as starting runjob. While runjob is waiting for you to “Enter a rank…”, use another shell window to issue these commands:

# Find out the runjob pid 
ps –u $USER 
# Use the runjob pid, and specify a different RANK and PORT than any tools already started
start_tool --pid <RUNJOB_PID> --tool /sbin/gdbtool --args "--rank=<RANK> --listen_port="

Repeat this process if necessary for a third or fourth tool.

You can see the status of all tools using:

tool_status --pid <RUNJOB_PID>

After all desired tools are started, query the rank for each tool at the runjob prompt (“Enter a rank to see its associated I/O node’s IP address…”). Then, each in a separate shell, start a gdb client as above, and give each its appropriate target remote IP:PORT command. Finally, go back to the runjob and hit to start the execution.All gdb clients should now connect and each give a PC location and prompt.