Cobalt Job Control on BG/Q

The queuing system used at ALCF is Cobalt. Cobalt has two ways to queue a run: the basic method and the script method.

Basic Method

In the basic method, enter the information needed for mpirun and Cobalt will do the mpirun when the job starts. These are the most commonly used qsub options (for a complete list, please run "man qsub").

    -A Project     - project (-A YourProject)

    -q queue       - queue (-q R.workshop)

    -t time        - running time (-t 5 for 5 minutes, -t 01:10:20 for 1 hr 10 min 20 sec)
                     includes partition boot - give at least 5 min
                     
    -n NN          - number of nodes (-n 64 for 64 nodes, each node is 1 to 64 MPI tasks depending on how the mode flag is set)

    --mode script/c1/c2/c4/c8/c16/c32/c64  - running mode (default c1)
                          (script for script mode, otherwise cN causes the nodes to run N processes per node)

    --proccount    - number of MPI tasks (ranks) for the run
                     (default is computed from -n and --mode)

    -O Name        - name your job and stdout/stderr (-O Job1)

    -i file        - give a file name to be used for stdin

    --env VAR1=1:VAR2=2:… - specify required environment variables

NOTE: Remember to give all options before the executable name.

Example:

    qsub -A YourProject -q R.workshop -n 256 --mode c16 --proccount 1024 -t 30  \
          --env MYVAR=value1 -i inputdata -O Project1_out program.exe progarg1

Script Method

Alternatively, Cobalt can run a job with a script. The syntax is slightly different than a PBS-style script. Follow this link for the documentation on Cobalt scripting.

Ensemble Jobs

The Blue Gene/Q platform now provides users with the ability to allocate and boot blocks with a Cobalt resource allocation within their script. Unlike the Blue Gene/P platform, booting resources with the Blue Gene/Q is separate from running jobs. For For ensemble jobs, the --disable_preboot flag must be added to the qsub submission line.

Additionally, the get-bootable-blocks utility provides a list of available blocks. This command takes a parent block as an argument, and also accepts --size and --geometry flags as constraints on the blocks returned. Please visit Cobalt's project website for more information on ensemble jobs: http://trac.mcs.anl.gov/projects/cobalt/wiki/BGQUserComputeBlockControl.
 

Queue Names and Scheduling Policy

Queue names and operations are described on the Job Scheduling Policy page.

Project Names

You can find active project names that your account is associated with by running the command:

sbank allocations

If an account is associated with more than one project, a job must be submitted by using a specific project name using -A, or by setting the environment variable COBALT_PROJ.

Submitted Job with the Wrong Arguments

If you submit a job with the wrong arguments, you can modif without deleting it and resubmitting it. Most settings can be changed using qalter.

For example:

  Usage: qalter [-d] [-v] -A <project name> -t <time in minutes> 
             -e <error file path> -o <output file path> 
             --dependencies <jobid1>:<jobid2>
             -n <number nodes of> -h --proccount <processor count> 
             -M <email address> --mode script/c1/c2/c4/c8/c16/c32/c64 <jobid1> <jobid2> 

Note: To change the queue, use qmove.

   Usage: qmove <queue name> <jobid> <jobid>

Changing Executable after Job Submission

When a job is submitted via qsub, Cobalt records the path to the executable or script, but it does not make a copy. As a result, if the executable or script is modified when there is a deletion or modification, it will affect any jobs already submitted that use that executable. To avoid confusion it is generally best to avoid making changes after job submission.

Holding and Releasing Jobs

User Holds

To hold a job (prevent from running), use qhold. This will put the job in the "user_hold" state.

   qhold <jobid>

To release a job in a user hold (user_hold) state, use qrls:

   qrls <jobid>

A job may also be put into a user hold immediately upon submission by passing qsub the -h flag;

   qsub -n 512 -t 120 -A MyProject -h myExe

Dependency Holds

For jobs in the dep_hold or dep_fail state, please see the section on job dependencies

Admin Holds

Jobs in the state admin_hold may only be released by a system administrator.

MaxRun Holds

Jobs may temporarily enter the state maxrun_hold if the user has reached the limit of per-user running jobs in a particular queue. No action is required; as running jobs complete, jobs in the maxrun_hold state will be automatically changed back to queued and eligible to run.

Job Dependencies

To submit a job that waits until another job or jobs have completed, use the dependencies argument to qsub. For example, to submit a job that depends on job 12345,

   qsub -q prod -n 512 -t 10 -A yourproject --dependencies 12345 a.out

For multiple dependencies, list and separate with colons:

   qsub -q prod -n 512 -t 10 -A yourproject --dependencies 12345:12346 a.out

Jobs submitted with dependencies will remain in the state dep_hold until all the dependencies are fulfilled, then will proceed to the state queued.

NOTE: In the event any of the dependencies do not complete successfully (nonzero exit status), the job will instead go into the state dep_fail. To manually release a job that is in either dep_hold or dep_fail:

   qrls --dependencies <jobid>

or alternatively change the job's dependencies setting to "none":

   qalter --dependencies none <jobid>

Customizing the Output of Qstat

Default fields displayed by the qstat command may be changed by setting the QSTAT_HEADER environment variable.

   > qstat

     JobID  User      WallTime  Nodes  State      Location
     =======================================================
     42342  user1     00:15:00  16     user hold  None
     45273  user2     00:35:00  1024   queued     None
     ...

   > export QSTAT_HEADER=JobId:JobName:User:WallTime:RunTime:Nodes:Mode:State:Queue
   > qstat

     JobId  JobName    User      WallTime  RunTime  Nodes  Mode    State      Queue
     ===================================================================================
     42342  -          user1     00:15:00  N/A      16     smp     user hold  short
     45273  -          user2     00:35:00  N/A      1024   smp     queued     medium

One may specify column headers via the --header flag to qstat.

Available field names can be seen by entering "qstat -fl <jobid>" for any current jobid.

Redirecting Standard Input

To redirect the standard input to a job, do not use the '<' redirection operator on the qsub command line. This simply redirects standard input to qsub, not the job itself. Instead, use the qsub option "-i".

   # The wrong way
   qsub -q queuename -t 10 -n 64 a.out < my_input_file.dat

   # The right way
   qsub -q queuename -t 10 -n 64 -i my_input_file.dat a.out

Sbank

The sbank database is updated hourly.  This means transactions against your account can take up to an hour before they show up.

Submitting into Backfill Partitions

Sometimes the scheduler will try to clear up room for a large job. During these times, although there may not be many jobs running, the new jobs are not being scheduled as expected.

At such times, backfill partitions may be available. For instance, suppose that 16 racks are being drained to allow a 16-rack job to run. Of the 16 racks, perhaps eight are empty and the other eight are running an eight-rack job that has two hours of wall time left. This allows the opportunity to run a two-hour eight-rack job in the backfill here.

To discover available backfill, run the partlist command.

For example:

> partlist
Name                        Queue                                    State                            Backfill  Geometry
===========================================================================================================================
[...]
MIR-00000-7BFF1-49152       prod-capability:testing:backfill:R.pm    busy                             -         8x12x16x16x2
MIR-00000-77FF1-32768       prod-capability:testing:backfill:R.pm    blocked (MIR-00000-7BFF1-49152)  -         8x8x16x16x2 
MIR-00000-7BFF1-0100-32768  prod-capability:testing:backfill:R.pm    blocked (MIR-00000-7BFF1-49152)  -         8x8x16x16x2 
MIR-04000-7BFF1-32768       prod-capability:testing:backfill:R.pm    blocked (MIR-00000-7BFF1-49152)  -         8x8x16x16x2 
MIR-00000-3BFF1-24576       prod-capability:testing:backfill:R.pm    blocked (MIR-00000-7BFF1-49152)  -         4x12x16x16x2
[...]

In this example, a 4K-, 8K-, or 16K-node job with a maximum wall time of 45 minutes can be run during this backfill. The backfill times will not always be identical and will depend on the mix of jobs on the partitions that are being drained.

Submitting to a Specific Partition

In rare cases, there may be a need to target a specific hardware partition. This may be accomplished using "--attrs location=".

For example:

  qsub -t 10 -n 8192 --attrs location=MIR-00000-333F1-2048 myprogram.exe

This will force the job to run on that specific location. Should that location become unschedulable, for instance, due to a failed node, the job will not be allowed to run anywhere else, without resetting the location attribute.

Running with a Group of Users

Sometimes it is useful to allow other users to run Cobalt commands on a given job such as qhold, qrls, or qdel. A list of users can be allowed to run commands on your job by submitting a list of users to qsub, cqsub, or qalter using the flag --run_users. Specified users need not be in the same project under which the job was submitted.

For example:

  qsub -A FellowShipOTR -n 512 -t 1:00 --run_users frodo:sam:pippin ./council

As a convenience, all users belonging to the project under which a job was submitted can be added to a list of users that may control a job by using the --run_project flag.

Users who have been added to the list can run any command that the job-submitter could run on a job. This includes, qhold, qrls, qalter, and qdel.

Group Running and File System Groups

While setting this list of users allows any of the listed users to run Cobalt commands on a job, it does not do anything about the permissions of any files involved with the job. Those must be handled by the user(s) setting appropriate permissions on their directories to allow users in their group to read and write files as appropriate. If your project needs a group on the file system to share files or a user needs to be added, email User Services (support@alcf.anl.gov).

More Information

For more information on Cobalt commands, their options, consult the manpages on the system.  The same information may be found online in Cobalt's Command Reference.