Queueing a Job on BG/Q

Using the Job Resource Manager on BG/Q: Commands, Options and Examples

This document provides examples of how to submit jobs on our BG/Q system. It also provides examples of commands that can be used to query the status of jobs, what partitions are available, etc. For an introduction to using the job resource manager and running jobs on BG/Q, see Running Jobs on the BG/Q System.

Submit a job request

Use qsub to submit a job. Scripts and interactive jobs are not supported at this time.

Run the compiled binary exe1 with 10 nodes for a maximum of 15 minutes:

   qsub -n 10 -t 15 exe1

To submit jobs to a particular queue, use qsub -q <queue_name>.

To run the compiled binary exe1 with 10 nodes for a maximum of 30 minutes in the production queue:

   qsub -q prod -n 10 -t 30 exe1

Charge a job to a project

Use qsub -A <project_name> to charge a job to a particular project. If you are a member of only one project, you do not need to specify a project name.

To run the compiled binary exe1 with 10 nodes for a maximum of 15 minutes and charge the job to MyProject:

   qsub -n 10 -t 15 -A MyProject exe1

To see which projects you are a member of:

   projects

You can use the environment variable "COBALT_PROJ" to set your default project. qsub -A takes precedence over COBALT_PROJ.

On Vesta, if you are a member of one project (besides your pilot project), the non-pilot project will be your default project.

Delete a job from the queue

To delete a job from the queue, use the qdel command.

Cancel job 34586:

   qdel 34586

If the job failed to cancel (indicating that the resource manager is unable to kill the mpirun's cleanly), you might try again with the force option:

   qdel -f 34586

If you must forcibly delete a job, email Support at alcf.anl.gov with the job i.d. so that the necessary cleanup can be accomplished.

Query partition availability

To determine which partitions are currently available to the scheduler, use the partlist command. This command provides a list of partitions, names, queue, and state. For example:

   % partlist	
   Name                      Queue                     State
   MIR-00000-7BFF1-49152     prod-capability           blocked
   MIR-04000-3BFF1-16384     prod-capability           idle
   MIR-00000-33FF1-8192      prod-capability:backfill  blocked
   MIR-04000-37FF1-8192      prod-capability:backfill  idle
   MIR-44800-77FF1-4096      prod-short:backfill       blocked
   MIR-04000-377F1-4096      prod-short:backfill       idle