Queueing a Job on XC40

Using the Job Resource Manager: Commands, Options, and Examples

This document provides examples of how to submit jobs on our systems. It also provides examples of commands that can be used to query the status of jobs, what partitions are available, etc. For an introduction to using the job resource manager and running jobs on a Cray XC-40, see Running Jobs on XC40. For information on queues on Cray XC-40, see XC-40 queues. For information on priority and scheduling, see XC-40 priority and scheduling.

Submit a Job Request

Use qsub to submit a job. (Unlike jobs on the ALCF BlueGene systems, all jobs on Theta are either script or interactive.)

Run the script jobscript.sh with 10 nodes for a maximum of 15 minutes:

   qsub -n 10 -t 15 jobscript.sh

To submit jobs to a particular queue, use qsub -q <queue_name>.

To run jobscript.sh with 10 nodes for a maximum of 30 minutes in the debug queue for flat memory mode and quad numa mode:

   qsub -q debug-flat-quad -n 10 -t 30 jobscript.sh

Charge a Job to a Project

Use qsub -A <project_name> to charge a job to a particular project.

To run jobscript.sh with 10 nodes for a maximum of 15 minutes and charge the job to MyProject:

   qsub -n 10 -t 15 -A MyProject jobscript.sh

To see which projects you are a member of:

   projects

You can use the environment variable “COBALT_PROJ” to set your default project. qsub -A takes precedence over COBALT_PROJ.

Delete a Job from the Queue

To delete a job from the queue, use the qdel command.

Cancel job 34586:

   qdel 34586

Depending on the stage of a job’s lifetime, qdel may not complete immediately, especially if the delete is issued during startup on a job that is changing memory modes and rebooting a node. If the job does not ultimately terminate, contact support@alcf.anl.gov with the jobid so that an administrator can take appropriate cleanup actions and administratively terminate the job.

Query Partition Availability

To determine which partitions are currently available to the scheduler, use the nodelist command. This command provides a list of node ids, names, queue, and state as well as any backfill windows. For example:

% nodelist
Node_id  Name         Queues   Status             MCDRAM  NUMA   Backfill
================================================================================
[...]
20       c0-0c0s5n0   default  cleanup-pending    flat    quad   4:59:44
21       c0-0c0s5n1   default  cleanup-pending    flat    quad   4:59:44
22       c0-0c0s5n2   default  busy               flat    quad   4:59:44
24       c0-0c0s6n0   default  busy               flat    quad   4:59:44
25       c0-0c0s6n1   default  busy               flat    quad   4:59:44
26       c0-0c0s6n2   default  busy               flat    quad   4:59:44
27       c0-0c0s6n3   default  busy               flat    quad   4:59:44
28       c0-0c0s7n0   default  idle               flat    quad   4:59:44
29       c0-0c0s7n1   default  idle               flat    quad   4:59:44
30       c0-0c0s7n2   default  idle               flat    quad   4:59:44
31       c0-0c0s7n3   default  idle               flat    quad   4:59:44
32       c0-0c0s8n0   default  idle               flat    quad   4:59:44
33       c0-0c0s8n1   default  idle               flat    quad   4:59:44
34       c0-0c0s8n2   default  idle               flat    quad   4:59:44
[...]