FAQs for Queueing and Running on Theta

Help Desk

Hours: 9:00am-5:00pm CT M-F
Telephone: 866-508-9181 (Toll-Free, US Only) or 630-252-3111
Email: support@alcf.anl.gov

Theta

 

 

Contents

  1. Where can I find the details of a job submission?
  2. Why is my job stuck in "starting" state?
  3. What are the "utime" and "stime" values printed at the bottom of the <jobid>.output file?

  4. Do #COBALT directives need to start on the second line of job script?

Back to top

Where can I find the details of a job submission?

Details of the job submission are recorded in the <jobid>.cobaltlog. This file contains the qsub command and environment variables. The location of this file can be controlled with the ‘qsub --debuglog <path>’ that defaults to the same place as the .output and .error files.

Why is my job stuck in "starting" state?

If you submit a job and qstat shows it in "starting" state for 5 minutes or more, most likely your memory/numa mode selection requires rebooting some or all of the nodes your job was assigned. This process takes about 15 minutes, during which your job appears to be in the "starting" phase. When no reboots are required, the "starting" phase only lasts a matter of seconds.

What are the "utime" and "stime" values printed at the bottom of the <jobid>.output file?

At the bottom of a <jobid>.ouput file, there is usually a line like:

Application 3373484 resources: utime ~6s, stime ~6s, Rss ~5036, inblocks ~0, outblocks ~8

The "utime" and "stime" values are user CPU time and system CPU time from the aprun and getrusage commands. They are rounded aggregate numbers scaled by the number of resources used, and are approximate. The aprun man page has more information about them.

 Do #COBALT directives need to start on the second line of job script?

 

Yes, if #COBALT directives are used inside a job submission script, then they must appear at the topmost lines of the script. #COBALT directives following a blank line will be ignored. Attempting to qsub the following example script will lead to the error message below.

> cat submit.csh

#!/bin/csh

 

#COBALT -n 128 -t 2:00:00 -q default

 

aprun -n 8192 -N 64 -d 1 -j 1 --cc depth ./my_app

 

> qsub submit.csh

Usage: qsub.py [options] <executable> [<excutable options>] Refer to man pages for JOBID EXPANSION and SCRIPT JOB DIRECTIVES. No required options provided

A correct submission script would look like the following with the blank line removed.

 

> cat submit.csh

#!/bin/csh

#COBALT -n 128 -t 2:00:00 -q default

 

aprun -n 8192 -N 64 -d 1 -j 1 --cc depth ./my_app

 

> qsub submit.csh

12345

 

Systems