Job Scheduling Policy for BG/Q Systems


Mira Job Scheduling

queue layout

Mira Queues

User
Queue

Queue

Nodes

Wall-clock
Time (hours)

Maximum Running per User

Maximum Queued
Per User

prod

prod-short

512 - 4096

0 - ≤6

5

20

prod-long

512 - 4096

>6 - 12

5

20

prod-capability

4097 - 49152

0 - 24

5

20

backfill (‡)

512 - 49152

0 - 6

5

20

prod-1024-torus

prod-1024-torus

1024

0 - 12

5

16

prod-32768-torus

prod-32768-torus 32768 0 - 24 1 20

‡: Depending on the type of project (INCITE, ALCC, Director's Discretionary), this queue may be automatically selected if a project's allocation is negative.

User Interface

Users will submit to the prod or prod-1024-torus queue. Jobs submitted to the prod queue will be re-routed automatically into the queue matching the node-count and wall clock time parameters requested. Job priority in the queue is based on several criteria:

  • positive balance of your project
  • size (in nodes) of the job, larger jobs receive higher priority
  • the type of project (INCITE, ALCC, or discretionary)
  • job duration - shorter duration jobs will accumulate priority more quickly, so it is best to specify the job run time as accurately as possible

Note: Effective July 14, 2014, for the purpose of determining queue priority, all requested wall-clock times (job durations) greater than or equal to 12 hours are treated equivalently. It is still advantageous to accurately estimate your job's required wall-clock time, even for jobs that run for more than 12 hours, because shorter jobs are more likely to be scheduled in partitions being drained for a higher-priority job.

Note: Effective Oct. 28, 2015, capability jobs must request a wall-clock time of at least 30 minutes. This is to ensure sufficient time exists for the application to run once the required block is booted.

Additionally, based on which queue the job routes to, the job may be restricted to running on a subset of the available locations on the machine. The prod-long queue only runs on one row of Mira (1/3 of the machine which is 16,384 nodes). 

prod-short prod-long prod-capability

Backfill Queue

Jobs in this queue are only run when no other jobs are able to be scheduled. This queue is only open to projects that are already negative but need to find a mechanism to continue their work.

INCITE/ALCC Overburn Policy

Capability jobs submitted by INCITE and ALCC projects will run in the regular prod-capability queue (instead of backfill) for some portion of the year until some percentage, greater than 100%, of the project allocation has been consumed. The applicable months of the year, and the associated percentages, are detailed in the table below. Non-capability jobs from projects that have exhausted their allocation will continue to run in backfill. To be clear, this policy does not constitute a guarantee of extra time, and we reserve the right to prioritize the scheduling of jobs submitted by projects that have not yet used 100% of their allocations, so the earlier that an INCITE or ALCC project exhausts its allocation, the more likely it is to be able to take full advantage of this policy.

ALCC Overburn
Month Percentage
January 200%
February 200%
March 150%

 

INCITE Overburn
Month Percentage
July 250%
August 220%
September 190%
October 160%
November 130%

Big Run Monday

As part of our regular maintenance procedures, we will promote to the highest priority any jobs in the queued state in the prod-capability queue.  Currently, planned maintenance days are every other Monday.

We may also, at our discretion, take the opportunity to promote the priority of capability jobs if the system has been drained of jobs for any other reason.

Cetus Job Scheduling

User
Queue

Partition Sizes in Nodes

Wall-clock Time (hours)

Maximum Running per User

Maximum Queued Node-Hours

default

128, 256, 512, 1024, 2048 

0 - 1

5

1024

low

128, 256, 512, 1024, 2048 

0 - 1

3

2048

Cetus scheduling is designed to support application testing and debugging not production work.

Most users on Cetus will submit to the default queue. This queue is intended to provide quick turn-around for application testing and debugging. Even if the job parameters allow it, production science runs should not be submitted to this queue. The goal of the default queue is to provide quick turn-around, and the scoring algorithm gives preference to smaller and shorter jobs. There is also a low queue which is similar in behavior to the backfill queue on Mira without the automatic routing. It is intended for running jobs that can tolerate longer waits than standard debugging jobs.  Jobs in the low queue do not acquire score beyond their initial score. Note that due to the 1024 node-hours limit per user in the default queue, the maximum walltime for a 2048 nodes job should be 30 minutes.

Due to the architecture of the BlueGene/Q system and the IO-density of Cetus, jobs under 512 nodes will share IO node resources with other jobs on the system.  This occurs at larger sizes than on Vesta.

Vesta Job Scheduling

User
Queue

Partition Sizes in Nodes

Wall-clock Time (hours)

Maximum Running per User

Maximum Queued Node-Hours

Default

32, 64, 128, 256, 512, 1024

0 - 2

5

1024

Singles

32, 64, 128, 256, 512, 1024

0 - 2

1

1024

low

32, 64, 128, 256, 512, 1024

0 - 2

None

2048

  • The queue has uniform score accrual (first in/first out).
  • While there is not a limit on the absolute number of jobs queued, you may only queue 1024 node-hours worth of jobs (One 2-hour 512-node job, One 1-hour 1024-node job, Sixteen 64-node jobs, etc.) in the singles queue and the default queue.
  • The singles queue is intended for highly-serialized runs and, while it accrues score with the same speed as the default queue, it will only run one job at a time.
  • The low priority queue has a higher limit, however jobs in this queue will run only if there is nothing else that can run on empty partitions that are not being drained for another job, though they are eligible for backfill.  Low is also restricted to one half of the machine, similar to the restrictions placed on prod-long on Mira.
  • There are two geometries for 1024-blocks on Vesta, 4x4x4x8x2 and 4x4x8x4x2.  The --geometry flag may be used specify which geometry is needed.  This may also be detected at runtime.  Please consult the BlueGene/Q Redbooks for further information.

The I/O configuration on a Blue Gene/Q is different than a Blue Gene/P.  Because of this, partitions smaller than 128 nodes do not have completely isolated I/O.  While using 32 or 64 node partitions, the job will be sharing I/O nodes.  This can result in different timing and performance for an application.  These smaller partitions are still very useful for scaling and CPU focused testing.  

General Scheduling Guidelines

We ask that all users follow these guidelines:

  1. Follow good etiquette.
  2. Clearly, some work will require use of the machine beyond these bounds. Please send email to support at alcf.anl.gov so that we can make arrangements for accomodations.
  3. Well founded complaints from other users will result in the best course of action. In some circumstances, a running job may have to be killed.

System Maintenance Day

  • Every other Monday, 9:00AM until 5:00PM Central Time
    • Use the "showres" command to see when the next maintenance period is scheduled.  This reservation will be labeled “pm”.
    • Notice is sent to the "notify" mailing lists (e.g. mira-notify, vesta-notify) in advance of scheduled maintenance
  • System testing and maintenance jobs will be performed during this period; no other jobs will run
  • If logins are allowed, services like job submission may be disrupted periodically due to maintenance tasks.

Jobs on Hold

Jobs in the queue that are placed on hold (either by the user or by us), which have been on hold for more than 90 days, will be deleted from the queue. Please contact support if you would like to leave a job on hold for a long time period.

Reservations

  • For special needs, please send requests at least 5 business days in advance
  • See the reservations page for more details on requesting reservations, including what information to include in the reservation request

Please inform your Catalyst of your needs