Overview: The Gronkulator, or just “The Gronk”, as it’s called by the creators of the web-based tool at the Argonne Leadership Computing Facility, is a pictorial depiction of the jobs that are running on the computing platforms at ALCF. In addition, the tool also provides information about the jobs that are queued, and if there are any reservation in place on the platforms. The tool can be accessed via https://status.alcf.anl.gov, which takes one to the following page which lists the computing resources at ALCF.
The jobs that are running are color coded by the name of the project allocation that the job is charging to, the length of time for the job has been running (i.e. “Running Time”), the requested “Wall Time”, its location on the machine (i.e. the specific rack partitions on which the job is running), the number of compute nodes on which the job is running and the mode (i.e. the number of MPI threads that the job is running on, on each compute node).
There are two other tabs on the page, the first of which (below) shows the number of jobs that are queued on the platform, and their score (i.e. higher the score, the higher the likelihood of the job running before those that have a lower score, should an appropriately sized partition become available).
The third tab “Reservations”, shows if any reservations have been placed. For instance, if you have submitted a fairly large job in the queue (large by way of the number of nodes on which the job will run, and/or the length of time for which it will run), and you notice that other jobs with a lower score are being transitioned from the “Queued Jobs” state to the “Running Jobs” state, then make sure to check if there is an upcoming “reservation” that is causing the system to “drain”. As a note, there is almost always a reservation for “preventative maintenance” (as shown below).
The IBM Blue Gene/Q platforms, Vesta and Cetus, have a similar (but smaller) layout than that of Mira, while Cooley, the visualization cluster, has the following layout
where each of the squares under a Rack represents a compute node which has 12 cores. Likewise, the Cray XC-40 platform, Theta, has the following layout:
Note: All of the information contained on the Gronkulator is available via the “qstat” command, should one be logged into the system. In addition, there are options in the qstat command that are not available on the Gronkulator. Perhaps the main (if not only) reason why someone would feel compelled to use the Gronkulator is when they are unable to login to the platforms (to use qstat). Such a situation arises when the systems are under preventative maintenance, in which case, the Gronkulator would show the following message for all of the ALCF platforms (as shown in the following example image for Mira).
Using at ALCF: The Gronkulator