References

IBM Blue Gene Documents

IBM maintains openly available BlueGene redbooks. Below are links to a search of the IBM redbooks site for 'Blue Gene' and to the main IBM Redbook page where you can see a list of the latest books and search for other RedBooks.

Blue Gene/P

IBM Compilers

Please note that the documentation for the XL compilers on Blue Gene tends to discuss only BG-specific issues:

  • Using the IBM XL Compilers for Blue Gene
  • This and other XL compiler documents are also available on the system at
    • /soft/apps/ibmcmp/xlf/bg/11.1/doc/en_US/pdf
    • /soft/apps/ibmcmp/vacpp/bg/9.0/doc/en_US/pdf

Consult the Linux versions of the documents for a more complete description of general XL functionality:

Driver and eFix Release Notes

Misc

 

Block and Job State Documentation

Block (partition) state and job state transitions are show below. The letters highlighted correspond to those reported by bg-listblocks, bg-listjobs, and mmcs_db_console.

 

 * Normal block state transition
 *
 *
 *                 (C)onfigure      ------------------------------>(D)eallocate
 *                  ^   |           ^             ^                 |
 *                  |   V           |             |                 V
 *   -->(F)ree <-->(A)llocated --> (B)ooting --> (I)nitialized --> (T)erminating
 *                                     ^          |                 |
 *                  ^               ^  |----------|                 |
 *                  |               |             V                 |
 *                  |               -------------(R)ebooting        |
 *                  |                                               |
 *                  |                                               |
 *                  |                                               |
 *                  -------------------------------------------------
 *
 * Note: In the above diagram, the reboot process will use the (R) status only when
 *       when initiated by mpirun.  When reboot is done by an administrator, using
 *       mmcs_db_console, the block goes directly from (I) to (B).
 *
 * Abnormal block state transition
 *
 *   - any error accessing the database will put the block into an (E)rror state,
 *     from which the only state transition is to (F)ree state
 *   - fatal RAS errors while (B)ooting or (I)nitialized will set the block state
 *     to (D)eallocate
 *
 * Notes on (C)onfigure and (D)eallocate
 *
 *   - the (C)onfigure and (D)eallocate states may be set by external components
 *     such as mpirun. These states are monitored by mmcs, and used to trigger
 *     block allocations and deallocations. The other states are only set by mmcs.

 

 * Job State diagram:
 *
 *
 *  deleted  <--- (Q)ueued ----> ready to (S)tart ----> (R)unning ---> (T)erminated
 *                                                       |              ^
 *                                                       \/             |
 *                                                      (D)ying----------
 *
 *  In the two diagrams above, it shows that blocks and jobs can be deleted.
 *  A deleted block or job does not have a state, since it no longer exists.
 *  The reason to include them in the diagrams is simply to show that blocks and
 *  jobs can only be removed when they are in the Free and Queued state, respectively.
 *
 * Job State diagram for debugging program prior to running:
 *
 * (Q)ueued  --->  l(O)ad  --->  (L)oaded  --->  (B)egin  --->  (R)unning
 *                                 |   ^
 *                                 v   |
 *                               (A)ttach Debugger
 *
 * Job State diagram for debugging a running job:
 *
 * (Q)ueued  --->  ready to (S)tart  --->  (R)unning
 *                                           |   ^
 *                                           v   |
 *                                          Debu(G)
 *