Theta Programming Environment Upgrade on July 27

July 27 Programming Environment Upgrade (follow-on to the OS Upgrade)

The OS upgrade was completed on July 22, but expected programming environment (PE) changes were not implemented at that time.  As of July 27, the PE has been upgraded to the very latest Cray PE.

======= 

Major OS Upgrade

The Programming Environment (PE) was updated to the latest 6.0.7 that Cray offers.  This is necessary for full compatibility with the previously completed OS upgrade.

Users should note that:

  • With the new PE, when linking statically (default), users may see the following warning message that can safely be ignored. This warning is not a problem, please ignore it.

/usr/bin/ld: /opt/cray/pe/atp/3.6.4/lib/libAtpSigHandler.a(libAtpSigHandler_la-libAtpSigHandler.o): in function '(anonymous                namespace)::get_frontend_addrinfo(char const*, char const*)': /workspace/src/libAtpSigHandler/libAtpSigHandler.cpp:200: warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking

  • Packages previously built with spack will no longer work and will need to be rebuilt
  • Anyone using spack to build should re-detect or otherwise update their compiler settings and should verify version numbers for  any external packages they are using.  
  • User codes need to be rebuilt and/or relinked.
  • Intel compilers from 19.x still work, but older ones will not.
  • Statically linked applications should not be affected, unless they depend on the OS in some non-obvious way. The default will remain static linking.
  • Compiles with static linking (default) going forward will show a compiler warning message – you can disregard.
  • Huge pages of various sizes will be available as loadable modules, with none loaded by default.
  • Codes running against the old Cray Programming Environment may not be affected IF the software and module versions are still available.

Improvements

  • Newer versions of system software will offer better support for user software.
  • Newer compiler versions are become available.
  • New builds will be valid though Theta's end-of-life.
  • Default routing has changed from Adaptive-0 to Adaptive-3 to reduce congestion effects.

Known impacts and remediations

We are taking measures to minimize impacts and ensure continuity of available software. The current list of known impacts and remediations:

  • With the new PE, when linking statically (default), users may see the following warning message that can safely be ignored. This warning is not a problem, please ignore it:  /usr/bin/ld: /opt/cray/pe/atp/3.6.4/lib/libAtpSigHandler.a(libAtpSigHandler_la-libAtpSigHandler.o): in function '(anonymous namespace)::get_frontend_addrinfo(char const*, char const*)': /workspace/src/libAtpSigHandler/libAtpSigHandler.cpp:200: warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
  • Cray will no longer be supporting HDF5 1.8.x versions. We will deploy our own build of 1.8.16. 
  • Intel compilers and software products are not expected to be affected.
  • Python support under Intel python should be minimally affected, similarly for conda or user-provided pythons. However, if any of a code's underlying dependencies are reliant upon packages the Cray programming environment or upon specific versions of system libraries, they *may* be affected. The upgraded OS will support both python 2.7.17 and 3.6.10. 
  • This upgrade was a clean wipe of the current programming environment, including spack-built software.
  • Many user codes will need to be re-built and/or re-linked against the newer versions of programming environment and spack provided dependencies.
  • Some older versions of cray-built packages will no longer be available. In some instances this may require migrating to a newer version of a dependency, or sanctioning a spack-built replacement for the cray package.
  •  Summary of Cray software versions that will NOT be available after upgrade. Please contact support@alcf.anl.gov if any of these are essential for you:
    • ATP 2.1.0, 2.1.1, 2.1.2 
    • CCE 8.*.*, 9.*.*, 9.0.2
    • FFTW 3.3.4.11, 3.3.8.1, 3.3.8.2, 3.3.8.3
    • GA 5.3.0.7-9
    • HDF5 1.10.0, 1.10.2.0, 1.10.5.1
    • LGDB 3.0.5-7
    • LibSci 16.*, 17.*, 18.*, 19.06.1
    • MPT 7.5.x, 7.6.x, 7.7.6, 7.7.2-4, 7.7.10
    • NetCDF 4.4.*, 4.6.1.3, 4.6.3.1
    • PAPI 5.5.*, 5.6.*, 5.7.0.2
    • Perftools 6.5.*, 7.0.*, 7.1.1
    • PETSC 3.7.*, 3.8.*, 3.9.*
    • STAT 2.*, 3.0.*
    • TPSL 17.*, 18.*, 19.*

Here is the before-vs-after diff of the OS upgrade.

As much as is practical, we are encouraging users to migrate to newer versions. Our users are a tremendous resource and we are reaching out for feedback during this process so that we can address and test migration issues in a timely matter.

Changes to adaptive routing

The adaptive routing has been changed from "Adaptive 0" to "Adaptive 3". This change will result in positive overall performance improvement for applications especially those that are sensitive to network latency. Although we don't suggest it, any application may return to the default ADAPTIVE_0 by unloading the adaptive-routing-a3 module.

module unload adaptive-routing-a3

We appreciate any feedback on this change and how it may have impacted your application performance. Theta (Cray XC 40) uses the packet-level adaptive routing† which transfers packets on network potentially avoiding the congested links. This ensures balancing the network load on the available paths thereby realizing high network utilization even under heavy network load.

Adaptive routing on Aries comes in four different flavors differed by the way the weighting (bias) given to the minimal vs. nonminimal paths. The default adaptive routing used so far on Theta is ADAPTIVE_0 which has no bias towards minimal or nonminimal. Our recent research found that ADAPTIVE_3 which has a strong bias towards minimal routing is optimal for majority of the workloads as well as overall system-level congestion management, hence a recommendation was made to switch the default routing mode on Theta to ADAPTIVE_3.

https://www.cray.com/sites/default/files/resources/CrayXCNetwork.pdf