The Aurora’s node consists of a pair of Intel Xeon CPUs based of Sapphire Rapids architecture. The majority of the computational performance comes from six Intel Xe GPUs based on Ponte Vecchio architecture. The GPUs are all connected together by an all-to-all interconnect with high bandwidth low latency links. The GPUs are in turn connected to the host CPUs over PCIe. A unified memory architecture allows access to a single address space across the node. Lastly, eight Cray Slingshot fabric endpoints connect the Aurora’s node to the Cray Shasta interconnect.
For the Xe architecture and the Ponte-Vecchio GPU Intel benefits from a long history in developing a widely utilized integrated graphics architecture, currently in its 11th generation. Intel leverages this experience for the Xe architecture to provide an upcoming array of graphic architecture accelerators for a variety of platforms including the exa-scale Ponte-Vecchio compute accelerator.
Intel Gen9 Architecture
The Gen9 GPU architecture is an example of a recent Intel GPU architecture that is scaled to allow for integration with the CPU. This architecture consists of a modular hierarchal design consisting of cores or Execution Units (EU) that are grouped together into SubSlices (SS) and further into Slices. The scalable nature of this device allows for configurations that can vary based on the product range being targeted.
The EUs contain Floating Point Units (FPU) with a flexible SIMD width. Similar to other GPU architectures in the market, they are optimized for high throughput computation. To mitigate this, each EU consists of a group of threads that can exploit concurrency to hide the latency of long operations. These devices typically consist of a shared data cache (L1) among the EUs in a SubSlice and an L3 for the entire device. Finally, a global thread dispatcher load balances and distributes instructions from the device level command streamer to the various SubSlices.