Communication-optimal and Topology-aware Algorithms For Dense Matrix and Tensor Computations

Edgar Solomonik, University of California @ Berkeley
Seminar

Low dimensional mesh and torus topologies allow high performance networks to scale to hundreds of thousands of nodes. Many modern interconnects employ such topologies in Petascale-era supercomputers (BG/P, BG/Q, Cray XE6, K-computer). These torus networks operate at much higher efficiency when applications communicate in a structured and well-mapped manner. In this seminar, I will present
new topology-aware algorithms for dense matrix and tensor computations that run at much higher efficiency on torus networks.

I will introduce a novel class of '2.5D algorithms' for distributed dense linear algebra computations. These algorithms have an optimal theoretical communication
complexity and have a parameterized 3D logical topology. I will also present actual implementations of 2.5D matrix multiplication and LU factorization. These algorithms achieve speed-ups of up to ~3X on 16,384 nodes of BG/P and reduce communication time by up to ~90% over classical algorithms.

The topology-aware mapping ideas used in matrix multiplication generalize to higher-dimensional tensor contractions. Further, a cyclic decomposition allows for seamless mapping of tensors in symmetric packed layouts. Using all of the above ideas as building blocks, I will detail a library implementation that performs automatic topology aware mapping for contractions of tensors with arbitrary dimension and symmetry to arbitrary dimensional k-ary n-cube networks."