Communication-optimal and Topology-aware Algorithms For Dense Matrix and Tensor Computations

Low dimensional mesh and torus topologies allow high performance networks to scale to hundreds of thousands of nodes. Many modern interconnects employ such topologies in Petascale-era supercomputers (BG/P, BG/Q, Cray XE6, K-computer). These torus networks operate at much higher efficiency when applications communicate in a structured and well-mapped manner. In this seminar, I will present
new topology-aware algorithms for dense matrix and tensor computations that run at much higher efficiency on torus networks.

I will introduce a novel class of '2.5D algorithms' for distributed dense linear algebra computations. These algorithms have an optimal theoretical communication
complexity and have a parameterized 3D logical topology. I will also present actual implementations of 2.5D matrix multiplication and LU factorization. These algorithms achieve speed-ups of up to ~3X on 16,384 nodes of BG/P and reduce communication time by up to ~90% over classical algorithms.

The topology-aware mapping ideas used in matrix multiplication generalize to higher-dimensional tensor contractions. Further, a cyclic decomposition allows for seamless mapping of tensors in symmetric packed layouts. Using all of the above ideas as building blocks, I will detail a library implementation that performs automatic topology aware mapping for contractions of tensors with arbitrary dimension and symmetry to arbitrary dimensional k-ary n-cube networks."

Argonne Leadership Computing Facility

Leadership Computing Resources

Featured: Aurora

Computational Science

Featured: Engineering

Growing the HPC Community

Accelerating Science

Support Center

Featured: Get Started

Featured: MyALCF

Communication-optimal and Topology-aware Algorithms For Dense Matrix and Tensor Computations

08/11/2011, 4:30am CT