Matrix Multiplications: A Lot of Ways of Handling a Few Nonzeros

Karl Rupp
Seminar

Optimizing the implementation of dense matrix-matrix multiplication kernels often boils down to finding a single, outstanding implementation for a particular machine. If the matrices involved have only few nonzeros, however, there is no longer a single, outstanding implementation. Instead, the fastest algorithm and the fastest implementation depend on details such as the particular location as well as the number of nonzeros of the matrices involved. As a consequence, libraries such as PETSc include implementations of different algorithms for matrix-matrix multiplications of such sparsely populated matrices.

This talk explains the concepts underlying the different algorithms for sparse matrix-matrix multiplications. The audience will learn how these concepts carry over to graph algorithms and how the use of hardware features such as vector registers may result in better performance. By considering a range of typical applications with characteristic nonzero patterns, an inexpensive heuristic is proposed to select a fast sparse matrix-matrix multiplication implementation for the matrices provided.