Numerical Behavior of GPU Matrix Multiply-Accumulate Hardware

Mantas Mikaitis, The University of Manchester
DOE supercomputer

Description: Tensor cores and matrix engines are hardware units on the latest GPUs that perform dot product or matrix multiply accumulate (MMA) operations. 127 of the TOP500 supercomputers contain these units and a lot of the numerical libraries begin to utilize them in various algorithms in scientific computing. Tensor cores and similar arithmetic units are targeted at low precision machine learning algorithms and therefore are not necessarily compliant with the IEEE 754 standard. The features such as rounding, normalization, order of operations, subnormal number support and others can differ from a standard software implementation of the matrix multiplication. In this talk I will discuss our recent work on determining various numerical features of MMAs, using NVIDIA tensor cores as an example test case. We determined the features of the three generations of the tensor core with the carefully constructed numerical test cases on the V100, T4 and the A100 NVIDIA GPUs and have explored the effects those features have on applications.

Location: Zoom videoconferencing: