BG/Q DGEMM Performance

The table below represents the percentage of peak performance for a matrix-matrix multiply BLAS3 dgemm routine as it is implemented in a BG/Q Power A2 core with IBM’s ESSL library. The library is threaded internally and users should take advantage of it to get better performance from the Power A2 core. Even though the core can issue two instructions per cycle, one from each hardware thread, the implementation can take advantage of using all four hardware threads simultaneously.

The data show performance of a single Power A2 core being close to the peak performance of 12.8 GF/s.

single Power A2 core being close to the peak performance of 12.8 GB/s