
In this session, we will learn how to optimize the code for performance portability so that we get the best performance regardless of whether the code is offloaded to GPU or CPU. We will look at Intel VTune and Intel Advisor Roofline analysis for various optimizations.
Agenda
- Optimize the matrix multiplication code for Performance Portability across CPU and GPU offload - 30min
- Intel VTune Profiling. - 20min
- Intel Advisor Roofline analysis. - 30min
This module is a part of the Aurora Learning Paths Series.
About the Presenter
Rakshith Krishnappa is a developer evangelist at Intel, focused on oneAPI, DPC++, and High-Performance Computing. For the last 16 years, he has worked on various Intel products including CPUs, Graphics, GPUs, HPC products, and Software solutions.
