Performance, Portability, and Productivity
Developers of high-performance computing applications are faced with an increasingly diverse number of computing platforms featuring multiple generations of CPUs, GPUs, FPGAs and ASICs. Developing code that is performant and portable across a diverse set of platforms is expensive in terms of time spent trying to achieve the best result across a given set of platforms.
This hands-on learning path will explore the use of oneAPI and Data Parallel C++ to demonstrate a method to achieve performant, portable code across five different platforms available on the Intel Devcloud.
We will define Performance, Portability and Productivity as:
- Performance: Measurable quantity representing a characteristic of an application run on a platform. Typically, throughput or time-to-solution. We often express it as percentage of peak performance on platforms.
- Portability: Ability of application to run the correctly on different platforms.
- Productivity: The ratio of application code output to invested development effort, where application code output represents features, optimizations, maintenance, etc.
Tying it together we will use the following definition "An application is performance portable if it achieves a consistent ratio of the actual time to solution to either the best-known or the theoretical best time to solution on each platform with minimal platform specific code required."
In order to demonstrate this, we will explore several GEMM examples using DPC++. We will introduce several techniques to measure the effectiveness of the applications across the platforms. We will use timer functions inside the applications measuring kernel and compute times and use this information to determine relative efficiency compared to a best implementation. In addition, we will use roofline analysis and VTune to measure the applications performance across the represented platforms.
Our learning path will not be an exhaustive optimization exercise nor considered benchmarking, rather we will focus on using DPC++ as a method for heterogeneous programming that enables the developer to execute their code across CPU's and GPU's with minimal changes to the source. We will explore some techniques to improve the performance of the GEMM examples across platforms and introduce tools to gain insight into an application.
About the Speakers
Rakshith Krishnappa is a developer evangelist at Intel, focused on oneAPI, DPC++ and High-Performance Computing. For the last 16 years he has worked on various Intel products including CPUs, Graphics, GPUs, HPC products and Software solutions.
Ben Odom is a developer evangelist at Intel, focused on highlighting, training and showcasing Intel products and tools to developers worldwide. Currently Ben is working to develop coursework for oneAPI and Data Parallel C++ (DPC++). Ben has been in the tech industry for over 20 years and has worked on delivering course work and solutions for Intel’s developer ecosystem on Artificial Intelligence, IoT, and Android.