Evaluating and Utilizing Compute Capabilities of Parallel CPU-GPU Architectures

Lukasz G. Szafaryn
Seminar

The introduction of various architectures such as multi-core CPUs, GPUs, accelerators and reconfigurable logic no longer restricts parallel computation to large-scale systems, but enables it at a level of a node. In particular, GPUs provide significant computational capability in a single machine. It is therefore important to use benchmarks that can evaluate relative performance of these architectures in exploiting different types of parallelism. This talk gives an overview of Rodinia benchmark suite, developed at the University of Virginia, as the first comprehensive effort to implement a diverse set of codes written in C, OpenMP, CUDA and OpenCL for both CPUs and GPUs. The suite also includes benchmark characterization in terms of computation patterns, diversity and performance.

Enabling better access to large scale computation capability provided by a parallel architecture such as a GPU requires simplifications in the programming paradigm, presumably in the direction of user-friendly, high-level, directive-based languages. This talk describes Trellis, a framework developed in collaboration with the Lawrence Livermore National Laboratory which aims at maintaining a common, high-level code base that is translated to OpenMP and OpenACC for CPU and GPU execution at the back end, respectively. This solution also tries to achieve performance portability by generating target CUDA code to avoid OpenACCÂ’s inefficiencies in mapping parallel code structures to GPU hardware. The remainder of the talk mentions other research work and experience.