Evaluating and Utilizing Compute Capabilities of Parallel CPU-GPU Architectures

The introduction of various architectures such as multi-core CPUs, GPUs, accelerators and reconfigurable logic no longer restricts parallel computation to large-scale systems, but enables it at a level of a node. In particular, GPUs provide significant computational capability in a single machine. It is therefore important to use benchmarks that can evaluate relative performance of these architectures in exploiting different types of parallelism. This talk gives an overview of Rodinia benchmark suite, developed at the University of Virginia, as the first comprehensive effort to implement a diverse set of codes written in C, OpenMP, CUDA and OpenCL for both CPUs and GPUs. The suite also includes benchmark characterization in terms of computation patterns, diversity and performance.

Enabling better access to large scale computation capability provided by a parallel architecture such as a GPU requires simplifications in the programming paradigm, presumably in the direction of user-friendly, high-level, directive-based languages. This talk describes Trellis, a framework developed in collaboration with the Lawrence Livermore National Laboratory which aims at maintaining a common, high-level code base that is translated to OpenMP and OpenACC for CPU and GPU execution at the back end, respectively. This solution also tries to achieve performance portability by generating target CUDA code to avoid OpenACCs inefficiencies in mapping parallel code structures to GPU hardware. The remainder of the talk mentions other research work and experience.

Argonne Leadership Computing Facility

Leadership Computing Resources

Featured: Aurora

Computational Science

Featured: Engineering

Growing the HPC Community

Accelerating Science

Support Center

Featured: Get Started

Featured: MyALCF

Evaluating and Utilizing Compute Capabilities of Parallel CPU-GPU Architectures

07/15/2015, 5:30am CT