Exploring the Application-Specific Optimization Opportunities to Boost the Utilization of HPC Platforms' Capacity

Xiaodong Yu
Seminar

Modern HPC platforms get more and more complex and powerful. However, under such systems, coarse-grained parallelizations and implementations of specific applications barely achieve advertised optimal efficiency. Moreover, emerging architectures usually are difficult to get popularized due to insufficient programming interface support. The study on unleashing the power of existing HPC platforms is equally crucial as proposing new technologies and hardware. In this presentation, I will demon strate how to explore architecture-aware application-specific optimization opportunities and develop code-generating frameworks on various HPC platforms and highlight a few of my contributions in GPU-based application accelerations. In CT image reconstruction work, we leverage the application’s symmetry characteristics and sparsity pattern to further compress matrix data and optimize data access for both SpMV and SpMV_T. Our design achieves up to 7.2x speedup over GPU counterparts. In Convolutional Neural network work, we explore how different data layouts of tensors impact the performance and develop GPU primitives for different tensor configurations. I will also describe our development of a code-generating framework for emerging Automata Processor solving the programmability and scalability issues.