Using Runtime Energy Optimizations to Improve Energy Efficiency in High Performance Computing

Sridutt Bhalachandra
Seminar

Abstract:
Energy efficiency in high performance computing (HPC) will be critical to limit operating costs and carbon footprints in future supercomputing centers. In the push to achieve Exascale performance, a commensurate increase in power is no longer feasible. With both hardware and software factors affecting energy usage there exists a need for dynamic power regulation to achieve savings in energy. We identify two opportunities to improve energy efficiency - computational workload imbalance and waiting for a resource, mostly memory. In modern HPC systems, power and thermal constraints affect each chip differently causing on-chip mechanisms that control operating frequency to also vary. The performance will thus vary between cores for even perfectly balanced parallel applications. Memory operations in HPC applications are seldom explicit, making it difficult for the operating system to stall (or switch off) cores and reduce power while waiting on memory. The CPU remains active wasting energy.

We also investigate the effect on enforcement of power limits by external agents on application performance. My thesis differentiates itself from prior work by employing adaptive methods at runtime, and power control levers in the processor that have not been readily applied to the above two scenarios. This dissertation highlights an adaptive runtime framework that can allow processors capable of per-core specific power control to reduce power with little performance impact by dynamically adapting to workload characteristics. Different core-specific power controls can be either employed separately or combined to enhance the effectiveness of the framework. Monitoring of performance and power regulation is performed transparently within the MPI runtime system, so no code changes are required in the underlying application. In presence of workload imbalance, the runtime reduces the frequency on cores not on the critical path thereby reducing power without deteriorating performance. The lowering of frequency on the non-critical cores is shown to reduce run-to-run performance variation and improve performance both on conventional and power-limited systems in certain scenarios. For applications plagued by memory related issues, we identify new memory metrics that facilitate lowering of power without adversely impacting performance.

Speaker Bio:
Sridutt Bhalachandra is a Ph.D. student in the Computer Science department at UNC-Chapel Hill and a Research Assistant at Renaissance Computing Institute (RENCI). His research area is High Performance Computing (HPC) with a focus on energy efficiency; also he has spent some time looking at performance variability and reproducibility. His advisors are Dr. Allan Porterfield and Prof. Jan Prins. He is particularly interested in designing adaptive runtime energy optimization methods that do not degrade performance. He has interned at Sandia National Laboratories (Albuquerque) and Lawrence Livermore National Laboratory, and collaborated with the EEHPC Working Group. In future work, he is interested in leveraging his understanding of runtime systems and processor architectures to develop portable solutions that improve the efficiency of HPC systems. Previously, he has worked as a Systems Engineer at Infosys Labs, Bangalore. Sridutt has a Masters in Computer Science from UNC Chapel Hill and Bachelors in Computer Science Engineering from SDM College of Engineering & Technology, Dharwad under Visveswaraya Technological University, Belgaum. You can reach Sridutt by email at sriduttb@{cs.unc.edu, renci.org} or visit his website at www.cs.unc.edu/~sriduttb.