Balancing Performance and Energy in Computing Systems

Connor Imes
Seminar

Modern computing systems must meet multiple, often conflicting, goals; e.g., delivering high performance while respecting strict power constraints.  To support meeting such goals, systems expose knobs for tuning resources, like processor frequency or core allocation, which have a quantifiable impact on application performance and system power consumption.  The optimal resource settings to use depend on both the application and system, and often change during the course of execution as applications progress through different processing phases. Requiring application developers to accurately tune these knobs for different workloads and systems is unrealistic, necessitating more general solutions capable of managing the diversity in both hardware and software systems.

We first address the problem of meeting application performance goals while minimizing energy consumption with POET, the Performance with Optimal Energy Toolkit.  POET leverages control theory, which provides a formal framework for reasoning about dynamic systems, including convergence guarantees and robustness to model inaccuracies.  In contrast, commonly used heuristic techniques cannot provide these guarantees, nor are they always portable. POET’s general design keeps it portable between applications and systems---it operates independent of different knob types and their allowable settings.  Building on this generality, we also demonstrate optimizing application performance under system power constraints.

We then address the problem of optimizing energy efficiency to minimize the execution cost of running applications.  We use machine learning classifiers, driven by low-level hardware performance counters, to predict the most energy-efficient knob settings at runtime based on current application resource utilization.  We evaluate this approach in the High Performance Computing domain, more aggressively trading performance for energy savings than has historically been done, reducing the cost of scientific insight. Extrapolating from empirical single-node performance and power results, scaling the solution to hardware over-provisioned, power-constrained clusters could increase total cluster throughput by up to 24%.

Both projects dynamically adapt to changing application and system behavior at runtime, and are thus better able to provide the desired balance of performance and energy consumption than commonly-used static resource scheduling techniques.  Furthermore, their designs are independent of particular applications and systems, making them portable to a wide range of computing platforms.