10x10: A Customized Approach to Energy-Efficient Execution

Apala Guha
Seminar

In this talk, I will start by giving some background on my past research projects and how they connect to my current research. I will focus on the dynamic binary translation project that I worked on for my PhD thesis. The rest of the talk will be about my current research project, “10x10”. 10x10 is motivated by two current trends in computer architecture - a power wall limiting the energy consumption in processors and transistor scaling characteristics. With every generation, transistor area is decreasing by a factor of two, allowing more micro-architectural features to be packed into the same chip area and faster signal routing. However, transistor power consumption is decreasing at a slower rate, leading to an overall increase in power consumption. Therefore, all transistors on a chip cannot be switched at full frequency at the same time to avoid hitting the power wall. This fact has led to the adoption of multi-cores (operated at relatively low frequency) and many-cores (e.g., GPUs). However, even these architectures fall short of exascale-era computing requirements, because of insufficient energy scaling. Our solution for power and performance scaling stems from the insight that tailoring architectures to their target application set can lead to huge improvements. Homogeneous cores waste energy in powering logic and data movement that is not essential to the computation. Therefore, we are designing customized architectures for general-purpose workloads. We envision a chip that is an ensemble of heterogeneous cores, each tailored to certain application characteristics. These application characteristics may include parallelism type, memory access pattern, communication pattern, compute intensity, operation and datatype mix, etc.

There are three fundamental challenges to this research. The first one is that general-purpose workloads span a large set of applications. We need to show that general-purpose workloads can be grouped based on architecturally exploitable features and that these groups are manageable in number, exhibit architecturally coherent features and are distinct. The second challenge is to design heterogeneous cores targeting application features. Chip area and data movement between cores are constraints in this challenge. The final challenge is programming for heterogeneous systems. While programming for multi-cores and many-cores is still viewed as a complex problem, heterogeneous cores take the problem one level further. Now it is necessary to first identify the core to target code for, before actually optimizing and generating code. I will present in-depth research on the first challenge and a discussion on the approaches that we are pursuing for the other challenges.