Characterizing and Optimizing User-Level Threads for Extremely Fine-Grained Concurrency

Event Sponsor: 
Mathematics and Computing Science Seminar
Start Date: 
Aug 21 2015 - 10:30am
Building 240/Room 4301
Argonne National Laboratory
Alex Brooks
Speaker(s) Title: 
University of Illinois at Urbana-Champaign
Abdelhalim Amer

Parallel applications and programming models are increasingly relying on threading models to share node resources to cope with the growing core-density and the scarce per-core resources. To achieve high scalability on such systems, exposing massive concurrency is essential to fully utilize the computational resources and hide communication and data movement latencies (memory accesses, network communication, I/O, etc.). Unfortunately, kernel threads are heavy due to their high scheduling costs. As a result, user-level-based threading (ULT) runtimes are becoming popular as an alternative to kernel threads because of their lightweight nature. Despite their advantages, however, ULTs can still incur expensive scheduling costs in extremely fine-grained execution environments. Although, a large body of work has been dedicated to ULTs, there exist no lower-bounds on those scheduling costs in the context of modern multi- and many-core systems. In addition, it is not clear as to what features the ULT runtime has to expose to applications and programming models in order to approach these bounds without losing the required functionality.

In this work, we perform an in-depth characterization of the Argobots ULT runtime and derive lower-bounds on context-switching costs which constitute a large portion of scheduling overheads. In addition, we show that using a subset of Argobots’s features that are sufficient for certain execution environments can reduce the scheduling overhead by up to 67% with a minimalistic runtime. Furthermore, we demonstrate how exposing a thread-to-thread context-switching ability to the user can further reduce scheduling costs.

Miscellaneous Information: 

Alex Brooks is a third year Computer Science Ph.D. student at the University of Illinois at Urbana-Champaign under the supervision of Professor Marc Snir. During his first two years in the program, Alex has been principal in the design and development of a new parallel runtime system for distributed-memory machines. His research interests revolve around parallel runtime systems, more specifically on the interaction between communication runtimes and threading models.