Exploring Data Migrations for Deep-Memory Many-Core Systems

Loïc Pottier, École Normale Supérieure de Lyon (France)
Raphaël Jakse, Université Grenoble-Alpes (France)
Seminar

Abstract -  Loïc Pottier
Nowadays, more and more supercomputers use many-core systems such Xeon Phi.  The goal of this work is to explore data migration problems in deep-memory many-core architectures. Such systems add a new level in the memory hierarchy: the high bandwidth multichannel RAM (MCDRAM). We will only focus on the Intel's Knights Landing (KNL).

Intel provides two major modes to handle this memory: (i) the cache mode and (ii) the flat mode. In cache mode the KNL just handles the MCDRAM as a big last level cache with direct mapping. In flat mode, the MCDRAM is manually handled by programmers. It is a new, extremely fast, addressable space.

While Intel promotes the cache mode, we think that flat mode could be much more interesting in some cases. Our goal is to prove either theoretically or experimentally that flat mode can obtain better performances with particular workloads. We want to model data migrations impact on scheduling. To do that we used a DAG-based model. This is an ongoing work.

Bio - Loïc Pottier
Loïc Pottier is a first year PhD student at École Normale Supérieure de Lyon (France) under supervision of Anne Benoit and Yves Robert. His main research interests are co-scheduling, resilience, and data management.

Abstract - Raphaël Jakse
Context: Knights Landing (KNL) machines, in addition to a traditional (large) DDR4 memory, also have a smaller but faster (High Bandwidth) on package Memory (HBM) called Multi-Channel DRAM (MCDRAM). For programs that do not fit in the HBM, strategies need to be found to use this new architecture efficiently. One idea is to determine how these programs use the memory and to migrate, at runtime, data between memories to ensure that frequently accessed data at all time is, as best as possible, stored in the fastest memory.

In this work, we built a set of tools (the Round Table Memory Tracer) to instrument programs, get a trace from this instrumentation and add information to this trace. It is written in the C programming language and can be used to analyze memory usage patterns and build a strategy which is specific to the instrumented program. In the trace, each piece of information is added by an independent part of Round Table Memory Tracer, called a tool. These tools communicate with each other using a ring buffer in shared memory.

So far, the tool is able to instrument the execution of a program using dynamic binary instrumentation provided by DynamoRIO (which is similar to Intel Pin) and trace each memory  access. A tool adds blocking, that is, maps each memory access to a group number (thus, we are able to group them by cache line or memory page). Another tool maps a memory access to the name of the function in which it happened, using information about symbols given by DynamoRIO and computes a Reuse Distance histogram.

Future: The long term intent of the Round Table Memory Tracer is to be used to help develop strategies for memory placement and migration for memory-intensive tasks on architectures involving different kinds or levels of memories. These strategies are yet to be developed. In the shorter term, some work remains to be done to improve performance of the ring buffer implementation, especially when dealing with more than one producer thread. Also, instrumenting as currently done is (inherently) slow. However, the tool has been designed so that other means of instrumentation and other pieces of information can be added to the trace easily. One thing that shall be added is instrumentation using counters provided by recent CPUs and that can give precise information about memory usage. Other usages for the Round-Table Memory Tracer can also be found as it is not in any way specific to KNLs.

Bio - Raphaël Jakse
Raphaël Jakse just finished his Master at Université Grenoble-Alpes (France). He will begin as a PhD student in October in the domain of Runtime Verification of Component Based Systems as described in the Behavior Interaction and Priority framework.