Dynamic Random Access Memory (DRAM) is used as the main form of storage in main memory. DRAM is susceptible to cosmic radiation, alpha particles, voltage variations and aging. All of these factors might cause bit faults, word faults, and even entire chip faults. As more main memory is needed on desktops, servers and supercomputers the chances of finding faults on DRAM will also increase.
Redundant Array of Independent Memories (RAIM) is an approach to protect the memory. By adding additional parity channels to the main memory, RAIM is able to protect the main memory against the failure of an entire channel, which makes the main memory a lot more robust than memory systems with only ECC protection. In this seminar we examine the impact on performance and energy that RAIM schemes have on a system, while also proposing a new type of RAIM implementation that reduces on- and off-chip traffic and energy consumption. Compared to contemporary RAIM implementations, our proposed scheme can reduce traffic and energy consumption by more than 2x.
The seminar will be streamed; see details at https://anlpress.cels.anl.gov/mcs-streaming-seminars.