Co-Design of Next Generation Storage System

Ning Liu
Seminar

As supercomputers evolve toward the exascale, the storage subsystems are facing more imminent challenges. On the one hand, application scientists face continued pressure to minimize their interactions with the I/O system, and this situation is likely to result in missed discoveries. On the other hand, storage system architects are striving to seek a balance point between a cost-effective, green storage system and a reliable, resilient, and powerful system capable of dealing with skyrocketing parallelism and extreme bursts of I/O activities.

In this talk, I am going to discuss several aspects of the Co-Design of next generation storage system. First, I will present an exascale communication network model and its simulation. This model is validated using Little\'s Law and a series of P2P communication tests on BG/L platform. We scaled up the experiments to using 128K cores on Intrepid, the BG/P system. Then I will discuss the Co-Design of Exascale Storage System (CODES) framework for evaluating exascale storage system design points. We compared and verified the experimental results from both our storage system simulator and Intrepid storage system. I will further describe enhancements to the storage system simulator to enable burst buffer simulations. We show that burst buffers can accelerate the application perceived throughput to the external storage system and can reduce the amount of external storage bandwidth required to meet a desired application perceived throughput goal. The models and simulations are based on a parallel discrete-event simulation platform: Rensselaer Optimistic Simulation System (ROSS).