Scaling Distributed Logging using Application-Defined, Transactional Storage

Event Sponsor: 
Mathematics and Computer Science Division
Start Date: 
Jul 28 2017 - 11:00am
Building/Room: 
Building 240/Room 4301
Location: 
Argonne National Laboratory
Speaker(s): 
Pierre Matri
Speaker(s) Title: 
Universidad Politécnica de Madrid (Spain)
Host: 
Phil Carns

Abstract:
Post-petascale, leadership class supercomputers increasingly leverage node-local non-volatile memory. These devices are key to solving the I/O challenges posed by the next-generation of exascale supercomputers. Such devices enable deploying transcient data services in userspace that provide data-intensive applications with exactly the API they need, consequently facilitating application development and often increasing I/O performance compared to traditional file-based storage. We argue that this promising trend towards application-defined storage paves the way to using primitives such as atomic operations and lightweight transactional semantics in HPC applications. Such features ease the development of data structures known to be challenging at large scale such as distributed shared logs. Yet, these data structures are key for efficient live visualization of sequential data such as particle field simulations. We prove on Theta that these operations enable pushing the performance of distributed shared logs orders of magnitude beyond the capabilities of traditional file-based storage.

Short Bio:
Pierre Matri is a Ph.D. student at the Ontology Engineering Group of the Universidad Politécnica de Madrid (Spain) since March 2015. He is currently working in the context of the ETN BigStorage project, funded by the European Union. He received his M.S. degree from the University of Chambéry (France) in 2009. His broad research interests include high-performance data-storage and spans efficient data mining, data analytics, and compact data structures.