Scaling Distributed Logging using Application-Defined, Transactional Storage

Pierre Matri
Seminar

Abstract:
Post-petascale, leadership class supercomputers increasingly leverage node-local non-volatile memory. These devices are key to solving the I/O challenges posed by the next-generation of exascale supercomputers. Such devices enable deploying transcient data services in userspace that provide data-intensive applications with exactly the API they need, consequently facilitating application development and often increasing I/O performance compared to traditional file-based storage. We argue that this promising trend towards application-defined storage paves the way to using primitives such as atomic operations and lightweight transactional semantics in HPC applications. Such features ease the development of data structures known to be challenging at large scale such as distributed shared logs. Yet, these data structures are key for efficient live visualization of sequential data such as particle field simulations. We prove on Theta that these operations enable pushing the performance of distributed shared logs orders of magnitude beyond the capabilities of traditional file-based storage.

Short Bio:
Pierre Matri is a Ph.D. student at the Ontology Engineering Group of the Universidad Politécnica de Madrid (Spain) since March 2015. He is currently working in the context of the ETN BigStorage project, funded by the European Union. He received his M.S. degree from the University of Chambéry (France) in 2009. His broad research interests include high-performance data-storage and spans efficient data mining, data analytics, and compact data structures.