An Architecture for Scaling Spark on HPC Systems

Event Sponsor: 
Mathmatics and Computer Science Division Seminar
Start Date: 
Dec 1 2017 - 10:30am
Building/Room: 
Building 240/Room 4301
Location: 
Argonne National Laboratory
Speaker(s): 
Silvina Caíno Lores
Speaker(s) Title: 
University of Madrid
Host: 
Tom Peterka

Abstract:
The growing demand for efficient data analysis and visualization of modern HPC applications has increased ​​efforts to leverage data-centric frameworks from the Big Data ecosystem. Nevertheless, we need to adapt these plat​​forms to get the full benefits of HPC infrastructures. This seminar explores the possibility to layer the popular Spark ​application model​ and its higher level tools (e.g. GraphX, Streaming, Mllib) on top of a highly scalable MPI-based library (DIY). We will present an architecture that​ maps the RDD data abstraction of Spark​ and its task-oriented execution model, with the block-based nature of DIY​​ ​and the underlying fabric​ of​ MPI processes. As a result, the data-intensive communication patterns implemented in DIY are transparently supported with minor additions to the Spark programming model.

Bio:
Silvina Caíno-Lores is a PhD candidate at Carlos III University of Madrid under the supervision of Prof. Jesús Carretero and Prof. Florin Isaila.