Data Analytics for Numerical Simulations: Query based Visualization with Hadoop and In Situ Muti Parametric Studies

Event Sponsor: 
Mathematics and Computer Science Division Seminar
Start Date: 
Nov 9 2016 - 10:30am
Building/Room: 
Building 240/Room 4301
Location: 
Argonne National Laboratory
Speaker(s): 
Bruno Raffin
Speaker(s) Title: 
INRIA, Univ. Grenoble Alpes, France
Host: 
Tom Peterka

This talk will present two recent works related to the analysis of data produced by large numerical simulations. We will first introduce a query based scientific visualization framework based on Big Data tools developed in the context of the Velassco project. We will make a quick survey of  existing Map/Reduce frameworks from Hadoop to Flink and their main functionalities.  Through this work we will try to bring some answers to the question: Could Big Data tools be suitable for numerical simulation results analysis.  In the second part of this talk we will present a recent work focused on in situ statistics for parametric studies. The classical approach consist in running the simulation with different input parameters N times, storing results to disk, and then compute statistics post-hoc.  We will see that by relying on iterative statistics computations we  we can totally suppress the need for intermediate storage, leading to a drastic reduction in storage need and data processing time.

Short Bio: Bruno Raffin is the leader of the DataMove team at INRIA Grenoble, France. Bruno Raffin has a PhD from the Université d’Orléans on parallel programming language design (1997). After a 2 years postdoc at Iowa State University he refocused his research on high performance interactive computing. He led the development of the FlowVR middleware for large scale data-flow oriented parallel applications, used for virtual reality, telepresence and computational steering. He recently retargeted FlowVR at in-situ analytics for large scale parallel application. He also worked on parallel algorithms and cache-efficient parallel data structures (cache oblivious mesh layouts, parallel  adaptive sorting), strategies for task-based programming of multi-CPU and multi-GPU machines.