Data Analytics for Numerical Simulations: Query based Visualization with Hadoop and In Situ Muti Parametric Studies

Bruno Raffin
Seminar

This talk will present two recent works related to the analysis of data produced by large numerical simulations. We will first introduce a query based scientific visualization framework based on Big Data tools developed in the context of the Velassco project. We will make a quick survey of  existing Map/Reduce frameworks from Hadoop to Flink and their main functionalities.  Through this work we will try to bring some answers to the question: Could Big Data tools be suitable for numerical simulation results analysis.  In the second part of this talk we will present a recent work focused on in situ statistics for parametric studies. The classical approach consist in running the simulation with different input parameters N times, storing results to disk, and then compute statistics post-hoc.  We will see that by relying on iterative statistics computations we  we can totally suppress the need for intermediate storage, leading to a drastic reduction in storage need and data processing time.

Short Bio: Bruno Raffin is the leader of the DataMove team at INRIA Grenoble, France. Bruno Raffin has a PhD from the Université d’Orléans on parallel programming language design (1997). After a 2 years postdoc at Iowa State University he refocused his research on high performance interactive computing. He led the development of the FlowVR middleware for large scale data-flow oriented parallel applications, used for virtual reality, telepresence and computational steering. He recently retargeted FlowVR at in-situ analytics for large scale parallel application. He also worked on parallel algorithms and cache-efficient parallel data structures (cache oblivious mesh layouts, parallel  adaptive sorting), strategies for task-based programming of multi-CPU and multi-GPU machines.