Towards Generalized Parallel Programming Models for Large Scientific Data Analysis and Visualization

Wes Kendall
Seminar

Parallel programming models like MapReduce have allowed companies to easily process information at scales of hundreds of petabytes and beyond. MapReduce and other popular industrial approaches, however, do not suffice for scientific datasets and analysis problems. The increasing size of datasets creates a pressing need in the visualization community for a general and scalable framework. In this talk, I discuss recent advances in a data analysis framework that aid in progressing towards the goal of a parallel programming model. Some key components that I discuss are the I/O techniques that are used by this framework along with how they are also being used in other large scale applications like parallel particle tracing. I also will cover the parallel sorting functionality that is currently supported in the framework and how I managed to scale it to 32 K processes of BGP. I show how the current framework can be applied to various large scale problems like climatic analysis (using over a TB of MODIS satellite data) and flow field analysis.

Wes Kendall is a graduate research assistant under Jian Huang at the University of Tennessee, Knoxville. His primary interest is large data analysis and visualization. He has interned at Oak Ridge National Labs, spent two summers with Argonne National Labs, and recently spent the summer with the Google MapReduce team. He has been in close collaboration with Argonne National Labs since his internships and has authored publications with ---them at Supercomputing and the Eurographics Symposium on Parallel Graphics and Visualization.