Towards Efficient Big Data Processing on HPC Systems

Orcun Yildiz
Seminar

Big Data analytics frameworks (e.g., MapReduce, Hadoop and Spark) have been increasingly used by many companies and research labs to facilitate large-scale data analysis. With the increased usage of data analytics on scientific applications now (e.g., data preprocessing, in situ processing, filtering the simulation results), the HPC community is rapidly moving towards ways to leverage these frameworks on HPC systems. In this talk, we present and discuss several strategies towards this goal. First, we present the results of our experimental study regarding the characterization of the performance of Big Data applications on HPC systems. Then, we focus on the I/O interference problem which can be a major performance bottleneck for Big Data applications. Finally, we present and evaluate an I/O management scheme that can enable efficient Big Data processing on HPC systems.