Scalable Data Analysis and Active Storage

Seung Woo Son
Seminar

Scientists and engineers must understand results from experiment and simulation generated data to gain insights and perform knowledge discovery. However, the potential of large-scale systems for analytic capabilities may be difficult to achieve because of limited I/O, file system performance, and a lack of appropriate interfaces for data analysis. In this talk, I will present an active storage for scalable data analysis that enables end-to-end optimizations needed for performance and productivity gains when utilizing petascale systems. Specifically, I will present our design of an active storage node that will allow data analysis, mining, and statistical operations to be executed from within the parallel I/O runtime systems. I will then present the design of a parallel I/O runtime interface that will utilize customized active storage nodes to perform I/O operations. Lastly, I will present our experimental results using a set of data analysis kernels running on our active storage prototype.