Scientists and engineers must understand results from experiment and simulation generated data to gain insights and perform knowledge discovery. However, the potential of large-scale systems for analytic capabilities may be difficult to achieve because of limited I/O, file system performance, and a lack of appropriate interfaces for data analysis. In this talk, I will present an active storage for scalable data analysis that enables end-to-end optimizations needed for performance and productivity gains when utilizing petascale systems. Specifically, I will present our design of an active storage node that will allow data analysis, mining, and statistical operations to be executed from within the parallel I/O runtime systems. I will then present the design of a parallel I/O runtime interface that will utilize customized active storage nodes to perform I/O operations. Lastly, I will present our experimental results using a set of data analysis kernels running on our active storage prototype.