High Performance Data Analysis Framework

Event Sponsor: 
Argonne Leadership Computing Facility Seminar
Start Date: 
Jul 26 2016 - 10:00am
Building/Room: 
Building 240/Room 4301
Location: 
Argonne National Laboratory
Speaker(s): 
Pragnesh Patel
Speaker(s) Title: 
Oak Ridge National Laboratory
Host: 
Venkat Vishwanath

The R programming language is known for its diversity and sophistication in data analysis, however its scalability to big data has been lacking. In our work that resulted in the pbdR suite of R packages, integration of scalable libraries and development of ease-of-use components inside R is firmly rooted in best practices from the HPC community. This is a requirement for effective integration of the HPC components and it is a departure from some traditional practice in R. We favor realigning R’s parallel computing infrastructure with standards in HPC rather than continue non-standard developments that have limited scalability and limited ability to leverage results from the HPC community. We have developed several packages that provide a tight coupling of R with highly scalable libraries, enabling scalability to terabytes of data on tens of thousands of cores. We have released core packages and application package on the CRAN. The packages naturally separate into 4 categories : General, I/O, Computation, and Application. I will present about pbdR packages along with applications.

Miscellaneous Information: 

Information: Conference room is located on the fourth floor. Go to the right after you get off of the elevator. The room will be on your left side.

Please click below to add this event to your calendar.
[schedule.ics]