Using R on HPC Clusters Webinar

Webinar
ORNL logo

This OLCF hosted Webinar tutorial helps users learn a basic workflow for how to use R on an HPC cluster. The tutorial will focus on parallel computing as a means to speed up R scripts on a cluster computer. Many packages in R offer some form of parallel computing yet they rely on a much smaller set of underlying approaches: multithreading in compiled code, the unix fork, and MPI. The tutorial will take a narrow path to focus on packages that directly engage the underlying approaches, yet are easy to use at a high-level. This workshop is targeted for current users of OLCF, CADES, ALCF and NERSC. Users who do not already have accounts on those system are welcome to attend the lectures but will not be able to participate in all of the hands-on activities.

Objectives 

  1. Learn a workflow to edit R code on your laptop and run it on an HPC cluster
  2. Learn how to use multicore and distributed parallel concepts in R on an HPC cluster system

Topics covered:

Day 1: Friday Wednesday, August 17, 1:00pm 4:00pm EDT

Hardware and software overview and ways to use multiple cores on a single node: using the mclapply function in the parallel package, using multithreaded BLAS

Day 2: Friday , August 19 , 1:00pm 4:00pm EDT

Distributed: Hardware review and using multiple nodes: MPI at high level via pbdMPI package, matrix methods via kazaam and pbdDMAT packages

Hands-on exercises workflow:

Edit your code in RStudio on your laptop  -> push the code to GitHub/GitLab -> pull the code to the cluster and submit as batch -> look at your output and circle back to Edit.

This has the advantage of editing code in a familiar environment and running it in a common teaching environment. Other workflows are possible if you already know the tools.

We start with each user forking a GitHub exercise repository to own GitHub account and working with it as described above. See prerequisites that follow.

Prerequisites:

This workshop is targeted for users of OLCF, CADES, ALCF and NERSC. Users who do not have accounts on systems at those centers will be able to participate in the lectures and laptop hands-on parts of the course but will not be able to do the hands-on parts on the HPC clusters during the course, though the course repo will be provided to all attendees. The workshop assumes that participants have done the following:

For the workshop, we will be allocating time on Theta for current Theta users (no new accounts will be made). Those with ALCF accounts are welcome to join the project: r_workshop (PI: Paige Kinsley) to access Theta for the workshop.