Parallel R Workflows

Course Developed for the IRIDIS High Performance Computing (HPC) Facility, University of Southampton.

Welcome! 👋

Course Description

R was originally developed as a single threaded application, which means that by default, computations are run on a single CPU core. While this may be adequate for many computations in R, there are circumstances when this can result in long run times. Luckily, there are several approaches to better leverage the compute power available on modern hardware, from multiple CPUs on local nodes to numerous CPU cores across multiple nodes available through High Performance Computing platforms (using MPI).

In this course, I will introduce you to the basics of parallelisation, present an overview of available approaches to parallelisation in R while focusing on the relatively recent futureverse collection of packages, which aims to provide a consistent API across parallel backends and allow for users to parallelise using their prefered R programming style, whether that’s the lapply family of functions, loops or tidyverse’s purrr package.

We will explore concepts and approaches through practical demonstrations and developing example scripts which we will run both locally as well as on the Iridis cluster.

Course Objectives

After attending this course, participants will:

  • Understand general concepts and strategies to parallelisation and how they relate to underlying hardware.

  • Understand how to identify tasks amenable to parallelisation and be familiar with best practices in code parallelisation.

  • Have a general overview of libraries and R packages available for parallelising computation in R, particularly the futureverse family of packages, and understand when and how to deploy them.

  • Be able to deploy parallel code on the Iridis cluster as a batch job.

Course Outline

Introduction to Parallelisation

  • General concepts

Introduction to parallelisation in R

  • Brief history and demonstration of R package ecosystem evolution towards approaches that generalise across OS systems and parallel backends.

  • Brief demo of use of parallel package demonstrating limitation.

  • Demo of foreach package.

  • Demo of futureverse packages

Developing Parallel R code locally

  • Work with a few example workflows to demonstrate parallelisation concepts and approaches using futureverse packages

  • Demonstrate best practice in developing portable projects locally that can also be run on the HPC cluster with ease.

Running parallel R code on Iridis

  • Run previously developed examples on the Iridis

  • Will include refresher of submitting jobs and requesting resources

    • Logging into the cluster

    • Transferring code and data

    • Contents of a job submission script

    • Submitting a job and accessing results

    • Monitoring jobs and checking for correct resource utilisation.

Back to top