Exercise 1b: Mapping General Health Categories across Local Authority Districts

Parallel R workflows on Iridis

Now it’s time to run our exercises on Iridis.

We’ll start with the health data exercise.

As already mentioned, to run jobs on Iridis, we submit slurm batch jobs using command sbatch.

We need to supply a .slurm script to the sbatch command which contains configuration details required to run our job including:

the computational resources we need.
name of the job.
where to write out stdout and stderr messages.
setting and exporting environmental variables.
setting an email for job notifications.
loading any modules required for the job to run
the final command to run to initiate the job

Let’s examine this in more detail by examining the first .slurm file we will submit to slurm, generate-maps-furrr.slurm. All slurm files we will use for this exercise are in the health_data/slurm/ directory.

generate-maps-furrr.slurm

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks=10
#SBATCH --mem=3G
#SBATCH --job-name=map_furrr
#SBATCH --time=00:05:00
#SBATCH --output=logs/R-%x.%j.out
#SBATCH --error=logs/R-%x.%j.err
#SBATCH --export=ALL,TZ=Europe/London
#SBATCH --mail-type=ALL
# send mail to this address
#SBATCH --mail-user=your.email@here.com

module load gdal/3.0.1
module load proj/6.1.1
module load geos/3.6.2
module load R/4.4.1-gcc

cd parallel-r-materials/
Rscript health_data/generate-maps-furrr.R 1 10

Anatomy of a `.slurm` file

To start off, notice the #!/bin/bash line at the top of the script. This indicates the script is actually a bash script. Bash scripts are to the unix command line language what .R scripts are to the R language.

Warning

Older versions of the materials had #!/bin/sh instead of #!/bin/bash at the top of each .slurm file. This used to work on Iridis but a recent software update now requires #!/bin/bash at the top of each files for submission files to be executed correctly.

If you have #!/bin/sh at the top of any of your .slurm files, please change it to #!/bin/bash.

`SBATCH` section

Next we see a chunk of comments starting with #SBATCH. These will all be passed to the sbatch command as options.

Note

You can examine the full list of sbatch options in the sbatch official documentation.

Let’s have a look at what each one does:

--nodes=1: requests one node.
--ntasks=10: requests 10 CPU cores.
--mem=3G: requests 3G of memory in total for the job.
--job-name=map_furrr: name of the job.
--time=00:05:00: walltime the job is allowed to run for.
--output=logs/R-%x.%j.out: path to file name where stdout messages will be written. Here we use special slurm notations %x (job name) and %j (job ID) to create a file name that includes the job name and ID. Note that we are also storing logs in a logs/ directory in our home directory.
--error=logs/R-%x.%j.err: as above but for stderr output.
--export=ALL,TZ=Europe/London: export environment variable TZ=Europe/London (to silence some annoying tictoc warning messages) in addition to all the variables available in the submission environment (ALL).
--mail-type=ALL: send email notification when a job begins, ends, fails or is requeued.
--mail-user=your.email@here.com: send notifications to this email address.

Before moving on, let’s go ahead and replace the dummy email address in the file with your own email address.

Script section

What follows after the SBATCH comments is effectively a bash script in which a sequence of commands are listed before finally calling our workflow script.

Module load section

In this section we are loading all the software required for our job to run. We obviously need R and specify version R/4.4.1. We also need to load some additional geospatial libraries that R package sf needs available.

Working directory setting

We are also setting the working directory of execution to the parallel-r-materials directory on our home directories by using command cd parallel-r-materials/.

It’s important to cd into the root of our project directory to ensure here() works correctly and therefore paths within our scripts also resolve correctly.

Running our script using `Rscript`

The final line of our script calls the Rscript command which runs the R script supplied as an argument.

Rscript health_data/generate-maps-furrr.R 1 10

In our case, it will run the health_data/generate-maps-furrr.R script.

The Rscript command allows us to run R scripts from the command line (as opposed to interactively through the console. It can also be used on you local machine and can often be much faster than running R code interactively.

You may have noticed two trailing numbers, 1 and 10 after the script to run specification. These are command line arguments. They won’t do anything at the moment but we’re now going to edit our generate-maps-furrr.R script to be able to make use of these arguments, in particular to set idx_start and idx_end and therefore specify the range of LADs we want to produce maps for!

It’s good practice to test your workflow with a small subset of the data before running a full big job. This will allow us to specify the size of our test data during job submission without having to change our script every time.

Command Line arguments

Command line arguments are captured in R scripts using function commandArgs() which, if trailingOnly = TRUE is set, returns a character vector of those arguments (if any) supplied after –args.

In their basic usage they are unnamed (but see this blog post for setting named arguments up) and individual arguments can be accessed through indexing into the vector returned by commandArgs(). If an indexed element is missing it will return NA.

Let’s edit our generate-maps-furrr.R file so that, if idx_start/idx_end have not already been set and a command argument is provided, it is used to idx_start/idx_end accordingly. Let’s also make it so that if idx_start/idx_end are not set in the script and no command arguments are provided, maps are created for the full range of LADs.

To do so, replace:

# Create iteration indexes ----
idx_start <- 1
idx_end <- 5
idx <- idx_start:idx_end

with:

# Create iteration indexes ---
idx_start <- NULL
idx_end <- NULL
# Collect command line arguments if present
args <- commandArgs(trailingOnly = TRUE)

# If idx_* values have already been set (i.e. are not NULL) do nothing.
# If a command line argument is supplied (not NA) and no corresponding idx value
# has been set, convert it to integer and assign it.
# If any command arg is missing and idx_* value has not been set, set to 1 or
# total number of LADs.
lad_n <- length(unique(all_data$lad_code))
if (is.na(args[1])) {
    if (is.null(idx_start)) {
        idx_start <- 1L
    }
} else {
    idx_start <- as.integer(args[1])
}
if (is.na(args[2])) {
    if (is.null(idx_start)) {
        idx_end <- lad_n
    }
} else {
    idx_end <- as.integer(args[2])
}
idx <- idx_start:idx_end

Question: given the trailing arguments we’re supplying to our Rscript command, which LADs will the job create maps for?

idx_start will be set to 1, idx_end will be set to 10 so maps will be created for the sequence 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Synch changes to our local files

Before we head over to Iridis, let’s synch our files already on Iridis with the changes we’ve just made to our local files.

Run the following command either in Rstudio terminal on Linux/macOS or in your local shell session on mobaXterm:

rsync -hav --exclude={'.*/','*/outputs/*'} --progress ./* userid@iridis5.soton.ac.uk:/home/userid/parallel-r-materials/

As always, switch out userid with your own username.

Note we are not compressing files any more as we’ve already transferred our large file and compression is not needed for the smaller files we’re making changes to.

Submit `generate-maps-furrr.slurm`

Log into Iridis

Before we connect to Iridis we need to make sure we are connected to the VPN. Then use the ssh command we have been using to log into Iridis. On MobaXterm, click on the Iridis session.

Tip

In the terminal, you can click the button to navigate over previously run commands 😉

Create `logs/` directory

Before we submit our job, let’s create a logs directory in our home directory, for .err and .out log files to be written to with command:

mkdir logs

If you wanted you could also have project related logs in a logs file within a project also. You would need to make sure the path specified in --output and --error in your .slurm files point to the right directory. In any case, avoid dumping logs into your home directory as that can get very messy fast!

Submit job

To submit our job, we use command sbatch followed by the name of the .slurm job submission file.

sbatch parallel-r-materials/health_data/slurm/generate-maps-furrr.slurm

Iridis should respond with the Job ID it just submitted and you should also receive an email shortly that the job has begin! 🚀

Monitoring Jobs

`squeue`

squeue allows you to view information about jobs located in the Slurm scheduling queue. The -lu options ask for long information for the user specified.

squeue -lu userid

Here’s the output of the command when I run it when the map_furrr job is running:

squeue -lu ak1f23

  JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
3374906    serial map_furr   ak1f23  RUNNING       0:15      5:00      1 gold52

Note that once a job is finished, it drop off the slurm queue and will not show up anymore.

`scontrol`

scontrol show job displays the state of the specified job and takes Job ID of a job currently running as an argument. This gives detailed information about the job including the full job configuration (provisioning).

scontrol show job <JOB ID>

Here’s the output of the map_furrr job.

scontrol show job 3374906

JobId=3374906 JobName=map_furrr
   UserId=ak1f23(50709) GroupId=jf(339) MCS_label=N/A
   Priority=91988 Nice=0 Account=soton QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:10 TimeLimit=00:05:00 TimeMin=N/A
   SubmitTime=2023-05-29T09:38:06 EligibleTime=2023-05-29T09:38:06
   AccrueTime=2023-05-29T09:38:06
   StartTime=2023-05-29T09:38:07 EndTime=2023-05-29T09:43:07 Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2023-05-29T09:38:07 Scheduler=Main
   Partition=serial AllocNode:Sid=cyan51:31196
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=gold52
   BatchHost=gold52
   NumNodes=1 NumCPUs=10 NumTasks=10 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=10,mem=3G,node=1,billing=10
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=3G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/mainfs/home/ak1f23/parallel-r-materials/health_data/slurm/generate-maps-furrr.slurm
   WorkDir=/mainfs/home/ak1f23
   StdErr=/mainfs/home/ak1f23/logs/R-map_furrr.3374906.err
   StdIn=/dev/null
   StdOut=/mainfs/home/ak1f23/logs/R-map_furrr.3374906.out
   Power=
   MailUser=your.email@here.com MailType=INVALID_DEPEND,BEGIN,END,FAIL,REQUEUE,STAGE_OUT

Assessing and Debugging jobs

While squeue and scontrol show job show useful information while the job is running, they are not so useful for digging into what has happened in a job or debugging when things went wrong.

For this, we turn to the two files we’ve specified in the slurm script, the .out and .err files the job will create in our logs/ directory.

.out: The output file contains information about our job as well as any messages/errors generated by the slurm.
.err: The error file contains any output from the script part of our slurm script as well as any output (messages, errors & warnings) generated from the execution of our R script.

First let’s have a look at the contents of our logs/ directory:

ls -l logs

We should see a .out and .err file for the job we just ran:

-rw-r--r-- 1 ak1f23 jf 63517 May 29 08:48 R-map_furrr.3374906.err
-rw-r--r-- 1 ak1f23 jf  1125 May 29 08:48 R-map_furrr.3374906.out

`.err` file

Now let’s have a look at the .err file. To do so we use function cat which allows us to view the contents of files on the screen.

We can pass it the name of a specific file, e.g.:

cat logs/R-map_furrr.3374906.err

or we can use this clever little construction:

cat "$(ls -rt logs/*.err| tail -n1)"

This is effectively a unix version of the output of an expression as the input to another expression. In this case:

ls -rt logs/*.err lists all .err files in logs/ by time in reverse chronological order (the last one created will be listed last).
| pipes the result to the next expression and
tail -n1 takes the last element of the list of file names passed.

Finally cat prints the result of the evaluation of the commands contained within $(…).

Using this expression will show the contents of the last .err file created in the logs/ directory:

Loading compiler version 2021.2.0
Loading mkl version 2021.2.0
Linking to GEOS 3.6.2, GDAL 3.0.1, PROJ 6.1.1; sf_use_s2() is TRUE

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Loading required package: future
Rows: 178360 Columns: 5
-- Column specification --------------------------------------------------------
Delimiter: ","
chr (3): lsoa_code, lsoa_name, gen_health_cat
dbl (2): gen_health_code, observation

i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 35672 Columns: 4
-- Column specification --------------------------------------------------------
Delimiter: ","
chr (4): lsoa_code, lsoa_name, lad_code, lad_name

i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joining with `by = join_by(lsoa_code, lsoa_name)`
Joining with `by = join_by(lsoa_code)`
 Progress: ---------------------------------------------------------------- 100%v Map for LAD "E06000001 Hartlepool" completed successfuly on PID 54796 on gold52.cluster.local (26.98 sec elapsed)
v Map for LAD "E06000002 Middlesbrough" completed successfuly on PID 54793 on gold52.cluster.local (29.875 sec elapsed)
v Map for LAD "E06000003 Redcar and Cleveland" completed successfuly on PID 54795 on gold52.cluster.local (30.965 sec elapsed)
v Map for LAD "E06000004 Stockton-on-Tees" completed successfuly on PID 54789 on gold52.cluster.local (34.599 sec elapsed)
v Map for LAD "E06000005 Darlington" completed successfuly on PID 54790 on gold52.cluster.local (29.302 sec elapsed)
v Map for LAD "E06000006 Halton" completed successfuly on PID 54798 on gold52.cluster.local (102.076 sec elapsed)
v Map for LAD "E06000007 Warrington" completed successfuly on PID 54791 on gold52.cluster.local (35.781 sec elapsed)
v Map for LAD "E06000008 Blackburn with Darwen" completed successfuly on PID 54792 on gold52.cluster.local (30.237 sec elapsed)
v Map for LAD "E06000009 Blackpool" completed successfuly on PID 54794 on gold52.cluster.local (30.823 sec elapsed)
v Map for LAD "E06000010 Kingston upon Hull" completed successfuly on PID 54797 on gold52.cluster.local (35.394 sec elapsed)

-- Job Complete ----------------------------------------------------------------
v 10 maps written to '/mainfs/home/ak1f23/parallel-r-materials/health_data/outputs/maps'.

-- Total time elapsed: "44.65 sec elapsed" --

This shows us the full output of our R job. If any warnings or errors occurred in the R portion of our job, it would show up here.

As you can see, there is rich information in this file because we’ve added a lot of messaging in our script, using the cli and tictoc package in particular. That’s why I highly recommend spending some time on this when you’re running scripts on HPC. Given you can’t really interact with the job interactively, such messages can be an invaluable source of information for assessing what happened during a job.

By examining the output we can indeed see that each map was generated in parallel, as each PID is unique.

We can also see that our command arguments were captured correctly, as only the first 10 maps were created.

Question: the mapping function seems to be taking a lot longer on Iridis (from ~3s up to ~30s each!), why?

The main difference in time is a result of writing out PNGs to the file store. Remember our file store is separate from the computing nodes so the cost to write to it is much higher than our local machine.

In any case, at least your machine is free to do other stuff while Iridis does the mapping for you. 😉

`.out` file

Now, let’s use a modified version of our previous command to view the contents of the last .out file:

cat "$(ls -rt logs/*.out| tail -n1)"

Job ID          : 3374906
Job name        : map_furrr
WorkDir         : /mainfs/home/ak1f23
Command         : /mainfs/home/ak1f23/parallel-r-materials/health_data/slurm/generate-maps-furrr.slurm
Partition       : serial
Num hosts       : 1
Num cores       : 10
Num of tasks    : 10
Hosts allocated : gold52
Job Output Follows ...
===============================================================================
==============================================================================
Running epilogue script on gold52.


Submit time  : 2023-11-22T09:01:26
Start time   : 2023-11-22T09:01:57
End time     : 2023-11-22T09:03:08
Elapsed time : 00:01:11 (Timelimit=00:05:00)

Job ID: 5040191
Cluster: i5
User/Group: ak1f23/jf
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 10
CPU Utilized: 00:00:29
CPU Efficiency: 4.08% of 00:11:50 core-walltime
Job Wall-clock time: 00:01:11
Memory Utilized: 329.87 MB
Memory Efficiency: 10.74% of 3.00 GB

This gives us information about the job.

Any error messages from slurm will show up here.

The output also includes a report of how much of the resources requested were utilised.

We can see that the we’re actually using a small portion of the total wall-clock time allocated for the job. That’s because a big portion of the time is spent on the initial sequential part of the job of loading the data by the first process. During this time, the other cores are idle. If we went ahead and created maps for the whole dataset, the proportion of the sequential part of the job would drop and the time efficiency would increase.

We can also see we’re using about ~11% of the memory allocated. That’s quite low and we should probably look to reduce our resource request if we run the job again.

This information is really useful because the more you can tailor the resources you request to the actual size of the resources required, the sooner your jobs will be submitted to the slurm queue. But be careful to allow some margins for unexpected bottlenecks or jumps in memory use.

Check `outputs`.

Now, let’s also check the contents our our health_data/outputs/maps/ directory:

ls parallel-r-materials/health_data/outputs/maps/

E06000001_hartlepool_2023-05-29.png            E06000006_halton_2023-05-29.png
E06000002_middlesbrough_2023-05-29.png         E06000007_warrington_2023-05-29.png
E06000003_redcar_and_cleveland_2023-05-29.png  E06000008_blackburn_with_darwen_2023-05-29.png
E06000004_stockton_on_tees_2023-05-29.png      E06000009_blackpool_2023-05-29.png
E06000005_darlington_2023-05-29.png            E06000010_kingston_upon_hull_2023-05-29.png

We can see that the job has successfully created 10 PNG files in the appropriate directory 🎉.

Slurm Job Arrays

In the previous example, we’ve easily managed to complete all our work in a small job that was queued up by slurm pretty quickly.

What if you have a job that required a lot of resources that would either take ages to get queued up or requires more resources than individual user quotas allow for?

One way to manage such jobs, especially jobs where the parallel processed are completely independent of each other, as is the case with our current example, is to split them up into slurm job arrays.

Job arrays offer a mechanism for submitting and managing collections of similar jobs quickly and easily through a single sbatch command.

We will now rerun our workflow using a slurm job array by submitting the generate-maps-furrr-array.slurm script instead of generate-maps-furrr.slurm.

Let’s have a look at the file:

generate-maps-furrr-array.slurm

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks=10
#SBATCH --mem=3G
#SBATCH --job-name=map_furrr_array
#SBATCH --time=00:05:00
#SBATCH --output=logs/R-%x.%j.out
#SBATCH --error=logs/R-%x.%j.err
#SBATCH --export=ALL,TZ=Europe/London
#SBATCH --array=1-4
#SBATCH --mail-type=ALL

# send mail to this address
#SBATCH --mail-user=youremail@here.com

module load gdal/3.0.1
module load proj/6.1.1
module load geos/3.6.2
module load R/4.4.1-gcc

cd parallel-r-materials/
Rscript health_data/generate-maps-furrr.R 1 40

The #SBATCH configuration at the top of the array submission script define the computational resources required by each job.

The fact that this is an array job is indicated by the additional option --array=1-4 which specifies that 4 array jobs should be created with array indexes 1, 2, 3, 4

The array index within each job is available through environment variable SLURM_ARRAY_TASK_ID. This can be used to subset the portion of the job to be run within each array job programmatically in R.

So let’s go ahead and edit generate-maps-furrr.R to be able to use the SLURM_ARRAY_TASK_ID environment variable to subset the workload accordingly.

Before we move on though, let’s edit the file and add a valid email address for notifications.

Enable array job sub-setting in `generate-maps-furrr.R`

Before we start editing our file, let’s exit from Iridis (in Rstudio terminal on macOS or Linux):

exit

In generate-maps-furrr.R, add the following code chunk just below the section we added to capture command line arguments and create idx.

# If running within a slurm array, split idx into chunks and subset the idxs for the
# particular task id.
array_id <- Sys.getenv('SLURM_ARRAY_TASK_ID')
array_task_n <- Sys.getenv('SLURM_ARRAY_TASK_COUNT')
if (array_id != "") {
    array_task_n <- as.integer(array_task_n)
    array_id <- as.integer(array_id)
    idx <- idx[furrr:::make_chunks(length(idx), array_task_n)[[array_id]]]
}

The code captures the ID of the current array job from environment variable SLURM_ARRAY_TASK_ID and the total number of jobs in the array via SLURM_ARRAY_TASK_COUNT.

It then uses internal furrr function furrr:::make_chunks to create chunks of indexes using the total length of idx as currently set (after any subsetting has been performed via command line arguments) and the number of total array jobs.

Let’s have a look at how this works. Let’s say we want to map LADs 1 to 40 and we want to split this into 4 array job. Let’s also say we are currently in job array 2.

idx <- 1:40
array_task_n <- 4
array_id <- 2
chunks <- furrr:::make_chunks(length(idx), array_task_n)
chunks

[[1]]
 [1]  1  2  3  4  5  6  7  8  9 10

[[2]]
 [1] 11 12 13 14 15 16 17 18 19 20

[[3]]
 [1] 21 22 23 24 25 26 27 28 29 30

[[4]]
 [1] 31 32 33 34 35 36 37 38 39 40

chunks[[array_id]]

 [1] 11 12 13 14 15 16 17 18 19 20

array_task_n is used to split idx into the correct number of chunks and array_id is used to subset the correct chunk for the array job ID.

By using idx[furrr:::make_chunks(length(idx), array_task_n)[[array_id]]] we also ensure that the actual idx vector specified is chunked and subset. For example, say we had specified idx to be 41-80 through the command line:

idx <- 41:80
idx[furrr:::make_chunks(length(idx), array_task_n)[[array_id]]]

 [1] 51 52 53 54 55 56 57 58 59 60

The above construct ensures the correct idxs are subset 👍

Tip

It’s useful to understand how this internal function works and how it can be used through furrr_options chunk_size and scheduling for achieving better load balancing. For more details have a look at the furrr Chunking input article.

Synch changes to our local files

Before we head over to Iridis, let’s synch our files already on Iridis with the changes we’ve just made to our local files. As always, switch out userid with your own username.

Run the following command either in Rstudio terminal on Linux/macOS or in your local shell session on mobaXterm:

rsync -hav --exclude={'.*/','*/outputs/*'} --progress ./* userid@iridis5.soton.ac.uk:/home/userid/parallel-r-materials/

Submit slurm Job Array

Before submitting, let’s revisit what the submission script is going to request:

generate-maps-furrr-array.slurm

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks=10
#SBATCH --mem=3G
#SBATCH --job-name=map_furrr_array
#SBATCH --time=00:05:00
#SBATCH --output=logs/R-%x.%j.out
#SBATCH --error=logs/R-%x.%j.err
#SBATCH --export=ALL,TZ=Europe/London
#SBATCH --array=1-4
#SBATCH --mail-type=ALL

# send mail to this address
#SBATCH --mail-user=youremail@here.com

module load gdal/3.0.1
module load proj/6.1.1
module load geos/3.6.2
module load R/4.4.1-gcc

cd parallel-r-materials/
Rscript health_data/generate-maps-furrr.R 1 40

It’s going to request 4 array jobs, with array IDs 1, 2, 3, 4, each with 1 node and 10 CPU cores and each will run for a max of 5 mins.

The command line arguments are also specifying that the total number of maps produced will be 40 which will be split across the 4 array jobs. Each job will therefore generate 10 maps.

Now let’s log back into Iridis and submit our slurm array job with:

sbatch parallel-r-materials/health_data/slurm/generate-maps-furrr-array.slurm

Let’s check what’s going on in the slurm queue:

squeue -lu userid

    JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
3374900_1    serial map_furr   ak1f23  RUNNING       0:02      5:00      1 gold52
3374900_2    serial map_furr   ak1f23  RUNNING       0:02      5:00      1 gold53
3374900_3    serial map_furr   ak1f23  RUNNING       0:02      5:00      1 gold53
3374900_4    serial map_furr   ak1f23  RUNNING       0:02      5:00      1 gold53

We can see that our single sbatch submission has created 4 separate jobs and the initial Job ID is appended with the array job ID.

Once the job is finished, let’s check the contents of our log/ directory again:

 ls -rt logs/

-rw-r--r-- 1 ak1f23 jf   598 May 29 09:03 R-map_furrr_array.3374891.out
-rw-r--r-- 1 ak1f23 jf 73399 May 29 09:06 R-map_furrr_array.3374892.err
-rw-r--r-- 1 ak1f23 jf  1161 May 29 09:06 R-map_furrr_array.3374892.out
-rw-r--r-- 1 ak1f23 jf 69111 May 29 09:06 R-map_furrr_array.3374893.err
-rw-r--r-- 1 ak1f23 jf  1161 May 29 09:06 R-map_furrr_array.3374893.out
-rw-r--r-- 1 ak1f23 jf 61577 May 29 09:06 R-map_furrr_array.3374894.err
-rw-r--r-- 1 ak1f23 jf  1161 May 29 09:06 R-map_furrr_array.3374894.out

We can see that 4 .err and .out files have been created, one for each array ID.

Let’s have a look at the last .err file created:

 cat "$(ls -rt logs/*.err| tail -n1)"

Loading compiler version 2021.2.0
Loading mkl version 2021.2.0
Linking to GEOS 3.6.2, GDAL 3.0.1, PROJ 6.1.1; sf_use_s2() is TRUE

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Loading required package: future
Rows: 178360 Columns: 5
-- Column specification --------------------------------------------------------
Delimiter: ","
chr (3): lsoa_code, lsoa_name, gen_health_cat
dbl (2): gen_health_code, observation

i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 35672 Columns: 4
-- Column specification --------------------------------------------------------
Delimiter: ","
chr (4): lsoa_code, lsoa_name, lad_code, lad_name

i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joining with `by = join_by(lsoa_code, lsoa_name)`
Joining with `by = join_by(lsoa_code)`
 Progress: ---------------------------------------------------------------- 100%v Map for LAD "E06000011 East Riding of Yorkshire" completed successfuly on PID 153296 on gold54.cluster.local (40.697 sec elapsed)
v Map for LAD "E06000012 North East Lincolnshire" completed successfuly on PID 153290 on gold54.cluster.local (33.212 sec elapsed)
v Map for LAD "E06000013 North Lincolnshire" completed successfuly on PID 153294 on gold54.cluster.local (32.445 sec elapsed)
v Map for LAD "E06000014 York" completed successfuly on PID 153293 on gold54.cluster.local (32.333 sec elapsed)
v Map for LAD "E06000015 Derby" completed successfuly on PID 153291 on gold54.cluster.local (33.919 sec elapsed)
v Map for LAD "E06000016 Leicester" completed successfuly on PID 153297 on gold54.cluster.local (36.511 sec elapsed)
v Map for LAD "E06000017 Rutland" completed successfuly on PID 153299 on gold54.cluster.local (28.89 sec elapsed)
v Map for LAD "E06000018 Nottingham" completed successfuly on PID 153298 on gold54.cluster.local (36.766 sec elapsed)
v Map for LAD "E06000019 Herefordshire" completed successfuly on PID 153292 on gold54.cluster.local (36.08 sec elapsed)
v Map for LAD "E06000020 Telford and Wrekin" completed successfuly on PID 153295 on gold54.cluster.local (31.837 sec elapsed)

-- Job Complete ----------------------------------------------------------------
v 10 maps written to '/mainfs/home/ak1f23/parallel-r-materials/health_data/outputs/maps'.

-- Total time elapsed: "45.952 sec elapsed" --

We can see that only 10 maps were created in this job, as expected. We can also see that each map was created in parallel in a different process within each job.

Let’s have a look at the last .out file:

 cat "$(ls -rt logs/*.err| tail -n1)"

Running SLURM prolog script on gold53.cluster.local
===============================================================================
Job started on Mon May 29 09:17:01 BST 2023
Job ID          : 3374902
Job name        : map_furrr_array
WorkDir         : /mainfs/home/ak1f23
Command         : /mainfs/home/ak1f23/parallel-r-materials/health_data/slurm/generate-maps-furrr-array.slurm
Partition       : serial
Num hosts       : 1
Num cores       : 10
Num of tasks    : 10
Hosts allocated : gold53
Job Output Follows ...
===============================================================================
==============================================================================
Running epilogue script on gold53.

Submit time  : 2023-05-29T09:16:04
Start time   : 2023-05-29T09:16:57
End time     : 2023-05-29T09:28:17
Elapsed time : 00:01:20 (Timelimit=00:05:00)

Job ID: 3374902
Array Job ID: 3374900_2
Cluster: i5
User/Group: ak1f23/jf
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 10
CPU Utilized: 00:00:29
CPU Efficiency: 3.62% of 00:13:20 core-walltime
Job Wall-clock time: 00:01:20
Memory Utilized: 332.43 MB
Memory Efficiency: 10.82% of 3.00 GB

This gives us details about array job with ID 2 (as indicated by the Array Job ID: entry.

Finally let’s have another look at our outputs/maps/ directory:

ls parallel-r-materials/health_data/outputs/maps/

E06000001_hartlepool_2023-05-29.png                E06000021_stoke_on_trent_2023-05-29.png
E06000002_middlesbrough_2023-05-29.png             E06000022_bath_and_north_east_somerset_2023-05-29.png
E06000003_redcar_and_cleveland_2023-05-29.png      E06000023_bristol_2023-05-29.png
E06000004_stockton_on_tees_2023-05-29.png          E06000024_north_somerset_2023-05-29.png
E06000005_darlington_2023-05-29.png                E06000025_south_gloucestershire_2023-05-29.png
E06000006_halton_2023-05-29.png                    E06000026_plymouth_2023-05-29.png
E06000007_warrington_2023-05-29.png                E06000027_torbay_2023-05-29.png
E06000008_blackburn_with_darwen_2023-05-29.png     E06000030_swindon_2023-05-29.png
E06000009_blackpool_2023-05-29.png                 E06000031_peterborough_2023-05-29.png
E06000010_kingston_upon_hull_2023-05-29.png        E06000032_luton_2023-05-29.png
E06000011_east_riding_of_yorkshire_2023-05-29.png  E06000033_southend_on_sea_2023-05-29.png
E06000012_north_east_lincolnshire_2023-05-29.png   E06000034_thurrock_2023-05-29.png
E06000013_north_lincolnshire_2023-05-29.png        E06000035_medway_2023-05-29.png
E06000014_york_2023-05-29.png                      E06000036_bracknell_forest_2023-05-29.png
E06000015_derby_2023-05-29.png                     E06000037_west_berkshire_2023-05-29.png
E06000016_leicester_2023-05-29.png                 E06000038_reading_2023-05-29.png
E06000017_rutland_2023-05-29.png                   E06000039_slough_2023-05-29.png
E06000018_nottingham_2023-05-29.png                E06000040_windsor_and_maidenhead_2023-05-29.png
E06000019_herefordshire_2023-05-29.png             E06000041_wokingham_2023-05-29.png
E06000020_telford_and_wrekin_2023-05-29.png        E06000042_milton_keynes_2023-05-29.png

We can see that the job array has successfully created all 40 maps requested across the 4 jobs! 🎉

Transferring results from Iridis

One thing we haven’t mentioned yet is transferring data from Iridis to our local computer. Let’s transfer the maps we just created to demonstrate.

First let’s exit Iridis (in Rstudio terminal):

exit

or move to the local shell session in mobaXterm.

Next we use rsync again but just flip the source and destination arguments.

Here we ask rsync to transfer the health_data/outputs/maps directory on Iridis to our local health_data/outputs directory. We also compress the files before transferring to speed up the transfer.

rsync -zhav userid@iridis5.soton.ac.uk:/home/userid/parallel-r-materials/health_data/outputs/maps  health_data/outputs

receiving file list ... done
maps/
maps/E06000001_hartlepool_2023-11-22.png
maps/E06000002_middlesbrough_2023-11-22.png
maps/E06000003_redcar_and_cleveland_2023-11-22.png
maps/E06000004_stockton_on_tees_2023-11-22.png
maps/E06000005_darlington_2023-11-22.png
maps/E06000006_halton_2023-11-22.png
maps/E06000007_warrington_2023-11-22.png
maps/E06000008_blackburn_with_darwen_2023-11-22.png
maps/E06000009_blackpool_2023-11-22.png
maps/E06000010_kingston_upon_hull_2023-11-22.png
maps/E06000011_east_riding_of_yorkshire_2023-11-22.png
maps/E06000012_north_east_lincolnshire_2023-11-22.png
maps/E06000013_north_lincolnshire_2023-11-22.png
maps/E06000014_york_2023-11-22.png
maps/E06000015_derby_2023-11-22.png
maps/E06000016_leicester_2023-11-22.png
maps/E06000017_rutland_2023-11-22.png
maps/E06000018_nottingham_2023-11-22.png
maps/E06000019_herefordshire_2023-11-22.png
maps/E06000020_telford_and_wrekin_2023-11-22.png
maps/E06000021_stoke_on_trent_2023-11-22.png
maps/E06000022_bath_and_north_east_somerset_2023-11-22.png
maps/E06000023_bristol_2023-11-22.png
maps/E06000024_north_somerset_2023-11-22.png
maps/E06000025_south_gloucestershire_2023-11-22.png
maps/E06000026_plymouth_2023-11-22.png
maps/E06000027_torbay_2023-11-22.png
maps/E06000030_swindon_2023-11-22.png
maps/E06000031_peterborough_2023-11-22.png
maps/E06000032_luton_2023-11-22.png
maps/E06000033_southend_on_sea_2023-11-22.png
maps/E06000034_thurrock_2023-11-22.png
maps/E06000035_medway_2023-11-22.png
maps/E06000036_bracknell_forest_2023-11-22.png
maps/E06000037_west_berkshire_2023-11-22.png
maps/E06000038_reading_2023-11-22.png
maps/E06000039_slough_2023-11-22.png
maps/E06000040_windsor_and_maidenhead_2023-11-22.png
maps/E06000041_wokingham_2023-11-22.png
maps/E06000042_milton_keynes_2023-11-22.png

sent 902 bytes  received 41.60M bytes  978.93K bytes/sec
total size is 42.63M  speedup is 1.02

We can use ls again locally to inspect the updated contents of our local health_data/outputs/maps directory:

ls health_data/outputs/maps

E06000001_hartlepool_2023-11-22.png                   E06000021_stoke_on_trent_2023-11-22.png
E06000002_middlesbrough_2023-11-22.png                E06000022_bath_and_north_east_somerset_2023-11-22.png
E06000003_redcar_and_cleveland_2023-11-22.png         E06000023_bristol_2023-11-22.png
E06000004_stockton_on_tees_2023-11-22.png             E06000024_north_somerset_2023-11-22.png
E06000005_darlington_2023-11-22.png                   E06000025_south_gloucestershire_2023-11-22.png
E06000006_halton_2023-11-22.png                       E06000026_plymouth_2023-11-22.png
E06000007_warrington_2023-11-22.png                   E06000027_torbay_2023-11-22.png
E06000008_blackburn_with_darwen_2023-11-22.png        E06000030_swindon_2023-11-22.png
E06000009_blackpool_2023-11-22.png                    E06000031_peterborough_2023-11-22.png
E06000010_kingston_upon_hull_2023-11-22.png           E06000032_luton_2023-11-22.png
E06000011_east_riding_of_yorkshire_2023-11-22.png     E06000033_southend_on_sea_2023-11-22.png
E06000012_north_east_lincolnshire_2023-11-22.png      E06000034_thurrock_2023-11-22.png
E06000013_north_lincolnshire_2023-11-22.png           E06000035_medway_2023-11-22.png
E06000014_york_2023-11-22.png                         E06000036_bracknell_forest_2023-11-22.png
E06000015_derby_2023-11-22.png                        E06000037_west_berkshire_2023-11-22.png
E06000016_leicester_2023-11-22.png                    E06000038_reading_2023-11-22.png
E06000017_rutland_2023-11-22.png                      E06000039_slough_2023-11-22.png
E06000018_nottingham_2023-11-22.png                   E06000040_windsor_and_maidenhead_2023-11-22.png
E06000019_herefordshire_2023-11-22.png                E06000041_wokingham_2023-11-22.png
E06000020_telford_and_wrekin_2023-11-22.png           E06000042_milton_keynes_2023-11-22.png

Summary

We’ve successfully managed to:

Submit parallel jobs to slurm
Run our code in parallel on Iridis
Adapt our code to accept command line arguments
Adapt our code to chunk the workload across slurm job arrays
Run our code in parallel across slurm job arrays.
Trasfer files from Iridis to our local computers.

Reuse

CC BY-SA 4.0

Anatomy of a .slurm file

SBATCH section

Script section

Module load section

Working directory setting

Running our script using Rscript

Command Line arguments

Synch changes to our local files

Submit generate-maps-furrr.slurm

Log into Iridis

Create logs/ directory

Submit job

Monitoring Jobs

squeue

scontrol

Assessing and Debugging jobs

.err file

.out file

Check outputs.

Slurm Job Arrays

Enable array job sub-setting in generate-maps-furrr.R

Synch changes to our local files

Submit slurm Job Array

Transferring results from Iridis

Reuse

Anatomy of a `.slurm` file

`SBATCH` section

Running our script using `Rscript`

Submit `generate-maps-furrr.slurm`

Create `logs/` directory

`squeue`

`scontrol`

`.err` file

`.out` file

Check `outputs`.

Enable array job sub-setting in `generate-maps-furrr.R`