#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=10
#SBATCH --mem=3G
#SBATCH --job-name=map_furrr
#SBATCH --time=00:05:00
#SBATCH --output=logs/R-%x.%j.out
#SBATCH --error=logs/R-%x.%j.err
#SBATCH --export=ALL,TZ=Europe/London
#SBATCH --mail-type=ALL
# send mail to this address
#SBATCH --mail-user=your.email@here.com
module load gdal/3.0.1
module load proj/6.1.1
module load geos/3.6.2
module load R/4.4.1-gcc
cd parallel-r-materials/
Rscript health_data/generate-maps-furrr.R 1 10
Exercise 1b: Mapping General Health Categories across Local Authority Districts
Parallel R workflows on Iridis
Now it’s time to run our exercises on Iridis.
We’ll start with the health data exercise.
As already mentioned, to run jobs on Iridis, we submit slurm batch jobs using command sbatch
.
We need to supply a .slurm
script to the sbatch
command which contains configuration details required to run our job including:
- the computational resources we need.
- name of the job.
- where to write out
stdout
andstderr
messages. - setting and exporting environmental variables.
- setting an email for job notifications.
- loading any modules required for the job to run
- the final command to run to initiate the job
Let’s examine this in more detail by examining the first .slurm
file we will submit to slurm, generate-maps-furrr.slurm
. All slurm files we will use for this exercise are in the health_data/slurm/
directory.
Anatomy of a .slurm
file
To start off, notice the #!/bin/bash
line at the top of the script. This indicates the script is actually a bash
script. Bash
scripts are to the unix command line language what .R
scripts are to the R language.
Older versions of the materials had #!/bin/sh
instead of #!/bin/bash
at the top of each .slurm
file. This used to work on Iridis but a recent software update now requires #!/bin/bash
at the top of each files for submission files to be executed correctly.
If you have #!/bin/sh
at the top of any of your .slurm
files, please change it to #!/bin/bash
.
SBATCH
section
Next we see a chunk of comments starting with #SBATCH
. These will all be passed to the sbatch
command as options.
You can examine the full list of sbatch
options in the sbatch
official documentation.
Let’s have a look at what each one does:
-
--nodes
=1
: requests one node. -
--ntasks
=10
: requests 10 CPU cores. -
--mem
=3G
: requests 3G of memory in total for the job. -
--job-name
=map_furrr
: name of the job. -
--time
=00:05:00
: walltime the job is allowed to run for. -
--output
=logs/R-%x.%j.out
: path to file name where stdout messages will be written. Here we use special slurm notations%x
(job name) and%j
(job ID) to create a file name that includes the job name and ID. Note that we are also storing logs in alogs/
directory in our home directory. -
--error
=logs/R-%x.%j.err
: as above but for stderr output. -
--export
=ALL,TZ=Europe/London
: export environment variableTZ=Europe/London
(to silence some annoyingtictoc
warning messages) in addition to all the variables available in the submission environment (ALL
). -
--mail-type
=ALL:
send email notification when a job begins, ends, fails or is requeued. -
--mail-user
=your.email@here.com
: send notifications to this email address.
Before moving on, let’s go ahead and replace the dummy email address in the file with your own email address.
Script section
What follows after the SBATCH
comments is effectively a bash script in which a sequence of commands are listed before finally calling our workflow script.
Module load section
In this section we are loading all the software required for our job to run. We obviously need R and specify version R/4.4.1
. We also need to load some additional geospatial libraries that R package sf
needs available.
Working directory setting
We are also setting the working directory of execution to the parallel-r-materials
directory on our home directories by using command cd parallel-r-materials/.
It’s important to cd
into the root of our project directory to ensure here()
works correctly and therefore paths within our scripts also resolve correctly.
Running our script using Rscript
The final line of our script calls the Rscript
command which runs the R script supplied as an argument.
Rscript health_data/generate-maps-furrr.R 1 10
In our case, it will run the health_data/generate-maps-furrr.R
script.
The Rscript
command allows us to run R scripts from the command line (as opposed to interactively through the console. It can also be used on you local machine and can often be much faster than running R code interactively.
You may have noticed two trailing numbers, 1
and 10
after the script to run specification. These are command line arguments. They won’t do anything at the moment but we’re now going to edit our generate-maps-furrr.R
script to be able to make use of these arguments, in particular to set idx_start
and idx_end
and therefore specify the range of LADs we want to produce maps for!
It’s good practice to test your workflow with a small subset of the data before running a full big job. This will allow us to specify the size of our test data during job submission without having to change our script every time.
Command Line arguments
Command line arguments are captured in R scripts using function commandArgs()
which, if trailingOnly = TRUE
is set, returns a character vector of those arguments (if any) supplied after –args.
In their basic usage they are unnamed (but see this blog post for setting named arguments up) and individual arguments can be accessed through indexing into the vector returned by commandArgs()
. If an indexed element is missing it will return NA
.
Let’s edit our generate-maps-furrr.R
file so that, if idx_start/idx_end
have not already been set and a command argument is provided, it is used to idx_start/idx_end
accordingly. Let’s also make it so that if idx_start/idx_end
are not set in the script and no command arguments are provided, maps are created for the full range of LADs.
To do so, replace:
# Create iteration indexes ----
idx_start <- 1
idx_end <- 5
idx <- idx_start:idx_end
with:
# Create iteration indexes ---
idx_start <- NULL
idx_end <- NULL
# Collect command line arguments if present
args <- commandArgs(trailingOnly = TRUE)
# If idx_* values have already been set (i.e. are not NULL) do nothing.
# If a command line argument is supplied (not NA) and no corresponding idx value
# has been set, convert it to integer and assign it.
# If any command arg is missing and idx_* value has not been set, set to 1 or
# total number of LADs.
lad_n <- length(unique(all_data$lad_code))
if (is.na(args[1])) {
if (is.null(idx_start)) {
idx_start <- 1L
}
} else {
idx_start <- as.integer(args[1])
}
if (is.na(args[2])) {
if (is.null(idx_start)) {
idx_end <- lad_n
}
} else {
idx_end <- as.integer(args[2])
}
idx <- idx_start:idx_end
Synch changes to our local files
Before we head over to Iridis, let’s synch our files already on Iridis with the changes we’ve just made to our local files.
Run the following command either in Rstudio terminal on Linux/macOS or in your local shell session on mobaXterm:
rsync -hav --exclude={'.*/','*/outputs/*'} --progress ./* userid@iridis5.soton.ac.uk:/home/userid/parallel-r-materials/
As always, switch out userid
with your own username.
Note we are not compressing files any more as we’ve already transferred our large file and compression is not needed for the smaller files we’re making changes to.
Submit generate-maps-furrr.slurm
Log into Iridis
Before we connect to Iridis we need to make sure we are connected to the VPN. Then use the ssh
command we have been using to log into Iridis. On MobaXterm, click on the Iridis session.
Create logs/
directory
Before we submit our job, let’s create a logs
directory in our home directory, for .err
and .out
log files to be written to with command:
mkdir logs
If you wanted you could also have project related logs in a logs file within a project also. You would need to make sure the path specified in --output
and --error
in your .slurm
files point to the right directory. In any case, avoid dumping logs into your home directory as that can get very messy fast!
Submit job
To submit our job, we use command sbatch
followed by the name of the .slurm
job submission file.
sbatch parallel-r-materials/health_data/slurm/generate-maps-furrr.slurm
Iridis should respond with the Job ID it just submitted and you should also receive an email shortly that the job has begin! 🚀
Monitoring Jobs
squeue
squeue
allows you to view information about jobs located in the Slurm scheduling queue. The -lu
options ask for long information for the user specified.
squeue -lu userid
Here’s the output of the command when I run it when the map_furrr
job is running:
squeue -lu ak1f23
JOBID PARTITION NAME USER STATE TIME TIME_LIMI NODES NODELIST(REASON)
3374906 serial map_furr ak1f23 RUNNING 0:15 5:00 1 gold52
Note that once a job is finished, it drop off the slurm queue and will not show up anymore.
scontrol
scontrol show job
displays the state of the specified job and takes Job ID of a job currently running as an argument. This gives detailed information about the job including the full job configuration (provisioning).
scontrol show job <JOB ID>
Here’s the output of the map_furrr
job.
scontrol show job 3374906
JobId=3374906 JobName=map_furrr
UserId=ak1f23(50709) GroupId=jf(339) MCS_label=N/A
Priority=91988 Nice=0 Account=soton QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:10 TimeLimit=00:05:00 TimeMin=N/A
SubmitTime=2023-05-29T09:38:06 EligibleTime=2023-05-29T09:38:06
AccrueTime=2023-05-29T09:38:06
StartTime=2023-05-29T09:38:07 EndTime=2023-05-29T09:43:07 Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2023-05-29T09:38:07 Scheduler=Main
Partition=serial AllocNode:Sid=cyan51:31196
ReqNodeList=(null) ExcNodeList=(null)
NodeList=gold52
BatchHost=gold52
NumNodes=1 NumCPUs=10 NumTasks=10 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=10,mem=3G,node=1,billing=10
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=3G MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/mainfs/home/ak1f23/parallel-r-materials/health_data/slurm/generate-maps-furrr.slurm
WorkDir=/mainfs/home/ak1f23
StdErr=/mainfs/home/ak1f23/logs/R-map_furrr.3374906.err
StdIn=/dev/null
StdOut=/mainfs/home/ak1f23/logs/R-map_furrr.3374906.out
Power=
MailUser=your.email@here.com MailType=INVALID_DEPEND,BEGIN,END,FAIL,REQUEUE,STAGE_OUT
Assessing and Debugging jobs
While squeue
and scontrol show job
show useful information while the job is running, they are not so useful for digging into what has happened in a job or debugging when things went wrong.
For this, we turn to the two files we’ve specified in the slurm script, the .out
and .err
files the job will create in our logs/
directory.
-
.out
: The output file contains information about our job as well as any messages/errors generated by the slurm. -
.err
: The error file contains any output from the script part of our slurm script as well as any output (messages, errors & warnings) generated from the execution of our R script.
First let’s have a look at the contents of our logs/
directory:
ls -l logs
We should see a .out
and .err
file for the job we just ran:
-rw-r--r-- 1 ak1f23 jf 63517 May 29 08:48 R-map_furrr.3374906.err
-rw-r--r-- 1 ak1f23 jf 1125 May 29 08:48 R-map_furrr.3374906.out
.err
file
Now let’s have a look at the .err
file. To do so we use function cat
which allows us to view the contents of files on the screen.
We can pass it the name of a specific file, e.g.:
cat logs/R-map_furrr.3374906.err
or we can use this clever little construction:
cat "$(ls -rt logs/*.err| tail -n1)"
This is effectively a unix version of the output of an expression as the input to another expression. In this case:
ls -rt logs/*.err
lists all.err
files inlogs/
by time in reverse chronological order (the last one created will be listed last).|
pipes the result to the next expression andtail -n1
takes the last element of the list of file names passed.
Finally cat
prints the result of the evaluation of the commands contained within $(…)
.
Using this expression will show the contents of the last .err
file created in the logs/
directory:
Loading compiler version 2021.2.0
Loading mkl version 2021.2.0
Linking to GEOS 3.6.2, GDAL 3.0.1, PROJ 6.1.1; sf_use_s2() is TRUE
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Loading required package: future
Rows: 178360 Columns: 5
-- Column specification --------------------------------------------------------
Delimiter: ","
chr (3): lsoa_code, lsoa_name, gen_health_cat
dbl (2): gen_health_code, observation
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 35672 Columns: 4
-- Column specification --------------------------------------------------------
Delimiter: ","
chr (4): lsoa_code, lsoa_name, lad_code, lad_name
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joining with `by = join_by(lsoa_code, lsoa_name)`
Joining with `by = join_by(lsoa_code)`
Progress: ---------------------------------------------------------------- 100%v Map for LAD "E06000001 Hartlepool" completed successfuly on PID 54796 on gold52.cluster.local (26.98 sec elapsed)
v Map for LAD "E06000002 Middlesbrough" completed successfuly on PID 54793 on gold52.cluster.local (29.875 sec elapsed)
v Map for LAD "E06000003 Redcar and Cleveland" completed successfuly on PID 54795 on gold52.cluster.local (30.965 sec elapsed)
v Map for LAD "E06000004 Stockton-on-Tees" completed successfuly on PID 54789 on gold52.cluster.local (34.599 sec elapsed)
v Map for LAD "E06000005 Darlington" completed successfuly on PID 54790 on gold52.cluster.local (29.302 sec elapsed)
v Map for LAD "E06000006 Halton" completed successfuly on PID 54798 on gold52.cluster.local (102.076 sec elapsed)
v Map for LAD "E06000007 Warrington" completed successfuly on PID 54791 on gold52.cluster.local (35.781 sec elapsed)
v Map for LAD "E06000008 Blackburn with Darwen" completed successfuly on PID 54792 on gold52.cluster.local (30.237 sec elapsed)
v Map for LAD "E06000009 Blackpool" completed successfuly on PID 54794 on gold52.cluster.local (30.823 sec elapsed)
v Map for LAD "E06000010 Kingston upon Hull" completed successfuly on PID 54797 on gold52.cluster.local (35.394 sec elapsed)
-- Job Complete ----------------------------------------------------------------
v 10 maps written to '/mainfs/home/ak1f23/parallel-r-materials/health_data/outputs/maps'.
-- Total time elapsed: "44.65 sec elapsed" --
This shows us the full output of our R job. If any warnings or errors occurred in the R portion of our job, it would show up here.
As you can see, there is rich information in this file because we’ve added a lot of messaging in our script, using the cli
and tictoc
package in particular. That’s why I highly recommend spending some time on this when you’re running scripts on HPC. Given you can’t really interact with the job interactively, such messages can be an invaluable source of information for assessing what happened during a job.
By examining the output we can indeed see that each map was generated in parallel, as each PID is unique.
We can also see that our command arguments were captured correctly, as only the first 10 maps were created.
.out
file
Now, let’s use a modified version of our previous command to view the contents of the last .out
file:
cat "$(ls -rt logs/*.out| tail -n1)"
Job ID : 3374906
Job name : map_furrr
WorkDir : /mainfs/home/ak1f23
Command : /mainfs/home/ak1f23/parallel-r-materials/health_data/slurm/generate-maps-furrr.slurm
Partition : serial
Num hosts : 1
Num cores : 10
Num of tasks : 10
Hosts allocated : gold52
Job Output Follows ...
===============================================================================
==============================================================================
Running epilogue script on gold52.
Submit time : 2023-11-22T09:01:26
Start time : 2023-11-22T09:01:57
End time : 2023-11-22T09:03:08
Elapsed time : 00:01:11 (Timelimit=00:05:00)
Job ID: 5040191
Cluster: i5
User/Group: ak1f23/jf
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 10
CPU Utilized: 00:00:29
CPU Efficiency: 4.08% of 00:11:50 core-walltime
Job Wall-clock time: 00:01:11
Memory Utilized: 329.87 MB
Memory Efficiency: 10.74% of 3.00 GB
This gives us information about the job.
Any error messages from slurm will show up here.
The output also includes a report of how much of the resources requested were utilised.
We can see that the we’re actually using a small portion of the total wall-clock time allocated for the job. That’s because a big portion of the time is spent on the initial sequential part of the job of loading the data by the first process. During this time, the other cores are idle. If we went ahead and created maps for the whole dataset, the proportion of the sequential part of the job would drop and the time efficiency would increase.
We can also see we’re using about ~11% of the memory allocated. That’s quite low and we should probably look to reduce our resource request if we run the job again.
This information is really useful because the more you can tailor the resources you request to the actual size of the resources required, the sooner your jobs will be submitted to the slurm queue. But be careful to allow some margins for unexpected bottlenecks or jumps in memory use.
Check outputs
.
Now, let’s also check the contents our our health_data/outputs/maps/
directory:
ls parallel-r-materials/health_data/outputs/maps/
E06000001_hartlepool_2023-05-29.png E06000006_halton_2023-05-29.png
E06000002_middlesbrough_2023-05-29.png E06000007_warrington_2023-05-29.png
E06000003_redcar_and_cleveland_2023-05-29.png E06000008_blackburn_with_darwen_2023-05-29.png
E06000004_stockton_on_tees_2023-05-29.png E06000009_blackpool_2023-05-29.png
E06000005_darlington_2023-05-29.png E06000010_kingston_upon_hull_2023-05-29.png
We can see that the job has successfully created 10 PNG files in the appropriate directory 🎉.
Slurm Job Arrays
In the previous example, we’ve easily managed to complete all our work in a small job that was queued up by slurm pretty quickly.
What if you have a job that required a lot of resources that would either take ages to get queued up or requires more resources than individual user quotas allow for?
One way to manage such jobs, especially jobs where the parallel processed are completely independent of each other, as is the case with our current example, is to split them up into slurm job arrays.
Job arrays offer a mechanism for submitting and managing collections of similar jobs quickly and easily through a single sbatch
command.
We will now rerun our workflow using a slurm job array by submitting the generate-maps-furrr-array.slurm
script instead of generate-maps-furrr.slurm
.
Let’s have a look at the file:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=10
#SBATCH --mem=3G
#SBATCH --job-name=map_furrr_array
#SBATCH --time=00:05:00
#SBATCH --output=logs/R-%x.%j.out
#SBATCH --error=logs/R-%x.%j.err
#SBATCH --export=ALL,TZ=Europe/London
#SBATCH --array=1-4
#SBATCH --mail-type=ALL
# send mail to this address
#SBATCH --mail-user=youremail@here.com
module load gdal/3.0.1
module load proj/6.1.1
module load geos/3.6.2
module load R/4.4.1-gcc
cd parallel-r-materials/
Rscript health_data/generate-maps-furrr.R 1 40
The #SBATCH
configuration at the top of the array submission script define the computational resources required by each job.
The fact that this is an array job is indicated by the additional option --array
=1-4
which specifies that 4 array jobs should be created with array indexes 1, 2, 3, 4
The array index within each job is available through environment variable SLURM_ARRAY_TASK_ID
. This can be used to subset the portion of the job to be run within each array job programmatically in R.
So let’s go ahead and edit generate-maps-furrr.R
to be able to use the SLURM_ARRAY_TASK_ID
environment variable to subset the workload accordingly.
Before we move on though, let’s edit the file and add a valid email address for notifications.
Enable array job sub-setting in generate-maps-furrr.R
Before we start editing our file, let’s exit from Iridis (in Rstudio terminal on macOS or Linux):
exit
In generate-maps-furrr.R
, add the following code chunk just below the section we added to capture command line arguments and create idx
.
# If running within a slurm array, split idx into chunks and subset the idxs for the
# particular task id.
array_id <- Sys.getenv('SLURM_ARRAY_TASK_ID')
array_task_n <- Sys.getenv('SLURM_ARRAY_TASK_COUNT')
if (array_id != "") {
array_task_n <- as.integer(array_task_n)
array_id <- as.integer(array_id)
idx <- idx[furrr:::make_chunks(length(idx), array_task_n)[[array_id]]]
}
The code captures the ID of the current array job from environment variable SLURM_ARRAY_TASK_ID
and the total number of jobs in the array via SLURM_ARRAY_TASK_COUNT
.
It then uses internal furrr
function furrr:::make_chunks
to create chunks of indexes using the total length of idx
as currently set (after any subsetting has been performed via command line arguments) and the number of total array jobs.
Let’s have a look at how this works. Let’s say we want to map LADs 1 to 40 and we want to split this into 4 array job. Let’s also say we are currently in job array 2.
idx <- 1:40
array_task_n <- 4
array_id <- 2
chunks <- furrr:::make_chunks(length(idx), array_task_n)
chunks
[[1]]
[1] 1 2 3 4 5 6 7 8 9 10
[[2]]
[1] 11 12 13 14 15 16 17 18 19 20
[[3]]
[1] 21 22 23 24 25 26 27 28 29 30
[[4]]
[1] 31 32 33 34 35 36 37 38 39 40
chunks[[array_id]]
[1] 11 12 13 14 15 16 17 18 19 20
array_task_n
is used to split idx
into the correct number of chunks and array_id
is used to subset the correct chunk for the array job ID.
By using idx[furrr:::make_chunks(length(idx), array_task_n)[[array_id]]]
we also ensure that the actual idx
vector specified is chunked and subset. For example, say we had specified idx
to be 41-80 through the command line:
idx <- 41:80
idx[furrr:::make_chunks(length(idx), array_task_n)[[array_id]]]
[1] 51 52 53 54 55 56 57 58 59 60
The above construct ensures the correct idx
s are subset 👍
It’s useful to understand how this internal function works and how it can be used through furrr_options
chunk_size
and scheduling
for achieving better load balancing. For more details have a look at the furrr
Chunking input article.
Synch changes to our local files
Before we head over to Iridis, let’s synch our files already on Iridis with the changes we’ve just made to our local files. As always, switch out userid
with your own username.
Run the following command either in Rstudio terminal on Linux/macOS or in your local shell session on mobaXterm:
rsync -hav --exclude={'.*/','*/outputs/*'} --progress ./* userid@iridis5.soton.ac.uk:/home/userid/parallel-r-materials/
Submit slurm Job Array
Before submitting, let’s revisit what the submission script is going to request:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=10
#SBATCH --mem=3G
#SBATCH --job-name=map_furrr_array
#SBATCH --time=00:05:00
#SBATCH --output=logs/R-%x.%j.out
#SBATCH --error=logs/R-%x.%j.err
#SBATCH --export=ALL,TZ=Europe/London
#SBATCH --array=1-4
#SBATCH --mail-type=ALL
# send mail to this address
#SBATCH --mail-user=youremail@here.com
module load gdal/3.0.1
module load proj/6.1.1
module load geos/3.6.2
module load R/4.4.1-gcc
cd parallel-r-materials/
Rscript health_data/generate-maps-furrr.R 1 40
It’s going to request 4 array jobs, with array IDs 1, 2, 3, 4, each with 1 node and 10 CPU cores and each will run for a max of 5 mins.
The command line arguments are also specifying that the total number of maps produced will be 40 which will be split across the 4 array jobs. Each job will therefore generate 10 maps.
Now let’s log back into Iridis and submit our slurm array job with:
sbatch parallel-r-materials/health_data/slurm/generate-maps-furrr-array.slurm
Let’s check what’s going on in the slurm queue:
squeue -lu userid
JOBID PARTITION NAME USER STATE TIME TIME_LIMI NODES NODELIST(REASON)
3374900_1 serial map_furr ak1f23 RUNNING 0:02 5:00 1 gold52
3374900_2 serial map_furr ak1f23 RUNNING 0:02 5:00 1 gold53
3374900_3 serial map_furr ak1f23 RUNNING 0:02 5:00 1 gold53
3374900_4 serial map_furr ak1f23 RUNNING 0:02 5:00 1 gold53
We can see that our single sbatch
submission has created 4 separate jobs and the initial Job ID is appended with the array job ID.
Once the job is finished, let’s check the contents of our log/
directory again:
ls -rt logs/
-rw-r--r-- 1 ak1f23 jf 598 May 29 09:03 R-map_furrr_array.3374891.out
-rw-r--r-- 1 ak1f23 jf 73399 May 29 09:06 R-map_furrr_array.3374892.err
-rw-r--r-- 1 ak1f23 jf 1161 May 29 09:06 R-map_furrr_array.3374892.out
-rw-r--r-- 1 ak1f23 jf 69111 May 29 09:06 R-map_furrr_array.3374893.err
-rw-r--r-- 1 ak1f23 jf 1161 May 29 09:06 R-map_furrr_array.3374893.out
-rw-r--r-- 1 ak1f23 jf 61577 May 29 09:06 R-map_furrr_array.3374894.err
-rw-r--r-- 1 ak1f23 jf 1161 May 29 09:06 R-map_furrr_array.3374894.out
We can see that 4 .err
and .out
files have been created, one for each array ID.
Let’s have a look at the last .err
file created:
cat "$(ls -rt logs/*.err| tail -n1)"
Loading compiler version 2021.2.0
Loading mkl version 2021.2.0
Linking to GEOS 3.6.2, GDAL 3.0.1, PROJ 6.1.1; sf_use_s2() is TRUE
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Loading required package: future
Rows: 178360 Columns: 5
-- Column specification --------------------------------------------------------
Delimiter: ","
chr (3): lsoa_code, lsoa_name, gen_health_cat
dbl (2): gen_health_code, observation
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 35672 Columns: 4
-- Column specification --------------------------------------------------------
Delimiter: ","
chr (4): lsoa_code, lsoa_name, lad_code, lad_name
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joining with `by = join_by(lsoa_code, lsoa_name)`
Joining with `by = join_by(lsoa_code)`
Progress: ---------------------------------------------------------------- 100%v Map for LAD "E06000011 East Riding of Yorkshire" completed successfuly on PID 153296 on gold54.cluster.local (40.697 sec elapsed)
v Map for LAD "E06000012 North East Lincolnshire" completed successfuly on PID 153290 on gold54.cluster.local (33.212 sec elapsed)
v Map for LAD "E06000013 North Lincolnshire" completed successfuly on PID 153294 on gold54.cluster.local (32.445 sec elapsed)
v Map for LAD "E06000014 York" completed successfuly on PID 153293 on gold54.cluster.local (32.333 sec elapsed)
v Map for LAD "E06000015 Derby" completed successfuly on PID 153291 on gold54.cluster.local (33.919 sec elapsed)
v Map for LAD "E06000016 Leicester" completed successfuly on PID 153297 on gold54.cluster.local (36.511 sec elapsed)
v Map for LAD "E06000017 Rutland" completed successfuly on PID 153299 on gold54.cluster.local (28.89 sec elapsed)
v Map for LAD "E06000018 Nottingham" completed successfuly on PID 153298 on gold54.cluster.local (36.766 sec elapsed)
v Map for LAD "E06000019 Herefordshire" completed successfuly on PID 153292 on gold54.cluster.local (36.08 sec elapsed)
v Map for LAD "E06000020 Telford and Wrekin" completed successfuly on PID 153295 on gold54.cluster.local (31.837 sec elapsed)
-- Job Complete ----------------------------------------------------------------
v 10 maps written to '/mainfs/home/ak1f23/parallel-r-materials/health_data/outputs/maps'.
-- Total time elapsed: "45.952 sec elapsed" --
We can see that only 10 maps were created in this job, as expected. We can also see that each map was created in parallel in a different process within each job.
Let’s have a look at the last .out
file:
cat "$(ls -rt logs/*.err| tail -n1)"
Running SLURM prolog script on gold53.cluster.local
===============================================================================
Job started on Mon May 29 09:17:01 BST 2023
Job ID : 3374902
Job name : map_furrr_array
WorkDir : /mainfs/home/ak1f23
Command : /mainfs/home/ak1f23/parallel-r-materials/health_data/slurm/generate-maps-furrr-array.slurm
Partition : serial
Num hosts : 1
Num cores : 10
Num of tasks : 10
Hosts allocated : gold53
Job Output Follows ...
===============================================================================
==============================================================================
Running epilogue script on gold53.
Submit time : 2023-05-29T09:16:04
Start time : 2023-05-29T09:16:57
End time : 2023-05-29T09:28:17
Elapsed time : 00:01:20 (Timelimit=00:05:00)
Job ID: 3374902
Array Job ID: 3374900_2
Cluster: i5
User/Group: ak1f23/jf
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 10
CPU Utilized: 00:00:29
CPU Efficiency: 3.62% of 00:13:20 core-walltime
Job Wall-clock time: 00:01:20
Memory Utilized: 332.43 MB
Memory Efficiency: 10.82% of 3.00 GB
This gives us details about array job with ID 2 (as indicated by the Array Job ID:
entry.
Finally let’s have another look at our outputs/maps/
directory:
ls parallel-r-materials/health_data/outputs/maps/
E06000001_hartlepool_2023-05-29.png E06000021_stoke_on_trent_2023-05-29.png
E06000002_middlesbrough_2023-05-29.png E06000022_bath_and_north_east_somerset_2023-05-29.png
E06000003_redcar_and_cleveland_2023-05-29.png E06000023_bristol_2023-05-29.png
E06000004_stockton_on_tees_2023-05-29.png E06000024_north_somerset_2023-05-29.png
E06000005_darlington_2023-05-29.png E06000025_south_gloucestershire_2023-05-29.png
E06000006_halton_2023-05-29.png E06000026_plymouth_2023-05-29.png
E06000007_warrington_2023-05-29.png E06000027_torbay_2023-05-29.png
E06000008_blackburn_with_darwen_2023-05-29.png E06000030_swindon_2023-05-29.png
E06000009_blackpool_2023-05-29.png E06000031_peterborough_2023-05-29.png
E06000010_kingston_upon_hull_2023-05-29.png E06000032_luton_2023-05-29.png
E06000011_east_riding_of_yorkshire_2023-05-29.png E06000033_southend_on_sea_2023-05-29.png
E06000012_north_east_lincolnshire_2023-05-29.png E06000034_thurrock_2023-05-29.png
E06000013_north_lincolnshire_2023-05-29.png E06000035_medway_2023-05-29.png
E06000014_york_2023-05-29.png E06000036_bracknell_forest_2023-05-29.png
E06000015_derby_2023-05-29.png E06000037_west_berkshire_2023-05-29.png
E06000016_leicester_2023-05-29.png E06000038_reading_2023-05-29.png
E06000017_rutland_2023-05-29.png E06000039_slough_2023-05-29.png
E06000018_nottingham_2023-05-29.png E06000040_windsor_and_maidenhead_2023-05-29.png
E06000019_herefordshire_2023-05-29.png E06000041_wokingham_2023-05-29.png
E06000020_telford_and_wrekin_2023-05-29.png E06000042_milton_keynes_2023-05-29.png
We can see that the job array has successfully created all 40 maps requested across the 4 jobs! 🎉
Transferring results from Iridis
One thing we haven’t mentioned yet is transferring data from Iridis to our local computer. Let’s transfer the maps we just created to demonstrate.
First let’s exit Iridis (in Rstudio terminal):
exit
or move to the local shell session in mobaXterm.
Next we use rsync
again but just flip the source and destination arguments.
Here we ask rsync
to transfer the health_data/outputs/maps
directory on Iridis to our local health_data/outputs
directory. We also compress the files before transferring to speed up the transfer.
rsync -zhav userid@iridis5.soton.ac.uk:/home/userid/parallel-r-materials/health_data/outputs/maps health_data/outputs
receiving file list ... done
maps/
maps/E06000001_hartlepool_2023-11-22.png
maps/E06000002_middlesbrough_2023-11-22.png
maps/E06000003_redcar_and_cleveland_2023-11-22.png
maps/E06000004_stockton_on_tees_2023-11-22.png
maps/E06000005_darlington_2023-11-22.png
maps/E06000006_halton_2023-11-22.png
maps/E06000007_warrington_2023-11-22.png
maps/E06000008_blackburn_with_darwen_2023-11-22.png
maps/E06000009_blackpool_2023-11-22.png
maps/E06000010_kingston_upon_hull_2023-11-22.png
maps/E06000011_east_riding_of_yorkshire_2023-11-22.png
maps/E06000012_north_east_lincolnshire_2023-11-22.png
maps/E06000013_north_lincolnshire_2023-11-22.png
maps/E06000014_york_2023-11-22.png
maps/E06000015_derby_2023-11-22.png
maps/E06000016_leicester_2023-11-22.png
maps/E06000017_rutland_2023-11-22.png
maps/E06000018_nottingham_2023-11-22.png
maps/E06000019_herefordshire_2023-11-22.png
maps/E06000020_telford_and_wrekin_2023-11-22.png
maps/E06000021_stoke_on_trent_2023-11-22.png
maps/E06000022_bath_and_north_east_somerset_2023-11-22.png
maps/E06000023_bristol_2023-11-22.png
maps/E06000024_north_somerset_2023-11-22.png
maps/E06000025_south_gloucestershire_2023-11-22.png
maps/E06000026_plymouth_2023-11-22.png
maps/E06000027_torbay_2023-11-22.png
maps/E06000030_swindon_2023-11-22.png
maps/E06000031_peterborough_2023-11-22.png
maps/E06000032_luton_2023-11-22.png
maps/E06000033_southend_on_sea_2023-11-22.png
maps/E06000034_thurrock_2023-11-22.png
maps/E06000035_medway_2023-11-22.png
maps/E06000036_bracknell_forest_2023-11-22.png
maps/E06000037_west_berkshire_2023-11-22.png
maps/E06000038_reading_2023-11-22.png
maps/E06000039_slough_2023-11-22.png
maps/E06000040_windsor_and_maidenhead_2023-11-22.png
maps/E06000041_wokingham_2023-11-22.png
maps/E06000042_milton_keynes_2023-11-22.png
sent 902 bytes received 41.60M bytes 978.93K bytes/sec
total size is 42.63M speedup is 1.02
We can use ls
again locally to inspect the updated contents of our local health_data/outputs/maps
directory:
ls health_data/outputs/maps
E06000001_hartlepool_2023-11-22.png E06000021_stoke_on_trent_2023-11-22.png
E06000002_middlesbrough_2023-11-22.png E06000022_bath_and_north_east_somerset_2023-11-22.png
E06000003_redcar_and_cleveland_2023-11-22.png E06000023_bristol_2023-11-22.png
E06000004_stockton_on_tees_2023-11-22.png E06000024_north_somerset_2023-11-22.png
E06000005_darlington_2023-11-22.png E06000025_south_gloucestershire_2023-11-22.png
E06000006_halton_2023-11-22.png E06000026_plymouth_2023-11-22.png
E06000007_warrington_2023-11-22.png E06000027_torbay_2023-11-22.png
E06000008_blackburn_with_darwen_2023-11-22.png E06000030_swindon_2023-11-22.png
E06000009_blackpool_2023-11-22.png E06000031_peterborough_2023-11-22.png
E06000010_kingston_upon_hull_2023-11-22.png E06000032_luton_2023-11-22.png
E06000011_east_riding_of_yorkshire_2023-11-22.png E06000033_southend_on_sea_2023-11-22.png
E06000012_north_east_lincolnshire_2023-11-22.png E06000034_thurrock_2023-11-22.png
E06000013_north_lincolnshire_2023-11-22.png E06000035_medway_2023-11-22.png
E06000014_york_2023-11-22.png E06000036_bracknell_forest_2023-11-22.png
E06000015_derby_2023-11-22.png E06000037_west_berkshire_2023-11-22.png
E06000016_leicester_2023-11-22.png E06000038_reading_2023-11-22.png
E06000017_rutland_2023-11-22.png E06000039_slough_2023-11-22.png
E06000018_nottingham_2023-11-22.png E06000040_windsor_and_maidenhead_2023-11-22.png
E06000019_herefordshire_2023-11-22.png E06000041_wokingham_2023-11-22.png
E06000020_telford_and_wrekin_2023-11-22.png E06000042_milton_keynes_2023-11-22.png
We’ve successfully managed to:
Submit parallel jobs to slurm
Run our code in parallel on Iridis
Adapt our code to accept command line arguments
Adapt our code to chunk the workload across slurm job arrays
Run our code in parallel across slurm job arrays.
Trasfer files from Iridis to our local computers.