# A tibble: 30 Γ 9
id_team prop_win name_team slug_team url_team_season_logo city_team
<dbl> <dbl> <chr> <chr> <chr> <chr>
1 1610612737 0.386 Atlanta Hawks ATL https://stats.nba.c⦠Atlanta
2 1610612738 0.601 Boston Celtics BOS https://stats.nba.c⦠Boston
3 1610612739 0.378 Cleveland Caval⦠CLE https://stats.nba.c⦠Cleveland
4 1610612740 0.464 New Orleans Pelβ¦ NOP https://stats.nba.cβ¦ New Orleβ¦
5 1610612741 0.339 Chicago Bulls CHI https://stats.nba.c⦠Chicago
6 1610612742 0.454 Dallas Mavericks DAL https://stats.nba.c⦠Dallas
7 1610612743 0.605 Denver Nuggets DEN https://stats.nba.c⦠Denver
8 1610612744 0.578 Golden State Waβ¦ GSW https://stats.nba.cβ¦ Golden Sβ¦
9 1610612745 0.578 Houston Rockets HOU https://stats.nba.c⦠Houston
10 1610612746 0.592 LA Clippers LAC https://stats.nba.c⦠LA
# βΉ 20 more rows
# βΉ 3 more variables: id_conference <dbl>, colors_team <chr>,
# url_thumbnail_team <chr>
Exercise 2a: NBA Play-offs
Parallelising R workflows
Dataset background
In this exercise, weβll be working with NBA data, downloaded and compiled via the nbastatR
π¦.
Workflow objective
The objective of the workflow is to simulate the NBA Play-offs.
The NBA playoffs are split across two conferences, East and West. Within each conference, three rounds are played, the First Round, the Conference semifinals and the Conference finals.
The finalists from each conference then go on to play the Playoffs Final and the winner is the overall NBA playoffs winner!
The Play-offs offer a great opportunity for exploring the interaction between sequential and parallel aspects of a workflow as well as the concept of nested parallelisation.
Exercise materials
The materials for this exercise are contained in the nba/
directory in the course materials.
The data used in this exercise are contained in single csv file nba_stats_summaries_19-21.csv
in the data/
sub-folder.
Letβs have a look at it:
Source:
nbastatR
π¦Description:
The dataset has been created using the
nbastatR
π¦ and contains information about individual NBA teams including the overall probability of winning a game through the 2019-2021 seasons.
The workflow is based on a typical base R workflow, uses lapply
and is made up of two scripts:
This script runs the main simulation workflow. The steps involved include:
- Source custom functions.
- Load data.
- Split data across the two conferences.
- Use lapply to play qualifiers in each conference.
- Use lapply to play all rounds in each conference
- Play playoffs final once both conference winners have been obtained.
- Announce the winner and save the playoff match logs to a CSV.
# Source function ----
source(here::here("nba", "R", "playoff-functions.R"))
# Load data ----
nba_stats <- readr::read_csv(
here::here(
"nba", "data",
"nba_stats_summaries_19-21.csv"
)
)
# Play qualifiers ----
tictoc::tic(msg = "Total Play-offs Duration")
cli::cli_h1("Qualifiers have begun!")
split_confs <- split(nba_stats,
f = nba_stats$id_conference
)
set.seed(5, kind = "L'Ecuyer-CMRG")
qualified_confs <- lapply(
split_confs,
play_qualifiers
)
cli::cli_h2("ALL Qualifying matches COMPLETE!")
# Play Conference Rounds ----
cli::cli_h1("Conference Rounds have begun!")
conf_winners <- lapply(
X = qualified_confs,
FUN = play_conference,
nba_stats
)
attr(conf_winners, "match_logs") <- compile_match_logs(conf_winners)
cli::cli_h2("ALL Conference matches COMPLETE!")
# Play Play-offs FINAL ----
cli::cli_h1("Overall Playoff final has begun!")
playoff_winner <- play_round(conf_winners,
nba_stats,
round_name = "Play off Finals",
seed = 7
)
# Announce Winner!
winner_stats <- nba_stats[nba_stats$slug_team == playoff_winner, ]
cli::cli_h1("Playoffs complete!")
cli::cli_alert_success("Winner: {.field {winner_stats$name_team}} ({winner_stats$slug_team})")
tictoc::toc()
# Write match logs to csv ----
fs::dir_create(here::here("nba", "outputs"))
write.csv(attr(playoff_winner, "match_logs"),
file = here::here("nba", "outputs", "playoff_results.csv"),
row.names = FALSE
)
This file contains a number of custom functions used to simulate the NBA playoffs.
-
play_conference()
: Play all rounds for an individual conference. -
play_round()
: Play a round of matches. This uses lapply to iterate over each match within the round. It also compiles all match_logs of games played with the round and appends to previous round logs. -
play_match()
: Play an individual match and compile match logs. Winnerβs are sampled according to their overall probability of winning. -
play_qualifiers()
: Play qualifiers for an individual conference. This samples 8 teams from a given conference according their probability of winning.
#' Play all rounds for an individual conference
#'
#' @param qualified a character vector of team slugs of conference qualifiers
#' @param nba_stats data.frame of nba stats data
#'
#' @return a list of length 1 containg the name slug of the round winner. Match logs
#' for each round are also compiled and assigned to attribute `match_logs` of the output.
play_conference <- function(qualified, nba_stats) {
round_1 <- play_round(
qualified,
nba_stats
)
semis <- play_round(
round_1,
nba_stats
)
finals <- play_round(
semis,
nba_stats
)
return(finals)
}
#' Play a round of matches. Should either be used for matches of teams in a single
#' conference or for the play-offs final.
#'
#' @param teams list of character strings of team slugs of all teams competing in round.
#' @param nba_stats data.frame of nba stats data
#' @param round_name Round name. If `NULL` (default), round name is auto-detected
#' by number of competing teams. For play-off final should be `"Play off Finals"`.
#' @param seed seed to set for lapply function.
#'
#' @return returns a list of the round winners of each match. Match logs are also
#' compiled and appended to the input's logs as attribute `match_logs` of the output
play_round <- function(teams, nba_stats, round_name = NULL, seed = TRUE) {
# Detect round name from number of teams
if (is.null(round_name)) {
round_id <- as.character(length(teams))
round_name <- switch(round_id,
"8" = "Conference Round 1",
"4" = "Conference Semi finals",
"2" = "Conference finals"
)
}
if (round_name == "Play off Finals") {
conf <- NA
conf_msg <- ""
} else {
conf <- unique(nba_stats[nba_stats$slug_team %in% unlist(teams), ]$id_conference)
conf_msg <- paste0("(conf ", conf, ") ")
}
# Signal start of round
cli::cli_h2("{round_name} {conf_msg}started!")
# Create list of round pair match ups
round_pairs <- draw_match_pairs(unlist(teams))
# Use lapply to play each match
set.seed(seed, kind = "L'Ecuyer-CMRG")
round_winners <- lapply(
X = round_pairs,
FUN = play_match,
nba_stats,
round_name
)
# Compile match logs and append them to input match logs attribute
attr(round_winners, "match_logs") <- rbind(
attr(teams, "match_logs"),
compile_match_logs(round_winners)
)
# Signal round completion and return results
cli::cli_h2("{round_name} {conf_msg}COMPLETE!")
return(round_winners)
}
#' Play a single match
#'
#' @param matchup a character vector of length 2 containing the name slugs of teams
#' competing.
#' @param nba_stats data.frame of nba stats data
#' @param round_name Character string. Round name.
#'
#' @return a list of length 1 containg the name slug of the round winner. Match logs are also
#' appended as attribute `match_logs`.
play_match <- function(matchup, nba_stats, round_name) {
if (round_name == "Play off Finals") {
conf <- NA
conf_msg <- ""
} else {
conf <- unique(nba_stats[nba_stats$slug_team %in% matchup, ]$id_conference)
conf_msg <- paste0("(conf ", conf, ") ")
}
# Create list of probabilities for sampling game length
probs <- list(
c(0.9452, 0.0481, 0.0057, 6e-04, 4e-04),
c(0.7007, 0.1452, 0.0902, 0.0413, 0.0225)
)
# Assign higher probabilities for longer games to conference 2
if (length(conf) > 1) {
prob <- probs[[1]]
} else {
prob <- probs[[conf]]
}
pid <- Sys.getpid()
node <- replace_ip(system2("hostname", stdout = TRUE))
# play game
cli::cli_h3("Playing {round_name} {conf_msg}game: {.var {matchup[1]}} VS {.var {matchup[2]}}")
cli::cli_alert_info("Game location: {.val {pid}} ({node})")
# Sample game length
game_length <- sample(c(2.40, 2.65, 2.90, 3.15, 3.40), 1,
prob = prob
)
# Send system to sleep to simulate playing match
Sys.sleep(game_length)
# SAMPLE WINNER from probability of winning stats
# subset nba_stats to only team matchup data
match_df <- nba_stats[nba_stats$slug_team %in% matchup, ]
# sample winner
winner <- sample(match_df$slug_team, 1, prob = match_df$prop_win)
# print messages
cli::cli_alert_info("{matchup[1]} VS {matchup[2]} match complete in {game_length * 50} minutes")
cli::cli_alert_success("Winner: {winner}")
# Compile match information into match logs data.frame and append as attribute.
match_logs <- data.frame(
winner = winner,
team_1 = matchup[1],
team_2 = matchup[2],
pid = pid,
node = node,
game_length = game_length * 50,
date = Sys.time(),
conf = conf,
round_name = round_name
)
attr(winner, "match_logs") <- match_logs
return(winner)
}
#' Play Conference qualifiers
#'
#' @param teams a character vector of team slugs of teams competing in qualifiers
#'
#' @return a character vector of team slugs of qualified teams.
play_qualifiers <- function(teams) {
conf <- unique(teams$id_conference)
# play season
cli::cli_h2("Playing qualifiers for conference {.val {conf}} on {.var {Sys.getpid()}}")
Sys.sleep(5)
# sample qualifiers
qualified <- sample(teams$slug_team, 8, prob = teams$prop_win)
cli::cli_alert_success("Conference {.val {conf}} qualifying round complete")
return(qualified)
}
#' Draw pair matches.
#'
#' @param teams a character vector of team slugs of teams competing in round
#'
#' @return a list half the size of `teams` containing match pairs.
draw_match_pairs <- function(teams) {
n_teams <- length(teams)
matches <- sample(rep(1:(n_teams / 2), each = 2), size = n_teams)
match_pairs <- split(teams, matches)
return(match_pairs)
}
# Compile and order match_logs from lists of results
compile_match_logs <- function(x) {
logs_list <- lapply(x, function(x) {
attr(x, "match_logs")
})
logs <- do.call(rbind, logs_list)
logs[order(logs$date), ]
}
# Function replaces local IP addresses with fake IP address
replace_ip <- function(hostname) {
# Regular expression for IPv4
ipv4_pattern <- "\\b(?:[0-9]{1,3}\\.){3}[0-9]{1,3}\\b"
# Regular expression for IPv6
ipv6_pattern <- "\\b(?:[A-Fa-f0-9]{1,4}:){7}[A-Fa-f0-9]{1,4}\\b"
# Replace detected IPv4 addresses with a fake address
hostname <- gsub(ipv4_pattern, "192.0.2.0", hostname)
# Replace detected IPv6 addresses with a fake address
hostname <- gsub(ipv6_pattern, "2001:db8::", hostname)
hostname
}
Run workflow
Letβs go ahead and work through nba-playoffs.R
to see what the workflow entails.
Rows: 30 Columns: 9
ββ Column specification βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Delimiter: ","
chr (6): name_team, slug_team, url_team_season_logo, city_team, colors_team, url_thumbnail_team
dbl (3): id_team, prop_win, id_conference
βΉ Use `spec()` to retrieve the full column specification for this data.
βΉ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ββ Qualifiers have begun! ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββ Playing qualifiers for conference 1 on `83228` ββ
β Conference 1 qualifying round complete
ββ Playing qualifiers for conference 2 on `83228` ββ
β Conference 2 qualifying round complete
ββ ALL Qualifying matches COMPLETE! ββ
ββ Conference Rounds have begun! ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββ Conference Round 1 (conf 1) started! ββ
ββ Playing Conference Round 1 (conf 1) game: `TOR` VS `PHI`
βΉ Game location: 83228 (dcs33297-2.local)
βΉ TOR VS PHI match complete in 120 minutes
β Winner: TOR
ββ Playing Conference Round 1 (conf 1) game: `CLE` VS `MIA`
βΉ Game location: 83228 (dcs33297-2.local)
βΉ CLE VS MIA match complete in 120 minutes
β Winner: CLE
ββ Playing Conference Round 1 (conf 1) game: `IND` VS `NYK`
βΉ Game location: 83228 (dcs33297-2.local)
βΉ IND VS NYK match complete in 120 minutes
β Winner: IND
ββ Playing Conference Round 1 (conf 1) game: `MIL` VS `WAS`
βΉ Game location: 83228 (dcs33297-2.local)
βΉ MIL VS WAS match complete in 120 minutes
β Winner: MIL
ββ Conference Round 1 (conf 1) COMPLETE! ββ
ββ Conference Semi finals (conf 1) started! ββ
ββ Playing Conference Semi finals (conf 1) game: `CLE` VS `MIL`
βΉ Game location: 83228 (dcs33297-2.local)
βΉ CLE VS MIL match complete in 120 minutes
β Winner: MIL
ββ Playing Conference Semi finals (conf 1) game: `TOR` VS `IND`
βΉ Game location: 83228 (dcs33297-2.local)
βΉ TOR VS IND match complete in 120 minutes
β Winner: IND
ββ Conference Semi finals (conf 1) COMPLETE! ββ
ββ Conference finals (conf 1) started! ββ
ββ Playing Conference finals (conf 1) game: `MIL` VS `IND`
βΉ Game location: 83228 (dcs33297-2.local)
βΉ MIL VS IND match complete in 120 minutes
β Winner: MIL
ββ Conference finals (conf 1) COMPLETE! ββ
ββ Conference Round 1 (conf 2) started! ββ
ββ Playing Conference Round 1 (conf 2) game: `NOP` VS `DAL`
βΉ Game location: 83228 (dcs33297-2.local)
βΉ NOP VS DAL match complete in 120 minutes
β Winner: NOP
ββ Playing Conference Round 1 (conf 2) game: `SAS` VS `POR`
βΉ Game location: 83228 (dcs33297-2.local)
βΉ SAS VS POR match complete in 145 minutes
β Winner: SAS
ββ Playing Conference Round 1 (conf 2) game: `LAL` VS `UTA`
βΉ Game location: 83228 (dcs33297-2.local)
βΉ LAL VS UTA match complete in 132.5 minutes
β Winner: UTA
ββ Playing Conference Round 1 (conf 2) game: `OKC` VS `SAC`
βΉ Game location: 83228 (dcs33297-2.local)
βΉ OKC VS SAC match complete in 120 minutes
β Winner: OKC
ββ Conference Round 1 (conf 2) COMPLETE! ββ
ββ Conference Semi finals (conf 2) started! ββ
ββ Playing Conference Semi finals (conf 2) game: `SAS` VS `OKC`
βΉ Game location: 83228 (dcs33297-2.local)
βΉ SAS VS OKC match complete in 120 minutes
β Winner: OKC
ββ Playing Conference Semi finals (conf 2) game: `NOP` VS `UTA`
βΉ Game location: 83228 (dcs33297-2.local)
βΉ NOP VS UTA match complete in 145 minutes
β Winner: NOP
ββ Conference Semi finals (conf 2) COMPLETE! ββ
ββ Conference finals (conf 2) started! ββ
ββ Playing Conference finals (conf 2) game: `OKC` VS `NOP`
βΉ Game location: 83228 (dcs33297-2.local)
βΉ OKC VS NOP match complete in 120 minutes
β Winner: OKC
ββ Conference finals (conf 2) COMPLETE! ββ
ββ Overall Playoff final has begun! βββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββ Play off Finals started! ββ
ββ Playing Play off Finals game: `MIL` VS `OKC`
βΉ Game location: 83228 (dcs33297-2.local)
βΉ MIL VS OKC match complete in 132.5 minutes
β Winner: MIL
ββ Play off Finals COMPLETE! ββ
β Winner: Milwaukee Bucks (MIL)
Total Play-offs Duration: 48.822 sec elapsed
So Milwaukee Bucks have been crowned the champions! (Go Giannis!!!) and itβs taken around 50s to run the whole Playoffs sequentially.
We can confirm itβs sequential because all games are played in the same PID.
Parallelise workflow with future.apply
Letβs move on to setting up our code to be able to run in parallel. Just like π¦ furrr()
offers drop in replacements for functions in the purrr
π¦, π¦ future.apply
offers drop in replacements for the *apply
family of functions.
First letβs make copies of the files weβre going to modify.
Modify functions script.
Letβs start with R/playoff-functions.R
.
Copy R/playoff-functions.R
Letβs make a copy of R/playoff-functions.R
for us to edit in the same nba/R
directory and name it playoff-future_apply-functions.R
Next, letβs start editing R/playoff-future_apply-functions.R
.
Make sure you are editing R/playoff-future_apply-functions.R
and not R/playoff-functions.R
. To be sure it might be easiest to close R/playoff-functions.R
.
Modify R/playoff-future_apply-functions.R
Replace lapply()
in play_round()
In the play_round()
function, replace:
# Use lapply to play each match
set.seed(seed, kind = "L'Ecuyer-CMRG")
round_winners <- lapply(
X = round_pairs,
FUN = play_match,
nba_stats,
round_name
)
with:
# Use future_lapply to play each match
round_winners <- future.apply::future_lapply(
X = round_pairs,
FUN = play_match,
nba_stats,
round_name,
future.seed = seed
)
Thatβs the only modification we need to make to our functions script so letβs move on to the workflow script.
Modify workflow script
Copy nba-playoffs.R
Letβs make a copy of nba-playoffs.R
for us to edit in the same nba/
directory and name it nba-playoffs-future_apply.R
Next, letβs start editing nba-playoffs-future_apply.R
.
Make sure you are editing nba-playoffs-future_apply.R
and not nba-playoffs.R
. To be sure it might be easiest to close nba-playoffs.R
.
Modify nba-playoffs-future_apply.R
Load the future.apply
library
Letβs make sure the future.apply
library is loaded at the start of our workflow.
Add:
# Load Libraries ----
library(future.apply)
To the top of the script.
Change the path to the function file being sourced.
To make sure the modified functions using future_lapply
are being loaded, change:
to:
Replace lapply()
when playing conference qualifiers.
In the section where conference qualifiers are being played, change:
to:
plan(sequential)
qualified_confs <- future_lapply(
split_confs,
play_qualifiers,
future.seed = 5)
Replace lapply()
when playing conference rounds.
In the section where conference rounds are being played, change:
conf_winners <- lapply(
X = qualified_confs,
FUN = play_conference,
nba_stats)
to:
plan(sequential)
conf_winners <- future_lapply(
X = qualified_confs,
FUN = play_conference,
nba_stats,
future.seed = 8)
Add a plan()
call to the finals round.
Letβs also add another plan call to the final play_round()
call.
Replace:
playoff_winner <- play_round(conf_winners,
nba_stats,
round_name = "Play off Finals",
seed = 7)
with:
plan(sequential)
playoff_winner <- play_round(conf_winners,
nba_stats,
round_name = "Play off Finals",
seed = 7)
Thatβs it! Weβve now replaced all instances of lapply
with the future_lapply()
version which allows us to run the workflow both sequentially and in parallel by switching the plan.
Run nba-playoffs-future_apply.R
workflow sequentially.
Now letβs run our modified nba-playoffs-future_apply.R
script to check that it works.
...
ββ Conference Rounds have begun! βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββ Conference Round 1 (conf 1) started! ββ
ββ Playing Conference Round 1 (conf 1) game: `WAS` VS `IND`
βΉ Game location: 9665 (dcs33297-2.local)
βΉ WAS VS IND match complete in 120 minutes
β Winner: IND
ββ Playing Conference Round 1 (conf 1) game: `MIA` VS `MIL`
βΉ Game location: 9665 (dcs33297-2.local)
βΉ MIA VS MIL match complete in 120 minutes
β Winner: MIL
ββ Playing Conference Round 1 (conf 1) game: `BKN` VS `PHI`
βΉ Game location: 9665 (dcs33297-2.local)
βΉ BKN VS PHI match complete in 120 minutes
β Winner: PHI
ββ Playing Conference Round 1 (conf 1) game: `ORL` VS `TOR`
βΉ Game location: 9665 (dcs33297-2.local)
βΉ ORL VS TOR match complete in 120 minutes
β Winner: TOR
ββ Conference Round 1 (conf 1) COMPLETE! ββ
ββ Conference Semi finals (conf 1) started! ββ
ββ Playing Conference Semi finals (conf 1) game: `IND` VS `PHI`
βΉ Game location: 9665 (dcs33297-2.local)
βΉ IND VS PHI match complete in 120 minutes
β Winner: PHI
ββ Playing Conference Semi finals (conf 1) game: `MIL` VS `TOR`
βΉ Game location: 9665 (dcs33297-2.local)
βΉ MIL VS TOR match complete in 120 minutes
β Winner: MIL
ββ Conference Semi finals (conf 1) COMPLETE! ββ
ββ Conference finals (conf 1) started! ββ
ββ Playing Conference finals (conf 1) game: `PHI` VS `MIL`
βΉ Game location: 9665 (dcs33297-2.local)
βΉ PHI VS MIL match complete in 120 minutes
β Winner: MIL
ββ Conference finals (conf 1) COMPLETE! ββ
ββ Conference Round 1 (conf 2) started! ββ
ββ Playing Conference Round 1 (conf 2) game: `POR` VS `OKC`
βΉ Game location: 9665 (dcs33297-2.local)
βΉ POR VS OKC match complete in 120 minutes
β Winner: POR
ββ Playing Conference Round 1 (conf 2) game: `NOP` VS `MIN`
βΉ Game location: 9665 (dcs33297-2.local)
βΉ NOP VS MIN match complete in 120 minutes
β Winner: NOP
ββ Playing Conference Round 1 (conf 2) game: `LAC` VS `PHX`
βΉ Game location: 9665 (dcs33297-2.local)
βΉ LAC VS PHX match complete in 120 minutes
β Winner: LAC
ββ Playing Conference Round 1 (conf 2) game: `DEN` VS `UTA`
βΉ Game location: 9665 (dcs33297-2.local)
βΉ DEN VS UTA match complete in 132.5 minutes
β Winner: DEN
ββ Conference Round 1 (conf 2) COMPLETE! ββ
ββ Conference Semi finals (conf 2) started! ββ
ββ Playing Conference Semi finals (conf 2) game: `LAC` VS `DEN`
βΉ Game location: 9665 (dcs33297-2.local)
βΉ LAC VS DEN match complete in 120 minutes
β Winner: DEN
ββ Playing Conference Semi finals (conf 2) game: `POR` VS `NOP`
βΉ Game location: 9665 (dcs33297-2.local)
βΉ POR VS NOP match complete in 145 minutes
β Winner: POR
ββ Conference Semi finals (conf 2) COMPLETE! ββ
ββ Conference finals (conf 2) started! ββ
ββ Playing Conference finals (conf 2) game: `DEN` VS `POR`
βΉ Game location: 9665 (dcs33297-2.local)
βΉ DEN VS POR match complete in 120 minutes
β Winner: POR
ββ Conference finals (conf 2) COMPLETE! ββ
ββ ALL Conference matches COMPLETE! ββ
ββ Overall Playoff final has begun! βββββββββββββββββββββββββββββββββββ
ββ Play off Finals started! ββ
ββ Playing Play off Finals game: `MIL` VS `POR`
βΉ Game location: 9665 (dcs33297-2.local)
βΉ MIL VS POR match complete in 145 minutes
β Winner: MIL
ββ Play off Finals COMPLETE! ββ
ββ Playoffs complete! βββββββββββββββββββββββββββββββββββββββββββββββββ
β Winner: Milwaukee Bucks (MIL)
Total Play-offs Duration: 48.617 sec elapsed
We can see that the workflow is run sequentially and that it takes about the same time as the lapply version.
Modify nba-playoffs-future_apply.R
workflow to run in parallel
Letβs now modify nba-playoffs-future_apply.R
to be run it in parallel.
Change every call to
plan(sequential)
to:
plan(multisession)
There should be three instances in the script.
Run nba-playoffs-future_apply.R
ββ Playing qualifiers for conference 1 on `50120` ββ
β Conference 1 qualifying round complete
ββ Playing qualifiers for conference 2 on `50125` ββ
β Conference 2 qualifying round complete
ββ ALL Qualifying matches COMPLETE! ββ
ββ Conference Rounds have begun! ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββ Conference Round 1 (conf 1) started! ββ
ββ Playing Conference Round 1 (conf 1) game: `WAS` VS `IND`
βΉ Game location: 50490 (dcs33297-2.local)
βΉ WAS VS IND match complete in 120 minutes
β Winner: IND
ββ Playing Conference Round 1 (conf 1) game: `MIA` VS `MIL`
βΉ Game location: 50490 (dcs33297-2.local)
βΉ MIA VS MIL match complete in 120 minutes
β Winner: MIL
ββ Playing Conference Round 1 (conf 1) game: `BKN` VS `PHI`
βΉ Game location: 50490 (dcs33297-2.local)
βΉ BKN VS PHI match complete in 120 minutes
β Winner: PHI
ββ Playing Conference Round 1 (conf 1) game: `ORL` VS `TOR`
βΉ Game location: 50490 (dcs33297-2.local)
βΉ ORL VS TOR match complete in 120 minutes
β Winner: TOR
ββ Conference Round 1 (conf 1) COMPLETE! ββ
ββ Conference Semi finals (conf 1) started! ββ
ββ Playing Conference Semi finals (conf 1) game: `IND` VS `PHI`
βΉ Game location: 50490 (dcs33297-2.local)
βΉ IND VS PHI match complete in 120 minutes
β Winner: PHI
ββ Playing Conference Semi finals (conf 1) game: `MIL` VS `TOR`
βΉ Game location: 50490 (dcs33297-2.local)
βΉ MIL VS TOR match complete in 120 minutes
β Winner: MIL
ββ Conference Semi finals (conf 1) COMPLETE! ββ
ββ Conference finals (conf 1) started! ββ
ββ Playing Conference finals (conf 1) game: `PHI` VS `MIL`
βΉ Game location: 50490 (dcs33297-2.local)
βΉ PHI VS MIL match complete in 120 minutes
β Winner: MIL
ββ Conference finals (conf 1) COMPLETE! ββ
ββ Conference Round 1 (conf 2) started! ββ
ββ Playing Conference Round 1 (conf 2) game: `POR` VS `OKC`
βΉ Game location: 50491 (dcs33297-2.local)
βΉ POR VS OKC match complete in 120 minutes
β Winner: POR
ββ Playing Conference Round 1 (conf 2) game: `NOP` VS `MIN`
βΉ Game location: 50491 (dcs33297-2.local)
βΉ NOP VS MIN match complete in 120 minutes
β Winner: NOP
ββ Playing Conference Round 1 (conf 2) game: `LAC` VS `PHX`
βΉ Game location: 50491 (dcs33297-2.local)
βΉ LAC VS PHX match complete in 120 minutes
β Winner: LAC
ββ Playing Conference Round 1 (conf 2) game: `DEN` VS `UTA`
βΉ Game location: 50491 (dcs33297-2.local)
βΉ DEN VS UTA match complete in 132.5 minutes
β Winner: DEN
ββ Conference Round 1 (conf 2) COMPLETE! ββ
ββ Conference Semi finals (conf 2) started! ββ
ββ Playing Conference Semi finals (conf 2) game: `LAC` VS `DEN`
βΉ Game location: 50491 (dcs33297-2.local)
βΉ LAC VS DEN match complete in 120 minutes
β Winner: DEN
ββ Playing Conference Semi finals (conf 2) game: `POR` VS `NOP`
βΉ Game location: 50491 (dcs33297-2.local)
βΉ POR VS NOP match complete in 145 minutes
β Winner: POR
ββ Conference Semi finals (conf 2) COMPLETE! ββ
ββ Conference finals (conf 2) started! ββ
ββ Playing Conference finals (conf 2) game: `DEN` VS `POR`
βΉ Game location: 50491 (dcs33297-2.local)
βΉ DEN VS POR match complete in 120 minutes
β Winner: POR
ββ Conference finals (conf 2) COMPLETE! ββ
>
> attr(conf_winners, "match_logs") <- compile_match_logs(conf_winners)
> cli::cli_h2("ALL Conference matches COMPLETE!")
ββ ALL Conference matches COMPLETE! ββ
ββ Play off Finals started! ββ
ββ Playing Play off Finals game: `MIL` VS `POR`
βΉ Game location: 50882 (dcs33297-2.local)
βΉ MIL VS POR match complete in 145 minutes
β Winner: MIL
ββ Play off Finals COMPLETE! ββ
ββ Playoffs complete! ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Winner: Milwaukee Bucks (MIL)
Total Play-offs Duration: 29.459 sec elapsed
Great! Weβve successfully managed to run some of our workflow in parallel and brought execution time down to ~30s.
Also note that we got the same winner as we did when we ran the workflow sequentially, Milwaukee Bucks (MIL). This reproducibility is an extremely important property of running stochastic processes correctly in parallel which is handled for us automatically by future.apply
through the future.seed
argument.
Add nested parallelisation
To enable both the outer and inner levels of future_lapply
to be run in parallel, we need to set up a nested parallelisation plan. This applies only to the Conference Rounds section.
Calculating the number of workers
To determine a nested plan, we need to calculate the number of workers we want to allocate to each level of parallelisation.
In this example, the outer layer (Eastern and Western conference) will require 2 cores.
outer_cores <- 2L
The rest we can allocate to the inner level.
To make sure we donβt allocate more cores than we have available, when calculating the cores available to each inner plan we: - use the availableCores()
function from the parallely
package to determine the total number of cores left, omitting the 2 cores we have allocated to the outer plan. - We use %/%
to divide the cores left by the number of outer cores. The use of %/%
ensures that the result is an integer.
inner_cores <- parallelly::availableCores(
omit = outer_cores) %/% outer_cores
Creating a nested parallelisation plan
Now weβre ready to create a nested execution plan and allocate the correct number of workers to each.
To do so we provide a list containing the evaluation strategy we want at each level of nested-ness. To set the appropriate number of workers on each one, we wrap each evaluation strategy definition in function tweak()
which allows us to override the default values.
Also note that, because the Futureverse has a built-in protection, we need to declare nested workers using the As-Is I(.)
function, which basically tells the parallel framework βtrust us, we know what we are doingβ.
to do this, we replace:
plan(multisession)
with:
plan(list(
tweak(multisession, workers = outer_cores),
tweak(multisession, workers = I(inner_cores))
))
The only section which has changed should now look like this:
# Set up parallel plan
outer_cores <- 2L
inner_cores <- parallelly::availableCores(
omit = outer_cores
) %/% outer_cores
plan(list(
tweak(multisession, workers = outer_cores),
tweak(multisession, workers = I(inner_cores))
))
This adds two levels of parallelisation to the plan by supplying a list of plans of length 2.
Note that, because the first level of multisession
parallelisation can generate good defaults for allocating work across available workers, we could have left it as multisession
without tweaking it and only configure the additional layers manually to specify the number of workers (processes) in each additional level.
So the following would have worked just as well:
Run nba-playoffs-future_apply.R
...
ββ Conference Rounds have begun! βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββ Conference Round 1 (conf 1) started! ββ
ββ Playing Conference Round 1 (conf 1) game: `WAS` VS `IND`
βΉ Game location: 55019 (dcs33297-2.local)
βΉ WAS VS IND match complete in 120 minutes
β Winner: IND
ββ Playing Conference Round 1 (conf 1) game: `MIA` VS `MIL`
βΉ Game location: 55016 (dcs33297-2.local)
βΉ MIA VS MIL match complete in 120 minutes
β Winner: MIL
ββ Playing Conference Round 1 (conf 1) game: `BKN` VS `PHI`
βΉ Game location: 55017 (dcs33297-2.local)
βΉ BKN VS PHI match complete in 120 minutes
β Winner: PHI
ββ Playing Conference Round 1 (conf 1) game: `ORL` VS `TOR`
βΉ Game location: 55018 (dcs33297-2.local)
βΉ ORL VS TOR match complete in 120 minutes
β Winner: TOR
ββ Conference Round 1 (conf 1) COMPLETE! ββ
ββ Conference Semi finals (conf 1) started! ββ
ββ Playing Conference Semi finals (conf 1) game: `IND` VS `PHI`
βΉ Game location: 55019 (dcs33297-2.local)
βΉ IND VS PHI match complete in 120 minutes
β Winner: PHI
ββ Playing Conference Semi finals (conf 1) game: `MIL` VS `TOR`
βΉ Game location: 55016 (dcs33297-2.local)
βΉ MIL VS TOR match complete in 120 minutes
β Winner: MIL
ββ Conference Semi finals (conf 1) COMPLETE! ββ
ββ Conference finals (conf 1) started! ββ
ββ Playing Conference finals (conf 1) game: `PHI` VS `MIL`
βΉ Game location: 55019 (dcs33297-2.local)
βΉ PHI VS MIL match complete in 120 minutes
β Winner: MIL
ββ Conference finals (conf 1) COMPLETE! ββ
ββ Conference Round 1 (conf 2) started! ββ
ββ Playing Conference Round 1 (conf 2) game: `POR` VS `OKC`
βΉ Game location: 55140 (dcs33297-2.local)
βΉ POR VS OKC match complete in 120 minutes
β Winner: POR
ββ Playing Conference Round 1 (conf 2) game: `NOP` VS `MIN`
βΉ Game location: 55139 (dcs33297-2.local)
βΉ NOP VS MIN match complete in 120 minutes
β Winner: NOP
ββ Playing Conference Round 1 (conf 2) game: `LAC` VS `PHX`
βΉ Game location: 55137 (dcs33297-2.local)
βΉ LAC VS PHX match complete in 120 minutes
β Winner: LAC
ββ Playing Conference Round 1 (conf 2) game: `DEN` VS `UTA`
βΉ Game location: 55138 (dcs33297-2.local)
βΉ DEN VS UTA match complete in 132.5 minutes
β Winner: DEN
ββ Conference Round 1 (conf 2) COMPLETE! ββ
ββ Conference Semi finals (conf 2) started! ββ
ββ Playing Conference Semi finals (conf 2) game: `LAC` VS `DEN`
βΉ Game location: 55140 (dcs33297-2.local)
βΉ LAC VS DEN match complete in 120 minutes
β Winner: DEN
ββ Playing Conference Semi finals (conf 2) game: `POR` VS `NOP`
βΉ Game location: 55139 (dcs33297-2.local)
βΉ POR VS NOP match complete in 145 minutes
β Winner: POR
ββ Conference Semi finals (conf 2) COMPLETE! ββ
ββ Conference finals (conf 2) started! ββ
ββ Playing Conference finals (conf 2) game: `DEN` VS `POR`
βΉ Game location: 55140 (dcs33297-2.local)
βΉ DEN VS POR match complete in 120 minutes
β Winner: POR
ββ Conference finals (conf 2) COMPLETE! ββ
ββ ALL Conference matches COMPLETE! ββ
ββ Play off Finals started! ββ
ββ Playing Play off Finals game: `MIL` VS `POR`
βΉ Game location: 55345 (dcs33297-2.local)
βΉ MIL VS POR match complete in 145 minutes
β Winner: MIL
ββ Play off Finals COMPLETE! ββ
ββ Playoffs complete! ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Winner: Milwaukee Bucks (MIL)
Total Play-offs Duration: 21.398 sec elapsed
Success! Now each game in each round in played on a separate process and the total elapsed time is now nearer ~21s!
We can confirm this by also examining match logs.
And yet again, correct handling of seeds means we again have the same winner!
Rows: 15 Columns: 9
ββ Column specification ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Delimiter: ","
chr (5): winner, team_1, team_2, node, round_name
dbl (3): pid, game_length, conf
dttm (1): date
βΉ Use `spec()` to retrieve the full column specification for this data.
βΉ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Weβve successfully managed to:
Examine a more complex example which includes sequential and parallel sections of code as well as the potential for nested parallelisation.
Use the
future.apply
π¦ to paralellise lapply workflows.Use lists of plans to specify nested parallelisation.
Demonstrate the reproducibility of random seeds and stochastic processes across varying parallelisation plans using
future
verse packages