Exercise 2a: NBA Play-offs

Parallelising R workflows

Dataset background

In this exercise, we’ll be working with NBA data, downloaded and compiled via the nbastatR πŸ“¦.

Workflow objective

The objective of the workflow is to simulate the NBA Play-offs.

The NBA playoffs are split across two conferences, East and West. Within each conference, three rounds are played, the First Round, the Conference semifinals and the Conference finals.

The finalists from each conference then go on to play the Playoffs Final and the winner is the overall NBA playoffs winner!

The Play-offs offer a great opportunity for exploring the interaction between sequential and parallel aspects of a workflow as well as the concept of nested parallelisation.

Games within the same round can be played in parallel. However, each successive round requires the results of the previous round before being able to allocate games within the round. Thus each round must be played sequentially.

Rounds in each conference are independent of each other and can be played in parallel. As aforementioned, within each round games can also be played in parallel.

Exercise materials

The materials for this exercise are contained in the nba/ directory in the course materials.

The data used in this exercise are contained in single csv file nba_stats_summaries_19-21.csv in the data/ sub-folder.

Let’s have a look at it:

# A tibble: 30 Γ— 9
      id_team prop_win name_team        slug_team url_team_season_logo city_team
        <dbl>    <dbl> <chr>            <chr>     <chr>                <chr>    
 1 1610612737    0.386 Atlanta Hawks    ATL       https://stats.nba.c… Atlanta  
 2 1610612738    0.601 Boston Celtics   BOS       https://stats.nba.c… Boston   
 3 1610612739    0.378 Cleveland Caval… CLE       https://stats.nba.c… Cleveland
 4 1610612740    0.464 New Orleans Pel… NOP       https://stats.nba.c… New Orle…
 5 1610612741    0.339 Chicago Bulls    CHI       https://stats.nba.c… Chicago  
 6 1610612742    0.454 Dallas Mavericks DAL       https://stats.nba.c… Dallas   
 7 1610612743    0.605 Denver Nuggets   DEN       https://stats.nba.c… Denver   
 8 1610612744    0.578 Golden State Wa… GSW       https://stats.nba.c… Golden S…
 9 1610612745    0.578 Houston Rockets  HOU       https://stats.nba.c… Houston  
10 1610612746    0.592 LA Clippers      LAC       https://stats.nba.c… LA       
# β„Ή 20 more rows
# β„Ή 3 more variables: id_conference <dbl>, colors_team <chr>,
#   url_thumbnail_team <chr>
Source: nbastatR πŸ“¦
Description:

The dataset has been created using the nbastatR πŸ“¦ and contains information about individual NBA teams including the overall probability of winning a game through the 2019-2021 seasons.

The workflow is based on a typical base R workflow, uses lapply and is made up of two scripts:

This script runs the main simulation workflow. The steps involved include:

  • Source custom functions.
  • Load data.
  • Split data across the two conferences.
  • Use lapply to play qualifiers in each conference.
  • Use lapply to play all rounds in each conference
  • Play playoffs final once both conference winners have been obtained.
  • Announce the winner and save the playoff match logs to a CSV.
# Source function ----
source(here::here("nba", "R", "playoff-functions.R"))

# Load data ----
nba_stats <- readr::read_csv(
  here::here(
    "nba", "data",
    "nba_stats_summaries_19-21.csv"
  )
)


# Play qualifiers ----
tictoc::tic(msg = "Total Play-offs Duration")

cli::cli_h1("Qualifiers have begun!")
split_confs <- split(nba_stats,
  f = nba_stats$id_conference
)
set.seed(5, kind = "L'Ecuyer-CMRG")

qualified_confs <- lapply(
  split_confs,
  play_qualifiers
)

cli::cli_h2("ALL Qualifying matches COMPLETE!")

# Play Conference Rounds ----
cli::cli_h1("Conference Rounds have begun!")

conf_winners <- lapply(
  X = qualified_confs,
  FUN = play_conference,
  nba_stats
)

attr(conf_winners, "match_logs") <- compile_match_logs(conf_winners)
cli::cli_h2("ALL Conference matches COMPLETE!")

# Play Play-offs FINAL ----
cli::cli_h1("Overall Playoff final has begun!")

playoff_winner <- play_round(conf_winners,
  nba_stats,
  round_name = "Play off Finals",
  seed = 7
)

# Announce Winner!
winner_stats <- nba_stats[nba_stats$slug_team == playoff_winner, ]
cli::cli_h1("Playoffs complete!")
cli::cli_alert_success("Winner: {.field {winner_stats$name_team}} ({winner_stats$slug_team})")
tictoc::toc()

# Write match logs to csv ----
fs::dir_create(here::here("nba", "outputs"))
write.csv(attr(playoff_winner, "match_logs"),
  file = here::here("nba", "outputs", "playoff_results.csv"),
  row.names = FALSE
)

This file contains a number of custom functions used to simulate the NBA playoffs.

  • play_conference(): Play all rounds for an individual conference.
  • play_round(): Play a round of matches. This uses lapply to iterate over each match within the round. It also compiles all match_logs of games played with the round and appends to previous round logs.
  • play_match(): Play an individual match and compile match logs. Winner’s are sampled according to their overall probability of winning.
  • play_qualifiers(): Play qualifiers for an individual conference. This samples 8 teams from a given conference according their probability of winning.
#' Play all rounds for an individual conference
#'
#' @param qualified a character vector of team slugs of conference qualifiers
#' @param nba_stats data.frame of nba stats data
#'
#' @return a list of length 1 containg the name slug of the round winner. Match logs
#' for each round are also compiled and assigned to attribute `match_logs` of the output.
play_conference <- function(qualified, nba_stats) {
  round_1 <- play_round(
    qualified,
    nba_stats
  )
  semis <- play_round(
    round_1,
    nba_stats
  )
  finals <- play_round(
    semis,
    nba_stats
  )
  return(finals)
}
#' Play a round of matches. Should either be used for matches of teams in a single
#' conference or for the play-offs final.
#'
#' @param teams list of character strings of team slugs of all teams competing in round.
#' @param nba_stats data.frame of nba stats data
#' @param round_name Round name. If `NULL` (default), round name is auto-detected
#' by number of competing teams. For play-off final should be `"Play off Finals"`.
#' @param seed seed to set for lapply function.
#'
#' @return returns a list of the round winners of each match. Match logs are also
#' compiled and appended to the input's logs as attribute `match_logs` of the output
play_round <- function(teams, nba_stats, round_name = NULL, seed = TRUE) {
  # Detect round name from number of teams
  if (is.null(round_name)) {
    round_id <- as.character(length(teams))

    round_name <- switch(round_id,
      "8" = "Conference Round 1",
      "4" = "Conference Semi finals",
      "2" = "Conference finals"
    )
  }
  if (round_name == "Play off Finals") {
    conf <- NA
    conf_msg <- ""
  } else {
    conf <- unique(nba_stats[nba_stats$slug_team %in% unlist(teams), ]$id_conference)
    conf_msg <- paste0("(conf ", conf, ") ")
  }
  # Signal start of round
  cli::cli_h2("{round_name} {conf_msg}started!")

  # Create list of round pair match ups
  round_pairs <- draw_match_pairs(unlist(teams))

  # Use lapply to play each match
  set.seed(seed, kind = "L'Ecuyer-CMRG")
  round_winners <- lapply(
    X = round_pairs,
    FUN = play_match,
    nba_stats,
    round_name
  )

  # Compile match logs and append them to input match logs attribute
  attr(round_winners, "match_logs") <- rbind(
    attr(teams, "match_logs"),
    compile_match_logs(round_winners)
  )

  # Signal round completion and return results
  cli::cli_h2("{round_name} {conf_msg}COMPLETE!")

  return(round_winners)
}


#' Play a single match
#'
#' @param matchup a character vector of length 2 containing the name slugs of teams
#' competing.
#' @param nba_stats data.frame of nba stats data
#' @param round_name Character string. Round name.
#'
#' @return a list of length 1 containg the name slug of the round winner. Match logs are also
#' appended as attribute `match_logs`.
play_match <- function(matchup, nba_stats, round_name) {
  if (round_name == "Play off Finals") {
    conf <- NA
    conf_msg <- ""
  } else {
    conf <- unique(nba_stats[nba_stats$slug_team %in% matchup, ]$id_conference)
    conf_msg <- paste0("(conf ", conf, ") ")
  }


  # Create list of probabilities for sampling game length
  probs <- list(
    c(0.9452, 0.0481, 0.0057, 6e-04, 4e-04),
    c(0.7007, 0.1452, 0.0902, 0.0413, 0.0225)
  )
  # Assign higher probabilities for longer games to conference 2
  if (length(conf) > 1) {
    prob <- probs[[1]]
  } else {
    prob <- probs[[conf]]
  }

  pid <- Sys.getpid()
  node <- replace_ip(system2("hostname", stdout = TRUE))

  # play game
  cli::cli_h3("Playing {round_name} {conf_msg}game: {.var {matchup[1]}} VS {.var {matchup[2]}}")
  cli::cli_alert_info("Game location: {.val {pid}} ({node})")

  # Sample game length
  game_length <- sample(c(2.40, 2.65, 2.90, 3.15, 3.40), 1,
    prob = prob
  )
  # Send system to sleep to simulate playing match
  Sys.sleep(game_length)

  # SAMPLE WINNER from probability of winning stats
  # subset nba_stats to only team matchup data
  match_df <- nba_stats[nba_stats$slug_team %in% matchup, ]
  # sample winner
  winner <- sample(match_df$slug_team, 1, prob = match_df$prop_win)


  # print messages
  cli::cli_alert_info("{matchup[1]} VS {matchup[2]} match complete in {game_length * 50} minutes")
  cli::cli_alert_success("Winner: {winner}")

  # Compile match information into match logs data.frame and append as attribute.
  match_logs <- data.frame(
    winner = winner,
    team_1 = matchup[1],
    team_2 = matchup[2],
    pid = pid,
    node = node,
    game_length = game_length * 50,
    date = Sys.time(),
    conf = conf,
    round_name = round_name
  )
  attr(winner, "match_logs") <- match_logs

  return(winner)
}
#' Play Conference qualifiers
#'
#' @param teams a character vector of team slugs of teams competing in qualifiers
#'
#' @return a character vector of team slugs of qualified teams.
play_qualifiers <- function(teams) {
  conf <- unique(teams$id_conference)
  # play season
  cli::cli_h2("Playing qualifiers for conference {.val {conf}} on {.var {Sys.getpid()}}")
  Sys.sleep(5)
  # sample qualifiers
  qualified <- sample(teams$slug_team, 8, prob = teams$prop_win)

  cli::cli_alert_success("Conference {.val {conf}} qualifying round complete")

  return(qualified)
}
#' Draw pair matches.
#'
#' @param teams a character vector of team slugs of teams competing in round
#'
#' @return a list half the size of `teams` containing match pairs.
draw_match_pairs <- function(teams) {
  n_teams <- length(teams)
  matches <- sample(rep(1:(n_teams / 2), each = 2), size = n_teams)
  match_pairs <- split(teams, matches)
  return(match_pairs)
}
# Compile and order match_logs from lists of results
compile_match_logs <- function(x) {
  logs_list <- lapply(x, function(x) {
    attr(x, "match_logs")
  })
  logs <- do.call(rbind, logs_list)
  logs[order(logs$date), ]
}
# Function replaces local IP addresses with fake IP address
replace_ip <- function(hostname) {
  # Regular expression for IPv4
  ipv4_pattern <- "\\b(?:[0-9]{1,3}\\.){3}[0-9]{1,3}\\b"
  # Regular expression for IPv6
  ipv6_pattern <- "\\b(?:[A-Fa-f0-9]{1,4}:){7}[A-Fa-f0-9]{1,4}\\b"

  # Replace detected IPv4 addresses with a fake address
  hostname <- gsub(ipv4_pattern, "192.0.2.0", hostname)
  # Replace detected IPv6 addresses with a fake address
  hostname <- gsub(ipv6_pattern, "2001:db8::", hostname)

  hostname
}

While the code is run on independent subsets of the data, we do need to aggregate the results at each step. So here we are dealing with a classic map reduce problem.

Run workflow

Let’s go ahead and work through nba-playoffs.R to see what the workflow entails.

Rows: 30 Columns: 9                                                                                                                                              
── Column specification ───────────────────────────────────────────────────────────────────
Delimiter: ","
chr (6): name_team, slug_team, url_team_season_logo, city_team, colors_team, url_thumbnail_team
dbl (3): id_team, prop_win, id_conference

β„Ή Use `spec()` to retrieve the full column specification for this data.
β„Ή Specify the column types or set `show_col_types = FALSE` to quiet this message.

── Qualifiers have begun! ────────────────────────────────────────────────────────────────

── Playing qualifiers for conference 1 on `83228` ──

βœ” Conference 1 qualifying round complete

── Playing qualifiers for conference 2 on `83228` ──

βœ” Conference 2 qualifying round complete

── ALL Qualifying matches COMPLETE! ──

── Conference Rounds have begun! ──────────────────────────────────────────────────────────

── Conference Round 1 (conf 1) started! ──

── Playing Conference Round 1 (conf 1) game: `TOR` VS `PHI` 
β„Ή Game location: 83228 (dcs33297-2.local)
β„Ή TOR VS PHI match complete in 120 minutes
βœ” Winner: TOR

── Playing Conference Round 1 (conf 1) game: `CLE` VS `MIA` 
β„Ή Game location: 83228 (dcs33297-2.local)
β„Ή CLE VS MIA match complete in 120 minutes
βœ” Winner: CLE

── Playing Conference Round 1 (conf 1) game: `IND` VS `NYK` 
β„Ή Game location: 83228 (dcs33297-2.local)
β„Ή IND VS NYK match complete in 120 minutes
βœ” Winner: IND

── Playing Conference Round 1 (conf 1) game: `MIL` VS `WAS` 
β„Ή Game location: 83228 (dcs33297-2.local)
β„Ή MIL VS WAS match complete in 120 minutes
βœ” Winner: MIL

── Conference Round 1 (conf 1) COMPLETE! ──

── Conference Semi finals (conf 1) started! ──

── Playing Conference Semi finals (conf 1) game: `CLE` VS `MIL` 
β„Ή Game location: 83228 (dcs33297-2.local)
β„Ή CLE VS MIL match complete in 120 minutes
βœ” Winner: MIL

── Playing Conference Semi finals (conf 1) game: `TOR` VS `IND` 
β„Ή Game location: 83228 (dcs33297-2.local)
β„Ή TOR VS IND match complete in 120 minutes
βœ” Winner: IND

── Conference Semi finals (conf 1) COMPLETE! ──

── Conference finals (conf 1) started! ──

── Playing Conference finals (conf 1) game: `MIL` VS `IND` 
β„Ή Game location: 83228 (dcs33297-2.local)
β„Ή MIL VS IND match complete in 120 minutes
βœ” Winner: MIL

── Conference finals (conf 1) COMPLETE! ──

── Conference Round 1 (conf 2) started! ──

── Playing Conference Round 1 (conf 2) game: `NOP` VS `DAL` 
β„Ή Game location: 83228 (dcs33297-2.local)
β„Ή NOP VS DAL match complete in 120 minutes
βœ” Winner: NOP

── Playing Conference Round 1 (conf 2) game: `SAS` VS `POR` 
β„Ή Game location: 83228 (dcs33297-2.local)
β„Ή SAS VS POR match complete in 145 minutes
βœ” Winner: SAS

── Playing Conference Round 1 (conf 2) game: `LAL` VS `UTA` 
β„Ή Game location: 83228 (dcs33297-2.local)
β„Ή LAL VS UTA match complete in 132.5 minutes
βœ” Winner: UTA

── Playing Conference Round 1 (conf 2) game: `OKC` VS `SAC` 
β„Ή Game location: 83228 (dcs33297-2.local)
β„Ή OKC VS SAC match complete in 120 minutes
βœ” Winner: OKC

── Conference Round 1 (conf 2) COMPLETE! ──

── Conference Semi finals (conf 2) started! ──

── Playing Conference Semi finals (conf 2) game: `SAS` VS `OKC` 
β„Ή Game location: 83228 (dcs33297-2.local)
β„Ή SAS VS OKC match complete in 120 minutes
βœ” Winner: OKC

── Playing Conference Semi finals (conf 2) game: `NOP` VS `UTA` 
β„Ή Game location: 83228 (dcs33297-2.local)
β„Ή NOP VS UTA match complete in 145 minutes
βœ” Winner: NOP

── Conference Semi finals (conf 2) COMPLETE! ──

── Conference finals (conf 2) started! ──

── Playing Conference finals (conf 2) game: `OKC` VS `NOP` 
β„Ή Game location: 83228 (dcs33297-2.local)
β„Ή OKC VS NOP match complete in 120 minutes
βœ” Winner: OKC

── Conference finals (conf 2) COMPLETE! ──

── Overall Playoff final has begun! ─────────────────────────────────────────────────────

── Play off Finals started! ──

── Playing Play off Finals game: `MIL` VS `OKC` 
β„Ή Game location: 83228 (dcs33297-2.local)
β„Ή MIL VS OKC match complete in 132.5 minutes
βœ” Winner: MIL

── Play off Finals COMPLETE! ──

βœ” Winner: Milwaukee Bucks (MIL)

Total Play-offs Duration: 48.822 sec elapsed

So Milwaukee Bucks have been crowned the champions! (Go Giannis!!!) and it’s taken around 50s to run the whole Playoffs sequentially.

We can confirm it’s sequential because all games are played in the same PID.

Parallelise workflow with future.apply

Let’s move on to setting up our code to be able to run in parallel. Just like πŸ“¦ furrr() offers drop in replacements for functions in the purrr πŸ“¦, πŸ“¦ future.apply offers drop in replacements for the *apply family of functions.

First let’s make copies of the files we’re going to modify.

Modify functions script.

Let’s start with R/playoff-functions.R.

Copy R/playoff-functions.R

Let’s make a copy of R/playoff-functions.R for us to edit in the same nba/R directory and name it playoff-future_apply-functions.R

Next, let’s start editing R/playoff-future_apply-functions.R.

Important

Make sure you are editing R/playoff-future_apply-functions.R and not R/playoff-functions.R. To be sure it might be easiest to close R/playoff-functions.R.

Modify R/playoff-future_apply-functions.R

Replace lapply() in play_round()

In the play_round() function, replace:

  # Use lapply to play each match
  set.seed(seed, kind = "L'Ecuyer-CMRG")
  round_winners <- lapply(
    X = round_pairs,
    FUN = play_match,
    nba_stats,
    round_name
  )

with:

  # Use future_lapply to play each match
  round_winners <- future.apply::future_lapply(
    X = round_pairs,
    FUN = play_match,
    nba_stats,
    round_name,
    future.seed = seed
  )

That’s the only modification we need to make to our functions script so let’s move on to the workflow script.

Modify workflow script

Copy nba-playoffs.R

Let’s make a copy of nba-playoffs.R for us to edit in the same nba/ directory and name it nba-playoffs-future_apply.R

Next, let’s start editing nba-playoffs-future_apply.R.

Important

Make sure you are editing nba-playoffs-future_apply.R and not nba-playoffs.R. To be sure it might be easiest to close nba-playoffs.R.

Modify nba-playoffs-future_apply.R

Load the future.apply library

Let’s make sure the future.apply library is loaded at the start of our workflow.

Add:

# Load Libraries ----
library(future.apply)

To the top of the script.

Change the path to the function file being sourced.

To make sure the modified functions using future_lapply are being loaded, change:

source(here::here("nba", "R", "playoff-functions.R"))

to:

source(here::here("nba", "R", "playoff-future_apply-functions.R"))

Replace lapply() when playing conference qualifiers.

In the section where conference qualifiers are being played, change:

set.seed(5, kind = "L'Ecuyer-CMRG")

qualified_confs <- lapply(
    split_confs,
    play_qualifiers)

to:

plan(sequential)
qualified_confs <- future_lapply(
    split_confs,
    play_qualifiers,
    future.seed = 5)

Replace lapply() when playing conference rounds.

In the section where conference rounds are being played, change:

conf_winners <- lapply(
    X = qualified_confs,
    FUN = play_conference,
    nba_stats)

to:

plan(sequential)
conf_winners <- future_lapply(
    X = qualified_confs,
    FUN = play_conference,
    nba_stats,
    future.seed = 8)

Add a plan() call to the finals round.

Let’s also add another plan call to the final play_round() call.

Replace:

playoff_winner <- play_round(conf_winners,
                             nba_stats,
                             round_name = "Play off Finals",
                             seed = 7)

with:

plan(sequential)
playoff_winner <- play_round(conf_winners,
                             nba_stats,
                             round_name = "Play off Finals",
                             seed = 7)

That’s it! We’ve now replaced all instances of lapply with the future_lapply() version which allows us to run the workflow both sequentially and in parallel by switching the plan.

Run nba-playoffs-future_apply.R workflow sequentially.

Now let’s run our modified nba-playoffs-future_apply.R script to check that it works.

...

── Conference Rounds have begun! ───────────────────────────────────────────────────────────────────

── Conference Round 1 (conf 1) started! ──

── Playing Conference Round 1 (conf 1) game: `WAS` VS `IND` 
β„Ή Game location: 9665 (dcs33297-2.local)
β„Ή WAS VS IND match complete in 120 minutes
βœ” Winner: IND

── Playing Conference Round 1 (conf 1) game: `MIA` VS `MIL` 
β„Ή Game location: 9665 (dcs33297-2.local)
β„Ή MIA VS MIL match complete in 120 minutes
βœ” Winner: MIL

── Playing Conference Round 1 (conf 1) game: `BKN` VS `PHI` 
β„Ή Game location: 9665 (dcs33297-2.local)
β„Ή BKN VS PHI match complete in 120 minutes
βœ” Winner: PHI

── Playing Conference Round 1 (conf 1) game: `ORL` VS `TOR` 
β„Ή Game location: 9665 (dcs33297-2.local)
β„Ή ORL VS TOR match complete in 120 minutes
βœ” Winner: TOR

── Conference Round 1 (conf 1) COMPLETE! ──

── Conference Semi finals (conf 1) started! ──

── Playing Conference Semi finals (conf 1) game: `IND` VS `PHI` 
β„Ή Game location: 9665 (dcs33297-2.local)
β„Ή IND VS PHI match complete in 120 minutes
βœ” Winner: PHI

── Playing Conference Semi finals (conf 1) game: `MIL` VS `TOR` 
β„Ή Game location: 9665 (dcs33297-2.local)
β„Ή MIL VS TOR match complete in 120 minutes
βœ” Winner: MIL

── Conference Semi finals (conf 1) COMPLETE! ──

── Conference finals (conf 1) started! ──

── Playing Conference finals (conf 1) game: `PHI` VS `MIL` 
β„Ή Game location: 9665 (dcs33297-2.local)
β„Ή PHI VS MIL match complete in 120 minutes
βœ” Winner: MIL

── Conference finals (conf 1) COMPLETE! ──

── Conference Round 1 (conf 2) started! ──

── Playing Conference Round 1 (conf 2) game: `POR` VS `OKC` 
β„Ή Game location: 9665 (dcs33297-2.local)
β„Ή POR VS OKC match complete in 120 minutes
βœ” Winner: POR

── Playing Conference Round 1 (conf 2) game: `NOP` VS `MIN` 
β„Ή Game location: 9665 (dcs33297-2.local)
β„Ή NOP VS MIN match complete in 120 minutes
βœ” Winner: NOP

── Playing Conference Round 1 (conf 2) game: `LAC` VS `PHX` 
β„Ή Game location: 9665 (dcs33297-2.local)
β„Ή LAC VS PHX match complete in 120 minutes
βœ” Winner: LAC

── Playing Conference Round 1 (conf 2) game: `DEN` VS `UTA` 
β„Ή Game location: 9665 (dcs33297-2.local)
β„Ή DEN VS UTA match complete in 132.5 minutes
βœ” Winner: DEN

── Conference Round 1 (conf 2) COMPLETE! ──

── Conference Semi finals (conf 2) started! ──

── Playing Conference Semi finals (conf 2) game: `LAC` VS `DEN` 
β„Ή Game location: 9665 (dcs33297-2.local)
β„Ή LAC VS DEN match complete in 120 minutes
βœ” Winner: DEN

── Playing Conference Semi finals (conf 2) game: `POR` VS `NOP` 
β„Ή Game location: 9665 (dcs33297-2.local)
β„Ή POR VS NOP match complete in 145 minutes
βœ” Winner: POR

── Conference Semi finals (conf 2) COMPLETE! ──

── Conference finals (conf 2) started! ──

── Playing Conference finals (conf 2) game: `DEN` VS `POR` 
β„Ή Game location: 9665 (dcs33297-2.local)
β„Ή DEN VS POR match complete in 120 minutes
βœ” Winner: POR

── Conference finals (conf 2) COMPLETE! ──

── ALL Conference matches COMPLETE! ──

── Overall Playoff final has begun! ───────────────────────────────────

── Play off Finals started! ──

── Playing Play off Finals game: `MIL` VS `POR` 
β„Ή Game location: 9665 (dcs33297-2.local)
β„Ή MIL VS POR match complete in 145 minutes
βœ” Winner: MIL

── Play off Finals COMPLETE! ──

── Playoffs complete! ─────────────────────────────────────────────────
βœ” Winner: Milwaukee Bucks (MIL)
Total Play-offs Duration: 48.617 sec elapsed

We can see that the workflow is run sequentially and that it takes about the same time as the lapply version.

Modify nba-playoffs-future_apply.R workflow to run in parallel

Let’s now modify nba-playoffs-future_apply.R to be run it in parallel.

Change every call to

plan(sequential)

to:

plan(multisession)

There should be three instances in the script.

Run nba-playoffs-future_apply.R

── Playing qualifiers for conference 1 on `50120` ──

βœ” Conference 1 qualifying round complete

── Playing qualifiers for conference 2 on `50125` ──

βœ” Conference 2 qualifying round complete

── ALL Qualifying matches COMPLETE! ──

── Conference Rounds have begun! ──────────────────────────────────────────────────────

── Conference Round 1 (conf 1) started! ──

── Playing Conference Round 1 (conf 1) game: `WAS` VS `IND` 
β„Ή Game location: 50490 (dcs33297-2.local)
β„Ή WAS VS IND match complete in 120 minutes
βœ” Winner: IND

── Playing Conference Round 1 (conf 1) game: `MIA` VS `MIL` 
β„Ή Game location: 50490 (dcs33297-2.local)
β„Ή MIA VS MIL match complete in 120 minutes
βœ” Winner: MIL

── Playing Conference Round 1 (conf 1) game: `BKN` VS `PHI` 
β„Ή Game location: 50490 (dcs33297-2.local)
β„Ή BKN VS PHI match complete in 120 minutes
βœ” Winner: PHI

── Playing Conference Round 1 (conf 1) game: `ORL` VS `TOR` 
β„Ή Game location: 50490 (dcs33297-2.local)
β„Ή ORL VS TOR match complete in 120 minutes
βœ” Winner: TOR

── Conference Round 1 (conf 1) COMPLETE! ──

── Conference Semi finals (conf 1) started! ──

── Playing Conference Semi finals (conf 1) game: `IND` VS `PHI` 
β„Ή Game location: 50490 (dcs33297-2.local)
β„Ή IND VS PHI match complete in 120 minutes
βœ” Winner: PHI

── Playing Conference Semi finals (conf 1) game: `MIL` VS `TOR` 
β„Ή Game location: 50490 (dcs33297-2.local)
β„Ή MIL VS TOR match complete in 120 minutes
βœ” Winner: MIL

── Conference Semi finals (conf 1) COMPLETE! ──

── Conference finals (conf 1) started! ──

── Playing Conference finals (conf 1) game: `PHI` VS `MIL` 
β„Ή Game location: 50490 (dcs33297-2.local)
β„Ή PHI VS MIL match complete in 120 minutes
βœ” Winner: MIL

── Conference finals (conf 1) COMPLETE! ──


── Conference Round 1 (conf 2) started! ──

── Playing Conference Round 1 (conf 2) game: `POR` VS `OKC` 
β„Ή Game location: 50491 (dcs33297-2.local)
β„Ή POR VS OKC match complete in 120 minutes
βœ” Winner: POR

── Playing Conference Round 1 (conf 2) game: `NOP` VS `MIN` 
β„Ή Game location: 50491 (dcs33297-2.local)
β„Ή NOP VS MIN match complete in 120 minutes
βœ” Winner: NOP

── Playing Conference Round 1 (conf 2) game: `LAC` VS `PHX` 
β„Ή Game location: 50491 (dcs33297-2.local)
β„Ή LAC VS PHX match complete in 120 minutes
βœ” Winner: LAC

── Playing Conference Round 1 (conf 2) game: `DEN` VS `UTA` 
β„Ή Game location: 50491 (dcs33297-2.local)
β„Ή DEN VS UTA match complete in 132.5 minutes
βœ” Winner: DEN

── Conference Round 1 (conf 2) COMPLETE! ──

── Conference Semi finals (conf 2) started! ──

── Playing Conference Semi finals (conf 2) game: `LAC` VS `DEN` 
β„Ή Game location: 50491 (dcs33297-2.local)
β„Ή LAC VS DEN match complete in 120 minutes
βœ” Winner: DEN

── Playing Conference Semi finals (conf 2) game: `POR` VS `NOP` 
β„Ή Game location: 50491 (dcs33297-2.local)
β„Ή POR VS NOP match complete in 145 minutes
βœ” Winner: POR

── Conference Semi finals (conf 2) COMPLETE! ──

── Conference finals (conf 2) started! ──

── Playing Conference finals (conf 2) game: `DEN` VS `POR` 
β„Ή Game location: 50491 (dcs33297-2.local)
β„Ή DEN VS POR match complete in 120 minutes
βœ” Winner: POR

── Conference finals (conf 2) COMPLETE! ──

> 
> attr(conf_winners, "match_logs") <- compile_match_logs(conf_winners)
> cli::cli_h2("ALL Conference matches COMPLETE!")

── ALL Conference matches COMPLETE! ──

── Play off Finals started! ──


── Playing Play off Finals game: `MIL` VS `POR` 
β„Ή Game location: 50882 (dcs33297-2.local)
β„Ή MIL VS POR match complete in 145 minutes
βœ” Winner: MIL
── Play off Finals COMPLETE! ──

── Playoffs complete! ──────────────────────────────────────────────────────────────────────────────

βœ” Winner: Milwaukee Bucks (MIL)

Total Play-offs Duration: 29.459 sec elapsed

Great! We’ve successfully managed to run some of our workflow in parallel and brought execution time down to ~30s.

Also note that we got the same winner as we did when we ran the workflow sequentially, Milwaukee Bucks (MIL). This reproducibility is an extremely important property of running stochastic processes correctly in parallel which is handled for us automatically by future.apply through the future.seed argument.

Only the outer level of calls to future_apply were run in parallel (i.e. the ones splitting the workload over conferences). The inner level, i.e. matches within the same round have been run sequentially.

Add nested parallelisation

To enable both the outer and inner levels of future_lapply to be run in parallel, we need to set up a nested parallelisation plan. This applies only to the Conference Rounds section.

Calculating the number of workers

To determine a nested plan, we need to calculate the number of workers we want to allocate to each level of parallelisation.

In this example, the outer layer (Eastern and Western conference) will require 2 cores.

outer_cores <- 2L

The rest we can allocate to the inner level.

To make sure we don’t allocate more cores than we have available, when calculating the cores available to each inner plan we: - use the availableCores() function from the parallely package to determine the total number of cores left, omitting the 2 cores we have allocated to the outer plan. - We use %/% to divide the cores left by the number of outer cores. The use of %/% ensures that the result is an integer.

inner_cores <- parallelly::availableCores(
    omit = outer_cores) %/% outer_cores

Creating a nested parallelisation plan

Now we’re ready to create a nested execution plan and allocate the correct number of workers to each.

To do so we provide a list containing the evaluation strategy we want at each level of nested-ness. To set the appropriate number of workers on each one, we wrap each evaluation strategy definition in function tweak() which allows us to override the default values.

Also note that, because the Futureverse has a built-in protection, we need to declare nested workers using the As-Is I(.) function, which basically tells the parallel framework β€œtrust us, we know what we are doing”.

to do this, we replace:

plan(multisession)

with:

plan(list(
  tweak(multisession, workers = outer_cores),
  tweak(multisession, workers = I(inner_cores))
))

The only section which has changed should now look like this:

# Set up parallel plan
outer_cores <- 2L
inner_cores <- parallelly::availableCores(
  omit = outer_cores
) %/% outer_cores

plan(list(
  tweak(multisession, workers = outer_cores),
  tweak(multisession, workers = I(inner_cores))
))

This adds two levels of parallelisation to the plan by supplying a list of plans of length 2.

Note that, because the first level of multisession parallelisation can generate good defaults for allocating work across available workers, we could have left it as multisession without tweaking it and only configure the additional layers manually to specify the number of workers (processes) in each additional level.

So the following would have worked just as well:

plan(list(
  multisession,
  tweak(multisession, workers = I(inner_cores))
))

Run nba-playoffs-future_apply.R

...

── Conference Rounds have begun! ───────────────────────────────────────────────────────────────────

── Conference Round 1 (conf 1) started! ──


── Playing Conference Round 1 (conf 1) game: `WAS` VS `IND` 
β„Ή Game location: 55019 (dcs33297-2.local)
β„Ή WAS VS IND match complete in 120 minutes
βœ” Winner: IND

── Playing Conference Round 1 (conf 1) game: `MIA` VS `MIL` 
β„Ή Game location: 55016 (dcs33297-2.local)
β„Ή MIA VS MIL match complete in 120 minutes
βœ” Winner: MIL

── Playing Conference Round 1 (conf 1) game: `BKN` VS `PHI` 
β„Ή Game location: 55017 (dcs33297-2.local)
β„Ή BKN VS PHI match complete in 120 minutes
βœ” Winner: PHI

── Playing Conference Round 1 (conf 1) game: `ORL` VS `TOR` 
β„Ή Game location: 55018 (dcs33297-2.local)
β„Ή ORL VS TOR match complete in 120 minutes
βœ” Winner: TOR
── Conference Round 1 (conf 1) COMPLETE! ──

── Conference Semi finals (conf 1) started! ──


── Playing Conference Semi finals (conf 1) game: `IND` VS `PHI` 
β„Ή Game location: 55019 (dcs33297-2.local)
β„Ή IND VS PHI match complete in 120 minutes
βœ” Winner: PHI

── Playing Conference Semi finals (conf 1) game: `MIL` VS `TOR` 
β„Ή Game location: 55016 (dcs33297-2.local)
β„Ή MIL VS TOR match complete in 120 minutes
βœ” Winner: MIL
── Conference Semi finals (conf 1) COMPLETE! ──

── Conference finals (conf 1) started! ──


── Playing Conference finals (conf 1) game: `PHI` VS `MIL` 
β„Ή Game location: 55019 (dcs33297-2.local)
β„Ή PHI VS MIL match complete in 120 minutes
βœ” Winner: MIL
── Conference finals (conf 1) COMPLETE! ──


── Conference Round 1 (conf 2) started! ──


── Playing Conference Round 1 (conf 2) game: `POR` VS `OKC` 
β„Ή Game location: 55140 (dcs33297-2.local)
β„Ή POR VS OKC match complete in 120 minutes
βœ” Winner: POR

── Playing Conference Round 1 (conf 2) game: `NOP` VS `MIN` 
β„Ή Game location: 55139 (dcs33297-2.local)
β„Ή NOP VS MIN match complete in 120 minutes
βœ” Winner: NOP

── Playing Conference Round 1 (conf 2) game: `LAC` VS `PHX` 
β„Ή Game location: 55137 (dcs33297-2.local)
β„Ή LAC VS PHX match complete in 120 minutes
βœ” Winner: LAC

── Playing Conference Round 1 (conf 2) game: `DEN` VS `UTA` 
β„Ή Game location: 55138 (dcs33297-2.local)
β„Ή DEN VS UTA match complete in 132.5 minutes
βœ” Winner: DEN
── Conference Round 1 (conf 2) COMPLETE! ──

── Conference Semi finals (conf 2) started! ──


── Playing Conference Semi finals (conf 2) game: `LAC` VS `DEN` 
β„Ή Game location: 55140 (dcs33297-2.local)
β„Ή LAC VS DEN match complete in 120 minutes
βœ” Winner: DEN

── Playing Conference Semi finals (conf 2) game: `POR` VS `NOP` 
β„Ή Game location: 55139 (dcs33297-2.local)
β„Ή POR VS NOP match complete in 145 minutes
βœ” Winner: POR
── Conference Semi finals (conf 2) COMPLETE! ──

── Conference finals (conf 2) started! ──


── Playing Conference finals (conf 2) game: `DEN` VS `POR` 
β„Ή Game location: 55140 (dcs33297-2.local)
β„Ή DEN VS POR match complete in 120 minutes
βœ” Winner: POR
── Conference finals (conf 2) COMPLETE! ──

── ALL Conference matches COMPLETE! ──

── Play off Finals started! ──


── Playing Play off Finals game: `MIL` VS `POR` 
β„Ή Game location: 55345 (dcs33297-2.local)
β„Ή MIL VS POR match complete in 145 minutes
βœ” Winner: MIL
── Play off Finals COMPLETE! ──


── Playoffs complete! ──────────────────────────────────────────────────────────────────────────────

βœ” Winner: Milwaukee Bucks (MIL)

Total Play-offs Duration: 21.398 sec elapsed

Success! Now each game in each round in played on a separate process and the total elapsed time is now nearer ~21s!

We can confirm this by also examining match logs.

And yet again, correct handling of seeds means we again have the same winner!

View(readr::read_csv(here::here("nba", "outputs", "playoff_results.csv")))
Rows: 15 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (5): winner, team_1, team_2, node, round_name
dbl  (3): pid, game_length, conf
dttm (1): date

β„Ή Use `spec()` to retrieve the full column specification for this data.
β„Ή Specify the column types or set `show_col_types = FALSE` to quiet this message.
Summary

We’ve successfully managed to:

  • Examine a more complex example which includes sequential and parallel sections of code as well as the potential for nested parallelisation.

  • Use the future.apply πŸ“¦ to paralellise lapply workflows.

  • Use lists of plans to specify nested parallelisation.

  • Demonstrate the reproducibility of random seeds and stochastic processes across varying parallelisation plans using futureverse packages

Back to top