Assessing R code Performance

Basics

Benchmarking

Benchmarking refers to timing the execution of our code, although aspects of memory utilisation can also be compared during benchmarking. Benchmarking can be really useful for assessing improvements in performance when optimising code and also to get ideas of what resources you’ll need to request when running your code on the cluster.

There are many ways to benchmark code in R:

Simple benchmarking:
- system.time(): Base R function takes an expression and returns the CPU time used to evaluate the expression
- 📦 tictoc: Package tictoc provides similar functionality to system.time with some additional useful features. To time code execution, you wrap them between function calls tic and toc.
Formal Benchmarking:
- 📦 microbenchmark: serves as a more accurate replacement of system.time(). Allows us to compare multiple expressions with much shorter execution times.
- 📦 bench: similar to microbenchmark with some additional features, like testing expressions across a grid of parameters, which I generally prefer.

Profiling

The best way to approach code optimisation, and identification of parallelisable parts of code, is to profile our code and identify bottlenecks that need your attention before beginning any optimisation/parallelisation.

TOP TIP:

The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimisation is the root of all evil (or at least most of it) in programming.

Donald Knuth, legendary computer scientist

`profvis()`

The best tool to perform this in R is the profvis() function in the profvis package.

Profvis is a tool for helping you to understand how R spends its time. It provides a interactive graphical interface for visualizing data from Rprof, R’s built-in tool for collecting profiling data.

It is so fundamental that it ships with RStudio and can even be accessed directly from the IDE through the Profile menu.

Parallelisation Top Tip:

If you profile your code and you find you are spending a lot of time in *apply 📦 purrr map_*, walk-* etc calls or for loops where each iterations does not depend on the result of the previous, you have identified an excellent candidate for parallelisation!

Reuse

CC BY-SA 4.0