Iterative procedures in R are usually accomplished serially: First, step one is completed, then step two is completed, then step three is completed, and so on. But if the completion of step two doesn’t rely on the completion of step one, then another—faster—approach is on the table: parallelization, the simultaneous execution of multiple processes or computations. We can accomplish this by dividing an iterative task, such as a bootstrap, into chunks and passing the chunks to multiple computer cores using the “parallel” R package. Parallelization can slash runtimes dramatically, and it’s an important technique for anyone who implements time-intensive computational processes like resampling. The workshop will discuss the parallelization of iterative processes implemented with the *apply() function family and with for loops; the simultaneous execution of arbitrary code across computer cores; operating system–specific considerations to be aware of; and best practices (e.g., managing random-number generation across cores and ensuring the reproducibility of parallelized processes).
Familiarity with R is a prerequisite for this workshop. Contact jacobgg@virginia.edu with questions.
Our policy is that we do not record live workshops in order to encourage robust Q&A. However, you can always find the full workshop materials for all of our workshops at: https://library.virginia.edu/data/training/past-workshops which allows you to work through the material at your own pace. We also encourage you to reach out to the instructor at any time for a one-on-one consult, and for specific or general questions about any of the topics we cover.