2. Goals for This Course


Throughout this course we will run and build upon an R script example in hands-on portions. This is meant to be a unifying red thread through the course and also show you a typical project life cycle that we see on the Yen cluster.

Introducing R Example

In this course, we will work with an R code that runs a bootstrap analysis in which a subset of iris dataset is resampled 50,000 times, a generalized linear model is fit on each sample and the results are combined into a table (one row for each sample).

Save the following code to a script on your local machine called iris-parallel-bootstrap.R.

library(foreach)
library(doParallel)

# set the number of cores here
ncore <- detectCores()

print(paste('running on', ncore, 'cores'))

# register parallel backend to limit threads to the value specified in ncore variable
registerDoParallel(ncore)

# get subset of Iris data set
x <- iris[which(iris[,5] != "setosa"), c(1,5)]

# number of bootstrap computations
trials <- 50000

# time the for loop
system.time({
  r <- foreach(icount(trials), .combine=rbind) %dopar% {

    # resample with replacement for one bootstrap computation
    ind <- sample(100, 100, replace=TRUE)

    # build a linear model
    result <- glm(x[ind, 2] ~ x[ind, 1], family = binomial(logit))
    coefficients(result)
  }
})

Run it on your local machine

You will need to install two R libraries and R itself if you do not have it yet for the example to work.

To install R packages without using RStudio, open a terminal or a Gitbash window.

Mac OS X

Open the Terminal application (it’s in the Utilities folder of the Applications folder). You can also search for it by typing Command then hitting a space bar to bring up the Spotlight Search bar. Start typing terminal and a new terminal window will pop up.

mac terminal

Windows

Windows does not come with a Terminal application but there are plenty of free and paid terminal emulation software. Personally, I like to use GitBash but feel free to explore and find the terminal app that works best for you. When installing GitBash, the default options are good. Once installed, open a new GitBash window.

Once you have a terminal open and have R installed, launch R interactive console by typing R.

You should see the following:

R version 4.0.3 (2020-10-10) -- "Bunny-Wunnies Freak Out"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin17.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

>

Next, we will install R packages:

> install.packages('foreach')
> install.packages('doParallel')

Select a CRAN mirror when asked (anywhere in the US is fine) and let the installation complete.

Once both packages are installed successfully, let’s run the R code on the command line.

Quit R to get back to the terminal:

> q()
Save workspace image? [y/n/c]: n

Run R code:

Rscript iris-parallel-bootstrap.R

You should see the output printed to the terminal:

Loading required package: iterators
Loading required package: parallel
[1] "running on 12 cores"
   user  system elapsed
138.953   2.138  15.957