2. Run R Script Locally


Throughout this course we will run and build upon an R script example in hands-on portions. This is meant to be a unifying thread through the course and also show you a typical project life cycle that we see on the Yen cluster.

Introducing R Example

In this course, we will work with an R code that runs a bootstrap analysis. The swiss data set is resampled 50,000 times, a linear model is fit on each sample and the results are used to plot a histogram of R squared values and to estimate 90% confidence intervals for R squared for the linear model fit.

If you followed the Setup Guide, you should have this script in intro-to-yens folder on your Desktop.

If you don’t have the script downloaded already, you can save the following code to a script on your local machine called swiss-parallel-bootstrap.R.

# Run bootstrap computations on swiss data set
# Plot histogram of R^2 values and compute C.I. for R^2
# Modified: 2021-09-01
library(foreach)
library(doParallel)
library(datasets)

options(warn=-1)

# set the number of cores here
ncore <- detectCores()

print(paste('running on', ncore, 'cores'))

# register parallel backend to limit threads to the value specified in ncore variable
registerDoParallel(ncore)

# Swiss data: Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888.
# head(swiss)
#             Fertility Agriculture Examination Education Catholic Infant.Mortality
#Courtelary        80.2        17.0          15        12     9.96             22.2           
#Delemont          83.1        45.1           6         9    84.84             22.2
#Franches-Mnt      92.5        39.7           5         5    93.40             20.2
#Moutier           85.8        36.5          12         7    33.77             20.3
#Neuveville        76.9        43.5          17        15     5.16             20.6
#Porrentruy        76.1        35.3           9         7    90.57             26.6

# dim(swiss)
# [1] 47  6 
# number of bootstrap computations
trials <- 50000

# time the for loop
system.time({
    boot <- foreach(icount(trials), .combine=rbind) %dopar% {

    # resample with replacement for one bootstrap computation
    ind <- sample(x = 47, size = 10, replace = TRUE)

    # build a linear model
    fit <- lm(swiss[ind, "Fertility"] ~ data.matrix( swiss[ind, 2:6]))
    summary(fit)$r.square
  }
})

# Plot histogram of R^2 values from bootstrap 
hist(boot[, 1], xlab="r squared", main="Histogram of r squared")

# Compute 90% Confidence Interval for R^2
print('90% C.I. for R^2:')
quantile(boot[, 1], c(0.05,0.95))

Run it on your local machine in RStudio

We need to have installed:

before we can install R libraries foreach and doParallel.

Open RStudio and install the two packages from the console panel with the install.packages(c("foreach", "doParallel")) command:

Once the packages are installed, open the script, swiss-parallel-bootstrap.R in RStudio and run it.

Run it on your local machine without RStudio

To install R packages without using RStudio, open a terminal or a Git Bash window.

Mac OS X

Open the Terminal application (it’s in the Utilities folder of the Applications folder).

mac terminal

Windows

Windows does not come with a Terminal application but there are plenty of free and paid terminal emulation software. Personally, I like to use Git Bash but feel free to explore and find the terminal app that works best for you. Once installed, open a new Git Bash window.

Once you have a terminal open and have R installed, launch the R interactive console by typing R.

You should see the following:

R version 4.1.2 (2021-11-01) -- "Bird Hippie"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin17.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

>

Next, you can install the R packages:

> install.packages(c('foreach', 'doParallel'))

Select a CRAN mirror when asked (anywhere in the US is fine) and let the installation complete.

You can always check that the packages load without errors:

> library(foreach)
> library(doParallel)
Loading required package: iterators
Loading required package: parallel

Also, check where on your local machine the R library is:

> .libPaths()
[1] "/Library/Frameworks/R.framework/Versions/4.1/Resources/library"

This is similar to how we install packages on the yens. Once both packages are installed successfully, let’s run the R code on the command line.

Quit R to get back to the terminal:

> q()
Save workspace image? [y/n/c]: n

Run R code:

$ Rscript swiss-parallel-bootstrap.R

You should see the output printed to the terminal:

Loading required package: iterators
Loading required package: parallel
[1] "running on 12 cores"
   user  system elapsed
 96.996   1.573  13.175
[1] "90% C.I. for R^2:"
       5%       95%
0.6581048 0.9892969

If you do not see a plot pop up from the hist() call while the script is running, the plot is saved as a pdf file (Rplots.pdf) in the same folder where the R script is located. Find it and open it to see the histogram plot.