Prerequisites
Topics
Extras
2. Run R Script Locally
Throughout this course we will run and build upon an R script example in hands-on portions. This is meant to be a unifying thread through the course and also show you a typical project life cycle that we see on the Yen cluster.
Introducing R Example
In this course, we will work with an R code that runs a bootstrap analysis. The swiss
data set is resampled 50,000 times,
a linear model is fit on each sample and the results are used to plot a histogram of R squared values and to estimate
90% confidence intervals for R squared for the linear model fit.
If you followed the Setup Guide, you should have this
script in intro-to-yens
folder on your Desktop.
If you don’t have the script downloaded already, you can save the following code to a script on your local machine called swiss-parallel-bootstrap.R
.
# Run bootstrap computations on swiss data set
# Plot histogram of R^2 values and compute C.I. for R^2
# Modified: 2021-09-01
library(foreach)
library(doParallel)
library(datasets)
options(warn=-1)
# set the number of cores here
ncore <- detectCores()
print(paste('running on', ncore, 'cores'))
# register parallel backend to limit threads to the value specified in ncore variable
registerDoParallel(ncore)
# Swiss data: Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888.
# head(swiss)
# Fertility Agriculture Examination Education Catholic Infant.Mortality
#Courtelary 80.2 17.0 15 12 9.96 22.2
#Delemont 83.1 45.1 6 9 84.84 22.2
#Franches-Mnt 92.5 39.7 5 5 93.40 20.2
#Moutier 85.8 36.5 12 7 33.77 20.3
#Neuveville 76.9 43.5 17 15 5.16 20.6
#Porrentruy 76.1 35.3 9 7 90.57 26.6
# dim(swiss)
# [1] 47 6
# number of bootstrap computations
trials <- 50000
# time the for loop
system.time({
boot <- foreach(icount(trials), .combine=rbind) %dopar% {
# resample with replacement for one bootstrap computation
ind <- sample(x = 47, size = 10, replace = TRUE)
# build a linear model
fit <- lm(swiss[ind, "Fertility"] ~ data.matrix( swiss[ind, 2:6]))
summary(fit)$r.square
}
})
# Plot histogram of R^2 values from bootstrap
hist(boot[, 1], xlab="r squared", main="Histogram of r squared")
# Compute 90% Confidence Interval for R^2
print('90% C.I. for R^2:')
quantile(boot[, 1], c(0.05,0.95))
detectCores()
. When we transfer this script to the yens, it is a very bad idea to use that funciton. We will instead specify how many cores to use either in the R script or from the command line as a user specified argument.Run it on your local machine in RStudio
We need to have installed:
before we can install R libraries foreach
and doParallel
.
Open RStudio and install the two packages from the console panel with the
install.packages(c("foreach", "doParallel"))
command:
Once the packages are installed, open the script, swiss-parallel-bootstrap.R
in RStudio and run it.
Run it on your local machine without RStudio
To install R packages without using RStudio, open a terminal or a Git Bash window.
Mac OS X
Open the Terminal application (it’s in the Utilities folder of the Applications folder).
Windows
Windows does not come with a Terminal application but there are plenty of free and paid terminal emulation software. Personally, I like to use Git Bash but feel free to explore and find the terminal app that works best for you. Once installed, open a new Git Bash window.
Once you have a terminal open and have R installed, launch the R interactive console by typing R
.
You should see the following:
R version 4.1.2 (2021-11-01) -- "Bird Hippie"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin17.0 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
>
Next, you can install the R packages:
> install.packages(c('foreach', 'doParallel'))
Select a CRAN mirror when asked (anywhere in the US is fine) and let the installation complete.
You can always check that the packages load without errors:
> library(foreach)
> library(doParallel)
Loading required package: iterators
Loading required package: parallel
Also, check where on your local machine the R library is:
> .libPaths()
[1] "/Library/Frameworks/R.framework/Versions/4.1/Resources/library"
This is similar to how we install packages on the yens. Once both packages are installed successfully, let’s run the R code on the command line.
Quit R to get back to the terminal:
> q()
Save workspace image? [y/n/c]: n
Run R code:
$ Rscript swiss-parallel-bootstrap.R
You should see the output printed to the terminal:
Loading required package: iterators
Loading required package: parallel
[1] "running on 12 cores"
user system elapsed
96.996 1.573 13.175
[1] "90% C.I. for R^2:"
5% 95%
0.6581048 0.9892969
If you do not see a plot pop up from the hist()
call while the script is running, the plot is saved as a pdf file
(Rplots.pdf
) in the same folder where the R script is located. Find it and open it to see the histogram plot.
Connect with us