12. Monitoring Usage


Monitoring Your Resource Footprint

Certain parts of the GSB research computing infrastructure provide isolated cloud resources (like CloudForest where there is generally only one user per system), or are environments that are already managed by a scheduler (like Sherlock). In these cases it is not necesary for individuals to monitor resource usage themselves.

However, when working on systems like the yens where resources like CPU, RAM, and disk space are shared among many researchers, it is important that all users be mindful of how their work impacts the larger community.

CPU & RAM

Per our Community Guidelines, CPU usage should always be limited to 12 CPU cores/threads per user at any one time on yen1-4 and to 48 CPU cores on yen5. Some software (R and RStudio, for example) default to claiming all available cores unless told to do otherwise. These defaults should always be overwritten when running R code on the yens. Similarly, when working with multiprocessing code in languages like Python, care must be taken to ensure your code does not grab everything it sees. Please refer to our parallel processing Topic Guides for information about how to limit resource consumption when using common packages.

One easy method of getting a quick snapshot of your CPU and memory usage is via the htop command line tool. Running htop shows usage graphs and a process list that is sortable by user, top CPU, top RAM, and other metrics. Please use this tool liberally to monitor your resource usage, especially if you are running multiprocessing code on shared systems for the first time.

The htop console looks like this:

htop output for well-behaved code

The userload command will list the total amount of resources all your tasks are consuming.

$ userload

Disk

Unlike personal home directories which have a 50 GB quota, faculty project directories on yens/ZFS are currently uncapped. Disk storage is a finite resource, however, so to allow us to continue to provide uncapped project space please always be aware of your disk footprint. This includes compressing files when you are able, and removing intermediate and/or temp files whenever possible. See the yen file storage page for more information about file storage options.

Disk quotas on all yen servers can be reviewed by using the gsbquota command. It produces output like this:

nrapstin@yen1:~$ gsbquota
/home/users/nrapstin: currently using 39% (20G) of 50G available

You can also check size of your project space by passing in a full path to your project space to gsbquota command:

nrapstin@yen1:~$ gsbquota /zfs/projects/students/<my-project-dir>/
/zfs/projects/students/<my-project-dir>/: currently using 39% (78G) of 200G available

Example

We are going to continue using the same R example and experiment running it on multiple cores and monitoring our resource consumption.

# Run bootstrap computations on swiss data set
# Plot histogram of R^2 values and compute C.I. for R^2
# Modified: 2021-09-01
library(foreach)
library(doParallel)
library(datasets)

options(warn=-1)

# set the number of cores here
ncore <- 1

print(paste('running on', ncore, 'cores'))

# register parallel backend to limit threads to the value specified in ncore variable
registerDoParallel(ncore)

# Swiss data: Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888.
# head(swiss)
#             Fertility Agriculture Examination Education Catholic Infant.Mortality
#Courtelary        80.2        17.0          15        12     9.96             22.2
#Delemont          83.1        45.1           6         9    84.84             22.2
#Franches-Mnt      92.5        39.7           5         5    93.40             20.2
#Moutier           85.8        36.5          12         7    33.77             20.3
#Neuveville        76.9        43.5          17        15     5.16             20.6
#Porrentruy        76.1        35.3           9         7    90.57             26.6

# dim(swiss)
# [1] 47  6
# number of bootstrap computations
trials <- 50000

# time the for loop
system.time({
    boot <- foreach(icount(trials), .combine=rbind) %dopar% {

    # resample with replacement for one bootstrap computation
    ind <- sample(x = 47, size = 10, replace = TRUE)

    # build a linear model
    fit <- lm(swiss[ind, "Fertility"] ~ data.matrix( swiss[ind, 2:6]))
    summary(fit)$r.square
  }
})

# Plot histogram of R^2 values from bootstrap
hist(boot[, 1], xlab="r squared", main="Histogram of r squared")

# Compute 90% Confidence Interval for R^2
print('90% C.I. for R^2:')
quantile(boot[, 1], c(0.05,0.95))

To monitor the resource usage while running a program, we will need a second terminal window that is connected to the same yen server.

Check what yen you are connected to in the first terminal:

$ hostname

Then ssh to the same yen in the second terminal window. So if I am on yen1, I would open a new terminal window and ssh to the yen1 server so I can monitor my resources when I start running the R program on yen1.

$ ssh yen1.stanford.edu

Once you have two terminal windows connected to the same yen, run the swiss-parallel-bootstrap.R program after loading the R module in one of the terminals:

$ ml R/4.2.1
$ Rscript swiss-parallel-bootstrap.R

Once the program is running, monitor your usage with htop command in the second window:

$ htop -u <SUNetID>

where -u will filter the running processes for your user.

While the program is running you should see only one R process running because we specified 1 core in our R program.

Let’s modify the number of cores to 8:

# Run bootstrap computations on swiss data set
# Plot histogram of R^2 values and compute C.I. for R^2
# Modified: 2021-09-01
library(foreach)
library(doParallel)
library(datasets)

options(warn=-1)

# set the number of cores here
ncore <- 8

print(paste('running on', ncore, 'cores'))

# register parallel backend to limit threads to the value specified in ncore variable
registerDoParallel(ncore)

# Swiss data: Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888.
# head(swiss)
#             Fertility Agriculture Examination Education Catholic Infant.Mortality
#Courtelary        80.2        17.0          15        12     9.96             22.2
#Delemont          83.1        45.1           6         9    84.84             22.2
#Franches-Mnt      92.5        39.7           5         5    93.40             20.2
#Moutier           85.8        36.5          12         7    33.77             20.3
#Neuveville        76.9        43.5          17        15     5.16             20.6
#Porrentruy        76.1        35.3           9         7    90.57             26.6

# dim(swiss)
# [1] 47  6
# number of bootstrap computations
trials <- 50000

# time the for loop
system.time({
    boot <- foreach(icount(trials), .combine=rbind) %dopar% {

    # resample with replacement for one bootstrap computation
    ind <- sample(x = 47, size = 10, replace = TRUE)

    # build a linear model
    fit <- lm(swiss[ind, "Fertility"] ~ data.matrix( swiss[ind, 2:6]))
    summary(fit)$r.square
  }
})

# Plot histogram of R^2 values from bootstrap
hist(boot[, 1], xlab="r squared", main="Histogram of r squared")

# Compute 90% Confidence Interval for R^2
print('90% C.I. for R^2:')
quantile(boot[, 1], c(0.05,0.95))

Then rerun:

$ Rscript swiss-parallel-bootstrap.R

You should see:

Loading required package: iterators
Loading required package: parallel
[1] "running on 8 cores"
   user  system elapsed
 50.551   0.517  10.142
[1] "90% C.I. for R^2:"
       5%       95%
0.6593025 0.9892563

While the program is running (the process will run faster since we are using 8 cores instead of 1), you should see 8 R processes running in the htop output because we specified 8 cores in our R program.

Last modification we are going to make is to pass the number of cores as a command line argument to our R script. Save the following to a new script called swiss-par-command-line-args.R.

#!/usr/bin/env Rscript
############################################
# This script accepts a user specified argument to set the number of cores to run on
# Run from the command line:
#
#      Rscript swiss-par-command-line-args.R 4
#
# this will execute on 4 cores
###########################################
# accept command line arguments and save them in a list called args
args = commandArgs(trailingOnly=TRUE)
library(foreach)
library(doParallel)
library(datasets)

options(warn=-1)

# set the number of cores here from the command line. Avoid using detectCores() function.
ncore <- as.integer(args[1])

print(paste('running on', ncore, 'cores'))

# register parallel backend to limit threads to the value specified in ncore variable
registerDoParallel(ncore)

# Swiss data: Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888.
# head(swiss)
#             Fertility Agriculture Examination Education Catholic Infant.Mortality
#Courtelary        80.2        17.0          15        12     9.96             22.2
#Delemont          83.1        45.1           6         9    84.84             22.2
#Franches-Mnt      92.5        39.7           5         5    93.40             20.2
#Moutier           85.8        36.5          12         7    33.77             20.3
#Neuveville        76.9        43.5          17        15     5.16             20.6
#Porrentruy        76.1        35.3           9         7    90.57             26.6

# dim(swiss)
# [1] 47  6
# number of bootstrap computations
trials <- 50000

# time the for loop
system.time({
    boot <- foreach(icount(trials), .combine=rbind) %dopar% {

    # resample with replacement for one bootstrap computation
    ind <- sample(x = 47, size = 10, replace = TRUE)

    # build a linear model
    fit <- lm(swiss[ind, "Fertility"] ~ data.matrix( swiss[ind, 2:6]))
    summary(fit)$r.square
  }
})

# Plot histogram of R^2 values from bootstrap
hist(boot[, 1], xlab="r squared", main="Histogram of r squared")

# Compute 90% Confidence Interval for R^2
print('90% C.I. for R^2:')
quantile(boot[, 1], c(0.05,0.95))

Now, we can run this script with varying number of cores. We will still limit the number of cores to 12 on yen1-4 and to 48 cores on yen5 per Community Guidelines.

For example, to run with 4 cores:

$ Rscript swiss-par-command-line-args.R 4

You should see:

Loading required package: iterators
Loading required package: parallel
[1] "running on 4 cores"
   user  system elapsed
 49.547   0.375  16.040
[1] "90% C.I. for R^2:"
       5%       95%
0.6574049 0.9891781

Monitor your CPU usage while the program is running in the other terminal window with htop (try userload as well).