13. Monitoring Usage


Monitoring Your Resource Footprint

Certain parts of the GSB research computing infrastructure provide isolated cloud resources (like CloudForest where there is generally only one user per system), or are environments that are already managed by a scheduler (like Sherlock). In these cases it is not necesary for individuals to monitor resource usage themselves.

However, when working on systems like the yens where resources like CPU, RAM, and disk space are shared among many researchers, it is important that all users be mindful of how their work impacts the larger community.

CPU & RAM

Per our Community Guidelines, CPU usage should always be limited to 12 CPU cores/threads per user at any one time. Some software (R and Rstudio, for example) default to claiming all available cores unless told to do otherwise. These defaults should always be overridden when running R code on the yens. Similarly, when working with multiprocessing code in languages like Python, care must be taken to ensure your code does not grab everything it sees. Please refer to our parallel processing Topic Guides for information about how to limit resource consumption when using common packages.

One easy method of getting a quick snapshot of your CPU and memory usage is via the htop command line tool. Running htop shows usage graphs and a process list that is sortable by user, top CPU, top RAM, and other metrics. Please use this tool liberally to monitor your resource usage, especially if you are running multiprocessing code on shared systems for the first time.

The htop console looks like this:

htop output for well-behaved code

The userload command will list the total amount of resources all your tasks are consuming.

userload

Disk

Unlike personal home directories which have a 25 GB quota, faculty project directories on yens/IFS are currently uncapped. Disk storage is a finite resource, however, so to allow us to continue to provide uncapped project space please always be aware of your disk footprint. This includes compressing files when you are able, and removing intermediate and/or temp files whenever possible. See the yen file storage page for more information about file storage options.

Disk quotas on all yen servers can be reviewed by using the gsbquota command. It produces output like this:

jbponce@yen1:~$ gsbquota
/home/users/jbponce: currently using 53% (16G) of 30G available

Example

We are going to continue using the same R example and experiment running it on multiple cores and monitoring our resource consumption.

library(foreach)
library(doParallel)

# set the number of cores here
ncore <- 1

print(paste('running on', ncore, 'cores'))

# register parallel backend to limit threads to the value specified in ncore variable
registerDoParallel(ncore)

# get subset of Iris data set
x <- iris[which(iris[,5] != "setosa"), c(1,5)]

# number of bootstrap computations
trials <- 50000

# time the for loop
system.time({
  r <- foreach(icount(trials), .combine=rbind) %dopar% {

    # resample with replacement for one bootstrap computation
    ind <- sample(100, 100, replace=TRUE)

    # build a linear model
    result <- glm(x[ind, 2] ~ x[ind, 1], family = binomial(logit))
    coefficients(result)
  }
})

To monitor the resource usage while running a program, we will need a second terminal window that is connected to the same yen server.

Check what yen you are connected to in the first terminal:

hostname

Then ssh to the same yen in the second terminal. So if I am on yen4, I would open a new terminal window and ssh to the yen4 server so I can monitor my resources when I start running the R program on yen4.

ssh yen4.stanford.edu

Once you have two terminal windows connected to the same yen, run the iris-parallel-bootstrap.R program after loading the R module in one of the terminals:

ml R/4.0.0
Rscript iris-parallel-bootstrap.R

Once the program is running, monitor your usage with htop command:

htop -u <SUNetID>

where -u will filter the running processes for your user.

While the program is running you should see only one R process running because we specified one core in our R program.

Let’s modify the number of cores to 8:

library(foreach)
library(doParallel)

# set the number of cores here
ncore <- 8

print(paste('running on', ncore, 'cores'))

# register parallel backend to limit threads to the value specified in ncore variable
registerDoParallel(ncore)

# get subset of Iris data set
x <- iris[which(iris[,5] != "setosa"), c(1,5)]

# number of bootstrap computations
trials <- 50000

# time the for loop
system.time({
  r <- foreach(icount(trials), .combine=rbind) %dopar% {

    # resample with replacement for one bootstrap computation
    ind <- sample(100, 100, replace=TRUE)

    # build a linear model
    result <- glm(x[ind, 2] ~ x[ind, 1], family = binomial(logit))
    coefficients(result)
  }
})

Then rerun:

Rscript iris-parallel-bootstrap.R

While the program is running (the process will run faster since we are using 8 cores instead of 1), you should see 8 R processes running in the htop output because we specified 8 cores in our R program.

Last modification we are going to make is to pass the number of cores as a command line argument to our R script. Save the following to a new script called iris-par-command-line-args.R.

#!/usr/bin/env Rscript
############################################
# This script accepts a user specified argument to set the number of cores to run on
# Run from the command line:
#
#      Rscript iris-par-command-line-args.R 4
#
# this will execute on 4 cores
###########################################
# accept command line arguments and save them in a list called args
args = commandArgs(trailingOnly=TRUE)

library(foreach)
library(doParallel)

# set the number of cores here from the command line. Avoid using detectCores() function.
ncore <- as.integer(args[1])

print(paste('running on', ncore, 'cores'))

# register parallel backend to limit threads to the value specified in ncore variable
registerDoParallel(ncore)

# get subset of Iris data set
x <- iris[which(iris[,5] != "setosa"), c(1,5)]

# number of bootstrap computations
trials <- 50000

# time the for loop
system.time({
  r <- foreach(icount(trials), .combine=rbind) %dopar% {

    # resample with replacement for one bootstrap computation
    ind <- sample(100, 100, replace=TRUE)

    # build a linear model
    result <- glm(x[ind, 2] ~ x[ind, 1], family = binomial(logit))
    coefficients(result)
  }
})

Now, we can run this script with varying number of cores. We will still limit the number of cores to 12 per Community Guidelines.

For example, to run with 4 cores:

Rscript iris-par-command-line-args.R 4

Monitor your CPU usage while the program is running in the other terminal window with htop (try userload as well).