6. Yen File System


If you are new to using the Unix shell, please go over the Shell Introduction first.

If you’re already comfortable manipulating files and directories (using ls, pwd, cd, mv, rm commands), you probably want to explore the next lesson: Shell Extras, to learn about searching for files with grep and find, and writing simple shell loops and scripts.

Unix Shell on the Yens

When you login to the Yens via ssh, you can interact with the remote yen machines by typing commands in the shell that will pass them to the yen operating system (Ubuntu in this case) to execute.

See what shell you are running:

$ echo $SHELL

When I run the above command, I see:

/bin/bash

which means my shell is Bash but you might see another shell by default.

If your machine is set up to use something other than Bash, you can switch to using Bash by typing bash.

Home Directory

Every user on the Yens has a home directory. This is where you are when you login to the system. Check the absolute path with:

$ pwd

This will print your working directory (where <SUNetID> is your SUNet ID):

/home/users/<SUNetID>

To see this schematically, here is a visualization of the home directory on the file system:

The squares with ... in them indicate more directories that are not shown in the graph.

The path to your home directory is stored in $HOME environment variable. To see it, run:

$ echo $HOME

The echo command prints out the environment variable $HOME which stores the path to your home directory (where <SUNetID> is your SUNet ID):

/home/users/<SUNetID>

The home directory is not for storing large files or outputting large files while working on a project. It is a good place to store small files like scripts and text files. Your home directory storage space is capped at 50 G.

To see how much space you have used in your home directory, run:

$ gsbquota

You should see your home directory usage:

/home/users/<SUNetID>: currently using X% (XG) of 50G available

where X% and XG will be actual percent used and gigabytes used, respectively.

Project Directory

If you are a GSB researcher that is interested in starting a new project on the Yens, please complete and submit DARC’s new project request form. This form allows you to estimate disk usage, and specify any collaborators that should be added to the shared access list. ZFS project access is granted by workgroups.

The project directories live on ZFS and currently do not have space quotas. However, we ask that you be responsible and delete what you no longer need such as intermediate files, etc.

Schematically, we can visualize the path to the project directory as follows:

The absolute path to your project space is:

/zfs/projects/students/<your-project-dir>

where <your-project-dir> is the name of your project directory (created for you by the DARC team after the request for a new project space form is filled out). If you are a faculty, your new project will live in /zfs/projects/faculty directory.

Let’s say we want to navigate to our project directory. To “change directories”, use cd command:

$ cd /zfs/projects/faculty/<your-project-dir>

Every directory has a link to its parent directory. To navigate from a directory one level up, use:

$ cd ..

To go two levels up, use:

$ cd ../..

If you want to return to the directory where you previously were, use:

$ cd -

To come back to your home directory, type cd without any options:

$ cd

Another shortcut for home is ~. Again, you can navigate to home from anywhere on the file system with:

$ cd ~

Or you can use the environment variable $HOME to return to the home directory:

$ cd $HOME

Experiment with cd command by navigating to different places on the file system then returning home by using the above commands.

Downloading files from the web

Sometimes you want to download a file from the web. Instead of locally downloading then transferring the file to the Yens, we can download the file directly to the Yens. First, navigate where you want the file to be downloaded with the cd command. Then use wget to download the file. Usually you have to copy the link to the file/folder on the web that you are trying to download.

For example, let’s say we want to install Rmpi R package from source. We would navigate to the CRAN website and copy the web link to the package source file. Then, run

$ wget https://cran.r-project.org/src/contrib/Rmpi_0.6-9.2.tar.gz

which will download the package source tarball to your Yen directory. If everything worked, you should see:

--2022-01-19 08:25:23--  https://cran.r-project.org/src/contrib/Rmpi_0.6-9.2.tar.gz
Resolving cran.r-project.org (cran.r-project.org)... 137.208.57.37
Connecting to cran.r-project.org (cran.r-project.org)|137.208.57.37|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 106030 (104K) [application/x-gzip]
Saving to: ‘Rmpi_0.6-9.2.tar.gz’

Rmpi_0.6-9.2.tar.gz             100%[====================================================>] 103.54K   197KB/s    in 0.5s

2022-01-19 08:25:26 (197 KB/s) - ‘Rmpi_0.6-9.2.tar.gz’ saved [106030/106030]

When you list the files in the current directory you will see Rmpi_0.6-9.2.tar.gz present.

Compressing and uncompressing files / directories

When you download the files or transfer them, the files could be compressed with zip and will have .zip extension. Transferring compressed files is recommended for faster transfer speeds.

For example, if you downloaded or transferred a zipped folder named test.zip, run:

$ unzip test.zip

to unzip it.

tar is a Unix utility for collecting files together into one archive file (commonly called a tarball). The name tar comes from “tape archive” and was used to archive a series of file objects together (as one collection or archive).

To untar the tarball (use options x to extract, v for verbose and f to give the name of the tarball you want to untar), run:

$ tar -xvf my_tar.tar

Commonly, the tarball may also be compressed with gunzip (having an extension .tar.gz or .tgz) or bzip2 (with .bz2 extension). Then to untar, add the unzip option as well (-z). Let’s practice with the downloaded Rmpi tarball:

$ tar -zxvf Rmpi_0.6-9.2.tar.gz

Uncompressing the tarball creates a new directory Rmpi with the source code for this R package.

Finally, let’s clean up the home directory by removing all the files and directories we made in this tutorial. Using rm -f option will not prompt for confirmation and will force delete all of the files, so use it with caution. This is a good thing to do when you are working on real projects too. Keep your working space tidy!

$ rm *.gz
$ rm -rf Rmpi 

File Storage

You have several options for where to store your research files (data sets, programs, output files, and so forth).

ZFS Directories

The GSB now has about 612 TB of high-performance storage available from the yen servers under the path /zfs. Currently all yen user directories (~/) reside on ZFS. In addition to user home directories Stanford GSB faculty (and in some cases students, see below), may request additional project space on ZFS.

GSB Faculty

All GSB faculty may request project space on ZFS, and DARC will setup a corresponding Stanford workgroup that you may use to add and remove collaborators for your project. Currently there are no quotas for faculty ZFS directories. However, since it is costly for the GSB to expand capacity there, this policy might change in the future if users consume that space too quickly. Kindly do your part and be a good steward of the commons:

  • zip up files when you can
  • remove intermediate files that you no longer need to use regularly

GSB Students

While ZFS is primarily a resource for Stanford faculty, under certain conditions Stanford graduate students may be granted workspace in ZFS. If you feel you are in need of project space on ZFS please contact us at gsb_darcresearch@stanford.edu.

Backups

Files on ZFS are backed up as “snapshots” and can be restored manually by any user. Please see the page How Do I Recover ZFS Files for instructions on recovering files. There is currently an off-site disaster recovery solution implemented as well for both ZFS and home directories.

Yen Home Directories

Home directories on the Yen servers are also stored on our ZFS storage system. Your home directory has a quota of 50 GB.

Local Disk

On each Yen machine, there is a local scratch space mounted at /scratch. All yen users are free to make use of this space. Much like a hard drive on your laptop, this can be accessed only from that single Yen machine.

AFS Volumes

When you log into the Yen servers you will automatically land in your home directory, which is located at /home/users/{SUNet-id} or with the shortcut ~. You are able to access your former AFS home directory by following the afs-home symlink inside your home directory.

You have a personal AFS volume that is named according to your SUNetID. For example if your SUNetID is johndoe13, then the path to your AFS directory is: /afs/ir/users/j/o/johndoe13. The two individual letters are the first two letters of the SUNetID.

You may have access to other AFS volumes set up for specific projects, or other people may give you access to a specific directory in their AFS volume. To access other AFS volumes, you need to know what the path to them is. For example, the path might be something like /afs/ir/data/gsb/nameofyourdirectory.

How to access an AFS volume

There are two options for transferring files to and from AFS:

  1. From your desktop using OpenAFS, a free download available from Stanford. This software will mount your AFS directory so that you can access it using an Explorer (Windows) or Finder (Mac) window as you do with other files.

  2. Through a web interface: https://afs.stanford.edu/. When you go to this url, it will take you to your home directory. To go to a different directory, click the Change button at the top of the page under Current AFS Directory Path.

How to create an AFS volume

If you are working with a faculty member on a project that uses AFS, chances are that person already has an AFS directory created for that project. Just ask the faculty member what the path to the directory is, and to grant you permissions to use it.

Size Limitations

As of this writing, AFS volumes at Stanford can be as large as 256 GB. However, it is possible to chain multiple volumes together in one Linux directory using symbolic links.

Backups

All AFS directories are backed up nightly. Any file or directory that existed for at least 24 hours before it was deleted, can be restored by submitting a HelpSU request.