Storage Solutions
GSB researchers work with datasets that often exceed the capacity of personal machines. To support this work, Stanford provides several storage platforms optimized for performance and collaboration. This page summarizes the storage options available and when to use each one.
Check Data License and Platform
Before transferring data to any platform, make sure the data is licensed for your use and that the storage platform meets the security requirements for that data.
New Storage System: VAST
High-Performance Storage for the Yen Cluster
The Yen cluster now runs on a new 1 PB all-flash VAST Data storage system, replacing the legacy ZFS backend. This upgrade significantly improves performance, reliability, and scalability for data-intensive research.
Designed for data-intensive research, VAST provides:
-
All-flash performance for fast reads/writes
-
Scalability for multi-TB and multi-user workloads
-
User-accessible snapshots for self-service file recovery
Not for High Risk Data
The Yen servers are not approved for high risk data. VAST is mounted only from the Yen servers. You cannot access it from Sherlock, FarmShare or any other system.
All home directories and project spaces on the Yens now live on VAST.
Use VAST when:
-
You are running analyses or workflows on the Yen cluster
-
You need fast access to large datasets
-
You want a shared project directory for collaboration
ZFS Paths
Although VAST now provides the underlying storage, the existing /zfs/... paths remain in place. You do not need to change your workflows, scripts, or file paths — everything continues to function exactly as before.
Home Directory
Every Yen user receives a personal home directory:
/home/users/<SUNetID>
Your home directory is your private working space on the Yens. It’s best used for small scripts and utilities.
Do NOT Store Large Files in Home
Home directories have a strict 50 GB limit. They are not intended for large datasets, large outputs, or collaborative projects. If your home directory is full, you won't be able to access JupyterHub.
Project Directory
Requesting New Project Space
To create a new project directory, submit the Project Space Request Form. This form allows you to estimate disk usage, and specify any collaborators that should be added to the shared access list. Access to the directory is controlled through Stanford workgroups.
Project directories on the Yens provide shared, scalable storage for research conducted on the Yens. Every faculty project space is created by the DARC team in collaboration with the system administrators and mounted at a path:
/zfs/projects/faculty/<your-project-dir>
/zfs/projects/students/<your-project-dir>
Project directories are ideal for:
-
Large datasets
-
Analysis outputs and intermediate files
-
Collaborative research with multiple users
-
Long-running or compute-intensive workflows
Default Quotas
- Faculty led projects: 1 TB
- Student led projects: 200 GB
Additional space can be requested if needed.
Help Keep Shared Storage Healthy
Please archive inactive data and delete unused intermediate files. Yen storage is a shared resource for the entire research community.
If you would like to discuss specific storage solutions for your project, please email the DARC team to discuss further.
Temporary Storage
Some workflows require fast, short-term storage for intermediate results or temporary files. The Yen cluster provides two such locations: node-local temporary storage and shared scratch space. These areas are not intended for permanent data and will be purged intermittently.
/tmp - Node-Local Storage
Each Yen compute node has a local disk available at /tmp path, which is over 1 TB in size.
Use /tmp when you need:
-
Very fast read/write performance
-
Temporary files used only by a job running on a single node
Characteristics:
-
Accessible only from the node where your job is running
-
Cleared when the node reboots
-
Not backed up
Avoid Filling /tmp
Jobs running on the same node may fail if /tmp fills up.
Always clean up temporary files after your job finishes.
/scratch/shared — Cluster-Wide Scratch
There is a large scratch space, accessible from any Yen server, at /scratch/shared, which is 100 TB in size. Reads and writes to this space are slower than to local to the node /tmp storage.
Not for Long-Term Storage
/scratch/shared is not backed up and may be periodically cleared by administrators. Move any important results to your project directory as soon as possible.
How to Check Home and Project Space Quota
Temporarily Unavailable
Currently the gsbquota command will not display usage properly due to new VAST storage system upgrade. Our system administrators are working on getting this command to work. Thanks for your patience.
To determine how much of your quota you have used in your home direcotry /home/users/<SUNet ID>/, you can simply type:
gsbquota
The resulting output will display the actual percentage and gigabytes used, as shwon below:
/home/users/<SUNet ID>: currently using 50% (25G) of 50G available
You can also check size of your project space by passing in a full path to your project space to gsbquota command:
gsbquota /zfs/projects/students/<my-project-dir>/
/zfs/projects/students/<my-project-dir>/: currently using 39% (78G) of 200G available
Email Us If You Encounter Issues
Please report any issues with gsbquota to the DARC team.
How to Recover Deleted Files
Files on the Yens are backed up in snapshots, so if you need to recover something you accidentally deleted, luckily you can still access it!
Here is an example of something that might happen:
- You are working in the directory
/zfs/projects/faculty/hello-world/and have a couple of files namedresults.csvandresults_temp.csvthat you made yesterday. The latter file was clearly temporary and so you want to remove it to clean things up:
rm /zfs/projects/faculty/hello-world/results.csv
- Oops, you accidentally deleted the file you wanted to keep! Only
results_temp.csvremains. Luckily, since you made the files yesterday, there are likely snapshots available.
Note that snapshots are retained according to specific intervals. The current snapshot retainment policy is as follows:
- Hourly --- retain 1 day of hourly snapshots
- Daily --- retain 1 week of daily snapshots
- Weekly --- retain 2 months of weekly snapshots
- Monthly --- retain 1 year of monthly snapshots
You can find them at the top level of any project folder, so for our example:
/zfs/projects/faculty/hello-world/.snapshot
Snapshots Are Still Being Populated
The snapshots are still being populated in the new file system. Eventually we will have a year of snapshots.
We recommend not relying on snapshots since they may not always be available. If you often need snapshots, it may mean you don’t yet have a good backup/versioning workflow in place. Think of the snapshots as "oh thank goodness, I didn't mean to delete that".
Other Storage Options
In addition to VAST, several other storage platforms support research workflows at Stanford GSB. The best option depends on your dataset size, security requirements, compute needs, and level of collaboration.
Redivis
Redivis allows users to deploy datasets in a web-based environment (GCP backend) and provides a powerful query GUI for users who don't have a strong background in SQL.
Google Drive
Available to all users at Stanford. Google Drive is approved for low, medium and high risk data. It supports up to 400,000 files and has a daily upload limit of 750 GB, making it ideal for storing audio, video, PDFs, images, and flat files. Google Drive is great for sharing with external collaborators and is also suitable for archiving research data.
Oak
Oak is a High-Performance Computing (HPC) storage system available to research groups and projects at Stanford for research data. The monthly cost is approximately $45 per 10 TB. Oak does not provide local or remote data backup by default, and should be considered as a single copy. However, backups are available for an additional fee. Oak is the preferred storage location for Sherlock, but can be mounted on the Yen cluster by request using an NFS gateway.
Cloud Platforms
Storing data in the cloud is an effective way to inexpensively archive data, serve files publicly, and integrate with cloud-native query and compute tools. With the growing number of cloud storage options and security risks, we advise caution when choosing to store your data on any cloud platform. If you are considering cloud solutions for storage, please contact DARC to discuss your needs.
Legacy Stanford Platforms
The following storage platforms are currently supported but users are discouraged from relying on them for continuing research:
- AFS: Andrew File System is a distributed, networked file system that allows users to access and share files. UIT no longer automatically provisions new faculty and staff members with AFS user volumes as the service is being sunsetted by the university.
- Box: Stanford University Box provided basic document management and collaboration through Box.com. As of February 28, 2023, university IT retired the Box service and has taken steps to migrate Box content to a new folder on Google Drive or Microsoft OneDrive.
AFS Volumes
You may have a personal AFS volume that is named according to your SUNet ID. For example If your SUNet ID is johndoe13, then the path to your AFS directory is: /afs/ir/users/j/o/johndoe13. The two individual letters are the first two letters of the SUNet ID.
You may have access to other AFS volumes set up for specific projects, or other people may give you access to a specific directory in their AFS volume. To access other AFS volumes, you need to know what the path is. For example, the path might be something like /afs/ir/data/gsb/nameofyourdirectory.
How to access an AFS volume
You can transfer files to and from AFS using OpenAFS using your desktop, a free download available from Stanford. This software will mount your AFS directory so that you can access it using an Explorer (Windows) or Finder (Mac) window as you do with other files.
AFS is NOT Mounted on the Yen Servers
AFS is no longer mounted. If you still wish to access your AFS space (afs-home), you can SSH into SRC's rice nodes. These nodes are part of the University's FarmShare system and you can access them with ssh <SUnetID>@rice.stanford.edu.
WebAFS has been retired and is no longer available to use. For its alternatives, visit this page.