8. How to Transfer Files to/from the Yen Servers


In your data processing pipeline, you will need to transfer data to the Yen servers to carry out your analysis and then transfer the resulting files back to your local machine.

Using scp

Transfer files to the Yens

To transfer a file or a few files matching a search criteria, use:

scp mydata.txt <SUNetID>@yen.stanford.edu:/ifs/project/<my_project_space>

to transfer the file mydata.txt to your project space on the Yens. You will be asked for your password and Duo authentication every time you use scp (because scp command uses ssh to transfer files).

If you want to transfer all csv files from a particular directory, use:

scp *.csv <SUNetID>@yen.stanford.edu:/ifs/project/<my_project_space>

Example

Let’s transfer our R script example from the local machine to the yen servers. On your local machine, in a terminal or GitBash window, run:

scp iris-parallel-bootstrap.R <SUNetID>@yen.stanford.edu:~

where ~ is your Yen home directory shortcut. Enter your SUNet ID password and Duo authenticate for the file transfer to complete.

Transfer folders to the Yens

On your local machine, open a new terminal or GitBash window and navigate to the parent directory of the folder that you want to transfer to the Yens with the cd command.

For example, if you want to copy a folder from your Dropbox, use (on your local machine):

cd ~/Dropbox

On Windows, another way is to navigate to the parent directory using File Browser then right click and choose “Open GitBash here” to open a new GitBash window in the directory that you navigated to via the file browser.

Once you are in the parent directory above the one you want to copy, then run the following to copy the folder to the Yens:

scp -r my_folder/ <SUNetID>@yen.stanford.edu:/ifs/project/<my_project_space>

where -r flag is used to copy folders (recursively copy files), <SUNetID> is your SUNet ID and <my_project_space> is the path to your project directory on IFS or wherever you want to transfer the files to.

Let’s run a short example. We will create an empty folder called test_from_local that we will then transfer to the home directory on the Yens.

cd

mkdir test_from_local

scp -r test_from_local/ <SUNetID>@yen.stanford.edu:~

Transfer files from the Yens

Similarly, you can copy files back from Yen to your local machine. Open a new terminal and do not connect to the Yens. Navigate where you want to copy the files to with the cd command. Then, run

scp -r <SUNetID>@yen.stanford.edu:/ifs/project/<my_project_space>/results .

where we are copying the results folder on the Yen’s IFS file system to wherever you are locally ( . means copy here). If you are copying files and not direcotries, omit -r flag and for multiple files transfer, use the wild card * to match several files.

Using rsync

Alternatively, we can use rsync to transfer files to/from the Yens.

Transfer files to the Yens

To transfer a file (for example, myfile.csv from my local machine), use:

rsync -aP myfile.csv <SUNetID>@yen.stanford.edu:/ifs/project/<my_project_space>

You will be asked to enter your password and complete the two-step authentication process after this.

To transfer folders to the Yens

We can also add a recursive flag (-r) to rsync to transfer a folder to the yens:

rsync -aPr myfolder/ <SUNetID>@yen.stanford.edu:/ifs/project/<my_project_space>/myfolder

Transfer files from the Yens

To transfer a file (for example, myfile.csv from the yens to your local machine), use:

rsync -aP <SUNetID>@yen.stanford.edu:/ifs/project/<my_project_space>/myfile.csv .

To transfer folders from the Yens

We can also add a recursive flag (-r) to rsync to transfer a folder (myfolder) from the yens:

rsync -aPr <SUNetID>@yen.stanford.edu:/ifs/project/<my_project_space>/myfolder/ myfolder/

Using rclone to Google Drive

Another useful tool for data transfer is rclone which often outperforms rsync, but unlike Globus, it will allow us to transfer files to a myriad of locations, not just locations with endpoints such as Google Drive or Dropbox or the Cloud.

Using rclone locally

On Windows: download from here.

On Mac: install rclone with:

curl https://rclone.org/install.sh | sudo bash

Using rclone on the yens

We can utilize rclone to directly push files or directories from the yens into Google Drive (and other remote locations such as Amazon S3, Dropbox, etc).

rclone is available on the Yens so we can push data or sync data folders from the Yens to different remote locations such as Google Drive.

module load rclone

Setting up rclone

Before we can push data from yens to Google Drive, we need to configure rclone once.

rclone config

The configuration menu will be presented:

2021/02/10 08:36:44 NOTICE: Config file "/home/users/nrapstin/.config/rclone/rclone.conf" not found - using defaults
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> nrapstinGoogleDrive

Select n to make a new remote and give it a name when prompted. For example, $USERGoogleDrive where $USER is your SUNet ID.

Next, select the number corresponding to Google Drive (the menu changes with rclone version so be careful to select the right remote).

15 / Google Drive
   \ "drive"

When prompted for the next two options, leave them blank and press Enter.

Then the next menu asks to select permissions you want to give rclone. Choose 1 for full read-write access. Then leave the next two prompts blank and press Enter.

Choose n to not edit advanced config:

Edit advanced config? (y/n)
y) Yes
n) No (default)
y/n> n

Choose n again since we are working on the remote Yen server:

Remote config
Use auto config?
 * Say Y if not sure
 * Say N if you are working on a remote or headless machine
y) Yes (default)
n) No
y/n> n

rclone gives you a unique URL that starts with Please go to the following link: https://accounts.google.com/.

Copy and paste the full link into your browser. Log in with your Stanford account.

Once you authorize rclone for access, Google Drive will give a code to paste back into the terminal window. Use the copy button and paste the code into the terminal:

Log in and authorize rclone for access
Enter verification code>

Next, you will be asked if you want to configure this as a team drive. Press y if you are connecting to a shared Google Drive or press n if you are connecting to your Google Drive.

Configure this as a team drive?
y) Yes
n) No (default)

Finally, press Enter to complete the config.

y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d>

In the last prompt, hit q to quit. Now, rclone should be setup to push files from the yens to your Google Drive.

Using rclone

Here is a list of common rclone commands you can use to push files directly from the yens to your Google Drive.

To list remote connections where you can push:

rclone listremotes

Create a remote folder on Google Drive. Note this will make the folder within your Google Drive base folder.

rclone mkdir $USERGoogleDrive:GoogleDriveFolderName

Alternatively, you can specify the path to the new direcotry on Google Drive:

rclone mkdir $USERGoogleDrive:myFolder/subfolder/data

where I know I already had myFolder directory on my Google drive and within myFolder I have already created subfolder. This rclone command will make a new subdirectory data.

List contents of remote folder on Google Drive

rclone ls $USERGoogleDrive:GoogleDriveFolderName

To upload directory to Google Drive using copy:

rclone copy /Path/To/Folder/ $USERGoogleDrive:GoogleDriveFolderName/

where /Path/To/Folder/ is the path on the yens to the directory you want to upload.

Download from remote Google Drive to the yens:

rclone copy $USERGoogleDrive:GoogleDriveFolderName /Path/To/Local/Download/Folder

where /Path/To/Local/Download/Folder is the path on the yens (or local machine) where you want to copy files to.

See more details on official rclone documentation.