Migrating Processes From JupyterHub to Yen-Slurm

JupyterHub and the interactive Yens are a great resource for developing and debugging code, but is not intended to be final stop for your research computing needs. If your process requires more resources than the technical limits of JupyterHub and Yen1-5, migrating your process to the yen-slurm scheduler will allow you to access more resources.

Am I using too many resources?

Currently, JupyterHub’s resource restrictions match the restrictions on Yen1-5.

If you believe your workflow might exceed these limits, here are some recommendations:

  • Run your workflow on a fraction of the data, and keep an eye on the memory usage. You can expect your memory usage to scale with your data.
  • Run your workflow and keep an eye on the CPU usage. If your usage is going above the CPU limit, check out our topic guides for help on limiting cores.

This article gives some help on how to check your resources.

The topbyuser command will give you a good snapshot of your current usage.

bchivers@yen2:~$ topbyuser
          %MEM    %CPU
USER
suzienoh  11.4  1176.0
ajirana    9.9   200.0
jiarui98   0.0   100.0
codycook   0.0    23.8
rodonn     0.8    19.0
jjares     0.5     9.6
167        0.3     4.8
surya21    0.0     4.8
aoshotse   1.5     0.0
curiki     1.7     0.0
eturkel    0.2     0.0
mlinegar  11.7     0.0
pgarg1     5.0     0.0
wenjiaba   1.4     0.0

The htop and htop -u $USER commands allow you to watch your usage change as your workflow advances.

Migrating from JupyterHub to yen-slurm

The biggest hurdle in migrating your process from a JupyterHub notebook to yen-slurm will be managing package dependencies. Generally, for any process, the following steps will help make a smooth transition:

  • Create a virtual environment for your process to run
  • Install any packages needed in that environment
  • Setup that environment in JupyterHub
  • Test your process
  • Write a submit script to run your process on Yen10 using your working environment

Python Virtual Environments

For Python, you can use virtualenv or conda package managers to create an environment that can be shared across Yen1-5, yen-slurm, and JupyterHub. See this page for information on setting up a Python virtual environment

Virtualenv


Activate your environment and install ipykernel

source activate my_env
pip install ipykernel

The ipykernel package is required.

Setup that environment in JupyterHub

The following command should install the environment ` my_env` as a kernel in JupyterHub:

python -m ipykernel install --user --name=my_env

In JupyterHub, you should see the following:

Virtual Env in JupyterHub

Test your process

Try running your JupyterHub notebook using the venv kernel you just installed. You can change the kernel of an existing notebook by going to Kernel->Change Kernel…

Change Kernel

If it works on this kernel, your next step would be to migrate these commands to a .py script. You can test this by activating your venv environment on the Yens, and running your script via python my_script.

Write a submit script to run your process on yen-slurm using your working environment

The specifics of writing a submit script are outlined here. In addition, you’ll need to make sure your submit script is running the correct python environment. There are two ways to do that.

First, you can run

source my_env/bin/activate

before you run python. Afterwards, you can add the line

echo $(which python);

to print out which python your script using, to be sure it’s in my_env/bin/.

The second method is to explicitly call out the python instance you want to use. In your submit script, instead of using the command

python my_script.py

, you would use

my_env/bin/python3 my_script.py

to be sure the python instance in my_env/bin/ is being used.

Conda


For starters, if you do not have conda installed on the yens, see this page for more info.

Activate your environment and install ipykernel

conda activate my_conda_env
conda install ipykernel

The ipykernel package is required for JupyterHub

Setup that environment in JupyterHub

The following command should install the environment my_conda_env as a kernel in JupyterHub:

python -m ipykernel install --user --name=my_conda_env

In JupyterHub, you should see the following:

Conda in JupyterHub

Test your process

Try running your JupyterHub notebook using the conda kernel you just installed. You can change the kernel of an existing notebook by going to Kernel->Change Kernel…

Change Kernel

If it works on this kernel, your next step would be to migrate these commands to a .py script. You can test this by activating your conda environment on the Yens, and running your script via python my_script.

Write a submit script to run your process on yen-slurm using your working environment

The specifics of writing a submit script are outlined here. In addition, you’ll need to make sure your submit script is running the correct python environment. There are two ways to do that.

First, you can run

conda activate my_conda_env

before you run python. Afterwards, you can add the line

echo $(which python);

to print out which python your script using, to be sure it’s in something like path/to/conda/envs/my_conda_env/bin

The second method is to explicitly call out the python instance you want to use. In your submit script, instead of using the command

python my_script.py

you would use

path/to/conda/envs/my_conda_env/bin/python my_script.py

to be sure the python instance in the conda environment directory is being used.

Non-Python Code Migration

If there’s a package manager for the programming language you are using, try it out!

Otherwise, DARC recommends running your program from the interactive Yen command line before moving to yen-slurm. This is a good test of whether your process will succeed or not.