Intermediate Yens
5. Submit Your First Job to Run on Yen-Slurm
We are going to copy scripts directory that includes example python scripts and submission scripts.
Make a directory inside /zfs/gsb/intermediate-yens/
with your user name.
Then copy the scripts
folder that contains scripts for this class from scratch
to your working directory so you can modify and run them.
# copy scripts to your working dir
$ cd /zfs/gsb/intermediate-yens/
$ mkdir $USER
$ cd $USER
$ cp -r /scratch/darc/intermediate-yens/scripts .
$ cd scripts
Simple Example
For our first scheduled job, we will run a simple python script.
Consider the following example, yahtzee.py
, which rolls 5 dice until they’re all the same.
import random
import sys
def dice(n = 1):
vals = [1,2,3,4,5,6]
return([random.choice(vals) for i in range(n)])
def rollsUntilYahtzee(d = 5):
yahtzee = False
rolls = 0
while not yahtzee:
roll = dice(d)
rolls+=1
yahtzee = len(set(roll))==1
return(rolls)
def main(attempts):
for a in range(attempts):
print(rollsUntilYahtzee())
if __name__ == "__main__":
main(attempts=int(sys.argv[1]))
We want to sample from the distribution 100 times to see how many rolls it takes to get a Yahtzee. We can call the function like this:
$ python3 yahtzee.py 100 > result.csv
Submit Serial Script to the Scheduler
We’ll prepare a submit script called yahtzee.slurm
and submit it to the scheduler. Edit the slurm script to include
your email address.
#!/bin/bash
# Example of running python script in a batch mode
#SBATCH -J yahtzee
#SBATCH -p normal,dev
#SBATCH -c 1 # CPU cores (up to 256 on normal partition)
#SBATCH -t 10:00
#SBATCH -o rollcount-%j.csv
#SBATCH --mail-type=ALL
#SBATCH --mail-user=your_email@stanford.edu
# Run python script
python3 yahtzee.py 10000
Then submit the script:
$ sbatch yahtzee.slurm
You should see a similar output:
Submitted batch job 44097
Monitor your job:
$ squeue
The script should take less than a minute to complete. Look at the slurm emails after the job is finished.
Make sure the resulting csv file contains 10,000 lines as expected.
$ wc -l rollcount-*.csv
Multiprocessing Script
We can modify our script to use multiprocessing
python package. Consider a slightly modified program, yahtzee_multi.py
,
which takes an argument for the number of Yahtzees to throw, as well as the number of cores to use.
import random
import sys
import multiprocessing
def dice(n = 1):
vals = [1,2,3,4,5,6]
return([random.choice(vals) for i in range(n)])
def rollsUntilYahtzee(_):
d = 5
yahtzee = False
rolls = 0
while not yahtzee:
roll = dice(d)
rolls+=1
yahtzee = len(set(roll))==1
print(rolls)
def main(attempts,cores):
pool = multiprocessing.Pool(cores)
results = pool.map(rollsUntilYahtzee, range(attempts))
pool.close()
pool.join()
if __name__ == "__main__":
main(attempts=int(sys.argv[1]), cores=int(sys.argv[2]))
We’ll then make our submit script - noting that we will request cpus-per-task=10
(you can also use a shorthand -c 10
) to request 10 cores to run in parallel.
Let’s save it as yahtzee_multi.slurm
.
#!/bin/bash
# Example of running multiprocessing python script in a batch mode
#SBATCH -J yahtzeemp
#SBATCH -p normal
#SBATCH -c 10 # CPU cores (up to 256 on normal partition)
#SBATCH -t 10:00
#SBATCH -o rollcount_mp-%j.csv
#SBATCH --mail-type=ALL
#SBATCH --mail-user=your_email@stanford.edu
# Run python script
python3 yahtzee_multi.py 10000 10
To submit this script, we run:
$ sbatch yahtzee_multi.slurm
Monitor the queue:
$ squeue
The job should take less than 10 seconds now that we are using 10 cores.
Make sure the resulting csv file contains 10,000 lines as expected.
$ wc -l rollcount_mp-*.csv
Connect with us