4. Scheduler

What is a scheduler?

Unlike the interactive yens (yen1, yen2, yen3, yen4 or yen5), you do not directly login to the yen-slurm cluster. The yen-slurm cluster can be accessed by the Slurm workload manager, also known as job scheduler or batch scheduler. Researchers can submit jobs to the scheduler, asking for a certain amount of resources (CPU cores, memory, and time). Slurm will then manage the queue of jobs based on what resources are available. In general, those who request less resources will see their jobs start faster than jobs requesting more resources.

Why use a scheduler?

A job scheduler has many advantages over the directly shared environment of the interactive yens:

  • Run jobs with a guaranteed amount of resources (CPU cores, memory, time)
  • Setup multiple jobs to run automatically
  • Run jobs that exceed the community guidelines on the interactive nodes
  • Gold standard for using high performance computing resources around the world

Queue of jobs

See all of the jobs in the yen-slurm queue with:

$ squeue

You will see a list of currently running jobs and a queue of pending jobs. Your job will run based on this queue.

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
              1043    normal    a_job    user1 PD       0:00      1 (Resources)
              1042    normal    job_2    user2  R    1:29:53      1 yen10
              1041    normal     bash    user3  R    3:17:08      1 yen11
              1044       dev     bash    user3  R    1:00:08      1 yen12
  • JOBID lists a unique numeric job ID for this job.
  • PARTITION lists the partition the job is submitted to (normal, dev or long).
  • NAME lists the job name that the user specified in the submission script (if no name is supplied, the name of the submission batch script is used). Job names do not have to be unique.
  • USER indicates the yen user who submitted the job.
  • ST lists the job state. R means the job is running and PD means the job is pending in the queue.
  • TIME lists the time the job has been running. Pending jobs will have time 0:00 until they start running.
  • NODES lists how many different machines or nodes the job is running on (1 means the job is running on either yen10, yen11, yen12, yen13 or yen14 and 2 means the job is running on two nodes, and so on).
  • NODELIST(REASON) lists the hostname for the node that the job is running on (yen10, yen11, yen12, yen13 or yen14). For pending jobs, you will see a reason why this jobs has not started yet. Common reasons are (Resources) when the job is waiting on resources such as CPU cores or memory to be available before it can start and (Priority) when the job is lower in priority than other jobs in the queue but the resources are available.

Filtering this command for your user will display only your running and queued jobs:

$ squeue -u $USER

where $USER is your SUNet ID.

How do I check how busy the machines are?

You can pass format options to the sinfo command as follows:

$ sinfo --format="%m | %C"
  MEMORY | CPUS(A/I/O/T)
  1031612+ | 86/394/0/480 

where MEMORY outputs the (minimum) size of memory per node in megabytes (the minimum node memory is 1T) and CPUS(A/I/O/T) prints the number of CPU cores that are allocated / idle / other / total. For example, if you see 86/394/0/480 that means 86 CPU cores are allocated, 394 are idle (free) out of 480 cores total (other should always be 0 unless that node is down for maintenance).

We can also add formatting options to the squeue command:

$ squeue -o "%.18i %.9P %.8j %.8u %.8T %.10M %.10l %.4C %.7m %.15R"

            JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMIT CPUS MIN_MEM NODELIST(REASON)
            157022   normal     job1    user1  PENDING       0:00 2-00:00:00   30      1T     (Resources)
            155217   normal     job2    user2  RUNNING    8:59:39 1-00:00:00    4     80G           yen10
            157027     long     job3    user3  RUNNING    7:30:34 7-00:00:00    4     70G           yen11
          157026_1   normal job_array   user3  RUNNING       4:11    4:00:00    4     70G           yen12

where we specify what columns we want to display with additional columns for time limit, CPU cores and minimum memory each job requested.

When will my job start?

You can ask the scheduler using squeue --start, and look at the START_TIME column.

$ squeue --start

             JOBID PARTITION     NAME     USER ST          START_TIME  NODES SCHEDNODES    NODELIST(REASON)
               112    normal yahtzeem  astorer PD 2020-03-05T14:17:40      1 yen10          (Resources)
               113    normal yahtzeem  astorer PD 2020-03-05T14:27:00      1 yen10          (Priority)
               114    normal yahtzeem  astorer PD 2020-03-05T14:37:00      1 yen10          (Priority)
               115    normal yahtzeem  astorer PD 2020-03-05T14:47:00      1 yen10          (Priority)
               116    normal yahtzeem  astorer PD 2020-03-05T14:57:00      1 yen10          (Priority)
               117    normal yahtzeem  astorer PD 2020-03-05T15:07:00      1 yen10          (Priority)