Slurm Interactive Sessions

A SLURM interactive session reserves resources on compute nodes allowing you use them interactively as you would the login node.

There are two main commands that can be used to make a session, run and salloc, both of which use most of the same options available to sbatch (see our Slurm Reference Sheet). 

Warning

An interactive session will, once it starts, use the entire requested block of CPU time and other resources unless earlier exited from, even if unused. To avoid unnecessary charges to your project, don't forget to exit an interactive session once finished.

Using 'srun --pty bash'

srun will add your resource request to the queue. When the allocation starts, a new bash session will start up on one of the granted nodes.

For example;

srun --job-name "InteractiveJob" --cpus-per-task 8 --mem-per-cpu 1500 --time 24:00:00 --pty bash

 You will receive a message.

srun: job 10256812 queued and waiting for resources

And when the job starts;

srun: job 10256812 has been allocated resources
[wbn079 ~ SUCCESS ]$

Note the host name in the prompt has changed to the compute node wbn079.

For a full description of srun and its options, see here.

Using 'salloc'

salloc functions similarly srun --pty bash in that it will add your resource request to the queue. However the allocation starts, a new bash session will start up on the login node. This is useful for running a GUI on the login node, but your processes on the compute nodes.

For example;

salloc --job-name "InteractiveJob" --cpus-per-task 8 --mem-per-cpu 1500 --time 24:00:00

 You will receive a message.

salloc: Pending job allocation 10256925
salloc: job 10256925 queued and waiting for resources

And when the job starts;

salloc: job 10256925 has been allocated resources
salloc: Granted job allocation 10256925 [mahuika01~ SUCCESS ]$

Note the that you are still on the login node mahuika01, however you will now have permission to ssh to any node you have a session on .

For a full description of srun and its options, see here.

Requesting a postponed start

salloc lets you specify that a job is not to start before a specified time, however the job may still be delayed if requested resources are not available. You can request a start time using the --begin flag.

The --begin flag takes either absolute or relative times as values.

Warning

If you specify absolute dates and/or times, Slurm will interpret those according to your environment's current time zone. Ensure that you know what time zone your environment is using, for example by running date in the same terminal session.

  • --begin=16:00 means start the job no earlier than 4 p.m. today. (Seconds are optional, but the time must be given in 24-hour format.)
  • --begin=11/05/20 means start the job on (or after) 5 November 2020. Note that Slurm uses American date formats. --begin=2020-11-05 is another Slurm-acceptable way of saying the same thing, and possibly easier for a New Zealander.
  • --begin=2020-11-05T16:00:00 means start the job on (or after) 4 p.m. on 5 November 2020.
  • --begin=now+1hour means wait at least one hour before starting the job.
  • --begin=now+60 means wait at least one minute before starting the job.

If no --begin argument is given, the default behaviour is to start as soon as possible.

While you wait

It's quite common to have to wait for some time before your interactive session starts, even if you specified that the job is to start as soon as possible.

While you're waiting, you will not have use of that shell prompt. Do not use Ctrl-C to get the prompt back, as doing so will cancel the job. If you need a shell prompt, detach your tmux or screen session, or switch to (or open) another terminal session to the same cluster's login node.

In the same way, before logging out (for example, if you choose to shut down your workstation at the end of the working day), be sure to detach the tmux or screen session. In fact, we recommend detaching whenever you leave your workstation unattended for a while, in case your computer turns off or goes to sleep or its connection to the internet is disrupted while you're away.

 

Setting up a detachable terminal

Warning

If you don't request your interactive session from within a detachable terminal, any interruption to the controlling terminal, for example by your computer rebooting or losing its connection to the internet, will permanently cancel that interactive session and remove it from the queue, whether it has started or not.

  1. Log in to a Mahuika, Māui or Māui-ancil login node.
  2. Start up tmux or screen.

Modifying an existing interactive session

Whether your interactive session is already running or is still waiting in the queue, you can make a range of changes to it using the scontrol command. Some changes are off limits for ordinary users, such as increasing the maximum permitted wall time, or unsafe, like decreasing the memory request. But many other changes are allowed.

Postponing the start of an interactive job

Suppose you submitted an interactive job just after lunch, and it's already 4 p.m. and you're leaving in an hour. You decide that even if the job starts now, you won't have time to do everything you need to do before the office shuts and you have to leave. Even worse, the job might start at 11 p.m. after you've gone to bed, and you'll get to work at 9:00 the next morning nd find that it has wasted ten wall-hours of time.

Slurm offers an easy solution: Identify the job, and use scontrol to postpone its start time.

Note

Job IDs are unique to each cluster but not across the whole of NeSI. Therefore, scontrol must be run on a node belonging to the cluster where the job is queued.

The following command will delay the start of the job with numeric ID 12345678 until 9:30 a.m. the next day:

scontrol update jobid=12345678 StartTime=tomorrowT09:30:00

This variation, if run on a Friday, will delay the start of the same job until 9:30 a.m. on Monday:

scontrol update jobid=12345678 StartTime=now+3daysT09:30:00

Warning

Don't just set StartTime=tomorrow with no time specification unless you like the idea of your interactive session starting at midnight or in the wee small hours of the morning.

Bringing forward the start of an interactive job

In the same way, you can use scontrol to set a job's start time to earlier than its current value. A likely application is to allow a job to start immediately even though it stood postponed to a later time:

scontrol update jobid=12345678 StartTime=now

Other changes using scontrol

There are many other changes you can make by means of scontrol. For further information, please see the scontrol documentation (off site).

Modifying multiple interactive sessions at once

In the same way, if you have several interactive sessions waiting to start on the same cluster, you might want to postpone them all using a single command. To do so, you will first need to identify them, hence the earlier suggestion to something specific to interactive jobs in the job name.

For example, if all your interactive job names start with the text "IJ", you could do this:

# -u $(whoami) restricts the search to my jobs only.
# The --states=PD option restricts the search to pending jobs only.
#
# Each <tab> string should be replaced with a literal tab character. If you
# can't insert one by pressing the tab key on your keyboard, you should be
# able to insert one by pressing Ctrl-V followed immediately by Ctrl-I.
#
squeue -u $(whoami) --states=PD -o "%A<tab>%j" | grep "<tab>IJ"

The above command will return a list of your jobs whose names start with the text "IJ". In this respect, it's more flexible than the -n option to squeue, which requires the entire job name string in order to identify a match.

In order to use scontrol, we need to throw away all of the line except for the job ID, so let's use awk to do this, and send the output to scontrol via xargs:

squeue -u $(whoami) --states=PD -o "%A<tab>%j" | grep "<tab>IJ" | \
awk '{print $1}' | \
xargs -I {} scontrol update jobid={} StartTime=tomorrowT09:30:00

If you want to do this automatically every working day and you have a consistent element that you use in the name of all your interactive jobs, you can set up cron jobs on Māui, Mahuika and/or Māui-ancil login nodes. This is left as an exercise for the reader, having regard to the following:

  • Time zone: Even if your environment is set up to use a different time zone (commonly New Zealand time, which adjusts for daylight saving as needed), time schedules in the crontab itself are interpreted in UTC. So if you want something to run at 4:30 p.m. New Zealand time regardless of the time of year, the cron job will need to run at 4:30 a.m. UTC (during winter) or 3:30 a.m. UTC (during summer), and you will need to edit the crontab every six months or so.
  • Weekends: If you just have a single cron job that postpones pending interactive jobs until the next day, interactive jobs pending on a Friday afternoon will be postponed until Saturday morning, which is probably not what you want. Either your cron job detects the fact of a Friday and postpones jobs until Monday, or you have two cron jobs, one that runs on Mondays to Thursdays, and a different cron job running on Fridays.

Cancelling an interactive session

You can cancel a pending interactive session by attaching the relevant session, putting the job in the foreground (if necessary) and pressing Ctrl-C on your keyboard.

To cancel all your queued interactive sessions on a cluster in one fell swoop, a command like the following should do the trick:

squeue -u $(whoami) --states=PD -o "%A<tab>%j" | grep "<tab>IJ" | \
awk '{print $1}' | \
xargs -I {} scancel {}

If you frequently use interactive jobs, we recommend doing this before you go away on leave or fieldwork or other lengthy absence.

Was this article helpful?
0 out of 0 found this helpful
a.homepage:before