Slurm Usage - a Primer

Slurm Scripts

Slurm scripts are text files you will need to create in order to submit a job to the scheduler. Slurm scripts start with #!/bin/bash and contain set of directives (start with #SBATCH , followed by commands (srun):

#!/bin/bash
#SBATCH --job-name=JobName      # job name (shows up in the queue)
#SBATCH --account=nesi99999     # Project Account
#SBATCH --time=00:10:00         # Walltime (HH:MM:SS)
#SBATCH --mem-per-cpu=4096      # memory/cpu (in MB)
#SBATCH --ntasks=2              # number of tasks (e.g. MPI)
#SBATCH --cpus-per-task=4       # number of cores per task (e.g. OpenMP)
#SBATCH --partition=long        # specify a partition
#SBATCH --qos=debug             # debug jobs have increased priority but tighter restrictions
#SBATCH --hint=nomultithread    # don't use hyperthreading

srun [options] <executable> [options]

Not all directives need to be specified, just the ones you need.

Launching Jobs with srun

The srun command runs the executable along with its options, within the resources allocated to the job.

For MPI jobs, srun sets up the MPI runtime environment needed to run the parallel program, launching it on multiple CPUs, which can be on different nodes. srun should be used in place of any other MPI launcher, such as aprun or mpirun.

Commonly Used Slurm Environment variables

These can be useful within Slurm scripts:

  • $SLURM_JOB_ID (job id)
  • $SLURM_NNODES (number of nodes)
  • $SLURM_NTASKS (number of MPI tasks)
  • $SLURM_CPUS_PER_TASK (CPUs per MPI task)
  • $SLURM_SUBMIT_DIR (directory job was submitted from)
  • $SLURM_ARRAY_JOB_ID (job id for the array)
  • $SLURM_ARRAY_TASK_ID (job array index value)

MPI Jobs

For MPI jobs you need to set --ntasks to a value larger than 1, or if you want more control of task layout set --ntasks-per-node and --node instead.

OpenMP Jobs

For OpenMP jobs you need to set --cpus-per-task to a value larger than 1. Our Slurm prolog will then set OMP_NUM_THREADS equal to that number.

Submitting a job

Use sbatch <script> to submit the job. All Slurm directives can alternatively be specified at the command line, e.g. sbatch --account=nesi12345 <script>.

Try submitting a simple job

Submit job helloworld.sl:

#!/bin/bash
#SBATCH --job-name=hello
#SBATCH --time=00:02:00

srun echo "Hello, World!"

with sbatch --account=nesi12345 helloworld.sl where nesi12345 is your NeSI project’s code. If you only have one project then you don’t need to specify it.

Checking completed jobs with sacct

Another useful Slurm command is sacct which retrieves information about completed jobs. For example:

sacct -j 14309

where the argument passed to -j is the job ID, will show us something like:

       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
14309        problem.sh       NeSI  nesi99999         80  COMPLETED      0:0
14309.batch       batch             nesi99999         80  COMPLETED      0:0
14309.0         yourapp             nesi99999         80  COMPLETED      0:0

By default sacct will list all of your jobs which were (or are) running on the current day. Each job will show as more than one line (unless -X is specified): an initial line for the job as a whole, and then an additional line for each job step, i.e.: the batch process which is your executing script, and then each of the srun commands it executes.

By changing the displayed columns you can gain information about the CPU and memory utilisation of the job, for example

sacct -j 14309 --format=jobid,jobname,elapsed,avecpu,totalcpu,alloccpus,maxrss,state
      JobID    JobName    Elapsed     AveCPU   TotalCPU  AllocCPUS     MaxRSS      State
------------ ---------- ---------- ---------- ---------- ---------- ---------- ----------
14309        problem.sh   00:12:42             00:00.012         80             COMPLETED
14309.batch       batch   00:12:42   00:00:00  00:00.012         80      1488K  COMPLETED
14309.0         yourapp   00:12:41   00:12:03   16:00:03         80    478356K  COMPLETE
Was this article helpful?
0 out of 0 found this helpful