Slurm scripts are text files you will need to create in order to submit a job to the scheduler. Slurm scripts start with
#!/bin/bash (with optional flags) and contain a set of directives (which start with
#SBATCH), followed by commands (at least one of which should start with
#SBATCH --job-name=JobName # job name (shows up in the queue) #SBATCH --account=nesi99999 # Project Account #SBATCH --time=00:10:00 # Walltime (HH:MM:SS) #SBATCH --mem-per-cpu=4096 # memory/cpu (in MB) #SBATCH --ntasks=2 # number of tasks (e.g. MPI) #SBATCH --cpus-per-task=4 # number of cores per task (e.g. OpenMP) #SBATCH --partition=long # specify a partition #SBATCH --hint=nomultithread # don't use hyperthreading
#SBATCH --output=%x-%j.out # %x and %j are replaced by job name and ID
#SBATCH --mail-type=ALL # Optional: Send email notifications
#SBATCH --email@example.com # Use with --mail-type option srun [options] <executable> [options]
We strongly recommend using
#!/bin/bash -e instead of plain
#!/bin/bash, so that a command throwing an error will cause your job to stop, instead of wasting your project's CPU core hours by continuing to make use of potentially erroneous intermediate data.
Not all directives need to be specified, just the ones you need.
Launching executables with srun
srun command runs the executable along with its options, within the resources allocated to the job. Although
srun can be used from the command line in certain circumstances, it is best used within a Slurm job script, to be executed when the job is run by the scheduler.
For MPI jobs,
srun sets up the MPI runtime environment needed to run the parallel program, launching it on multiple CPUs, which can be on different nodes.
srun should be used in place of any other MPI launcher, such as
Commonly Used Slurm Environment variables
These can be useful within Slurm scripts:
$SLURM_NNODES(number of nodes)
$SLURM_NTASKS(number of MPI tasks)
$SLURM_CPUS_PER_TASK(CPUs per MPI task)
$SLURM_SUBMIT_DIR(directory job was submitted from)
$SLURM_ARRAY_JOB_ID(job id for the array)
$SLURM_ARRAY_TASK_ID(job array index value)
For MPI jobs you need to set
--ntasks to a value larger than 1, or if you want all nodes to run the same number of tasks, set
For OpenMP jobs you need to set
--cpus-per-task to a value larger than 1. Our Slurm prolog will then set OMP_NUM_THREADS to equal that number. Along with
--cpus-per-task, you should set
--ntasks-per-node) to ensure that threading behaves correctly. For a simple OpenMP job (that doesn't also use MPI),
--ntasks=1 should suffice.
Submitting a job
sbatch <script> to submit the job. All Slurm directives can alternatively be specified at the command line, e.g.
sbatch --account=nesi12345 <script>. This overwrites directives specified in the script.
Try submitting a simple job
#!/bin/bash -e #SBATCH --job-name=hello #SBATCH --time=00:02:00 srun echo "Hello, World!"
sbatch --account=nesi12345 helloworld.sl where nesi12345 is your NeSI project’s code. If you only have one project then you don’t need to specify it.
Submitting a job using GPGPU nodes
To submit to the general purpose GPU nodes, you need to add the following to your SLURM script:
#SBATCH -p gpu
Submitting a job between Māui and Māui_Ancil
Māui consists of the XC50 and the CS500 (Māui_Ancil) part. To submit a job from the XC50 part (including Māui login nodes) to the CS500 part you need to add:
Thus a prepost job submitted to the CS500 nodes from the Maui login node would look like:
#!/bin/bash -e #SBATCH --job-name=hello #SBATCH --time=00:02:00nesi_prepost
module load Anaconda2
--clusters need to be also specified for the other slurm tools to monitor other parts.
You can use
squeue -u $USER to monitor your job status. Alternatively you can also use
Checking completed jobs with sacct
Another useful Slurm command is
sacct which retrieves information about completed jobs. For example:
sacct -j 14309
where the argument passed to
-j is the job ID, will show us something like:
JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 14309 problem.sh NeSI nesi99999 80 COMPLETED 0:0 14309.batch batch nesi99999 80 COMPLETED 0:0 14309.0 yourapp nesi99999 80 COMPLETED 0:0
sacct will list all of your jobs which were (or are) running on the current day. Each job will show as more than one line (unless
-X is specified): an initial line for the job as a whole, and then an additional line for each job step, i.e.: the batch process which is your executing script, and then each of the
srun commands it executes.
By changing the displayed columns you can gain information about the CPU and memory utilisation of the job, for example
sacct -j 14309 --format=jobid,jobname,elapsed,avecpu,totalcpu,alloccpus,maxrss,state
JobID JobName Elapsed AveCPU TotalCPU AllocCPUS MaxRSS State ------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- 14309 problem.sh 00:12:42 00:00.012 80 COMPLETED 14309.batch batch 00:12:42 00:00:00 00:00.012 80 1488K COMPLETED 14309.0 yourapp 00:12:41 00:12:03 16:00:03 80 478356K COMPLETE