Important
Partitions on these systems that may be used for NeSI workloads carry the prefix nesi_.
Definitions
CPU - A logical core, also known as a hardware thread. Referred to as a "CPU" in the Slurm documentation. Since Hyperthreading is enabled, there are two CPUs per physical core, and every task— and therefore every job — is allocated an even number of CPUs.
Job: A running batch script and any other processes which it might launch with srun.
Node: A single computer within the cluster with its own CPUs and RAM (memory), and sometimes also GPUs. A node is analogous to a workstation (desktop PC) or laptop.
Walltime: Real world time, as opposed to CPU time (walltime x CPUs).
Māui (XC50) Slurm Partitions
Nodes are not shared between jobs on Māui, so the minimum charging unit is node-hours, where 1 node-hour is 40 core-hours, or 80 Slurm CPU-hours.
There is only one partition available to NeSI jobs:
Name |
Nodes |
Max Walltime |
Avail / Node |
Max / Account |
Description |
nesi_research |
316 |
24 hours |
80 CPUs 90 or 180 GB RAM |
240 nodes 1200 node-hours running |
Standard partition for all NeSI jobs. |
Limits
As a consequence of the above limit on the node-hours reserved by your running jobs (GrpTRESRunMins in Slurm documentation, shown in squeue
output when you hit it as the reason "AssocGrpCPURunMinutes" ) you can occupy more nodes simultaneously if your jobs request a shorter time limit:
nodes | hours | node-hours | limits reached |
1 | 24 | 24 | 24 hours |
50 | 24 | 1200 | 1200 node-hours, 24 hours |
100 | 12 | 1200 | 1200 node-hours |
240 | 5 | 1200 | 1200 node-hours, 240 nodes |
240 | 1 | 240 | 240 nodes |
Most of the time job priority will be the most important influence on how long your jobs have to wait - the above limits are just backstops to ensure that Maui's resources are not all committed too far into the future, so that debug and other higher-priority jobs can start reasonably quickly.
Debug QoS
Each job has a "QoS", with the default QoS for a job being determined by the allocation class of its project. Specifying --qos=debug
will override that and give the job very high priority, but is subject to strict limits: 15 minutes per job, and only 1 job at a time per user. Debug jobs are limited to 2 nodes.
Māui_Ancil (CS500) Slurm Partitions
Name |
Nodes |
Max Walltime |
Avail / Node |
Max / Job |
Max / User |
Description |
nesi_prepost |
4 |
24 hours |
80 CPUs 720 GB RAM |
20 CPUs 700 GB RAM |
80 CPUs 700 GB RAM |
Pre and post processing tasks. |
nesi_gpu |
4 to 5 |
72 hours |
4 CPUs 12 GB RAM 1 P100 GPU* |
4 CPUs 12 GB RAM 1 P100 GPU |
4 CPUs 12 GB RAM 1 P100 GPU |
GPU jobs and visualisation. |
nesi_igpu |
0 to 1 |
2 hours |
4 CPUs 12 GB RAM 1 P100 GPU* |
4 CPUs 12 GB RAM 1 P100 GPU |
4 CPUs 12 GB RAM 1 P100 GPU |
Interactive GPU access 7am - 8pm. |
* NVIDIA Tesla P100 PCIe 12GB card
Requesting GPUs
Nodes in the nesi_gpu
partition have 1 P100 GPU card each. You can request it using:
#SBATCH --partition=nesi_gpu
#SBATCH --gpus-per-node=1
Note that you need to specify the name of the partition. You also need to specify a number of CPUs and amount of memory small enough to fit on these nodes.
See GPU use on NeSI for more details about Slurm and CUDA settings.