Mahuika Slurm Partitions

Definitions

CPU - Logical core, also known as a hardware thread. Referred to as a "CPU" in the Slurm documentation.  Since Hyperthreading is enabled, there are two CPUs per physical core and every job is allocated an even number of CPUs.

Fairshare Weight - CPU hours are multiplied by this factor to determine usage for the purpose of calculating fair-share score.

Node - A single computer within the cluster with its own CPUs, memory and sometimes GPUs.

Walltime - Real world time, as opposed to CPU time (walltime x CPUs).

General Limits

  • No individual job can request more than 20,000 CPU hours.
  • No user can have more than 1,000 jobs in the queue at a time. This limit can be relaxed for those who need to submit large numbers of jobs, provided that they undertake to do it with job arrays.

Selecting a Partition

Partition can be specified via the appropriate sbatch option, e.g.:

#SBATCH --partition=long

If a partition is not specified, the large partition will be used.

Name

Max Walltime

Nodes

CPUs/Node

Available Mem/CPU

Available Mem/Node

Fairshare Weight

Description

large

3 days

226

72

1500 MB

108 GB

1

Standard partition.

long

3 weeks

69

72

1500 MB

108 GB

1

For jobs that need to run for longer than 3 days.

prepost

3 hours

5

72

6800 MB

480 GB

1

Use for pre and post processing tasks in a workflow.

bigmem

7 days

4

72

6800 MB

480 GB

2

Partition for jobs requiring large amounts of memory.

hugemem

7 days

0.5

128

30 GB

4,000 GB

4

Can be used to run jobs that need up to 2 TB of memory.

gpu

3 days

4

8

13500 MB

108 GB 

56 / GPU

See below for more info.

ga_bigmem

7 days

1

72

6800 MB

480 GB

2

Only available to Genomics Aotearoa.

ga_hugemem

7 days

1

128

30 GB

4 000 GB

4

Only available to Genomics Aotearoa.

 

Debug QoS

Orthogonal to the partitions, each job has a "Quality of Service", with the default QoS for a job being determined by the allocation class of its project. Specifying --qos=debug will override that and give the job very high priority, but is subject to strict limits: 15 minutes per job, and only 1 job at a time per user. Debug jobs may not span more than two nodes.

Accounting for Memory

For the purposes of project accounting, jobs which use more memory per CPU than is indicated in the above table will be counted as having occupied the equivalent number of CPUs.

For example, a job requesting 4 CPUs and 12 GB of memory on the large partition (1.5 GB per CPU) is equivalent to requesting 8 CPUs (12GB / 1.5GB).

Requesting GPUs

  • In order to utilise GPUs you must use the gpu partition:
    #SBATCH --partition gpu
  • You must also specify the number of GPUs you wish to use, either 1 or 2:
    #SBATCH --gres gpu:1
  • In addition to GPUs, you can request up to four CPUs and up to 54 GB of RAM.
  • You may not request more than two GPUs per job.
  • One GPU hour is equivalent to 56 CPU hours.

Example SLURM header for using GPUs

#!/bin/bash -e
#SBATCH --job-name      GPU_example
#SBATCH --time          01:00:00          # Walltime
#SBATCH --mem           20G               # memory
#SBATCH --gres          gpu:1             # Number of GPUs to use
#SBATCH --partition gpu

Mahuika Infiniband Islands

Mahuika is divided into “islands” of 26 nodes (or 1,872 CPUs). Communication between two nodes on the same island is faster than between two nodes on different islands. MPI jobs placed entirely within one island will often perform better than those split among multiple islands.

You can request that a job runs within a single InfiniBand island by adding:

#SBATCH --switches=1

Slurm will then run the job within one island provided that this does not delay starting the job by more than the maximum switch waiting time, currently configured to be 5 minutes. That waiting time limit can be reduced by adding @<time> after the number of switches e.g:

#SBATCH --switches=1@00:30:00
Labels: mahuika slurm
Was this article helpful?
2 out of 3 found this helpful