Mahuika Slurm Partitions

Definitions

CPU - Logical Core, also known as a hardware thread. Referred to as a "CPU" in the Slurm documentation.  Since Hyperthreading is enabled, there are two CPUs per physical core and every job is allocated an even number of CPUs.

Fairshare Weight - CPU hours are multiplied by this factor to determine usage for the purpose of calculating fair-share score.

Node - A single computer within the cluster with its own CPUs, memory and sometimes GPUs.

Walltime - Real world time, as opposed to CPU time (walltime x CPUs).

General Limits

  • No individual job can request more than 20,000 CPU hours.
  • No user can have more than 1000 jobs in the queue at a time. This limit can be relaxed for those who need to submit large numbers of jobs, provided that they undertake to do it with job arrays.

Selecting a Partition

Partition can be specified via sbatch option, eg:

#SBATCH --partition=long

If undefined the large partition will be used.

 

Name

Max Walltime

Nodes

CPUs/Node

Available Mem/CPU

Available Mem/Node

Fairshare Weight

Description

large

3 days

226

72

1500 MB

108 GB

1

Standard partition.

long

3 weeks

69

72

1500 MB

108 GB

1

For jobs that need to run for longer than 3 days.

prepost

3 hours

5

72

6800 MB

480 GB

1

Use for pre and post processing tasks in a workflow.

bigmem

7 days

4

72

6800 MB

480 GB

2

Partition for jobs requiring large amounts of memory.

hugemem

7 days

0.5

128

30 GB

4,000 GB

4

Can be used to run jobs that need up to 2 TB of memory.

gpu

3 days

4

8

13500 MB

108 GB 

56 / GPU

See below for more info.

ga_bigmem

7 days

1

72

6800 MB

480 GB

2

Only available to Genomics Aotearoa.

ga_hugemem

7 days

1

128

30 GB

4 000 GB

4

Only available to Genomics Aotearoa.

 

Debug QoS

Orthogonal to the partitions, each job has a "Quality of Service", with the default QoS for a job being determined by the allocation class of its project. Specifying --qos=debug will override that and give the job very high priority, but is subject to strict limits: 15 minutes per job, and only 1 job at a time per user. Debug jobs can be no larger than 2 nodes.

Accounting for Memory

For the purposes of project accounting, jobs which use more memory per CPU than is indicated in the above table will be counted as having occupied the equivalent number of CPUs.

For example, a job requesting 4 CPUs and 12 GB of memory on the large partition (1.5 GB per CPU) is equivalent to requesting 8 CPUs (12GB / 1.5GB).

 

Requesting GPUs

  • In order to utilise GPUs you must use the gpu partition;
    #SBATCH --partition gpu
  • You must also specify number of GPUs;
    #SBATCH --gres gpu:1
  • For each GPU requested you can request up to 4 CPUs and 54 GB memory.
  • You may not request more than 2 GPUs at a time.
  • One GPU hour is equivalent to 56 CPU hours.

Example SLURM header for using GPUs

#!/bin/bash -e
#SBATCH --job-name      GPU_example
#SBATCH --time          01:00:00          # Walltime
#SBATCH --mem           20G               # memory
#SBATCH --gres          gpu:1             # Number of GPUs to use
#SBATCH --partition gpu

Mahuika Infiniband Islands

Mahuika is divided into “islands” of 26 nodes (or 1,872 CPUs), nodes communicate faster with other nodes that share their island. MPI jobs placed entirely within one island will often perform better than those split among multiple islands. 

You can request that a job runs within a single InfiniBand island by adding:

#SBATCH --switches=1

Slurm will then run the job within one island provided that this does not delay starting the job by more than the maximum switch waiting time, currently configured to be 5 minutes. That waiting time limit can be reduced by adding @<time> after the number of switches e.g:

#SBATCH --switches=1@00:30:00
Labels: mahuika slurm
Was this article helpful?
2 out of 3 found this helpful