Mahuika Slurm Partitions

Hyperthreading and CPU counts

On Mahuika and its Ancillary Nodes, hyperthreading is enabled. Accordingly, by default Slurm schedules hyperthreads (logical cores, or "CPUs" in SLURM nomenclature), of which there are 72 on each compute and large memory node.  For some Applications this will lead to improved performance on the system, but in cases where it does not, or even degrades performance, users can request that SLURM allocate physical cores.  To turn hyperthreading off you should

  • Use the srun option or sbatch directive  --hint=nomultithread

Even though hyperthreading is enabled, the resources will generally be allocated to jobs at the level of a physical core. Two different jobs will not share a physical core. For example, a job requesting resources for three tasks on a hyperthreading node will be allocated two full physical cores.

General Limits

  1. No individual job can request more than 20,000 CPU hours.
  2. By default no user can put more than 1000 jobs in the queue at a time. This limit will be relaxed for those who need to submit large numbers of jobs, provided that they undertake to do it with job arrays.

Partitions

Table 1: Batch Queues on Mahuika

Name of Partition

Maximum time limit

CPU cores

Maximum cores per user

Memory / Core (GB)

Brief description / Purpose

large

3 days

8,424

1024

3

Standard partition, allows large core count jobs.  Due to the above job size limit, large jobs can not be extremely long.

long

3 weeks

1,872

720

3

Standard partition to corral long-duration jobs.  Due to the above job size limit, long jobs can not be extremely large.

prepost

3 hours

36

2

15

Short jobs only.  User for pre and post processing tasks in a workflow.  More memory per CPU.

bigmem

7 days

108

108

15

Standard partition for all other “large memory” jobs. 

hugemem

7 days

64

64

62

Can be used to run jobs that need more than 500GB of memory.  This is a limited resource and should only be used when "huge" memory is required 

gpu

3 days

6

2

3

Provides access to 2 GPGPUs per (virtual) node.

Quality of Service: Debug

Orthogonal to the partitions, each job has a "QoS", with the default QoS for a job being determined by the allocation class of its project. Specifying --qos=debug will override that and give the job very high priority, but is subject to strict limits: 15 minutes per job, and only 1 job at a time per user. Debug jobs are also limited to 72 logical cores (36 physical cores).

Accounting Cores, Memory, and GPUs

For the purposes of project accounting, jobs which use more memory per CPU than is indicated in the table above will be counted as having occupied the equivalent number of CPUs. “bigmem” CPUs count as 2 ordinary CPUs and “hugemem” CPUs as 4. GPUs count as 56 CPUs. 

Mahuika Infiniband Islands

The design of Mahuika is optimised for Capacity workloads. However, the Fat Tree (CLOS) Infiniband Network on Mahuika provides full non-blocking InfiniBand network connectivity up to “islands” of 26 nodes (or 936 physical cores). Accordingly, jobs that run entirely within an InfiniBand Island will achieve better application scaling performance than those that cross InfiniBand Island boundaries.

Users can request that the job run within InfiniBand Island by adding the SLURM flag #SBATCH --switches=1 to their batch script, which defines the maximum count of switches desired for the job allocation. We strongly advise that you manually set a maximum waiting time for the selected number of switches, e.g. #SBATCH --switches=1@01:00:00 will make the scheduler wait for maximum one hour before ignoring the switches request.

Caution: If Slurm finds an allocation containing more switches than the count specified, the job remains pending until it either finds an allocation with the desired switch count or the time limit expires. To determine the default wait time see scontrol show config | grep max_switch_wait

Labels: mahuika slurm
Was this article helpful?
1 out of 1 found this helpful