Overview & Slurm Policies

All NeSI systems use the Slurm batch system for the submission, control and management of user jobs.

Slurm has three key functions:

  1. It allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work;
  2. It provides a framework for starting, executing, and monitoring work on the set of allocated nodes;
  3. It arbitrates contention for resources by managing a queue of pending work.

Slurm "partitions" can be considered to be 'job queues', each of which has an assortment of constraints such as job size limit, job time limit, users permitted to use it, etc.

The Slurm implementation on Māui and Mahuika have been designed to support:

Features common to Māui and Mahuika

  1. Fast turnaround for small debug jobs
  2. Fast turnaround for pre- and post-processing jobs
  3. Projects that have exhausted their allocations may continue to submit jobs, but at the lowest priority and with no expectation of service.
  4. Compute resources are fairly shared across Collaborator (60%), Merit (includes Proposal Development and Postgraduate classes, 20%) and Subscription (includes Commercial) (20%)
  5. Simple “queue” structure – with different partitions according to job size, hardware,
  6. and wall-clock time limits.
  7. Prevent jobs from starting if a Project’s /nesi/nobackup inode quota has been exceeded.
  8. Use Backfill to improve utilisation.
  9. At job termination, provide useful information on Job resource, including:
    1. Account (i.e. Project ID)
    2. User
    3. Partition
    4. QoS
    5. Allocated Cores
    6. Maximum core count used
    7. Memory allocated
    8. Maximum memory used (on any node)
    9. Total Core-hs consumed by the job
    10. Elapsed wall-clock time (job)
    11. Elapsed wall-clock time (from job submission to job completion)
    12. Job priority
    13. Maximum number of page faults
    14. Job Exit Code

Features that differ between Maui and Mahuika

On Mahuika:

  1. Optimised for Capacity (High Throughput) Workloads for
    1. jobs using a single core,
    2. small tightly coupled jobs that will fit into an InfiniBand Island (up to 26 nodes,
    3. embarrassingly parallel jobs up to 504 cores, and
    4. jobs that make high demand for metadata services (e.g. that write and/or read) large numbers of small files, or "chatter" to a log file(s)
  2. Topology-aware scheduling;
  3. Nodes are shared.

On Māui:

  1. Optimised for Capability Workloads (i.e. jobs using 1 to many nodes);
  2. Makes optimal use of Nodes with different memory sizes – enabling opportunities for in-memory pre-emption;
  3. Jobs should not make high demands on metadata services (e.g. that write and/or read) large numbers of small files, or chatter to a log file(s);
  4. Nodes are not shared

On Māui Ancillary Nodes

  1. Optimised for jobs that make high demands on metadata services
  2. Nodes are shared.

NeSI Scheduling Policies Enforced at Job Submission

  • Is the Project (Account) allowed to use the specified partition?
  • Is the Project (Account) allowed to use the specified QoS?
  • Does the job exceed the maximum core (or node) count for that partition?
  • Has the Project exceeded the project’s /nesi/nobackup inode quota?
  • Has the Project expired?
  • Is there sufficient core-h resources available to run the job
  • Is the user batch disabled?

 

 

 

Labels: slurm
Was this article helpful?
0 out of 0 found this helpful