Job prioritisation

The priority assigned to each job is determined by:

  • Whether your job is a debug job;
  • The allocation class your job is assigned to (See the Table below);
  • How much compute resource your project has used in the recent past, compared to its expected rate of use (see Fair Share for more details);
  • How long your job has been waiting in the queue;
  • Whether your job is small enough to run without impacting other scheduled jobs, and
  • Whether you have exhausted your HPC project's core hour allocation (on Mahuika) or node hour allocation (on Maui).

Table 1: Base job priorities on NeSI Platforms (both Maui and Mahuika)

Base Priority

Project Class

QoS

Comment

Highest Priority

Merit

merit

This ensures that whichever partition your job runs in, it will get the highest base priority

High Priority

Institution, Subscriber

institution

This is your default QoS if your job is accessing institutional resources, i.e. Collaborator or Subscriber

Low Priority

Proposal Development

proposal-development

This is your default QoS if your job is a proposal development project

Lowest Priority

Postgraduate

post-graduate

This is your default QoS if your job is part of a Post Graduate project.

The base priorities in the above table do not mean that all jobs in high-priority classes will run ahead of all jobs in lower-priority classes, but they do make a contribution to overall priority in that (for example) a job using the Merit QoS will start before a job using the Institution QoS if the jobs were submitted at the same time and are otherwise equal.

The order in which any job will run in a partition is based on its run-time priority, which is determined by the fair-share score of the project and whether the job can run as backfill or not. The base priority is modulated by the following factors:

  1. Job priority decreases whenever the project uses more core-hours than expected, across all partitions. This "fair share" policy means that projects that have consumed many CPU core hours in the recent past compared to their expected rate of use (either by submitting and running many jobs, or by submitting and running large jobs) will have a lower priority, and projects with little recent activity compared to their expected rate of use will see their waiting jobs start sooner. We do not have a strict "first-in-first-out" queue policy.
  2. Job priority increases with job wait time in the partition (unless the Project has exhausted its CPU core hour allocation, in which case this does not apply). After the history-based user priority calculation in 1), the next most important factor for each job's priority is the amount of time that each job has already waited in the partition. For all the jobs belonging to a Project, these jobs will most closely follow a "first-in-first-out" policy.
  3. Within any partition, job priority increases with job size, in cores. This least important factor slightly favours larger jobs, as a means of somewhat countering the inherently longer wait time necessary for allocating more cores to a single job.
  4. Where a job can run in backfill, without holding up any higher priority job, it will run immediately.
  5. All projects have the same priority in the debug QoS, so jobs submitted using "debug" are scheduled on a first-in, first-out basis.

To see the recent usage and current fair-share score of a project, you can use the command nn_corehour_usage.

To see the priorities of your currently pending jobs, type the following command: sprio -u $USER

For an overview of how Slurm is configured on Mahuika and Maui, see the Slurm Job Scheduler Design section.

Backfill

Backfill is a scheduling strategy that allows small, short jobs to run immediately if by doing so they will not delay the expected start time of any higher-priority jobs. Since the expected star time of pending jobs depends upon the expected completion time of running jobs, for backfill to work well, it is important that users set reasonably accurate job time limits 

While the kinds of jobs that can be backfilled will also get a low job size score, it is our general experience that an ability to be backfilled is on the whole more useful when it comes to getting work done on the HPCs.

More information about backfill can be found here.

Was this article helpful?
0 out of 0 found this helpful