On the Mahuika cluster, we use a mechanism called "Fair Share" to compute the lion's share of a job's overall priority score. We also plan to deploy this mechanism on the Māui cluster in the near future.
What is Fair Share?
Fair Share is a mechanism to set job priorities. It is based on a share of the cluster, that is, a fraction of the cluster's overall computing capacity. We set a project's expected rate of use based on that project's percentage share of all then-current allocations, which in turn is derived from the sizes (in core hours) and durations (in days) — and thus the expected rates of use (in terms of cores, where one core is 24 core hours per day) — of the various active allocations on that cluster.
- If the size of your allocation (in core hours per day) increases, your project's share of the cluster will increase. Conversely, if the size of your allocation decreases, your project's share of the cluster will decrease.
- If the size of another project's allocation (again, in core hours per day) increases, your project's share of the cluster will decrease, since, even though your allocation's size has remained the same, the total size of other allocations has increased and thus your allocation's share has decreased. Conversely, if the size of the other project's allocation decreases, your project's share of the cluster will increase.
- If the cluster gets larger (e.g. we purchase and install more computing capacity), your project's share of the cluster will not change, but that share of the cluster will correspond to a higher rate of core hour usage. This situation will only last until more allocations are issued, or existing allocations are made larger, to take advantage of the increased capacity. The opposite will occur if the cluster shrinks, though cluster shrinkage is not expected to occur.
Fair Share is not designed to ensure that all project teams get the same share of the cluster.
How does Fair Share work?
The starting point for a Fair Share calculation is a comparison of the project's actual share of use to the expected share of use. This share of use is based on what all users of the cluster have actually used during the relevant period of time, not what the cluster was capable of delivering during that same period. Currently, each period is five minutes.
Because five minutes is a very short time, Fair Share aggregates the ratio of actual share to expected share since records began on that cluster. But as the time gets further back from the present, each five-minute window has slightly less influence on fair share scores. Our current configuration has it that after two weeks (that is, 4,032 successive five-minute windows), the effect of the ratio for that five-minute slice is worth only half of what it was worth initially; after four weeks, it is worth a quarter; after six weeks, one eighth; and so on. The effect of this decay curve is that overuse or underuse in the recent past has a greater effect on your project's fair share score than the same extent of overuse or underuse long ago.
One important implication of Fair Share is that you cannot bank core hours by refraining from submitting work. If, for example, you expect to have a lot of computational work to carry out in September, you can't get a significant priority boost in September by refraining from carrying out computational work in March. In fact, you will get the best advantage from Fair Share by submitting work at close to a constant rate.
If you expect that your project team will need widely varying rates of computer use during your allocation period, please contact our support team to enquire about splitting your project's allocation up into parts. Please be aware that we cannot guarantee this option will be available for any given project, and that we are most likely to be able to accommodate such a request for projects that expect to use the cluster heavily on average, can predict when they will need their heaviest use with a high degree of confidence, and give us plenty of notice.
For full details on Slurm's Fair share mechanism, please see this page (offsite).
How do I check my project's Fair Share score?
- The command
nn_corehour_usage <project-id>, on a Mahuika or Maui login node, will show, along with other information, the current fair share score and ranking of the specified project.
ssharecommand, on a Mahuika login node, will show the fair share tree. A related command,
nn_sshare_sorted, will show projects in order from the highest fair share score to the lowest.
In our current configuration, Fair Share scores are attached to projects, not to individual users.
My project's Fair Share score is too low. How can I improve it?
If you have just carried out an unusually large spike of work, your fair share score will naturally be lowered for a while, and should come back to normal after a few days.
If, on the other hand, you have more work to do than expected, please contact us to apply for a larger allocation.
If you believe your project's fair share score has become corrupted, please get in touch with our support team.