SLURM: Best Practice

Use Appropriate Bash Header

We recommend using #!/bin/bash -e instead of plain #!/bin/bash, so that a command throwing an error will cause your job to stop, instead of wasting your project's CPU core hours by continuing to make use of potentially erroneous intermediate data.

Submit Shorter Jobs

There are more opportunities for the scheduler to find a time slot to run shorter jobs. Consider using job check-pointing or job arrays.

Use Appropriate Resources 

Don't request more resources (e.g. cores, memory, GPGPUs) than you will need, in addition to using your core hours faster, resources intensive jobs will take longer to queue. Use the information provided at the completion of your job to better define resource requirements.

Choose an Accurate Wall-time

Long jobs will spend more time in the queue, Leave some headroom for safety and run-to-run variability on the system but try to be as accurate as possible. Use the information provided at the completion of your job to better define resource requirements.

Understand Your Fairshare Score

A low fairshare score will affect your jobs priority in the queue, learn more about how to effectively use your allocation here.

Use Job Arrays

Job arrays are an efficient mechanism of managing a collection of batch jobs with identical resource requirements. Most Slurm commands can manage job arrays either as individual elements (tasks) or as a single entity (e.g. delete an entire job array in a single command)

Group Like Tasks

Consider putting related work into a single Slurm job with multiple job steps both for performance reasons and ease of management. Each Slurm job can contain a multitude of job steps and the overhead in Slurm for managing job steps is much lower than that of individual jobs

Use Absolute File Paths

/nesi/project/nesi99999/outputs is preferable to outputs especially if your script is changing directories. If your outputs are referenced multiple times, specifying the path as a variable can help keep things tidy.

 

Was this article helpful?
0 out of 0 found this helpful