Useful Slurm Commands

For full details, see the Quickstart Slurm User Guide and/or the Slurm man page

Note: environment variables can control the output format

Controlling Jobs 

Slurm command

Purpose / Comment

sbatch

is used to submit a job script for later execution. The script will typically contain one or more srun commands to launch parallel tasks.

salloc

is used to allocate resources for a job in real time. Typically, this is used to allocate resources and spawn a shell. The shell is then used to execute srun commands to launch parallel tasks.

srun

is used to submit a job for execution or initiate job steps in real time. srun has a wide variety of options to specify resource requirements, including: minimum and maximum node count, processor count, specific nodes to use or not use, and specific node characteristics (so much memory, disk space, certain required features, etc.). A job can contain multiple job steps executing sequentially or in parallel on independent or shared resources within the job's node allocation

sattach

is used to attach standard input, output, and error plus signal capabilities to a currently running job or job step. One can attach to and detach from jobs multiple times

sbcast

is used to transfer a file from local disk to local disk on the nodes allocated to a job. This can be used to effectively use diskless compute nodes or provide improved performance relative to a shared file system

scancel

is used to cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step

scontrol

is the administrative tool used to view and/or modify Slurm state. Note that many scontrol commands can only be executed as user root.  For example,

·  to list detailed information for a job (useful for troubleshooting):
scontrol show jobid -dd <job id>

·  to show job details:
scontrol show jobs

squeue

reports the state of jobs or job steps. It has a wide variety of filtering, sorting, and formatting options. By default, it reports the running jobs in priority order and then the pending jobs in priority order. For example:

·  to display only my jobs in the partition: 
squeue -u <user name>

·  to list all the jobs for <username> in  partition <queue>:
squeue -u <username> -p <queue>

·  to add more details to the list, set e.g.:

squeue -o "%.18i %.9P %.5Q %.8j %.8u %.8T %.10M %.11l %.6D %.4C %.6b %.20S %.20R %.8q" -u $USER

The output format can be also defined permanently (e.g. in $HOME/.bashrc) using the environment variable:

export SQUEUE_FORMAT="%.18i %.9P %.5Q %.8j %.8u %.8T %.10M %.11l %.6D %.4C %.6b %.20S %.20R %.8q"

see documentation using man squeue

sinfo

reports the state of partitions and nodes managed by Slurm. It has a wide variety of filtering, sorting, and formatting options. For example

·  to displays timelimits and available nodes for a particular partition:
sinfo -p <partition name>

Note: To print the full-length names of partitions etc: unset SINFO_FORMAT

sview

is a graphical user interface to get and update state information for jobs, partitions, and nodes managed by Slurm.

smap

reports state information for jobs, partitions, and nodes managed by Slurm, but graphically displays the information to reflect network topology. For example

·  to show this information, updating every 2 seconds:
smap -i 2

strigger

is used to set, get or view event triggers. Event triggers include things such as nodes going down or jobs approaching their time limit.

 

Job Accounting Information

Slurm Command

Purpose/Comment

sacct

Is used to report job or job step accounting information about active or completed jobs, e.g. to display historical information about completed jobs: For example

·  to display historical information about completed jobs:  
sacct --format=jobid,jobname,account,partition,ntasks,
alloccpus,elapsed,state,exitcode -j

·  to display statsiss on completed jobs by jobid:
sacct -j <jobid> --format=JobID,JobName,MaxRSS,Elapsed

·  To view the same information for all jobs of a user:
sacct -u <username> --format=JobID,JobName,MaxRSS,Elapsed

sstat

Report accounting information on currently running jobs and job steps. For example

·  to list the status for a currently running job:
sstat --format=AveCPU, AvePages,AveRSS,AveVMSize,JobID -j <jobid> --allsteps

sreport

Report resource usage by SLURM cluster, partition, user, account (Project id) etc.

sprio

View factors used to determine a job’s priority

sshare

View current fair-share information. For example:

·  To find out your Fairshare Score:
sshare -u <username>

 

Useful commands to help understand when jobs will run

Slurm command

Purpose/Comment

squeue

List all the jobs

squeue -u $USER

List all your jobs

sprio -w

To view your priority score

sprio -u $USER

To view the priorities of your pending jobs

 

Advanced Job Accounting Information

Slurm Command

Purpose / Comment

sacctmgr

Is used to view or modify Slurm account information. For example:

·  to list Accounts:
sacctmgr list accounts

·  to list Account Associations:
sacctmgr list accounts withassoc

·  to show the Associations tree:
sacctmgr list associations cluster=cluster_name format=Account,Cluster,User,Fairshare tree withd

to show QoS available:
sacctmgr show qos format=name,priority

 

Labels: author
Was this article helpful?
0 out of 0 found this helpful