Submitting your first job

Environment Modules

Modules are a convenient  way to provide access to applications  on the cluster. They prepare the environment you need to run an application.

For a full list of module commands run man module

module spider Lists all available modules. (only Mahuika)
module spider [module name] Searches available modules for [module name] (only Mahuika)
module show [module name] Shows information about [module name]
module load [module name] Loads [module name]
module list [module name] Lists currently loaded modules.


Testing

Before submitting a job to the scheduler it is good practice to run a small test of your code first to confirm there are no errors.

You are allowed to use the login node for this purpose, provided the resource usage is minimal.

Warning

Jobs running on the login node for long periods of time or using large numbers of CPU's will be killed.

Slurm

Jobs on Mahuika/Maui are submitted in the form of a batch script containing the code you want to run and a header of information needed by our job scheduler Slurm.

Creating a batch script

Create a new file and open it with nano myjob.sl

#!/bin/bash -e
#SBATCH --job-name=SerialJob # job name (shows up in the queue)
#SBATCH --time=00:01:00 #Walltime (HH:MM:SS)
#SBATCH --mem=3000 # Memory in MB

pwd #Prints working directory

Copy in the above text and save and exit the text editor with 'ctrl + x'.

Note: if you are a member of multiple accounts you should add the line #SBATCH --account=<projectcode>

Submitting

Jobs are submitted to the scheduler using; 

sbatch myjob.sl

You should receive an output 

Submitted batch job 1748836

Job Queue

The currently queued jobs can be checked using 

squeue

You can filter to just your jobs by adding the flag

squeue -u usr9999

Where 'usr9999' is replaced with your username

 

You can check all jobs submitted by you in the past day using;

sacct

Or since a specified date using;

sacct -S YYYY-MM-DD

Each job will show as multiple lines, one line for the parent job and then additional lines for each job step.

Tips

sacct -X Only show parent processes.

sacct --state=PENDING/RUNNING/FAILED/CANCELLED/TIMEOUT Filter jobs by state.

Canceling

scancel [jobid] Will cancel job described by [jobid], the jobid can be obtained by using sacct or squeue.

Tips

scancel -u [username] Kill all jobs submitted by you.

scancel {[n1]..[n2]} Kill all jobs with an id between [n1] and [n2]

Job Output

On job completion files will be added to the directory the job was submitted to;

slurm-[jobid].out containing standard output.

slurm-[jobid].err containing the standard error.

 

 

Was this article helpful?
4 out of 4 found this helpful