MATLAB

MATLAB is a high-level language and interactive environment that enables you to perform computationally intensive tasks faster than with traditional programming languages such as C, C++, and Fortran.

Example script

Simple script.

#!/bin/bash -e
#SBATCH --job-name   MATLAB_job   #Name to appear in squeue 
#SBATCH --time 01:00:00 #Max walltime
#SBATCH --mem 1500 #Max memory

module load MATLAB/2018b
# Run the MATLAB script MATLAB_job.m
matlab -nodisplay < MATLAB_job.m

Extended paths script.

#!/bin/bash -e
#SBATCH --job-name       MATLAB_job    #Name to appear in squeue
#SBATCH --time 06:00:00 #Max walltime
#SBATCH --mem 6000 #Max memory
#SBATCH --cpus-per-task 4 #2 physical cores.
#SBATCH --output %x.log #Location of output log

module load MATLAB/2018b

#Job run
matlab -nodisplay -r "addpath(genpath('../parentDirectory'));myFunction(5,20)"

 

Parallelism

MATLAB does not support MPI therefore #SBATCH --ntasks should always be 1, but if given the necessary resources some MATLAB functions can make use of multiple threads (--cpus-per-task) or GPUs (--gres gpu).

Implicit parallelism.

Implicit parallelism requires no changes to be made in your code. By default MATLAB will utilise multi-threading for a wide range of operations, scalablitiy will vary but generally you will not be able to utilise more than a 4-8 CPUs this way.

Explicit parallelism.

Explicit parallelism is when you write your code specifically to make use of multiple CPU's. This can be done using MATLABs parpool-based language constructs, MATLAB assigns each thread a 'worker' that can be assigned sections of code.

MATLAB will make temporary files under your home directory (in ~/.matlab/local_cluster_jobs) for communication with worker processes. To prevent simultaneous parallel MATLAB jobs from interfering with each other you should tell them to each use their own job-specific local directories:

pc = parcluster('local')
pc.JobStorageLocation = getenv('TMPDIR')
parpool(pc, str2num(getenv('SLURM_CPUS_PER_TASK')))

Note

Parpool will throw a warning when started due to a difference in how time zone is specified. To fix this, add the following line to your SLURM script: export TZ="Pacific/Auckland'

 The main ways to make use of parpool are;

parfor: Executes each iteration of a loop on a different worker. e.g.

parfor i=1:100

%Your operation here.

end

parfor operates similarly to a SLURM job array and must be embarrassingly parallel. Therefore all variables either need to be defined locally (used internally within one iteration of the loop), or static (not changing during execution of loop).

More info here.

parfeval:

parfeval is used to assign a particular function to a thread, allowing it to be run asynchronously. e.g.

my_coroutine=parfeval(@my_async_function,2,in1,in2);

% Do something that doesn't require outputs from 'my_async_function'

[out1, out2]=fetchOutputs(my_coroutine); % If 'my_coroutine' has not finished execution will pause.

function [out1,out2]=my_async_function(in1,in2)

%Your operation here.

end

fetchOutputs is used to retrieve the values.

More info here.


Determining which of these categories your variables fall under is a good place to start when attempting to parallelise your code.

Note

If your code is parallel at a high level it is preferable to use SLURM job arrays as there is less computational overhead and the multiple smaller jobs will queue faster.

Labels: mahuika tier1 engineering general app
Was this article helpful?
1 out of 1 found this helpful