Permanent Cluster-Wide Directories
Data in these directories is available from everywhere in the cluster. Directory contents are persistent, and in many cases are backed up regularly. However, reading from and writing to these directories is relatively slow, and so they are typically better suited for medium-term storage of user and project data than as workspaces for active jobs.
As a user of the Pan cluster, you will have your own home directory. This directory is private (meaning only you and authorised NeSI staff will have access to it), and, unless we arrange otherwise with you, is regularly backed up. We will not ordinarily delete any data in your home directory as long as your cluster account remains active.
Your home directory is useful for the following types of data:
- Configuration files, including dot files (which are ordinarily hidden and have names starting with ".")
- Short shell scripts and other source code, job command templates, etc.
It should not, as a rule, be used for any research data, even if you're the only one working on the project concerned. Instead, we recommend that you either request a project with yourself as the sole member, or explore file permission options if the data relates to a project but needs to be kept secret from other project team members.
You can expect your home directory to have a disk space allocation of 2 GB. Because home directories are not intended to hold research data, we will only grant extra disk space to a home directory in exceptional circumstances.
Every currently active project on the Pan cluster has a project directory. Each project directory is accessible to members of the project team (normally including the project team's adviser) and to other relevant NeSI staff. Data in project directories is backed up regularly except where otherwise arranged.
You may have access to several different project directories, depending on which project teams you belong to.
Project directories are suitable for project data that you intend to keep on the cluster for further work (post-processing, analysis, or input into subsequent jobs), or as a staging area for copying to and from other systems such as a personal workstation or institutional file storage device.
A typical project directory on the Pan cluster will have an initial allocation of 30 GB of disk space and 1,000,000 files (including subdirectories). If at any time you believe you need more disk space or files than your limit at the time, please email our support team at firstname.lastname@example.org.
Temporary Cluster-Wide Directories
We have a parallel file system set up especially as a temporary file system for running jobs. This file system is designed to give faster performance than is available for home or project directories. However, because this increased performance comes at a higher cost per gigabyte of storage space, the temporary file system is relatively small.
The Scratch Directory
The Scratch directory (which can be accessed from running jobs as
$SCRATCH_DIR) is a fast directory whose contents will be automatically
deleted by the scheduler when the job ends or fails. Its ideal use is for data
that does not need to be saved. Data written to the scratch directory should be
strictly temporary, having no value as output or logging even if the job fails,
and not providing any useful troubleshooting data.
The Checkpoint Directory
The Checkpoint directory (which can be accessed from running jobs as
$CHK_DIR) is a fast directory whose contents will be retained for a short
time after the job ends or fails. However, a cleanup program will run at
regular intervals, removing all files in the Checkpoint directory that have not
been modified within the last four weeks (28 days).
The Checkpoint directory is ideally suited for the following kinds of data:
- Job output data, especially output data that is frequently written to as the job progresses, and including log files
- Checkpoint or restart files (these may be known by various names depending on the terminology used by your application)
Temporary Node-Bound Directories
In some cases (notably serial, i.e., single-core, or shared-memory jobs), it can make sense to write your temporary data to a directory on the node on which your job is running. This strategy works well because the directories on the local node can be accessed directly, without involving the network. On the other hand, the storage space on the node is restricted, and its space constraints should be borne in mind whenever local storage space is considered.
Once your job finishes, you will not be able to interactively log in to the
relevant compute node to retrieve any data in node-bound directories. Such
directories should, therefore, only be used for data that can be safely removed
when the job completes, like
$SCRATCH_DIR on the shared file system.
The Temporary Directory
The temporary directory (
$TMP_DIR) is a directory created within
the local node. It resides on a physical disk, and is shared not only with
other jobs that may be running on that node and using its temporary directory,
but also with operating system tasks. It has a total capacity of between 100 GB
and 300 GB, depending on the node.
The Shared-Memory Directory
The shared-memory directory (
$SHM_DIR) is virtual hard disk space on the
local node, created out of the RAM that has been assigned to your job. Because
the data in
$SHM_DIR is stored in memory and never written to a physical hard
disk, reading and writing is very fast. It will also not be contested by other
running processes. On the other hand, the storage space will necessarily be
relatively small (no more than the total memory assigned to your job) and any
memory committed to
$SHM_DIR will be taken away from what your job can use as