NeSI File Systems and Quotas

Māui and Mahuika, along with all the ancillary nodes, share access to the same Spectrum Scale file systems. Note: Spectrum Scale was previously known as GPFS, or General Parallel File System. In cases where the default quotas for the /home and /nesi/project filesystems are unsuitable, they will be adjusted to meet requirements. The table below indicates the range of default allocations available in /home, /nesi/project and /nesi/nearline filesystems.

Table 1: Specifications of Filesystems available to users of Māui and Mahuika.

Filesystem

/home

/nesi/project

/nesi/nobackup

/nesi/nearline

Default disk space quota

20 GB

100 GB (per project)

No limit

No limit

Default file count quota

100,000 files

100,000 files

1,000,000 files

500,000 files, each no smaller than 5 MB

Intended use

User-specific files such as configuration files, environment setup, source code, etc.

Persistent project-related data

Data created by compute jobs that is intended to be temporary

Long-term archive storage

Total capacity

175 TB

1,590 TB

4,400 TB

>100 PB (media funded by projects)

Expiry

When the user is no longer a member of any active project

90 days after the end of the project

When the Librarian service is available files will deleted after being untouched for 60 days (or earlier if space is required)3

365 days after the end of the project (unless agreed otherwise)

Data Backup

Daily, last 10 versions of any given file retained for up to 90 days.

Daily, last 10 versions of any given file retained for up to 90 days.

None

Replicated to offsite tape library

Snapshots

Daily (retention period, 7 days)

None

None

None

Access Speed

Moderate

Moderate

Fast

Slow. Only accessible via the Librarian Service.

Notes:

  1. Please build big software projects that will not fit in /home in /nesi/project/<project-id>. Likewise, if the software is for shared use within a Project.
  2. As the /nesi/nobackup file system provides the highest performance, input files should be moved or copied to this file system before starting any job that makes use of them. Likewise, job scripts should be written so as to write output files to the /nesi/nobackup file system. If you wish to keep your data for the long term, you can include as a final part of your job script an operation to copy or move the output data to the /nesi/project file system.
  3. Keep in mind that data on /nesi/nobackup is not backed up, therefore users are advised to move valuable data to the /nesi/project filesystem (or to /nesi/nearline via the Librarian[Coming Soon]) as soon as batch jobs are completed. Please do not use the touch command to prevent the cleaning policy from removing files, because this behaviour would deprive the community of a shared resource.  However, until the Librarian Service is available no files will be deleted without discussion with the Project Owner.
  4. If you have accidentally deleted data, first check if it is present in a recent snapshot. If you cannot find it in a snapshot, please ask us to recover it for you by emailing NeSI Support

/home

This file system is accessible from login, compute and ancillary nodes. Users should not run jobs from this filesystem. All home directories are backed up daily, both via the Spectrum Protect backup system, which retains the last 10 versions of all files for up to 90 days, and via Spectrum Scale snapshots. No cleaning policy is applied.

You cannot exceed your space of inode (100,000 files) quotas

/nesi/project

This filesystem is accessible from all login, compute and ancillary nodes. Data are backed up daily, via the Spectrum Protect backup system, which retains the last 10 versions of all files for 90 days. No cleaning policy is applied.

It provides intermediate storage space for datasets, shared code or configuration scripts that need to be accessed by users within a project, and potentially by other projects. Read and write performance increases using larger files, therefore you should consider archiving small files with the tar utility.

All NeSI projects will have a file quota allocation for /nesi/project/<project-id>, based on the requirements specified in the Project Proposal Application. Each project-id folder has a quota space allocated that allows a maximum of 20,000 files per TB of disk space.

/nesi/nobackup

The /nesi/nobackup filesystem has the highest performance of all NeSI filesystems, with greater than 140GB/s bandwidth from compute nodes to disk. It provides access to a very large (4.4PB) resource for short term project usage. The only quota applied is the number of files that may be stored in a project folder (e.g. in /nesi/nobackup/uoa09090) as specified in the table.

To ensure this filesystem remains fit-for-purpose, the following data management policy will be applied once the Librarian Service is available:

  • Each day:
    1. All files older than the age specified in the table above will be automatically deleted.
    2. Files belonging to expired Projects will be deleted.
  • When /nesi/nobackup use reaches 75% of capacity, files will be deleted until used capacity falls to 50%. Oldest files will be deleted first.
  • If a race condition occurs (with more than 3 users generating or copying ~1PB of data into /nesi/nobackup) then this policy will need to be revised.

Please, do not use the touch command to prevent the cleaning policy from removing files, because this behaviour would deprive the community of a shared resource.

The purpose of this policy is to ensure that any user will be able to analyse datasets up to 1PB in size.

/nesi/nearline

The /nesi/nearline filesystem is a data cache for the Hierarchical Storage Management System, which automatically manages the movement of files between high performance disk storage and magnetic tape storage in an Automatic Tape Library. Stub files for all data on tape remain on disk.

Data can be moved (from any other filesystem) to /nesi/nearline via the “Librarian Service” [Coming Soon] which allows users to:

  • list the data they have stored on /nesi/nearline
  • move (put) data on /nesi/nearline and
  • recover (get) data back from /nesi/nearline.

The minimum size of a file that may be put on /nesi/nearline is 5 MB. Accordingly, you should combine small files into an archive such as a tarball or zip file before requesting that they be moved to /nesi/nearline.

If the data requested (via a get) is still in the /nesi/nearline cache, data recovery is fast. If it needs to be restored from the tape library, there will be a delay while the relevant tape (or tapes) are located and loaded into the tape drive(s) and the data are copied back to disk. The read speed (per tape drive) will be of the order of 300 MB/s. Multiple tape drives will be used when recovering datasets from more than one cartridge.

Contributions of Small Files Towards Quotas

The Spectrum Scale file system makes use of a feature called data-in-inode. This feature will ensure that, once all of a (non-encrypted) file's required metadata has been written to our metadata storage, if all the file's data is able to fit within the file's remaining inode space (4 KiB minus metadata), it will be written there instead of to the data storage.

For files larger than 4KiB (minus the space needed to store the file's metadata), the data written to disk will be stored in one or more sub-blocks of 256 KiB each (which are 1/32 of the file system Block Size), and the "size" allocated on disk will be rounded up to the nearest 256 KiB. Users or projects requiring many small files may find themselves using large amounts of disk space. Use of data-in-inode mitigates the effect of a large block size on such people and project teams.

However, very small files, as well as zero-size entities such as directories and symbolic links, still count towards the relevant fileset's inode quota. If therefore you expect you will need to store large numbers of very small files in your home directory or in a project's persistent storage, please contact our support team to discuss your storage needs.

Labels: info mahuika storage maui quota
Was this article helpful?
1 out of 1 found this helpful