NeSI File Systems and Quotas

Māui and Mahuika, along with all the ancillary nodes, share access to the same Spectrum Scale file systems. (Spectrum Scale was previously known as GPFS, or General Parallel File System.)

The table below indicates the disk space and file count allocations that are given by default in /home, /nesi/project and /nesi/nobackup filesystems.

You may query your actual usage and disk allocations using the following command:

$ nn_storage_quota

The values for 'nn_storage_quota' are updated approximately every hour and cached between updates.

Specifications of filesystems available to or planned for users of Māui and Mahuika

Filesystem

/home

/nesi/project

/nesi/nobackup

/nesi/nearline (not yet available)

Default disk space quota

20 GB per user

100 GB per project

No limit1

0 TB (allocations will be made to project teams on a case by case basis)

Default file count (inode) quota

100,000 files

100,000 files

1,000,000 files

No limit

Intended use

User-specific files such as configuration files, environment setup, source code, etc.

Persistent project-related data, project-related software, etc.

Data created or used by compute jobs that is intended to be temporary

Long-term archive storage

Total capacity

175 TB

1,590 TB

4,400 TB

Will grow as tapes are purchased

Data retention time

180 days after the user ceases to be a member of any active project

90 days after the end of the project's last HPC Compute & Analytics allocation

With certain exceptions, individual files will be deleted after being untouched for 120 days. See Automatic cleaning of nobackup file system for more information.

180 days after the end of the project's last nearline storage allocation

Data backup schedule (apart from snapshots)

Daily, last 10 versions of any given file retained for up to 90 days.

Daily, last 10 versions of any given file retained for up to 90 days.

None

Replicated between Wellington and Auckland tape libraries

Snapshots

Daily (retention period 7 days)

None

Weekly (retention period 28 days)

None

Access speed

Moderate

Moderate

Fast

Slow

Access interfaces

  • Native Spectrum Scale mounts
  • SCP
  • Globus data transfer
  • Native Spectrum Scale mounts
  • SCP
  • Native Spectrum Scale mounts
  • SCP
  • Globus data transfer

Nearline commands

1. Within reason, we may ask you to delete excessively large files.

Notes:

  • If you need to compile or install a software package that is large or is intended for use by a project team, please build it in /nesi/project/<project_code> rather than /home/<username>.
  • As the /nesi/nobackup file system provides the highest performance, input files should be moved or copied to this file system before starting any job that makes use of them. Likewise, job scripts should be written so as to write output files to the /nesi/nobackup file system. If you wish to keep your data for the long term, you can include as a final part of your job script an operation to copy or move the output data to the /nesi/project file system.
  • Keep in mind that data on /nesi/nobackup is not backed up, therefore users are advised to move valuable data to /nesi/project/<project_code>, or, if the data is seldom used, to other storage such as an institutional storage facility, as soon as batch jobs are completed. Please do not use the touch command to prevent the cleaning policy from removing files, because this behaviour would deprive the community of a shared resource.
  • If you have accidentally deleted data, first check if it is present in a recent snapshot. If you cannot find it in a snapshot, please ask us to recover it for you by emailing NeSI Support.

/home

This file system is accessible from login, compute and ancillary nodes. Users should not run jobs from this filesystem. All home directories are backed up daily, both via the Spectrum Protect backup system, which retains the last 10 versions of all files for up to 90 days, and via Spectrum Scale snapshots. No cleaning policy will be applied to your home directory as long as your My NeSI account is active and you are a member of at least one active project.

/nesi/project

This filesystem is accessible from all login, compute and ancillary nodes. Contents are backed up daily, via the Spectrum Protect backup system, which retains the last 10 versions of all files for 90 days. No cleaning policy is applied.

It provides storage space for datasets, shared code or configuration scripts that need to be accessed by users within a project, and potentially by other projects. Read and write performance increases using larger files, therefore you should consider archiving small files with the tar utility.

Each NeSI project receives quota allocations for /nesi/project/<project_code>, based on the requirements specified in the Project Proposal Application, and separately covering disk space and number of files.

/nesi/nobackup

The /nesi/nobackup file system has the highest performance of all NeSI file systems, with greater than 140 GB/s bandwidth from compute nodes to disk. It provides access to a very large (4.4 PB) resource for short-term project usage. The only quota applied is the number of files that may be stored in a project's nobackup directory (e.g. in /nesi/nobackup/nesi12345) as specified in the table.

To ensure this file system remains fit-for-purpose, we will shortly commence a regular cleaning policy as described in Automatic cleaning of nobackup filesystem.

Do not use the touch command or an equivalent to prevent the cleaning policy from removing unused files, because this behaviour would deprive the community of a shared resource.

The purpose of this policy is to ensure that any user will be able to analyse datasets up to 1 PB in size.

/nesi/nearline

Note

The nearline service, including its associated file systems, is on the NeSI road map but not yet available. We appreciate your patience as we develop, test and deploy this service.

The /nesi/nearline filesystem will be a data cache for the Hierarchical Storage Management System, which automatically manages the movement of files between high performance disk storage and magnetic tape storage in an Automatic Tape Library (ATL). We envisage that stub files for all data on tape will remain on disk.

The /nesi/nearline filesystem is not yet accessible to users, pending completion of a broker service which will help to ensure reasonable quality-of-service, efficient use of library resources, and provide quota management for the tape library. This service will allow users to:

  • list the data they have stored on /nesi/nearline
  • move (put) data from other NeSI filesystems onto /nesi/nearline and
  • recover (get) data back from /nesi/nearline onto other NeSI filesystems.

The minimum size of a file that may be put on /nesi/nearline is 5 MB. Accordingly, you should combine small files into an archive such as a tarball or zip file before requesting that they be moved to /nesi/nearline.

If the data requested (via a get) is still in the /nesi/nearline cache, data recovery is fast. If it needs to be restored from the tape library, there will be a delay while the relevant tape (or tapes) are located and loaded into the tape drive(s) and the files are copied back to disk.

Contributions of Small Files Towards Quotas

The Spectrum Scale file system makes use of a feature called data-in-inode. This feature will ensure that, once all of a (non-encrypted) file's required metadata has been written to our metadata storage, if all the file's data is able to fit within the file's remaining inode space (4 KiB minus metadata), it will be written there instead of to the data storage.

For files larger than 4 KiB (minus the space needed to store the file's metadata), the data written to disk will be stored in one or more sub-blocks of 256 KiB each (which are 1/32 of the file system Block Size), and the "size" allocated on disk will be rounded up to the nearest 256 KiB. Users or projects requiring many small files may find themselves using large amounts of disk space. Use of data-in-inode mitigates the effect of a large block size on such people and project teams.

However, very small files, as well as zero-size entities such as directories and symbolic links, still count towards the relevant fileset's inode quota. If therefore you expect you will need to store large numbers of very small files in your home directory or in a project's persistent storage, please contact our support team to discuss your storage needs.

Labels: info mahuika storage maui quota
Was this article helpful?
1 out of 1 found this helpful