Automatic cleaning of nobackup file system

Service Status

The staged rollout of automatic cleaning of the nobackup file system resumed on Tuesday 24 March 2020, and several dozen projects are now enrolled. If your project is already enrolled, you will have heard from a member of our team, and should also be receiving automatic emails whenever your project data on nobackup is at risk of automatic removal.

Starting on 1 July 2020:

  • all new projects will be automatically enrolled as we provision them.
  • existing projects with exemptions will be re-enrolled when the reason for the exemption no longer applies. We will contact each of these project teams at the time.
  • existing projects that are not already enrolled or exempted will be progressively enrolled as they are issued new allocations of compute time.
Further information about this process is detailed below. Please contact us if you have any questions.

NOTE: We have removed the 1 TB threshold for automatic cleaning. Old files in enrolled projects will be removed even if the project takes up less than 1 TB of space on the nobackup filesystem.

The automatic cleaning feature is a programme of regular deletion of selected files from project directories in our nobackup file system. We do this to optimise the availability of this file system for active research computing workloads and to ensure NeSI can reliably support large-scale compute and analytics workflows.

Files are deleted if they meet all of the following criteria:

  • The file was first created more than 120 days ago, and has not been accessed, and neither its data nor its metadata has been modified, for at least 120 days.
  • The file was identified as a candidate for deletion two weeks previously, and as such is listed in a the project's nobackup .policy directory.

The general process will follow a schedule as follows:

  • Notify (at 106 days), then two weeks later Delete (at 120 days).
  • Every fortnight on Tuesday morning, we will be reviewing files stored in the nobackup filesystem and identifying candidates for expiry.
  • Project teams will be notified by email if they have file candidates for deletion. Emails will be sent two weeks in advance of any deletion taking place.

    Warning

    Due to the nature of email, we cannot guarantee that any particular email message will be successfully delivered and received, for instance our emails could be blocked by your mail server or your inbox could be too full. We suggest that you check /nesi/nobackup/<project_code>/.policy (see below) for a list of deletion candidates, for each of your projects, whether you received an email from us or not.

  • Immediately after deletion is complete, a new set of candidate files will be identified for expiry during the next automated cleanup. These candidate files are all files within the project's nobackup that have not been created, accessed or modified within the last 106 days.

A file containing the list of candidates for deletion during the next cleanup, along with the date of the next cleanup, will be created in a directory called .policy/to_delete inside the project's nobackup directory. For example, the candidates for future deletion from the directory /nesi/nobackup/nesi12345 are recorded in /nesi/nobackup/nesi12345/.policy/to_delete/<date>.filelist.gz. Project team members are able to view the contents of .policy (but not delete or modify those contents). The gzip compressed filelist can be viewed and searched with the zless and zgrep commands respectively, e.g., zless /nesi/nobackup/nesi12345/.policy/to_delete/<date>.filelist.gz.

Warning

Objects other than files, such as directories and symbolic links, are not deleted under this policy, even if at deletion time they are empty, broken, or otherwise redundant. These entities typically take up no disk space apart from a small amount of metadata, but still count towards the project's inode (file count) quota.

What should I do with expiring data on the nobackup filesystem?

If the data is transient and no longer required for continued processing on NeSI then we would appreciate if you deleted it yourself, but you can also let the automated process do this.

If you have files identified as candidates for deletion that you need to keep beyond the scheduled expiry date, you have four options:

  • Move the file to your persistent project directory, e.g., /nesi/project/nesi12345. You may need to request more disk space, more inodes, or both, in your persistent project directory before you can do this. Submit a Support request. We assess such requests on a case-by-case basis.
  • Move or copy the file to a storage system outside NeSI, for example a research storage device at your institution. We expect most projects to do this for finalised output data and appreciate prompt egress of data once it is no longer used for processing.
  • Access or modify the file before the deletion date, in which case the file will not be deleted even though it is listed in .policy. This must only be done in cases where you expect to begin active use of the data again within the next month.

    Warning

    Doing this for large numbers of files, or for files that together take up a large amount of disk space, in your project's nobackup directory, without regard for your project's computational activity, constitutes a breach of NeSI's acceptable use policy.

Where should I put my data?

How often will my team's HPC jobs be accessing the data? How often will my team's HPC jobs be modifying the data?  Recommended option 
Often Often (at least once every two months) Leave in the nobackup directory (but ensure key result data is copied to the persistent project directory)
Often Seldom Put in the persistent project directory
Seldom Seldom

Store the data elsewhere (e.g. at your institution)

In general, the persistent project directory should be used for reference data, tools, and job submission and management scripts. The nobackup directory should be used for holding large reference working datasets (e.g., an extraction of compressed input data) and as a destination for writing and modifying temporary data. It can also be used to build and edit code, provided that the code is under version control and changes are regularly checked into upstream revision control systems.

If I need a file that was deleted from nobackup, what should I do?

Please contact our support team as soon as possible after you find that the file is missing. To reduce the risk of this outcome again in future, please contact us in advance so that we can discuss your data storage options with you.

I have research data on nobackup that I can't store in my project directory or at my institution right now. What should I do?

Please contact our support team without delay so we can discuss your short- and medium-term data storage needs. Our intention is to work with you to move your valuable data to an appropriate combination of:

  • persistent project storage on NeSI,
  • high performance /nobackup storage (temporary scratch space) on NeSI,
  • slow nearline storage (not released yet, on our roadmap), and 
  • institutional storage infrastructure.

User Webinars

On 14 and 26 November 2019, we hosted webinars to explain these upcoming changes and answer user questions. If you missed these sessions, the archived materials are available at the links below:

 

Was this article helpful?
1 out of 1 found this helpful