Automatic cleaning of nobackup file system

Service Status

The staged rollout of automatic cleaning of the nobackup file system will resume on Tuesday 24 March. Further information about this process is detailed below. Please contact us if you have any questions.

The automatic cleaning feature is a programme of regular deletion of selected files from project directories in our nobackup file system. We do this to optimise the availability of this file system for active research computing workloads and to ensure NeSI can reliably support large-scale compute and analytics workflows.

Files are deleted if they meet all of the following criteria:

  • The file is in a nobackup project directory that takes up at least 1 TB of disk space.
  • The file was first created more than 120 days ago, and has not been accessed, and neither its data nor its metadata has been modified, for at least 120 days.
  • The file was identified as a candidate for deletion two weeks previously, and as such is listed in a the project's nobackup .policy directory.

Note

If your project's nobackup directory contains more than 1 TB of data when the notification scan is run, and again contains more than 1 TB of data when the deletion scan is run, files in that directory that meet the other criteria will still be liable to deletion even if the directory dropped below 1 TB in between the scans.

The general process will follow a schedule as follows:

  • Notify (at 106 days), then two weeks later Delete (at 120 days).
  • Every fortnight on Tuesday morning, we will be reviewing files stored in the nobackup filesystem and identifying candidates for expiry.
  • Project teams will be notified by email if they have file candidates for deletion. Emails will be sent two weeks in advance of any deletion taking place.

    Warning

    Due to the nature of email, we cannot guarantee that any particular email message will be successfully delivered and received, for instance our emails could be blocked by your mail server or your inbox could be too full. We suggest that you check /nesi/nobackup/<project_code>/.policy (see below) for a list of deletion candidates, for each of your projects, whether you received an email from us or not.

  • Immediately after deletion is complete, a new set of candidate files will be identified for expiry during the next automated cleanup. Every file that has not been created, accessed or modified within the last 106 days, and that is in a nobackup directory taking up at least 1 TB of disk space, is identified as a candidate.

Candidates for future deletion, along with the date of the next planned deletion, are recorded in a directory entry called .policy/to_delete inside the project's nobackup directory. For example, the candidates for future deletion from the directory /nesi/nobackup/nesi12345 are recorded in /nesi/nobackup/nesi12345/.policy/to_delete/<date>.filelist.gz. Project team members are able to view the contents of .policy (but not delete or modify those contents). The gzip compressed filelist can be viewed and searched with the zless and zgrep commands respectivel, e.g., zless /nesi/nobackup/nesi12345/.policy/to_delete/<date>.filelist.gz. .policy is overwritten every two weeks with a new deletion date and list of deletion candidates.

Warning

Objects other than files, such as directories and symbolic links, are not deleted under this policy, even if at deletion time they are empty, broken, or otherwise redundant. These entities typically take up no disk space apart from a small amount of metadata, but still count towards the project's inode (file count) quota.

What should I do with expiring data on the nobackup filesystem?

If the data is transient and no longer required for continued processing on NeSI then we would appreciate if you deleted it yourself, but you can also let the automated process do this.

If you have files identified as candidates for deletion that you need to keep beyond the scheduled expiry date, you have four options:

  • Move the file to your persistent project directory, e.g., /nesi/project/nesi12345. You may need to request more disk space, more inodes, or both, in your persistent project directory before you can do this. Submit a Support request. We assess such requests on a case-by-case basis.
  • Move or copy the file to a storage system outside NeSI, for example a research storage device at your institution. We expect most projects to do this for finalised output data and appreciate prompt egress of data once it is no longer used for processing.
  • Delete or move other files in your project's nobackup directory and avoid creating large amounts of new data there, so that the total space taken by the nobackup directory is less than 1 TB at the time candidates for deletion are next identified. If you do this, no files will be automatically deleted from your project's nobackup directory during the next two cleanup runs.
  • Access or modify the file before the deletion date, in which case the file will not be deleted even though it is listed in .policy. This must only be done in cases where you expect to begin active use of the data again within the next month.

    Warning

    Doing this for large numbers of files, or for files that together take up a large amount of disk space, in your project's nobackup directory, without regard for your project's computational activity, constitutes a breach of NeSI's acceptable use policy.

Where should I put my data?

How often will my team's HPC jobs be accessing the data? How often will my team's HPC jobs be modifying the data?  Recommended option 
Often Often (at least once every two months) Leave in the nobackup directory (but ensure key result data is copied to the persistent project directory)
Often Seldom Put in the persistent project directory
Seldom Seldom

Store the data elsewhere (e.g. at your institution)

In general, the persistent project directory should be used for reference data, tools, and job submission and management scripts. The nobackup directory should be used for holding large reference working datasets (e.g., an extraction of compressed input data) and as a destination for writing and modifying temporary data. It can also be used to build and edit code, provided that the code is under version control and changes are regularly checked into upstream revision control systems.

If I need a file that was deleted from nobackup, what should I do?

Please contact our support team as soon as possible after you find that the file is missing. To reduce the risk of this outcome again in future, please contact us in advance so that we can discuss your data storage options with you.

I have research data on nobackup that I can't store in my project directory or at my institution right now. What should I do?

Please contact our support team without delay so we can discuss your short- and medium-term data storage needs. Our intention is to work with you to move your valuable data to an appropriate combination of:

  • persistent project storage on NeSI,
  • high performance /nobackup storage (temporary scratch space) on NeSI,
  • slow nearline storage (not released yet, on our roadmap), and 
  • institutional storage infrastructure.

User Webinars

On 14 and 26 November 2019, we hosted webinars to explain these upcoming changes and answer user questions. If you missed these sessions, the archived materials are available at the links below:

 

Was this article helpful?
0 out of 0 found this helpful