Nearline Storage

                                                                                                                                                                      

Service Status

The Nearline Storage service is in an Early Access Programme (EAP) phase and not yet in production. A few selected users will be given access for testing purposes. The functionality of the tool and syntax of the commands may change in future. The data migrated to the nearline system is not guaranteed to stay there persistently and may be removed at any time during testing. Please retain a copy of your data in your project or nobackup directory.

Please send feedback about your user experience at https://support.nesi.org.nz/hc/requests/new, which may include functionality issues, intuitive or counter-intuitive behaviours, behaviours or features that you like, suggestions for improvements, transfers taking too long, etc.

Nearline Storage service

NeSI's Nearline Storage service allows you to store your data on our hierarchical system, which consists of a staging area (disk) connected to a tape library. Users of this service gain access to more persistent storage space for their research data, in return for slower access to those files that are stored on tape. We recommend that you use this service for larger datasets that you will only need to access occasionally. The retrieval of data may be delayed, due to tape handling.

Nearline is intended for use with relatively large files and should not be used for a large number of small files. In fact, files smaller than a certain threshold size may not be moved to tape at all. Files smaller than ~100 MB should be combined into archive files (tarballs) using tar or a similar tool.

Note

The existing directory structure starting after /nesi/project/<projectID>/ or /nesi/nobackup/<projectID>/ will be mapped onto /nesi/nearline/<projectID>/. While retrieving data, the whole directory structure after /nesi/nearline/<projectID> will be mapped into the target directory. See details below for details.

A Nearline project gets locked when writing to or deleting from it. Until this process is finished no other write or delete operation can be performed on the same project and the user will see a status message "project locked by none".

What you can do

The client allows you to carry out the following operations:

  • View files: View list of files stored in nearline.
  • Put: Copy files from your project or nobackup folder into nearline.
  • Get: Retrieve files from nearline into your project or nobackup folder, without deleting them from nearline.
  • Purge: Delete files stored in nearline.
  • View job status: View a list of jobs (put/get/purge) you have run, along with their status.
  • View quota: View your nearline quota and usage.

Getting started

Nearline has a common tool for access, with a set of nl commands, which are accessible by loading the following module:

module load nearline/.20190710

View files

With the following command, you can print the list of files and directories within the specified nearline directory:

nlls /nesi/nearline/<projectID>

OR e.g.

nlls /nesi/nearline/<projectID>/path/to/results/

Furthermore, you can use the additional option -l to get the detailed list including mode, owner, group, filesize, and timestamp. The option -ls, an alternative to -l, will additionally show each file's migration status.

$ nlls -ls /nesi/nearline/<projectID>/results/
mode        s  owner               group      filesize    timestamp    filename
___________________________________________________________________________________________________________________________
-rw-rw----+ r  userName        nesi12345      33.93 MB       Jun 17    file1.tar.gz
-rw-rw----+ r  userName        nesi12345      33.93 MB       Jun 17    file2.tar.gz
-rw-rw----+ r  userName        nesi12345      34.03 MB       Jun 17    file3.tar.gz

Status ("s" column of the -ls output) legend:

  • migrated (m) - data of a specific nearline file is on tape (does not necessarily mean that the file is replicated across sites)
  • pre-migrated (p) - data of a specific nearline file is on both the staging filesystem and the tape.
  • resident (r) - data of a specific nearline file is only on the staging filesystem.

Warning

the option -ls shows only files, no directories.

Put

Data can be copied to nearline using the nlput command. The syntax is:

nlput <projectID> <src_dir> or <filelist>

The source data needs to be located under /nesi/projects/ or /nesi/nobackup/.

The data will be mapped into the same directory structure under /nesi/nearline/ (see below).

The recommended file size to archive is between 1 GB and 1 TB.

Warning

nlput takes only a directory or a filelist. A single file is treated as a filelist and read line by line, searching for valid file names. Single files can only be migrated using a filelist containing the full path of the file to be transferred.

Files and directories are checked for existence and only new files are transferred to nearline. Files will not be updated with newer source files. Thus, files that already exist on nearline (either tape or staging disk) will be neglected in the migration process without notification.

Put - directory

All files and subdirectories within a specified directory will be transferred into nearline. The target location maps with the source location. As an example:

nlput nesi12345 /nesi/nobackup/nesi12345/To/Archive/Results/

will copy all data within the Results directory into /nesi/nearline/nesi12345/To/Archive/Results/.

Warning

If you put /nesi/project/nesi12345/To/Archive/Results/ on nearline as well as /nesi/nobackup/nesi12345/To/Archive/Results/, the contents of both source locations (project and nobackup) will be merged into /nesi/nearline/nesi12345/To/Archive/Results/. Within /nesi/nearline/nesi12345/, files with the same name and path will be skipped.

Put - filelist

The filelist is a file containing a list of files to be transferred. It can specify only one file per line and directories are ignored.

The target location will again map with the source location, see above.

The filelist needs to be located in a project or nobackup directory.

Update

As a good practice, please migrate complete only large files (tarballs or files that are individually large), or directories containing exclusively large files. Do not modify a file in the source (nobackup or project) directory once it has been copied to nearline. Please keep a copy of your data on one of our online file systems (project or nobackup directory) during this early access testing phase.

If you need to update data on the nearline file system with a newer version of data from nobackup or project:

  1. compare content of the source and target directories (worse case file by file)
  2. remove the older files on nearline and
  3. copy the newer files to the nearline file system

Get

Data can be retrieved from nearline using then nlget command. The syntax is:

nlget <projectID> { <src_dir> | <filelist> } <dest_dir> [ --nowait ]

Similar to nlput (see above), nlget accepts a directory src_dir (no single files accepted) or a file list filelist, defining the source of the data to be retrieved from nearline.

The destination dest_dir needs to be defined. The whole directory structure after /nesi/nearline/<project_name> will be created at the destination and the specified data written into it. For example,

nlget nesi00000 /nesi/nearline/nesi00000/dir/to/results/ /nesi/nobackup/

will create the directory structure /nesi/nobackup/nesi00000/dir/to/results/ if that directory structure does not already exist, and copy the data within the Results directory into it.

Files already existing in the destination directory will not be overwritten. A copy of the file will, however, remain on nearline until purged.

Warning

nlget takes only one directory or one file list. Single files are treated as a file list and read line by line, searching for valid file names. A single file can only be retrieved using a filelist specifying the full path of the file to be retrieved.

Purge

The nlpurge command deletes specified data on the nearline file system permanently. The syntax is

nlpurge <projectID> { <src_dir> | <filelist> }

A directory src_dir (no single files accepted) or a file list filelist needs to be specified (see nlput above).

View job status

The tool nljobstatus provides current status of submitted (queued, running and completed) tasks. The syntax is:

nljobstatus [ -j <jobid> ]

If no job ID is specified the full list of submitted jobs is returned. In this list, each job looks like the following:


$ nljobstatus
+----------+------------+----------------------------+-----------+-------------+
|  Jobid   | Project ID |         Job Status         | Job Host  |  Job User   |
+----------+------------+----------------------------+-----------+-------------+
| 4e23f517 |     13     |   job done successfully    | librarian | userName    |
| -dfef-40 |            |                            |           |             |
| e9-a83c- |            |                            |           |             |
| 3da78b06 |            |                            |           |             |
|   0310   |            |                            |           |             |
+----------+------------+----------------------------+-----------+-------------+

With the -j flag and a job identifier jobid, information for a specific job can be listed:

$ nljobstatus -j 4e23f517-dfef-40e9-a83c-3da78b060310
+--------------------------------------+
|                Jobid                 |
+--------------------------------------+
| 4e23f517-dfef-40e9-a83c-3da78b060310 |
+--------------------------------------+
+------------+-----------------------+-----------+-------------+
| Project ID |      Job Status       | Job Host  |  Job User   |
+------------+-----------------------+-----------+-------------+
|     13     | job done successfully | librarian | userName    |
+------------+-----------------------+-----------+-------------+
+---------------------+---------------------+---------------------+
|   Job Start Time    |   Job Update Time   |    Job End Time     |
+---------------------+---------------------+---------------------+
| 2019-09-13T03:11:22 | 2019-09-13T03:11:44 | 2019-09-13T03:11:45 |
+---------------------+---------------------+---------------------+

If an nlput or nlpurge is running in that project, the project is locked until the task is finished.

If a job stays in one state for an unexpectedly long time, please contact NeSI Support.

View quota

With the command nlquotalist, the usage and limits of a nearline project quota can be listed:

nlquotalist <projectID>

The output looks like:

$ nlquotalist nesi12345
Projectname                                       Available           Used                Inodes         IUsed
___________________________________________________________________________________________________________________________
nesi12345                                         30.00 TB            27.16 TB            1000000        412

This quota is different from the project quota on GPFS (/nesi/project/<projectID>).

Data management

In case you have the same directory structure on your project and nobackup directories, be careful when archiving data from both. They will be merged in the nearline file system. Further, when retrieving data from nearline, keep in mind that the directory structure up to your projectID will be retrieved:

librarian_get_put.jpeg

Underlying mechanism

The nearline file system consists of two parts: Disk, mainly for buffering data, and the tape library. It consists of a client running on the login/compute node and the backend on the nearline file system. It is important to know that even if you cancel a client process, the corresponding backend process keeps scheduled or running until finished.

The process of what data goes into tape and when is automated, and is not something you will have control over. The service is designed to optimise interaction with the nearline filesystem and avoid problem workloads for the benefit of all users.

If your files are on tape, it will take time to retrieve them. Access to tape readers is on a first come first served basis, and the amount of wait time will vary dramatically depending on overall usage. We cannot guarantee access to your files within any particular timeframe, and indeed wait times could be hours or even in some cases more than a day.

Support contact

Please contact our support team with any queries or concerns you may have regarding this service. We welcome feedback from our users.

Was this article helpful?
0 out of 0 found this helpful