Our Long-Term Storage Service is currently in an Early Access phase, and we encourage researchers using the service to verify their data before deleting it from the project directory (persistent storage) or nobackup directory (temporary storage).
Service Status
The verification options outlined below are intended to support the Early Access phase of Nearline development. Verification options may change as the Early Access Programme continues and as the Nearline service moves into production. We will update our documentation to reflect all such changes.
Your feedback on which verification options you think are necessary will help us decide on future directions for the Nearline service. Please contact our support team to request verification or to offer suggestions regarding this or any other aspect of our Nearline service.
There are several options for verification, depending on the level of assurance you require.
Level 1: Transfer status report
The most basic form of verification is to look at the results of nljobstatus
. If all the Nearline job IDs associated with movement of data to Nearline (i.e. nlput
commands) report job done successfully
, that gives you a basic level of confidence that the files were in fact copied over to nearline.
Warning
The above check is reliable only if all
nlput
commands were concerned solely with uploading new files to nearline. Because of the waynlput
is designed, a command trying to update files that already existed on nearline will silently skip those files and still report success.
Level 2: File counts and sizes
You can get a higher level of assurance by checking the number of files, and their sizes and last modified times, in a particular directory on nearline, and optionally to compare that number and size to the corresponding directory on /nesi/project
or /nesi/nobackup
. We can also enable comparisons of file permissions if requested, though differences in permissions and modification times do not necessarily suggest a problem. If you are interested in verifying file permissions, please contact our support team.
To get a list of file names, sizes and dates in a particular directory on nearline, run the following command with the necessary modifications. Note that the nltraverse
command traverses all subdirectories within your chosen directory, and may therefore take some time if you verify a directory at the top of a complex directory tree.
nltraverse /nesi/nearline/nesi12345/foo_dir
To compare the output against the corresponding directory in /nesi/project
or /nesi/nobackup
, use the following:
find /nesi/nobackup/nesi12345/foo_dir -type f -o -type l -print0 | xargs -0 -I {} ls -l --time-style="+%Y_%m_%d_%H%M%S" {} | awk '{printf "%-17s %16s %-s\n", $6, $5, $7}' | sort -k3,3
find /nesi/project/nesi12345/foo_dir -type f -o -type l -print0 | xargs -0 -I {} ls -l --time-style="+%Y_%m_%d_%H%M%S" {} | awk '{printf "%-17s %16s %-s\n", $6, $5, $7}' | sort -k3,3
By redirecting the outputs of these three commands to files, you can compare the outputs using tools such as diff
or vimdiff
.
Warning
The above check is reliable only if the corresponding files in
/nesi/project
and/or/nesi/nobackup
have not been modified since they were copied to nearline. For this reason, if you want to carry out this level of checking, you should do so as soon as possible after you have established that thenlput
operation completed successfully.
Level 3: Checksums
You can get a still higher level of assurance by asking our support team to run checksums (e.g. SHA256 sums) on files of particular importance. If the checksums come out identical, it is virtually certain that the files contain the same data, even if their modification dates and times are reported differently.
If you would like us to compare checksums of files, please provide a list of the specific files you would like to be verified in this way, according to their paths in /nesi/project
and /nesi/nobackup
. If you include a directory in the list, we will assume that you want checksums of all files in that directory. We will assess your request for feasibility and get back to you if we think you have asked for too many files or too much data to be compared.
Warning
The above check is reliable only if the corresponding files in
/nesi/project
and/or/nesi/nobackup
have not been modified since they were copied to nearline. For this reason, if you want to carry out this level of checking, you should contact our support team to request it as soon as possible after you have established that thenlput
operation completed successfully.Also, this check is very expensive, so we will not be able to do it on large numbers of files or on files that collectively take up a lot of disk space. You should reserve this level of verification for your most valuable research data.