Verifying uploads to Nearline storage

Our Nearline data storage service is currently in an Early Access phase, and we encourage researchers using the service to verify their data before deleting it from the project directory (persistent storage) or nobackup directory (temporary storage).

Service Status

The verification options outlined below are intended to support the Early Access phase of Nearline development. Verification options may change as the Early Access Programme continues and as the Nearline service moves into production. We will update our documentation to reflect all such changes.

Your feedback on which verification options you think are necessary will help us decide on future directions for the Nearline service. Please contact our support team to request verification or to offer suggestions regarding this or any other aspect of our Nearline service.

There are several options for verification, depending on the level of assurance you require.

Level 1: Transfer status report

The most basic form of verification is to look at the results of nljobstatus. If all the Nearline job IDs associated with movement of data to Nearline (i.e. nlput commands) report job done successfully, that gives you a basic level of confidence that the files were in fact copied over to nearline.

Warning

The above check is reliable only if all nlput commands were concerned solely with uploading new files to nearline. Because of the way nlput is designed, a command trying to update files that already existed on nearline will silently skip those files and still report success.

Level 2: File counts and sizes

You can get a higher level of assurance by checking the number of files, and their sizes and last modified times, in a particular directory on nearline, and optionally to compare that number and size to the corresponding directory on /nesi/project or /nesi/nobackup. We can also enable comparisons of file permissions if requested, though differences in permissions and modification times do not necessarily suggest a problem. If you are interested in verifying file permissions, please contact our support team.

To get a list of file names, sizes and dates in a particular directory on nearline, run the following command with the necessary modifications. Note that the nltraverse command traverses all subdirectories within your chosen directory, and may therefore take some time if you verify a directory at the top of a complex directory tree.

nltraverse /nesi/nearline/nesi12345/foo_dir

To compare the output against the corresponding directory in /nesi/project or /nesi/nobackup, use the following:

find /nesi/nobackup/nesi12345/foo_dir -type f -print0 | xargs -0 -I {} ls -lh --time-style="+%Y_%m_%d_%H%M%S" {} | awk '{printf "%-17s   %16s   %-s\n", $6, $5, $7}' | sort -k3,3
find /nesi/project/nesi12345/foo_dir -type f -print0 | xargs -0 -I {} ls -lh --time-style="+%Y_%m_%d_%H%M%S" {} | awk '{printf "%-17s   %16s   %-s\n", $6, $5, $7}' | sort -k3,3

By redirecting the outputs of these three commands to files, you can compare the outputs using tools such as diff or vimdiff.

Warning

The above check is reliable only if the corresponding files in /nesi/project and/or /nesi/nobackup have not been modified since they were copied to nearline. For this reason, if you want to carry out this level of checking, you should do so as soon as possible after you have established that the nlput operation completed successfully.

Level 3: Checksums

You can get a still higher level of assurance by asking our support team to run checksums (e.g. SHA256 sums) on files of particular importance. If the checksums come out identical, it is virtually certain that the files contain the same data, even if their modification dates and times are reported differently.

If you would like us to compare checksums of files, please provide a list of the specific files you would like to be verified in this way, according to their paths in /nesi/project and /nesi/nobackup. If you include a directory in the list, we will assume that you want checksums of all files in that directory. We will assess your request for feasibility and get back to you if we think you have asked for too many files or too much data to be compared.

Warning

The above check is reliable only if the corresponding files in /nesi/project and/or /nesi/nobackup have not been modified since they were copied to nearline. For this reason, if you want to carry out this level of checking, you should contact our support team to request it as soon as possible after you have established that the nlput operation completed successfully.

Also, this check is very expensive, so we will not be able to do it on large numbers of files or on files that collectively take up a lot of disk space. You should reserve this level of verification for your most valuable research data.

Was this article helpful?
0 out of 0 found this helpful