NVIDIA GPU Containers

NVIDIA provides access to GPU accelerated software through their NGC container registry: https://www.nvidia.com/en-us/gpu-cloud/containers/. Many of these containers are able to run under Singularity, which is supported on the NeSI platform. NVIDIA also specifies the GPU requirements for each container, i.e. whether it will run on our Pascal (sm60) GPUs.

There are instructions for converting their Docker images to Singularity images on the NVIDIA site but some small changes are required to these instructions on NeSI. As an example, here we show the steps required for running the NAMD image on NeSI, based on the NVIDIA instructions here: https://ngc.nvidia.com/catalog/containers/hpc:namd.

  1. Download the APOA1 benchmark data:
    • wget -O - https://gitlab.com/NVHPC/ngc-examples/raw/master/namd/2.13/get_apoa1.sh | bash
      cd apoa1
  2. To build the container we will use Singularity remote build since you cannot run the build command directly on the HPC, as you do not have root access (alternatively you could install Singularity on your own machine and do the build there, then upload the image to NeSI).
    1. Sign up for an account and get a token here: https://cloud.sylabs.io/auth 
    2. Copy and paste your token into ~/.singularity/sylabs-token on the HPC (if you have problems running the remote build later on due to the token being expired, try deleting ~/.singularity/remote.yaml if it already exists and then redo this step).
  3. Load the Singularity module:
    • module load Singularity
  4. Build the Singularity image. This step differs from the NVIDIA instructions by using the --remote build option since we do not have root access on NeSI.
    • singularity build --remote namd_2.13-singlenode.simg docker://nvcr.io/hpc/namd:2.13-singlenode
  5. Copy the following into a Slurm script named run.sl. The main differences from the NVIDIA instructions are the additional binds and extended LD_LIBRARY_PATH to ensure the directories containing the host CUDA drivers and libraries are available from the container.
    • #!/bin/bash
      #SBATCH --job-name=namdgpu
      #SBATCH --time=00:10:00
      #SBATCH --ntasks=1
      #SBATCH --cpus-per-task=8
      #SBATCH --gres=gpu:1
      #SBATCH --mem=1G

      module load Singularity CUDA

      # name of the NAMD input file
      INPUT="apoa1.namd"

      # singularity command with required arguments
      # "-B /cm/local/apps/cuda" and "-B ${EBROOTCUDA}" are required for the
      # container to access the host CUDA drivers and libs
      SINGULARITY="$(which singularity) exec --nv -B $(pwd):/host_pwd \
      -B /cm/local/apps/cuda -B ${EBROOTCUDA} namd_2.13-singlenode.simg"

      # extend container LD_LIBRARY_PATH so it can find CUDA libs
      OLD_PATH=$(${SINGULARITY} printenv | grep LD_LIBRARY_PATH | awk -F= '{print $2}')
      export SINGULARITYENV_LD_LIBRARY_PATH="${OLD_PATH}:${LD_LIBRARY_PATH}"

      # run NAMD
      ${SINGULARITY} namd2 +ppn ${SLURM_CPUS_PER_TASK} +idlepoll ${INPUT}
  6. Submit the job:
    • sbatch run.sl
  7. View the standard output from the simulation in the Slurm .out file.

 We expect similar steps to work for other NGC containers.

Was this article helpful?
1 out of 1 found this helpful