Follow

ABAQUS

Description

Abaqus Unified FEA (widely known as ABAQUS) is Finite Element Analysis software for modelling, visualisation and best-in-class implicit and explicit dynamics FEA.

The ABAQUS home page is at http://www.3ds.com/products-services/simulia/products/abaqus/.

Available modules

Packages with modules

Module NeSI Cluster
ABAQUS/6.13.2-linux-x86_64 pan
ABAQUS/6.14.2 pan

Licensing requirements

ABAQUS is made available to research groups, departments and institutions under the terms of closed-source, commercial licence agreements. If you have any questions regarding your eligibility to access this software package or any particular version of it, please contact our support desk.

Checking licence availability

ABAQUS licence tokens are checked out from remote licence servers. To check the availability of tokens on your licence server, please run the following commands:

module load ABAQUS/6.14.2        # or other version of ABAQUS as appropriate

# replace licenceserver.example.com (but not the @ sign) with the hostname of
# your licence server.
abaqus licensing lmstat -c @licenceserver.example.com -a

This is a list of ABAQUS licence servers that communicate with NeSI clusters.

Affiliation ABAQUS licence server
University of Auckland Faculty of Engineering abaqus.licenses.foe.auckland.ac.nz

If you need to check the availability of an ABAQUS licence on a NeSI cluster but your affiliation is not listed in the table, please email our support desk.

Example scripts

Example scripts for the Pan cluster

ABAQUS in SMP mode on one node without GPU support

#!/bin/bash -e

# ABAQUS job submission script
# Optimized to run parallel jobs without GPU support

#SBATCH --job-name      ABAQUS_job
#SBATCH --account       nesi99999
#SBATCH --time          02:00:00
#SBATCH --cpus-per-task 16
#SBATCH --mem-per-cpu   2G
#SBATCH --constraint    avx
#SBATCH --output        ABAQUS_job.%j.out
#SBATCH --error         ABAQUS_job.%j.err

# Load the module
module load ABAQUS/6.14.2

# Save the initial working directory
thisdir=$(pwd -P)

# Transfer input files to the local disk on the compute node
cp input/BigCavity_FullDisk.inp ${TMP_DIR}
cd ${TMP_DIR}

#  Run the Parallel Program
srun abaqus \
    job=BigCavity_FullDisk input=BigCavity_FullDisk.inp \
    cpus=$SLURM_CPUS_PER_TASK \
    -verbose 1 standard_parallel=all mp_mode=threads \
    interactive

# Transfer the results back to the starting directory, within a subdirectory
# named "output"
cp -arv --no-preserve=mode ${TMP_DIR} ${thisdir}/output
rm -rfv ${TMP_DIR}/*

ABAQUS in SMP mode on one node with GPU support

#!/bin/bash -e

# ABAQUS job submission script
# Optimised to run parallel jobs with GPU support

#SBATCH --job-name      ABAQUS_GPU_job
#SBATCH --account       nesi99999
#SBATCH --time          02:00:00   
#SBATCH --cpus-per-task 16
#SBATCH --mem-per-cpu   2G
#SBATCH --constraint    avx
#SBATCH --gres          gpu:2
#SBATCH --output        ABAQUS_GPU_job.%j.out
#SBATCH --error         ABAQUS_GPU_job.%j.err

# Load the module
module load ABAQUS/6.14.2

# Save the initial working directory
thisdir=$(pwd -P)

# Transfer input files to the local disk on the compute node
cp input/BigCavity_FullDisk.inp ${TMP_DIR}
cd ${TMP_DIR}

#  Run the Parallel Program
srun abaqus \
    job=BigCavity_FullDisk input=BigCavity_FullDisk.inp \
    cpus=$SLURM_CPUS_PER_TASK gpus=2 \
    -verbose 1 standard_parallel=all mp_mode=threads \
    interactive

# Transfer the results back to the starting directory, within a subdirectory
# named "output"
cp -arv --no-preserve=mode ${TMP_DIR} ${thisdir}/output
rm -rfv ${TMP_DIR}/*

ABAQUS best practices

In order to maximise the efficiency of the cluster and minimise the issues related to some ABAQUS packages, the NeSI team has developed the following best practices.

Most suitable platform

We suggest requesting SMP, as this mode typically provides better efficiency than MPI, and at the same time it will reduce demand for tokens. If you wish to use more than one node to run any given job, the only means to do so is to submit a job requesting the nonsusp partition. This partition will provide exclusive resources and, since these jobs cannot be suspended, no job preemption will be applied. This partition is prioritised over others, and for that reason a maximum of 32 nodes, providing a total of 512 cores between them, is available at any given time. Multi-node ABAQUS jobs found in partitions other than nonsusp may be terminated by NeSI staff. Please note that your model may have some scalability issues. The more cores the job uses, the more communication overhead and less efficient usage of the cluster. Please assess the scalability of your model before requesting a large number of CPUs.

GPU acceleration support

GPUs are slower for smaller jobs because it takes longer to transfer data from the main memory to the GPU memory. We therefore suggest that you only use them for larger jobs. Another benefit of GPUs, especially applicable to larger jobs, is that a GPU only takes one licence token per use, whereas many CPU tokens are required to achieve a comparable speed.

Job preemption based on re-queueing mechanism

If your job requires more than one day of wall time in total, please consider using native Checkpointing & Restart (C&R). This feature can be used to minimise the impact of a hardware issue, by restarting the job from the last checkpoint. Furthermore, native C&R can be used in order to take advantage of Slurm job preemption based on a requeueing mechanism. In general, C&R relies on the presence of a fast shared filesystem, since extensive use of C&R could introduce an unreasonable overhead due to slow filesystem operations. Feel free to contact us in order to set up a workflow and study the suitability of checkpointing and restart of your jobs. At this time we have already documented how to enable Abaqus checkpointing & restart with Slurm Workload Manager.

GPGPU solver acceleration in an Abaqus analysis

This option specifies acceleration of the Abaqus/Standard direct solver. This option is meaningful only on computers equipped with appropriate GPGPU hardware. The direct sparse solver supports both shared memory nodes and distributed memory nodes. On shared memory computers or a single node of a computer cluster, thread-based parallelization is used for the direct sparse solver, and high-end graphics cards that support general processing (GPGPUs) can be used to accelerate the solution. On multiple compute nodes of a computer cluster, a hybrid MPI and thread-based parallelization is used. The direct sparse solver cannot be used on multiple nodes if:

  • the analysis also includes an eigenvalue extraction procedure, or
  • the analysis requires features for which MPI-based parallel execution of element operations is not supported.

In addition, the direct sparse solver cannot be used on multiple nodes for analyses that include any of the following:

  • multiple load cases with changing boundary conditions, and
  • the quasi-Newton nonlinear solution technique.

The direct sparse solver and GPGPU

The direct sparse solver supports GPGPU acceleration for the symmetric solver. GPGPU acceleration cannot be used with the asymmetric solver. If you request a GPGPU acceleration in an analysis with asymmetric matrix storage (equation solver), it will not be achieved. It is worth noting that both the Standard Dynamic Implicit Quasi-static Analysis and the Standard Linear Perturbation Frequency Analysis use the asymmetric solver, and can not be performed using GPGPU hardware.

The model itself

All of the above assumes that the model has already been optimized. Before submitting large models or large numbers of models, here are some tips on how to reduce the problem size:

  1. Use the smallest and simplest model possible. For example, there is no need to include any damage or fracture parameters or expensive contact simulations if the problem can be first solved using simpler approximations. At this stage, once you have a first result, you can decide whether you want to complicate the model or not.
  2. We recommend that, if you find that a model is too simple, you add features to it incrementally. You need the feedback and interactivity when building your model up, as you will also be building up a mental model of how the FEA works as you learn. You will need to be able to model something, run it and get the results quickly. Too many times, a large model is built and submitted, and hours or days later the results come back and they are not as expected.
  3. Unless the run time for your model is no more than a few hours, you will almost certainly benefit from requesting restart outputs.
  4. Request no more output data (in terms of both frequency and detail) than you will need for your planned analysis.
  5. Profile the speedup that you might achieve with different numbers of cores for your model. There is overhead in the use of more cores or GPUs, and the marginal benefit from requesting additional resources tends to go down. Eventually, you will reach a point where adding more cores produces only very limited benefits.

Comments

Powered by Zendesk