Logging in
For external users, please connect to lander02.nesi.org.nz.
Do not use lander.nesi.org.nz
as currently this lander only provides access to Kupe. In the future, it will become the lander for the HPCF in Wellington.
Once on the lander node, or if you are on the NIWA network, simply ssh to login.maui.niwa.co.nz
Programming Environment
The Programming Environment on Maui has been upgraded with respect to Kupe. The Programming Environment on the Maui_Ancil (CS500) nodes now includes the Cray Programming Environment, as well as the Intel and GNU tool chains. The Cray XC50 programming environments for CCE, GNU, and Intel have been upgraded, including more recent versions of the Cray and GNU compilers, while the Intel compiler version is the same as on Kupe.
As noted here, the CS500 Programming Environment does not presently include the ability to swap environments as on the XC50 (e.g. ) However, this capability will be included with the next release of the Cray CS programming environment in October.
Software Stack
The software stack on maui, maui_ancil and the Virtual Labs has been rebuilt from scratch. While all relevant software has been migrated from kupe, kupe_mp and the Virtual Labs, a few important changes have been made:
- There is now a clearer separation between the maui_ancil, and Virtual Labs (CS500) and maui (XC50) software stacks: the maui software stack only contains software needed for large jobs (e.g., XIOS and the grib_api library), while general tools and libraries (e.g., Anaconda, NCO, and Mule) can now only be found on the maui_ancil and Virtual Lab machines. This was necessary to make best use of system capabilities, such as InifiniBand file access on maui_ancil and the Virtual Labs, and to avoid single core jobs running on maui (i.e. XC50).
- The maui_ancil and Virtual Labs software stack now uses a more recent GNU toolchain based on GCC v7.1.0, to benefit from Intel Skylake architecture capabilities. Module names have therefore changed slightly with respect to Kupe.
- To avoid clutter, older versions of the same software have not been migrated, with the exception of NCL.
Rose and Cylc users
Maui requires users to work in their project directories on the /nesi/project
and /nesi/nobackup
file systems, personal directories are no longer available. Rose and Cylc are configured to place the "cylc-run" directory in
/nesi/nobackup/$PROJECT/$USER
So please add
export PROJECT=<your project number>
to your .profile
on Maui. You may need to reset this variable for particular sessions if you work on multiple projects.
Other Key Differences between Kupe and Maui.
For NIWA (and NeSI) users the services available on Maui are similar to those on Kupe. However, our experience with Kupe, means that we have made some changes that means you will need to adjust some of your work-habits, job scripts, and workflows.
The following tables outline the differences between the
- Kupe and Maui Cray XC50 hardware;
- Kupe and Maui Cray CS500 ancillary nodes (termed "multi-purpose" nodes on Kupe, and ancillary nodes on Maui;
- Kupe and Maui storage sub-systems;
- The Slurm partitions on kupe and maui;
- The Slurm partitions on kupe_mp and maui_ancil, and
- The virtual lab and cylc/rose services available on the two environments.
Each of these differences is described in the following sections.
Kupe and Maui Cray XC50s
Table 1: Essential differences (and similarities) between Kupe and Maui Cray XC50 Supercomputers
Component |
kupe (XC50) |
maui (XC50) |
Comment |
Login nodes (also known as eLogin nodes) |
80 cores in 2 × Skylake (Gold 6148, 2.4 GHz, dual socket 20 cores per socket) nodes |
80 cores in 2 × Skylake (Gold 6148, 2.4 GHz, dual socket 20 cores per socket) nodes |
Identical |
Compute nodes |
4,160 Skylake (Gold 6148, 2.4 GHz, dual socket 20 cores per socket) cores |
18,560 Skylake (Gold 6148, 2.4 GHz, dual socket 20 cores per socket) cores |
Kupe is Air Cooled, Maui is Liquid Cooled |
Hyperthreading |
Enabled |
Enabled |
Identical |
Theoretical Peak performance |
0.32PF |
1.43PF |
|
Memory capacity per login (build) node |
768 GB |
768 GB |
Identical |
Compute node memory |
All nodes have 96GB memory |
232 nodes have 96GB memory 232 nodes have 192 GB memory |
50% of Maui’s nodes have “large” memory |
Total System memory |
10.0TB |
66.8TB |
|
Interconnect |
Cray Aries, Dragonfly topology |
Cray Aries, Dragonfly topology |
Identical |
Workload Manager |
Slurm (Multi-Cluster) |
Slurm (Multi-Cluster) |
In Wellington, can submit jobs to all Slurm clusters (maui, maui_ancil and mahuika) |
Operating System |
Cray Linux Environment: SLES 12 SP2, and CLE 6.0 UP04 |
Cray Linux Environment: SLES 12 SP2, and CLE 6.0 UP06 |
Maui is on a later version of the Cray OS. |
Kupe and Maui Cray CS500s
Table 2: Essential differences (and similarities) between Kupe and Maui Cray CS500 Ancillary Nodes.
Component |
kupe_mp |
maui_ancil |
Comment |
Ancillary nodes |
440 cores in 11 × Skylake (Gold 6148, 2.4 GHz, dual socket 20 cores per socket) nodes |
1,120 cores in 28 × Skylake (Gold 6148, 2.4 GHz, dual socket 20 cores per socket) nodes |
Some maui_ancil nodes are available to non-NIWA NeSI users |
Hyperthreading |
Enabled |
Enabled |
Identical |
Local Disk |
1.2TB SSD |
1.2TB SSD |
Identical |
Operating System |
CentOS 7 |
CentOS 7 |
Identical |
GGPGUs |
2 × Tesla P100 |
8 × Tesla P100 |
|
Remote Visualisation |
Identical |
||
Memory per node |
768 GB |
768 GB |
Identical |
Interconnect |
EDR (100 Gb/s) InfiniBand |
EDR (100 Gb/s) InfiniBand |
Identical |
Workload Manager |
Slurm (Multi-Cluster) |
Slurm (Multi-Cluster) |
Identical |
OpenStack |
Used to provide Virtual Machines to users |
Used to provide Virtual Machines to users |
Identical |
Kupe and Maui Storage Systems (shared with Mahuika)
Table 3: Essential differences (and similarities) between Kupe and Maui Storage Subsystems.
Storage |
On: kupe & kupe_mp |
On: maui, maui_ancil & mahuika |
Comment |
File system Hardware |
IBM ESS 1×GS4S, 2×GL6S, ~70GB/s Bandwidth |
IBM ESS 1×GS4S, 4×GL6S, ~140GB/s Bandwidth IBM ESS 1×GS4, 1×GL6S |
The GS4 and one GL6S are provided for NIWA forecast operations. |
Filesystem software |
IBM Spectrum Scale v 4.2.3 |
IBM Spectrum Scale v 5.0.1 |
Scale v5 is faster than Scale v4. |
/home and /project |
626 TB |
2,505TB |
Incudes NIWA only storage |
/nobackup |
2,505TB |
6,263TB |
Incudes NIWA only storage |
/devoper |
470TB |
626TB |
Available to NIWA only |
/oper |
626TB |
939TB |
Available to NIWA only |
/nearline |
470TB |
626TB |
Accessed via the Librarian |
Offline Storage |
>100PB |
>100PB |
Maui uses LT08 drives; Kupe, LTO7 drives. |
More detailed information about the NeSI filesystems is available here.
Kupe and Maui Slurm Partitions
More detailed information about the NeSI Slurm partitions on Slurm clusters maui and maui_ancil is available here
Cray XC50 Slurm Clusters
Table 4: Main differences between the configurations of Slurm on Kupe and Maui (XC50s)
Users |
Kupe |
Maui |
Comment |
NeSI |
Debug |
nesi_research |
Single partition on maui. The QoS nesi_debug provides high priority access to the system for debug jobs |
NIWA |
NIWA_Research Operations |
niwa_research |
Two partitions on maui. The niwa_operations partition is 128 large memory nodes. NIWA research jobs may run on the niwa_operations if they can be pre-empted and use less than ~85GB of memory on each node. The QoS niwa_debug provides high priority access to the system for debug jobs |
Cray CS500 Slurm Clusters
Table 5: Main differences between the configurations of Slurm on the Kupe and Maui Ancillary (CS500) nodes.
Users |
kupe_mp |
maui_ancil |
Comment (Note name change from _mp to _ancil) |
NeSI |
N/A |
nesi_gpu nesi_igpu nesi_prepost |
Provides access to the 5×P100 GPGPUs (each is a VM with 8 Slurm CPUs and 1 × P100 GPGPU ) nesi_igpu provides interactive access to a VM with 8 Slurm CPUs and 1 × P100 GPGPU during business hours 7AM to 8PM Monday to Friday |
NIWA |
ec_cs sat general |
ec_cs ec_sat niwa_work |
Used for operational serial jobs (ec_cs and ec_sat) and to support serial processing required by NIWA virtual labs. |
Kupe and Maui Virtual Lab and Cylc Services
Table 6: Comparison of Virtual Lab and cylc service node names - where the fully qualified node name on Maui_Ancil is <VL-name>.maui.niwa.co.nz .
kupe (VL-name) |
maui (VL-name) |
Comment |
hafs01 |
w-nwp01 |
Virtual lab for NWP focused research |
clim01 |
w-clim01 |
Virtual lab for Climate Modelling focused research |
|
w-haz01 |
Virtual lab for Hazards focused research |
|
w-mar01 |
Virtual lab for Marine focused research (available in late September) |
|
w-cliops01 |
Virtual lab for CLIDB (available in late September) |
sat-login01 |
w-sat-login01 |
Virtual lab for satellite focused research (available in late September) |
cylc01 |
w-cylc01 |
To provide (research) cylc services |
|
w-cylc02 |
To provide (research) cylc service resilience |
|
w-cylc03 |
To provide (research) cylc service resilience |
rose01 |
w-rose01 |
For Rose services |
|
|
|
ec-login01 |
w-ec-login01 |
Virtual lab for EcoConnect development (available in late September) |
ec-satingest01 |
w-ec-satingest01 |
Satellite data ingest service (available in late September) |
ec-cylc01 |
w-ec-cylc01 |
To provide (operational) cylc services (available in late September) |
ec-cylc02 |
w-ec-cylc02 |
To provide (operational) cylc service resilience |
ec-cylc03 |
w-ec-cylc03 |
To provide (operational) cylc service resilience |
|
|
|