NeSI provides a service for working on Jupyter Notebooks. As a first step JupyterLab can be used on Mahuika nodes. JupyterLab is a single-user web-based Notebook server, running in the user space. JupyterLab servers should be started preferably on a compute node, especially for compute intensive or memory intensive workloads. For less demanding work the JupyterLab server can be started on a login or virtual lab node. After starting the server your local browser can be connected. Therefore port forwarding needs to be enabled properly. The procedure will be simplified in future, but now require the following steps, which are then described in more details:
- Launch JupyterLab
Since JupyterLab is a web based application, and at NeSI launched behind the firewall, a port needs to be forwarded to your local machine, where your browser should connected. This ports are numbers between 2000 and 65000, which needs to be unique on the present machine. The default port for JupyterLab is 8888, but only one user can use this at a time.
To avoid the need for modifying the following procedure again and again, we suggest to (once) select a unique number (between 2000 and 65000). This number needs to be used while establishing the port forwarding and while launching JupyterLab. In the following we use the port number 15051 (please select another number).
Setup SSH port forwarding
First, the port forwarding needs to be enabled between your local machine and the NeSI system. Therewith a local port will be connected to the remote port on the NeSI system. For simplicity, we kept both numbers the same (here 15051). This can be specified on the command line in the terminal or using the MobaXterm GUI.
SSH Command Line
The ssh command need to be called with following arguments, e.g. for Mahuika:
ssh -N -L 15051:localhost:15051 mahuika
Here -N is optional but recommended.
- In the following we assume you already configured your
.ssh/configto use two hop method as described in the Standard Terminal Setup.
- For Maui_Ancil, e.g. w-mauivlab01 you may want to add the following to your
.ssh/configto avoid establishing the additional hop manually.<username> needs to be changed. Hostnames can be adapted for other nodes, e.g.
Host maui_vlab User <username> Hostname w-mauivlab01.maui.niwa.co.nz ProxyCommand ssh -W %h:%p maui ForwardX11 yes ForwardX11Trusted yes ServerAliveInterval 300 ServerAliveCountMax 2
MobaXterm has a GUI to setup and launch sessions with port forwarding, click 'Tools > MobaSSH Thunnel (port forwarding)':
- specify the lander.nesi.org.nz as SSH server address (right, lower box, first line)
- specify your user name (right, lower box, second line)
- specify the remote server address, e.g. login.mahuika.nesi.org.nz (right, upper box first line)
- specify the JupyterLab port number on the local side (left) and at the remote server (right upper box, second line)
Launch the JupyterLab server
After successfully establishing the port forwarding, we need open another terminal and login to the NeSI system in the usual way, e.g. opening a new terminal and start another ssh session:
On the NeSI system, the required Anaconda3 module needs to be loaded. If you want to use additional kernels you need to load additional modules, e.g. IRkernel (for R kernels) or Spark:
module load Anaconda3
module load IRkernel # optional
The JupyterLab server then can be started on the present node (login or virtual lab) or offloaded to a compute node. Please launch compute or memory intensive tasks on a compute node.
On login nodes / virtual labs
For very small (computational cheap and small memory) the JupyterLab can be started on the login or virtual lab using:
jupyter lab --port 15051 --no-browser
--port 15051 specifies the above selected port number and
--no-browser option prevents JupyterLab from trying to open a browser on the compute/login node side. Jupyter will present output as described in the next section including the URL and a unique key, which needs to be copied in your local browser.
On compute node
Especially notebooks with computational and memory intensive tasks should run on compute nodes. Therefore, a script is provided, taking care of port forwarding to the compute node and launching JupyterLab. A session with 60 min on 1 core can be launched using:
srun --ntasks 1 -t 60 jupyter-compute 15051 # please change port number
After general output, JupyterLab prints a URL with a unique key and the network port number where the web-server is listening, this should look similar to:
... [C 14:03:19.911 LabApp] To access the notebook, open this file in a browser: file:///scale_wlg_persistent/filesets/project/nesi99996/.local/share/jupyter/runtime/nbserver-503-open.html Or copy and paste one of these URLs: http://localhost:15051/?token=d122855ebf4d029f2bfabb0da03ae01263972d7d830d79c4
The last line will be needed in the browser later.
Therewith the Notebook and its containing tasks are performed on a compute node. You can double check e.g. using
More resources can be requested, e.g. by using:
srun --ntasks 1 -t 60 --cpus-per-task 5 --mem 512MB jupyter-compute 15051
Where 5 cores are requested for threading and a total memory of 3GB. Please do not use
multiprocessing.cpu_count() since this is returning the total amount of cores on the node. Furthermore, if you use libraries, which implement threading align the numbers of threads (often called jobs) to the selected number of cores (otherwise the performance will be affected).
JupyterLab in your local browser
Finally, you need to open your local web browser and copy and paste the URL specified by the JupyterLab server into the address bar. After initializing Jupyter Lab you should see a page similar to:
The following JupyterLab kernel are installed:
verify that the module IRkernel is loaded
module load IRkernel
pySpark and SparkR is supported in NeSI Jupyter notebooks. Therefore, the module Spark needs to be loaded before starting Jupyter. Please run Spark workflows on compute nodes.
module load Spark
There are a long list of default packages provided by Anaconda3 (list all using
!pip list) and R (list using
installed.packages(.Library), note the list is shortened). Additionally, there are 2 modules with additional packages for Geo support, called
Anaconda3-Geo2 (containing packages can be listed with
module help Anaconda3-Geo).