JupyterLab

NeSI provides a service for working on Jupyter Notebooks. As a first step JupyterLab can be used on Mahuika nodes. JupyterLab is a single-user web-based Notebook server, running in the user space. JupyterLab servers should be started preferably on a compute node, especially for compute intensive or memory intensive workloads. For less demanding work the JupyterLab server can be started on a login or virtual lab node. After starting the server your local browser can be connected. Therefore port forwarding needs to be enabled properly. The procedure will be simplified in future, but now require the following steps, which are then described in more details:

Launch JupyterLab

Since JupyterLab is a web based application, and at NeSI launched behind the firewall, a port needs to be forwarded to your local machine, where your browser should connected. This ports are numbers between 2000 and 65000, which needs to be unique on the present machine. The default port for JupyterLab is 8888, but only one user can use this at a time.

To avoid the need for modifying the following procedure again and again, we suggest to (once) select a unique number (between 2000 and 65000). This number needs to be used while establishing the port forwarding and while launching JupyterLab. In the following we use the port number 15051 (please select another number).

 

Setup SSH port forwarding 

First, the port forwarding needs to be enabled between your local machine and the NeSI system. Therewith a local port will be connected to the remote port on the NeSI system. For simplicity, we kept both numbers the same (here 15051). This can be specified on the command line in the terminal or using the MobaXterm GUI.

SSH Command Line

The ssh command need to be called with following arguments, e.g. for Mahuika:

ssh -N -L 15051:localhost:15051 mahuika

The -N is option. But due unintended behaviors in the ssh session after closing the Jupyter server, we suggest to enable this option.

Requirements

  • In the following we assume you already configured your.ssh/config to use two hop method as described in the Standard Terminal Setup.

Tips

  • For Maui_Ancil, e.g. w-mauivlab01 you may want to add the following to your .ssh/config to avoid establishing the additional hop manually.
    Host maui_vlab
       User <username>
       Hostname w-mauivlab01.maui.niwa.co.nz
       ProxyCommand ssh -W %h:%p maui
       ForwardX11 yes
       ForwardX11Trusted yes
       ServerAliveInterval 300
       ServerAliveCountMax 2
    <username> needs to be changed. Hostnames can be adapted for other nodes, e.g. w-clim01

Tips

  • If you get a error message
    bind: No such file or directory
    unix_listener: cannot bind to path: 
    try to create the following directory:
    mkdir -P ~/.ssh/sockets

 

MobaXterm GUI

Tips

  • MobaXterm has an internal terminal which acts like a linux terminal and can be configured as described in the Standard Terminal Setup. Therewith the SSH command line approach above can be used.

MobaXterm has a GUI to setup and launch sessions with port forwarding, click 'Tools > MobaSSH Thunnel (port forwarding)':

  • specify the lander02.nesi.org.nz as SSH server address (right, lower box, first line)
  • specify your user name (right, lower box, second line)
  • specify the remote server address, e.g. login.mahuika.nesi.org.nz  (right, upper box first line)
  • specify the JupyterLab port number on the local side (left) and at the remote server (right upper box, second line)
  • Save

sshTunnel.PNG

Launch the JupyterLab server 

After successfully establishing the port forwarding, we need open another terminal and login to the NeSI system in the usual way, e.g. opening a new terminal and start another ssh session:

ssh mahuika

On the NeSI system, the required Anaconda3 module needs to be loaded. If you want to use additional kernels you need to load additional modules, e.g. IRkernel (for R kernels) or Spark:

module load Anaconda3
module load IRkernel # optional

The JupyterLab server then can be started on the present node (login or virtual lab) or offloaded to a compute node. Please launch compute or memory intensive tasks on a compute node.

On login nodes / virtual labs

For very small (computational cheap and small memory) the JupyterLab can be started on the login or virtual lab using: 

jupyter lab --port 15051 --no-browser

Where, --port 15051 specifies the above selected port number and --no-browser option prevents JupyterLab from trying to open a browser on the compute/login node side. Jupyter will present output as described in the next section including the URL and a unique key, which needs to be copied in your local browser.

On compute node

Especially notebooks with computational and memory intensive tasks should run on compute nodes. Therefore, a script is provided, taking care of port forwarding to the compute node and launching JupyterLab. A session with 60 min on 1 core can be launched using:

srun --ntasks 1 -t 60  jupyter-compute 15051  # please change port number

After general output, JupyterLab prints a URL with a unique key and the network port number where the web-server is listening, this should look similar to:

...
[C 14:03:19.911 LabApp]
  To access the notebook, open this file in a browser:
      file:///scale_wlg_persistent/filesets/project/nesi99996/.local/share/jupyter/runtime/nbserver-503-open.html
  Or copy and paste one of these URLs:
      http://localhost:15051/?token=d122855ebf4d029f2bfabb0da03ae01263972d7d830d79c4

The last line will be needed in the browser later.

Therewith the Notebook and its containing tasks are performed on a compute node. You can double check e.g. using

import os
os.open('hostname').read()

More resources can be requested, e.g. by using:

srun --ntasks 1 -t 60 --cpus-per-task 5 --mem 3G jupyter-compute 15051 

Where 5 cores are requested for threading and a total memory of 3GB. Please do not use multiprocessing.cpu_count() since this is returning the total amount of cores on the node. Furthermore, if you use libraries, which implement threading align the numbers of threads (often called jobs) to the selected number of cores (otherwise the performance will be affected).

JupyterLab in your local browser 

Finally, you need to open your local web browser and copy and paste the URL specified by the JupyterLab server into the address bar. After initializing Jupyter Lab you should see a page similar to:

Jupyter.PNG

Kernels

The following JupyterLab kernel are installed:

  • Python3
  • Spark

R

verify that the module IRkernel is loaded

module load IRkernel

Spark

pySpark and SparkR is supported in NeSI Jupyter notebooks. Therefore, the module Spark needs to be loaded before starting Jupyter. Please run Spark workflows on compute nodes.

module load Spark

Packages

There are a long list of default packages provided by Anaconda3 (list all using !pip list) and R (list using installed.packages(.Library), note the list is shortened). Additionally, there are 2 modules with additional packages for Geo support, called Anaconda3-Geo and Anaconda3-Geo2 (containing packages can be listed with module help Anaconda3-Geo).

Furthermore, you can install additional packages as described on the Python and R support page.

Labels: machine learning
Was this article helpful?
0 out of 0 found this helpful