Python

Availability

Licences

Description

Python is a programming language that lets you work more quickly and integrate your systems more effectively.

The Python home page is at http://python.org.

On the NeSI systems Python is available in various versions with already installed packages. On Mahuika you will find Python/2... and Python/3... modules as well as Anaconda3, on Maui cray-python and on Maui_ancil Anaconda2 and Anaconda3 is provided.

Licensing requirements

All versions of Python available on NeSI platforms are owned and licensed by the Python Software Foundation. Each version is released under a specific open-source licence. The licences are available on the Python documentation server.

Example scripts

#!/bin/bash -e

#SBATCH --job-name    MyPythonJob
#SBATCH --time        01:00:00
#SBATCH --mem         3G

module load Python/3.7.3-gimkl-2018b

python MyPythonScript.py

MPI Example

  #!/bin/bash -e
  #SBATCH --job-name=PythonMPI
#SBATCH --ntasks=2 # Number of MPI tasks #SBATCH --time=00:30:00 #SBATCH --mem-per-cpu=1500 # Memory per logical CPU

module load Python srun python PythonMPI.py # Executes ntasks copies of the script
import numpy as np
from mpi4py import MPI

comm = MPI.COMM_WORLD
size = comm.Get_size() # Total number of MPI tasks
rank = comm.Get_rank() # Rank of this MPI task

# Calculate the data (numbers 0-9) on the MPI ranks
rank_data = np.arange(rank, 10, size)

# perform some operation on the ranks data
rank_data += 1

# gather the data back to rank 0
data_gather = comm.gather(rank_data, root = 0)

# on rank 0 sum the gathered data and print both the sum of, 
# and the unsummed data if rank == 0: print('Gathered data:', data_gather) print('Sum:', sum(data_gather))

The above Python script will create a list of numbers (0-9) split between the MPI tasks (ranks). Each task will then add one to the numbers it has, those numbers will then be gathered back to task 0, where the numbers will be summed and both the sum of, and the unsummed data is printed.

Multiprocessing Example

  #!/bin/bash -e
  #SBATCH --job-name=PytonMultiprocessing
#SBATCH --cpus-per-task=2 #Number of logical CPUs #SBATCH --time=00:10:00 #SBATCH --mem-per-cpu=1500 #Memory per logical CPU

module load Python python PythonMultiprocessing.py
import multiprocessing

def calc_square(numbers, result1): for idx, n in enumerate(numbers): result1[idx] = n*n
def calc_cube(numbers, result2): for idx, n in enumerate(numbers): result2[idx] = n*n*n if __name__ == "__main__":
numbers = [2,3,4]
# Sets up the shared memory variables, allowing the variables to be
# accessed globally across processes result1 = multiprocessing.Array('i',3) result2 = multiprocessing.Array('i',3)
# set up the processes p1 = multiprocessing.Process(target=calc_square, args=(numbers,result1,)) p2 = multiprocessing.Process(target=calc_cube, args=(numbers,result2,))
# start the processes p1.start() p2.start()
# end the processes p1.join() p2.join()
print(result1[:]) print(result2[:])

The above Python script will calculated the square and cube of an array of numbers using multiprocessing and print the results from outside of those processes, safely circumventing Python's global interpreter lock.

For more in depth examples of and descriptions of Multiprocessing in Python you may find this Multithreading/Multiprocessing Youtube tutorial series helpful

Python Packages

Programmers around the world have written and released many packages for Python, which are not included with the core Python distribution and must be installed separately. Each Python environment module comes with its own particular suite of packages, and the system Python has its own installed packages.

The provided packages can be listed using

module load Python
python -c "help('modules')"

Adding packages

Additional packages can be requested via a NeSI support ticket if there is wider need for it. If you just need it for your self or want to provide additional packages for your project group, you can install it using the following methods:

install into your $HOME

this is the simplest way to install additional packages, but you might fill your $HOME quota and cannot share installations with collaborators.

A packages prodXY can be installed using:

pip install --user prodXY

install into project directory

This method is slightly more complicated, but provides the advantage that you do not fill up your $HOME  directory quota and you can easily share these installed packages with your collaborators. This method requires:

  • installing Python package into a "remote" location (your project directory
  • define the $PATH and $PYTHONPATH in your environment (e.g. using a module file)

In the following we install a python3 package called  prodXY into /nesi/project/<projectID>/PyPackages and create a module for it.

Using Easybuild

Easybuild is a Package provisioning tool, please read also here. Therefore you need a configuration file, e.g. for PyPI python packages (here cutadapt), which could be located in /nesi/project/<projectID>/easybuildinstall/easyconfigs and the following would be named as cutadapt-2.3-gimkl-2018b-Python-3.7.3.eb:

# Easybuild Python package template
# Author: Mandes Schoenherr
# NeSI - the New Zealand eScience Infrastructure

easyblock = "PythonPackage"

name = 'cutadapt' ### specify the name here
version = '2.3' ### specify the version

homepage = 'https://cutadapt.readthedocs.io/' ### add a reference URL and a description below
description = """ cutadapt removes adapter sequences
from high-throughput sequencing data. """

toolchain = {'name':'gimkl', 'version':'2018b'}
source_urls = [PYPI_SOURCE]
sources = [SOURCELOWER_TAR_GZ]

python = 'Python'
pyver = "3.7.3"
pyshortver = '.'.join(pyver.split('.')[:2])
versionsuffix = "-%s-%s" % (python, pyver)
dependencies = [ (python, pyver) ]
sanity_check_paths = {
'files': ['bin/cutadapt'],
'dirs': ['lib/python%s/site-packages/' % pyshortver],
}

moduleclass = 'bio' ### change the type of software

Then we can install this package using:

export NESI_EASYBUILD_PROJECT_ID=<projectID>
module load project
eb cutadpt-2.3-gimkl-2018b-Python-3.7.3.eb

After successful install we should see the module using:

module --ignore-cache avail cutadapt

And the package can be used by all project members (after specifying the projectID) using:

module load project
module load cutadapt

Manual install and provisioning

Depending on the package you can install packages using:

  • the Python Package Manager PIP
$ pip install --prefix /nesi/project/<projectID>/PyPackages cogent
  • use source code
$ python setup.py install --prefix /nesi/project/<projectID>/PyPackages

 

Python will not find the package by default, therefore you need to add the location of the package to the $PYTHONPATH. Furthermore, some packages provide binaries, which are accessible by adding the directory to $PATH

$ export PYTHONPATH=/nesi/project/<projectID>/PyPackages/lib/python3.6/site-packages:$PYTHONPATH
$ export PATH=/nesi/project/<projectID>/PyPackages/bin:$PATH

NOTE: If you install multiple packages in the same location, e.g. /nesi/project/<projectID>/PyPackages, you just need to set these environment variable once.

Creating a modulefile

These environment variables can be handled in a module file. Thus the packages (and others if desired) can be accessed by you and the project collaborators.

Therefore we create a modulefile let's say at /nesi/project/<projectID>/modulefiles/PyXtra and assume we installed a Python package into /nesi/project/<projectID>/PyPackages:

#%module
module load Python/3.7.3-gimkl-2018b
   # provide a description
whatis "The packageXY for python."
proc ModulesHelp { } {  puts stderr "This module loads the packageXY. It requires python3." }

set PKG_PREFIX /nesi/project/<projectID>/PyPackages
   # add the location of binaries to PATH, such they are immediately accessible
prepend-path PATH $PKG_PREFIX/bin
   # add to PYTHONPATH to access python packages
prepend-path PYTHONPATH $PKG_PREFIX/lib/python3.7/site-packages/

To use this module (and all other modules you create in that directory) you add the following to your $HOME/.bashrc (need to source it, or re-login to activate the changes):

module use /nesi/project/<projectID>/modulefiles

Then you can simple load the module via:

module load PyXtra

 

Further notes

iPython

iPython (interactive Python) is an enhanced tool for accessing a Python command line. It is available in many NeSI Python modules.

Starting iPython

To open an iPython console, simply run the ipython command:

[jblo123@build-wm ~]$ module load Python/3.6.3-gimkl-2017a
[jblo123@build-wm ~]$ ipython

Listing available functions

You can use iPython to list the functions available that start with a given string. Please note that if the string denotes a module (i.e., it has a full stop somewhere in it), that module (or the function you want from it) must first be imported, using either an "import X" statement or a "from X import Y" statement.

import os
os.<TAB>   # List all functions in the os module
os.O_<TAB> # List functions starting with "O_" from the os module
len<TAB>   # List functions starting with "len"

Here,

o<TAB>

or even

os<TAB>

and expect to see the methods and values provided by the os module - you have to put the full stop after the "os" if you want to do that.

Getting information about an object

In iPython, you can query any object by typing the object name followed by a question mark (?), then hitting Enter. For instance:

In [1]: x = 5
In [2]: x?
Type:        int
String form: 5
Docstring:
int(x=0) -> int or long
int(x, base=10) -> int or long

Convert a number or string to an integer, or return 0 if no arguments
are given.  If x is floating point, the conversion truncates towards zero.
If x is outside the integer range, the function returns a long instead.

If x is not a number or if base is given, then x must be a string or
Unicode object representing an integer literal in the given base.  The
literal can be preceded by '+' or '-' and be surrounded by whitespace.
The base defaults to 10.  Valid bases are 0 and 2-36.  Base 0 means to
interpret the base from the string as an integer literal.
>>> int('0b100', base=0)
4

You can also do this on functions (len?), methods (os.mkdir?) and modules (os.path?). If you try to do it on something that isn't defined yet, Python will tell you that the object in question couldn't be found.

Quitting iPython

Just enter the quit command at the iPython prompt.

Labels: mahuika tier1 app
Was this article helpful?
0 out of 0 found this helpful
a.homepage:before