TensorFlow on GPUs

TensorFlow is an open source library for machine learning. TensorFlow can train and run deep neural networks for handwritten digit classification, image and word recognition and natural language processing.

TensorFlow is callable from Python with the numerically intensive parts of the algorithms implemented in C++ for efficiency. Here we'll show how to run TensorFlow version 1.10 with GPU support, which is installed on Mahuika. If you want to run TensorFlow on CPUs instead, have a look at our article TensorFlow on CPUs for tips on how to configure TensorFlow and SLURM for optimal performance.

Example: classifying flower pictures

Let's assume we want to classify pictures of flowers - the example below is based on this.

TensorFlow comes with deep neural network models that have been pre-trained with millions of images. The vast majority of the neural network weights of these models have already been optimised for feature detection and will often work well for a new classification problem. All we need to do is retrain the last neural network layer. This is known as "transfer learning".

Initial setup

We'll need two scripts that will allow us to train and classify the images:

curl -LO https://github.com/tensorflow/hub/raw/master/examples/image_retraining/retrain.py

We'll also need additional functionality from tensorflow-hub, which we will install in our home directory using the command (on mahuika)

ml TensorFlow/1.10.1-gimkl-2017a-Python-3.6.3
pip install tensorflow-hub==0.3.0 --user

Getting flower images from the web

Next we need a directory that contains photos of flowers which have been labelled as daisies, dandelions, roses, sunflowers and tulips:

curl -LO http://download.tensorflow.org/example_images/flower_photos.tgz
tar xzf flower_photos.tgz


How to retrain TensorFlow 

The following Slurm script will retrain an existing neural network (Inception V3) on Mahuika's GPU. Note that we need to pass `--partition gpu` to Slurm to launch the execution on the GPU. On a shared computer like Mahuika we want to make sure the results are saved in a directory that is shared across nodes as opposed to /tmp, the default. Hence we pass additional command line options to retrain.py specifying the output directory. Be sure to clean up if you have to run `retrain.py` multiple times: `rm -rf output; mkdir -p output/intermediate output/summaries`. 


#!/bin/bash -e

#SBATCH --job-name TensorFlow-GPU
#SBATCH --partition gpu
#SBATCH --gres gpu:1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --time 01:00:00
#SBATCH --mem 4G

module load TensorFlow/1.10.1-gimkl-2017a-Python-3.6.3

# we'll save the second last layer weights in this directory, let's make
# sure
the directory exists. Subsequent runs will be faster when the
# bottleneck directory
is populated
mkdir -p bottleneck

# create directories that will hold the output and clean up previous
# results if present

rm -rf output
mkdir -p output/intermediate output/summaries output/chkpnt

# retrain the neural network. Type python retrain.py -h for a full
# list of options
srun python retrain.py --image_dir=flower_photos \
--bottleneck_dir=bottleneck --output_graph=output/graph.pb \
--output_label=output/labels.txt \
--intermediate_output_graphs_dir=output/intermediate \
--summaries_dir=output/summaries --saved_model_dir=output/model \

Copy-paste the above in file flowers.sl and type "sbatch flowers.sl" to submit the job.

You will find the execution time to be significantly longer the first time you run as the script will save bottleneck weights (the weights of the second to last layer) to files. Subsequent runs should take about 10 minutes.

Check your output file and look for the final "Final test accuracy", which should range between 85-95%. Each run will yield a different value due to the randomness of the training process.

How to classify a flower picture

Once the network has been trained you can find out what type of flower is in a picture. First download

curl -LO https://github.com/tensorflow/tensorflow/raw/master/tensorflow/examples/label_image/label_image.py

As of 2 December 2019, the downloaded script is no longer compatible with TensorFlow 1.10 used in this example. However, this can easily be fixed with the command

cat label_image.py | sed 's/\.compat\.v1//g' > label_imageV1.py

Next, choose for instance flower_photos/daisy/21652746_cc379e0eea_m.jpg and type:

python label_imageV1.py --image=flower_photos/daisy/21652746_cc379e0eea_m.jpg \
--graph=output/graph.pb --labels=output/labels.txt \
--input_layer=Placeholder --output_layer=final_result

which should return something like

daisy 0.997859
sunflowers 0.00132799
dandelion 0.000446326
tulips 0.000278063
roses 8.88173e-05

Again, your results might be slightly different. In this case the model is 99 percent confident that the image contains a daisy.

Recommended Resources

Users may want to consult https://www.tensorflow.org/guide/function - there are many environment variables that can used to control execution speed.


Labels: gpu jupyter
Was this article helpful?
1 out of 1 found this helpful