TensorFlow on GPUs

TensorFlow is an open source library for machine learning. TensorFlow can train and run deep neural networks for handwritten digit classification, image and word recognition and natural language processing.

TensorFlow is callable from Python with the numerically intensive parts of the algorithms implemented in C++ for efficiency. Here we'll show how to run TensorFlow version 1.10 with GPU support, which is installed on Mahuika. If you want to run TensorFlow on CPUs instead, have a look at our article TensorFlow on CPUs for tips on how to configure TensorFlow and SLURM for optimal performance.

Example: classifying flower pictures

Let's assume we want to classify pictures of flowers - the example below is based on this.

TensorFlow comes with deep neural network models that have been pre-trained with millions of images. The vast majority of the neural network weights of these models have already been optimised for feature detection and will often work well for a new classification problem. All we need to do is to retrain the last neural network layer.

Initial setup

We'll need two scripts that will allow us to train and classify the images:

curl -LO https://github.com/tensorflow/hub/raw/master/examples/image_retraining/retrain.py
curl -LO https://github.com/tensorflow/tensorflow/raw/master/tensorflow/examples/label_image/label_image.py

We'll also need additional functionality from tensorflow-hub, which we will install in our home directory using the command

ml TensorFlow/1.10.1-gimkl-2017a-Python-3.6.3
pip install tensorflow-hub==0.3.0 --user

Getting flower images from the web

Next we need a directory that contains photos of flowers which have been labelled as daisies, dandelions, roses, sunflowers and tulips:

curl -LO http://download.tensorflow.org/example_images/flower_photos.tgz
tar xzf flower_photos.tgz

 

How to retrain TensorFlow 

The following Slurm script will retrain an existing neural network (Inception V3) on Mahuika's GPU. Note that we need to pass `--partition gpu` to Slurm to launch the execution on the GPU. On a shared computer like Mahuika we want to make sure the results are saved in a directory that is shared across nodes as opposed to /tmp, the default. Hence we pass additional command line options to retrain.py specifying the output directory. Be sure to clean up if you have to run `retrain.py` multiple times: `rm -rf output; mkdir -p output/intermediate output/summaries`. 

 

#!/bin/bash -e

#SBATCH --job-name TensorFlow-GPU
#SBATCH --partition gpu
#SBATCH --gres gpu:1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --time 01:00:00
#SBATCH --mem 3G

module load TensorFlow/1.10.1-gimkl-2017a-Python-3.6.3

# we'll save the second last layer weights in this directory, let's make
# sure
the directory exists. Subsequent runs will be faster when the
# bottleneck directory
is populated
mkdir -p bottleneck

# create directories that will hold the output and clean up previous
# results if present

rm -rf output; mkdir -p output/intermediate output/summaries

# retrain the neural network. Type python retrain.py -h for a full
# list of options
srun python retrain.py --image_dir=flower_photos \
--bottleneck_dir=bottleneck --output_graph=output/graph.pb \
--output_label=output/labels.txt \
--intermediate_output_graphs_dir=output/intermediate \
--summaries_dir=output/summaries --saved_model_dir=output/model

Copy-paste the above in file flowers.sl and type `sbatch flowers.sl` to submit the job.

You will find the execution time to be significantly longer the first time you run as the script will save bottleneck weights (the weights of the second to last layer) to files. Subsequent runs should take 10 about minutes.

Check your output file and look for the final "Final test accuracy", which should range between 85-95%. Each run will yield a different value due to the randomness of the training process.

How to classify a flower picture

Once the network has been trained you can find out what type of flower is in a picture. Choose for instance flower_photos/daisy/21652746_cc379e0eea_m.jpg and type:

python label_image.py --image=flower_photos/daisy/21652746_cc379e0eea_m.jpg \
--graph=output/graph.pb --labels=output/labels.txt \
--input_layer=Placeholder --output_layer=final_result

which should return something like

daisy 0.997859
sunflowers 0.00132799
dandelion 0.000446326
tulips 0.000278063
roses 8.88173e-05

Again, your results might be slightly different. In this case the model is 99 percent confident that the image contains a daisy.

Recommended Resources

Users may want to consult https://www.tensorflow.org/guide/performance/overview - there are many environment variables that can used to control execution speed.

 

Was this article helpful?
1 out of 1 found this helpful