TensorFlow is an open source library for machine learning. TensorFlow can train and run deep neural networks for handwritten digit classification, image and word recognition and natural language processing.
TensorFlow is callable from Python with the numerically intensive parts of the algorithms implemented in C++ for efficiency. Here we'll show how to run TensorFlow version 1.10 with GPU support, which is installed on Mahuika. If you want to run TensorFlow on CPUs instead, have a look at our article TensorFlow on CPUs for tips on how to configure TensorFlow and SLURM for optimal performance.
Example: classifying flower pictures
Let's assume we want to classify pictures of flowers - the example below is based on this.
TensorFlow comes with deep neural network models that have been pre-trained with millions of images. The vast majority of the neural network weights of these models have already been optimised for feature detection and will often work well for a new classification problem. All we need to do is to retrain the last neural network layer.
We'll need two scripts that will allow us to train and classify the images:
curl -LO https://github.com/tensorflow/hub/raw/master/examples/image_retraining/retrain.py
curl -LO https://github.com/tensorflow/tensorflow/raw/master/tensorflow/examples/label_image/label_image.py
We'll also need additional functionality from tensorflow-hub, which we will install in our home directory using the command
pip install tensorflow-hub==0.3.0 --user
Getting flower images from the web
Next we need a directory that contains photos of flowers which have been labelled as daisies, dandelions, roses, sunflowers and tulips:
curl -LO http://download.tensorflow.org/example_images/flower_photos.tgz tar xzf flower_photos.tgz
How to retrain TensorFlow
The following Slurm script will retrain an existing neural network (Inception V3) on Mahuika's GPU. Note that we need to pass `--partition gpu` to Slurm to launch the execution on the GPU. On a shared computer like Mahuika we want to make sure the results are saved in a directory that is shared across nodes as opposed to /tmp, the default. Hence we pass additional command line options to retrain.py specifying the output directory. Be sure to clean up if you have to run `retrain.py` multiple times: `rm -rf output; mkdir -p output/intermediate output/summaries`.
#SBATCH --job-name TensorFlow-GPU
#SBATCH --partition gpu
#SBATCH --gres gpu:1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --time 01:00:00
#SBATCH --mem 3G
module load TensorFlow/1.10.1-gimkl-2017a-Python-3.6.3
# we'll save the second last layer weights in this directory, let's make
# sure the directory exists. Subsequent runs will be faster when the
# bottleneck directory is populated
mkdir -p bottleneck
# create directories that will hold the output and clean up previous
# results if present
rm -rf output; mkdir -p output/intermediate output/summaries
# retrain the neural network. Type python retrain.py -h for a full
# list of options
srun python retrain.py --image_dir=flower_photos \
--bottleneck_dir=bottleneck --output_graph=output/graph.pb \
Copy-paste the above in file flowers.sl and type `sbatch flowers.sl` to submit the job.
You will find the execution time to be significantly longer the first time you run as the script will save bottleneck weights (the weights of the second to last layer) to files. Subsequent runs should take 10 about minutes.
Check your output file and look for the final "Final test accuracy", which should range between 85-95%. Each run will yield a different value due to the randomness of the training process.
How to classify a flower picture
Once the network has been trained you can find out what type of flower is in a picture. Choose for instance flower_photos/daisy/21652746_cc379e0eea_m.jpg and type:
python label_image.py --image=flower_photos/daisy/21652746_cc379e0eea_m.jpg \
--graph=output/graph.pb --labels=output/labels.txt \
which should return something like
Again, your results might be slightly different. In this case the model is 99 percent confident that the image contains a daisy.
Users may want to consult https://www.tensorflow.org/guide/performance/overview - there are many environment variables that can used to control execution speed.