Training a neural network using Mobilenets in TensorFlow for image classification on Android

Sumit Kumar Arora
6 min readJan 20, 2018

--

First, a few definitions for the uninitiated.

TensorFlow is an open-source library for numeric computation using dataflow graphs. It was developed by Google brain team as a proprietary machine learning system based on deep learning neural networks. TensorFlow is causing quite a stir in research and development field and is set very to make its way into mainstream machine learning.

About TensorFlow

In this tutorial, we are going to make an Android app that uses a neural network trained by TensorFlow. For our purpose, we will use a special class of convolutional neural networks called MobileNets. As also suggested by the name, the special thing about the MobileNets is that they are “mobile-friendly”, meaning that they are optimized to be executed using minimal possible computing power on a smartphone device.

As you can imagine, there has to be a catch with something so customized for a minimal resource footprint. MobileNets do not provide as good of an accurate model as produced by a full-fledged deep neural network. However, the accuracy is surprisingly very high and good enough for many applications. Below graph shows the graph of accuracy versus the number of calculations required for a choice of neural network libraries. Model size is shown as the size of circle, and different model size of the MobileNets show the tradeoffs assumed during selection of model parameters. See this Google research blog post for more details. We will also come back to the choice of model parameters later.

Accuracy (y-axis) vs. the number of required operations (x-axis) for available configurations

Data collection

The image classifier that we are going to train in this example will be able to classify cars to their respective model. We are going to use “Cars Dataset“ made available by Stanford. [Link to dataset ~ 1.8GB]

This dataset consists of 16,185 images of 196 classes of cars. The label information for this dataset provides make, model and year for each of the 196 classes. The data is also available as separate train and test sets in a 50–50 split. But for this project, we will download the full dataset of 16,185 images, along with the label information.

Data Preprocessing

The image dataset from the Stanford is organized as a single directory containing 16,185 images of cars. To use these images for our training step, we need to reorganize these images so that each car image is inside a directory that contains all the images for a single class. Also, the name of the directory should reflect the name of the corresponding class. We will use the label information available in the .mat file provided by Stanford. Below Python script performs this task.

Edit: As correctly pointed out by a reader, I missed specifying that the name for class 174, i.e. ‘Ram C/V Cargo Van Minivan 2012’ has a forward slash, ‘/’, in it and it should be renamed to remove the slash character.

Organize the data to be used for training the classifier

Training the classifier

For training our image classifier, we are going to use the transfer learning concept. Transfer learning basically refers to a supervised learning technique that takes advantage of an already existing trained model that solves a similar problem. For our purpose, we will take TensorFlow’s fully trained model for Imagenet and retrain just the last layer of the neural network on our Cars dataset. Though this approach is not as powerful as a fully trained model, but it can provide a surprisingly high accuracy for most tasks that are related. You can read more about this concept from this article:

As mentioned in the above article, you can clone the retrain scripts from this GitHub repository.

As we are only training the final layer of the neural network, the training will end in reasonable amount of time. TensorFlow’s retraining procedure allows you to optimize the training procedure by tweaking certain parameters. Following two are probably the most important of those parameters:

  1. Input image resolution: The corresponding value can be 128,160,192, or 224px. As you can imagine training with a higher resolution image will take longer time, but also has higher chances of providing a better classification accuracy. Since, we are only training final layer and our dataset is not very huge, we will keep this value as 224.
  2. Relative model size: This value represents the relative size of the model as a fraction of the largest MobileNet. It can take value such as 1.0, 0.75, 0.50, or 0.25. The larger the size of the model, more accurate it will be. For our purpose, we will keep this value to 0.75.

Both the above parameters can be configured as environment variables as:

IMAGE_SIZE=224
ARCHITECTURE="mobilenet_0.75_${IMAGE_SIZE}"

With these parameters setup, let’s run the retrain python script provided by TensorFlow with following parameters.

python -m scripts.retrain 
— bottleneck_dir=tf_files/bottlenecks
— how_many_training_steps=5000
— model_dir=tf_files/models/”${ARCHITECTURE}”
— summaries_dir=tf_files/training_summaries/”${ARCHITECTURE}”
— output_graph=tf_files/retrained_graph.pb
— output_labels=tf_files/retrained_labels.txt
— architecture=”${ARCHITECTURE}”
— image_dir=tf_files/dataset

Above command sets up directory paths for bottleneck, model and summary files. image_dir refers to the directory where our image image data is stored. output_graph and output_labels provides the path where we will store our training model and label information respectively.

Parameter how_many_training_steps is the count of how many times retraining iterates over the data. By default it is set to reiterate 500 times. If you have time, you could increase this iteration count to achieve better results.

There are still a lot more configuration options that the retraining script provides. Run the below command help command to read more about the available options.

python -m scripts.retrain -h

If the final size of your trained model is large, you could look into some of the techniques that can reduces the model size, such as making the model compressible and to quantize the network weights. Refer to below article to read more on this.

Deploying the solution in Android app

TensorFlow GitHub repository contains an Android project that you can directly load into your Android studio and compile directly to create an image classifier application. Simply copy the trained model(graph.pb) and label information(labels.txt) that we generated in the last step, to the project’s “assets” directory And run the Gradle build. This will generate a .apk file that you can run on your computer in a emulator with camera access, or you can deploy it on an Android phone.

Result

Android app in action

The resulting Android app uses the phone’s camera stream to classify objects into the identified class. The application by default also provides probability value of a car belonging to the corresponding category. You could take a look at the Android application that I published on Google Play store to see this project in action.

As you can see transfer learning with TensorFlow makes it easy to quickly build our own classifiers. I hope you liked this article, and I’ll see you in the next one. :)

--

--

Sumit Kumar Arora
Sumit Kumar Arora

Written by Sumit Kumar Arora

Senior Machine Learning Engineer at Meta. Checkout my blog: https://blog.reachsumit.com/

Responses (3)