Image Classification
Summary
In this project, we classify series of images into ten classes using multi-layer perceptron, convolutional neural network, and by using transfer learning from a pre-trained model (Inception model).
Data format
We used CIFAR-10
data set that consists of 60,000 images, each 32x32 color pixels, each belonging to one of ten classes. We used 50,000 images for training and 10,000 for validation of our models.
Ten sample images representing ten classes
Transfer learning
In this approach we computed the latent vectors (the result of the images run through a pre-trained network) ahead of time and use those as our input features to the last few dense layers. The pre-trained network that we used here is Inception
.
To build a classification model using keras API the following steps were taken:
- Load the
inception
model: The following code snippet load the model and omit its classification layer, which is not required for our purposes.inception = tf.keras.applications.inception_v3.InceptionV3(include_top=True, input_shape=(299, 299, 3)) inception = tf.keras.Model([inception.input], [inception.layers[-2].output]) # manually discard prediction layer
-
Upscale our images from 32x32 to
inception
native image shape: 299x299, usingtf.image.resize
- Feed the resized image layer to inception model and save the calculated latent vectores to disk. Since we are not interested in re-training the inception mode, we freez all the layers of the model.
for layer in inception.layers:
layer.trainable = False
The summary of latent vector calculation model is shown in the snippet below:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 32, 32, 3)] 0
_________________________________________________________________
resizing_1 (Resizing) (None, 299, 299, 3) 0
_________________________________________________________________
functional_1 (Functional) (None, 2048) 21802784
=================================================================
Total params: 21,802,784
Trainable params: 0
Non-trainable params: 21,802,784
- Finally, we load features as an input to our model with a classification layer to make predictions
Epoch 1/10
834/834 [==============================] - 1s 2ms/step - loss: 0.5941 - accuracy: 0.7976
Epoch 2/10
834/834 [==============================] - 1s 2ms/step - loss: 0.4592 - accuracy: 0.8406
Epoch 3/10
834/834 [==============================] - 2s 2ms/step - loss: 0.4161 - accuracy: 0.8552
Epoch 4/10
834/834 [==============================] - 2s 2ms/step - loss: 0.3979 - accuracy: 0.8605
Epoch 5/10
834/834 [==============================] - 1s 2ms/step - loss: 0.3686 - accuracy: 0.8704
Epoch 6/10
834/834 [==============================] - 1s 2ms/step - loss: 0.3533 - accuracy: 0.8754
Epoch 7/10
834/834 [==============================] - 2s 2ms/step - loss: 0.3348 - accuracy: 0.8822
Epoch 8/10
834/834 [==============================] - 2s 2ms/step - loss: 0.3225 - accuracy: 0.8843
Epoch 9/10
834/834 [==============================] - 2s 2ms/step - loss: 0.3077 - accuracy: 0.8901
Epoch 10/10
834/834 [==============================] - 2s 2ms/step - loss: 0.2949 - accuracy: 0.8935
The picture below shows the results of our classification model predictions for few images: