Fashion MNIST¶

In this practical application notebook, we will work with fashion MNIST dataset to carry out a classification exercise using Artificial Neural Networks.¶

Dataset¶


The dataset, Fashion MNIST, is a collection of apparel images falling into several classes. Classes are numbered from 0 to 9 and have the following meanings with Tshirt/Top represented as 0 and an Ankle Boot as 9.

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']


Objective¶


In this exercise, we will create a simple ANN model to classify the images into some categories


Toolkit¶


We will use TensforFlow, tensorflow implementation of keras on google colab for this exercise.

Loading the libraries¶

In [2]:
#!pip install tensorflow
In [3]:
import warnings
warnings.filterwarnings("ignore")
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
In [4]:
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
In [5]:
tf.__version__
Out[5]:
'2.18.0'

Loading the Data¶

Let's import the data from the tf.keras.datasets and prepare the train and the test set.

In [6]:
# Load the data
(X_train, trainY), (X_test,testY) = tf.keras.datasets.fashion_mnist.load_data()
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
29515/29515 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
26421880/26421880 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
5148/5148 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
4422102/4422102 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
In [7]:
X_train.shape, X_test.shape
Out[7]:
((60000, 28, 28), (10000, 28, 28))
In [8]:
X_train.shape[1] * X_train.shape[2]
Out[8]:
784
  • This suggests that there are 60000 images of size 28*28 in the training set and 10000 images of size 28*28 in the test set.
  • Note that we will need to flatten these images before fitting an ANN model.
  • Let us now explore the classes present in the dataset.
In [9]:
np.unique(trainY)
Out[9]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8)
  • This suggests that the train set has 10 classes where each class denotes one type of apparel.

Encoding the target variable¶

  • We need to one hot encode the target variable to be able to form the training target vector.
  • Hint: check tf.keras.utils.to_categorical() - https://www.tensorflow.org/api_docs/python/tf/keras/utils/to_categorical
In [10]:
y_train = tf.keras.utils.to_categorical(trainY,num_classes=10)
y_test = tf.keras.utils.to_categorical(testY,num_classes=10)

# Let's have a look at the shapes of all the datasets
X_train.shape, y_train.shape, X_test.shape, y_test.shape
Out[10]:
((60000, 28, 28), (60000, 10), (10000, 28, 28), (10000, 10))
In [11]:
## Let's normalize the dataset. Since there are pixel values ranging from 0-255, let us divide by 255 to get the new ranges from 0-1
X_train = X_train/255
X_test = X_test/255

Visualization¶

  • Now, let us visualize the data items.
  • We will visualize the first 24 images in the training dataset.
In [12]:
class_names_list = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

plt.figure(figsize=(8,8))
for i in range(24):
    plt.subplot(4,6,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(X_train[i], cmap=plt.cm.binary)
    plt.xlabel(class_names_list[trainY[i]])
plt.show()
No description has been provided for this image

Model Building¶

  • We will now start with the model building process.
  • We will create a model with
  • A layer to flatten the input
  • A hidden layer with 64 nodes (You can play around with this number) and 'relu' activation.
  • Output layer

Model-1¶

Question 1: Add the output layer with activation function and number of neurons required based on the problem statement.¶

In [13]:
# Initialize sequential model

model_1 = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax'), # Remove this and complete the code.

])
In [14]:
model_1.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ flatten (Flatten)                    │ (None, 784)                 │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense (Dense)                        │ (None, 64)                  │          50,240 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_1 (Dense)                      │ (None, 10)                  │             650 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 50,890 (198.79 KB)
 Trainable params: 50,890 (198.79 KB)
 Non-trainable params: 0 (0.00 B)

Observations

  • The summary of the model shows each layer's name, type, output shape, and the number of parameters at that particular layer.
  • It also shows the total number of trainable and non-trainable parameters in the model. A parameter whose value is learned while training the model is called a trainable parameter otherwise it is called a non-trainable parameter.
  • The Flatten layer simply flattens each image into a size of 784 (28*28) and there is no learning or training at this layer. Hence, the number of parameters is 0 for the Flatten layer.
  • Each image in the form of 784 nodes would be the input for the 'dense' layer. Each node of the previous layer would be connected with each node of the current layer. Also, each connection has one weight to learn and each node has one bias. So, the total number of parameters are (784*64)+64 = 50,240.
  • Similarly, the last layer - 'dense_1' have (64*10)+10 = 650 parameters.

Let us now compile the model.

  • We will use 'adam' optimization and 'CategoricalCrossentropy' Loss as the loss. We will track the accuracy metric.
In [15]:
model_1.compile(optimizer='adam', loss='categorical_crossentropy',  metrics = ['accuracy'])
In [16]:
# Let us now fit the model

fit_history = model_1.fit(X_train, y_train,validation_split=0.1, verbose=1, epochs=10, batch_size=64)
Epoch 1/10
844/844 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.7527 - loss: 0.7361 - val_accuracy: 0.8458 - val_loss: 0.4365
Epoch 2/10
844/844 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.8519 - loss: 0.4276 - val_accuracy: 0.8587 - val_loss: 0.4032
Epoch 3/10
844/844 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.8643 - loss: 0.3854 - val_accuracy: 0.8668 - val_loss: 0.3669
Epoch 4/10
844/844 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.8739 - loss: 0.3532 - val_accuracy: 0.8637 - val_loss: 0.3781
Epoch 5/10
844/844 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.8822 - loss: 0.3305 - val_accuracy: 0.8717 - val_loss: 0.3559
Epoch 6/10
844/844 ━━━━━━━━━━━━━━━━━━━━ 5s 3ms/step - accuracy: 0.8843 - loss: 0.3179 - val_accuracy: 0.8783 - val_loss: 0.3324
Epoch 7/10
844/844 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.8893 - loss: 0.3064 - val_accuracy: 0.8692 - val_loss: 0.3570
Epoch 8/10
844/844 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.8926 - loss: 0.2940 - val_accuracy: 0.8783 - val_loss: 0.3407
Epoch 9/10
844/844 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.8997 - loss: 0.2806 - val_accuracy: 0.8798 - val_loss: 0.3364
Epoch 10/10
844/844 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.9006 - loss: 0.2736 - val_accuracy: 0.8763 - val_loss: 0.3418

Observation

  • We can observe that the model's accuracy increases with the increase in the number of epochs.

Evaluate the model on the test set¶

  • Let's predict using the test data. The .predict() method in Keras models returns the probabilities of each observation belonging to each class. We will choose the class where the predicted probability is the highest.
  • Also, let's build a function to print the classification report and confusion matrix.
In [17]:
def metrics_score(actual, predicted):
    from sklearn.metrics import classification_report
    from sklearn.metrics import confusion_matrix
    print(classification_report(actual, predicted))
    cm = confusion_matrix(actual, predicted)
    plt.figure(figsize=(8,5))
    sns.heatmap(cm, annot=True,  fmt='.0f', xticklabels=class_names_list, yticklabels=class_names_list)
    plt.ylabel('Actual')
    plt.xlabel('Predicted')
    plt.show()

Question 2: What is the test accuracy for the model1?¶

In [18]:
model_1.evaluate(X_test, y_test, verbose = 1)
test_pred1 = np.argmax(model_1.predict(X_test), axis = -1)
test_pred1
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8663 - loss: 0.3692
313/313 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Out[18]:
array([9, 2, 1, ..., 8, 1, 5])

Question 3: Which category has been most correctly classified by the model1?¶

In [19]:
metrics_score(testY, test_pred1)
              precision    recall  f1-score   support

           0       0.78      0.85      0.82      1000
           1       0.98      0.97      0.97      1000
           2       0.82      0.72      0.77      1000
           3       0.83      0.91      0.87      1000
           4       0.81      0.75      0.78      1000
           5       0.93      0.98      0.95      1000
           6       0.66      0.67      0.67      1000
           7       0.95      0.92      0.94      1000
           8       0.97      0.95      0.96      1000
           9       0.96      0.95      0.95      1000

    accuracy                           0.87     10000
   macro avg       0.87      0.87      0.87     10000
weighted avg       0.87      0.87      0.87     10000

No description has been provided for this image

Observations

  • Class 6 (Shirt) has the lowest recall and precision. The model is not able to identify the shirt. The confusion matrix shows that the model has predicted shirts as T-shirts/top, Pullover, and Coat which is understandable as these items have similar looks.
  • Let's try changing the learning rate and train the model for more epochs and see if the model can identify even subtle differences in different objects.

Further Iterations to model building¶

  • Let's change the learning rate and epochs and observe the effect on accuracy on the earlier network.
  • Let's build a bigger network with the new learning rate and epochs.

Model-2¶

In [20]:
# Initialize sequential model

model_2 = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    #tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation = 'softmax')
])

model_2.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), loss= 'categorical_crossentropy', metrics= ['accuracy'])
In [21]:
model_2.summary()
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ flatten_1 (Flatten)                  │ (None, 784)                 │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_2 (Dense)                      │ (None, 128)                 │         100,480 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_3 (Dense)                      │ (None, 10)                  │           1,290 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 101,770 (397.54 KB)
 Trainable params: 101,770 (397.54 KB)
 Non-trainable params: 0 (0.00 B)

Observation

  • The summary remains the same as the previous model because we have not changed anything about the structure of the NN.
In [22]:
fit_history_2 = model_2.fit(X_train, y_train, epochs=30, validation_split=0.1, batch_size=64, verbose = 2)
Epoch 1/30
844/844 - 4s - 4ms/step - accuracy: 0.8162 - loss: 0.5298 - val_accuracy: 0.8440 - val_loss: 0.4440
Epoch 2/30
844/844 - 6s - 7ms/step - accuracy: 0.8625 - loss: 0.3883 - val_accuracy: 0.8645 - val_loss: 0.3723
Epoch 3/30
844/844 - 4s - 5ms/step - accuracy: 0.8738 - loss: 0.3491 - val_accuracy: 0.8633 - val_loss: 0.3801
Epoch 4/30
844/844 - 5s - 6ms/step - accuracy: 0.8825 - loss: 0.3251 - val_accuracy: 0.8733 - val_loss: 0.3457
Epoch 5/30
844/844 - 3s - 4ms/step - accuracy: 0.8883 - loss: 0.3047 - val_accuracy: 0.8760 - val_loss: 0.3313
Epoch 6/30
844/844 - 5s - 5ms/step - accuracy: 0.8933 - loss: 0.2894 - val_accuracy: 0.8802 - val_loss: 0.3203
Epoch 7/30
844/844 - 5s - 6ms/step - accuracy: 0.8978 - loss: 0.2769 - val_accuracy: 0.8835 - val_loss: 0.3239
Epoch 8/30
844/844 - 6s - 7ms/step - accuracy: 0.9013 - loss: 0.2666 - val_accuracy: 0.8793 - val_loss: 0.3421
Epoch 9/30
844/844 - 4s - 5ms/step - accuracy: 0.9056 - loss: 0.2554 - val_accuracy: 0.8902 - val_loss: 0.3163
Epoch 10/30
844/844 - 5s - 6ms/step - accuracy: 0.9106 - loss: 0.2442 - val_accuracy: 0.8888 - val_loss: 0.3177
Epoch 11/30
844/844 - 5s - 6ms/step - accuracy: 0.9124 - loss: 0.2385 - val_accuracy: 0.8915 - val_loss: 0.3186
Epoch 12/30
844/844 - 3s - 3ms/step - accuracy: 0.9154 - loss: 0.2311 - val_accuracy: 0.8883 - val_loss: 0.3177
Epoch 13/30
844/844 - 5s - 6ms/step - accuracy: 0.9171 - loss: 0.2235 - val_accuracy: 0.8878 - val_loss: 0.3395
Epoch 14/30
844/844 - 4s - 5ms/step - accuracy: 0.9191 - loss: 0.2187 - val_accuracy: 0.8862 - val_loss: 0.3336
Epoch 15/30
844/844 - 3s - 3ms/step - accuracy: 0.9208 - loss: 0.2113 - val_accuracy: 0.8903 - val_loss: 0.3443
Epoch 16/30
844/844 - 5s - 6ms/step - accuracy: 0.9254 - loss: 0.2043 - val_accuracy: 0.8912 - val_loss: 0.3157
Epoch 17/30
844/844 - 4s - 5ms/step - accuracy: 0.9264 - loss: 0.1988 - val_accuracy: 0.8943 - val_loss: 0.3190
Epoch 18/30
844/844 - 4s - 4ms/step - accuracy: 0.9275 - loss: 0.1952 - val_accuracy: 0.8897 - val_loss: 0.3445
Epoch 19/30
844/844 - 4s - 5ms/step - accuracy: 0.9306 - loss: 0.1879 - val_accuracy: 0.8897 - val_loss: 0.3435
Epoch 20/30
844/844 - 2s - 3ms/step - accuracy: 0.9305 - loss: 0.1829 - val_accuracy: 0.8847 - val_loss: 0.3600
Epoch 21/30
844/844 - 4s - 4ms/step - accuracy: 0.9339 - loss: 0.1786 - val_accuracy: 0.8875 - val_loss: 0.3539
Epoch 22/30
844/844 - 3s - 4ms/step - accuracy: 0.9342 - loss: 0.1756 - val_accuracy: 0.8972 - val_loss: 0.3281
Epoch 23/30
844/844 - 4s - 5ms/step - accuracy: 0.9381 - loss: 0.1697 - val_accuracy: 0.8940 - val_loss: 0.3397
Epoch 24/30
844/844 - 3s - 3ms/step - accuracy: 0.9381 - loss: 0.1657 - val_accuracy: 0.8890 - val_loss: 0.3539
Epoch 25/30
844/844 - 4s - 4ms/step - accuracy: 0.9395 - loss: 0.1627 - val_accuracy: 0.8957 - val_loss: 0.3529
Epoch 26/30
844/844 - 3s - 4ms/step - accuracy: 0.9402 - loss: 0.1602 - val_accuracy: 0.8940 - val_loss: 0.3682
Epoch 27/30
844/844 - 3s - 3ms/step - accuracy: 0.9424 - loss: 0.1536 - val_accuracy: 0.8902 - val_loss: 0.3696
Epoch 28/30
844/844 - 3s - 3ms/step - accuracy: 0.9439 - loss: 0.1505 - val_accuracy: 0.8855 - val_loss: 0.3650
Epoch 29/30
844/844 - 7s - 8ms/step - accuracy: 0.9458 - loss: 0.1471 - val_accuracy: 0.8947 - val_loss: 0.3557
Epoch 30/30
844/844 - 3s - 3ms/step - accuracy: 0.9457 - loss: 0.1453 - val_accuracy: 0.8898 - val_loss: 0.3854

Observations

  • We can see that the accuracy of the training data has increased by ~3% but the accuracy on the validation set has increased only by ~0.50% as compared to the model trained with 10 epochs.
  • This indicates that if we further increase the number of epochs while keeping everything else the same then the model might start to overfit.
In [23]:
model_2.evaluate(X_test,y_test, verbose = 1)

test_pred2 = np.argmax(model_2.predict(X_test), axis  = -1)

metrics_score(testY, test_pred2)
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8824 - loss: 0.4026
313/313 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
              precision    recall  f1-score   support

           0       0.80      0.86      0.83      1000
           1       0.99      0.97      0.98      1000
           2       0.78      0.85      0.81      1000
           3       0.87      0.90      0.89      1000
           4       0.79      0.84      0.82      1000
           5       0.98      0.96      0.97      1000
           6       0.78      0.60      0.68      1000
           7       0.91      0.98      0.94      1000
           8       0.97      0.97      0.97      1000
           9       0.98      0.93      0.95      1000

    accuracy                           0.88     10000
   macro avg       0.89      0.88      0.88     10000
weighted avg       0.89      0.88      0.88     10000

No description has been provided for this image

Model-3¶

Question 4: For the above model i.e Model2, add 1 hidden layer with 128 neurons and relu activation function after the flatten layer. The test accuracy of this model lies between,¶

In [24]:
# Initialize sequential model

model_3 = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),  # Remove this and complete the code.
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation = 'softmax')
])
In [25]:
model_3.summary()
Model: "sequential_2"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ flatten_2 (Flatten)                  │ (None, 784)                 │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_4 (Dense)                      │ (None, 128)                 │         100,480 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_5 (Dense)                      │ (None, 64)                  │           8,256 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_6 (Dense)                      │ (None, 10)                  │             650 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 109,386 (427.29 KB)
 Trainable params: 109,386 (427.29 KB)
 Non-trainable params: 0 (0.00 B)

Observations

  • We can see that the number of parameters has increased by ~2.15 times than the number of parameters in previous models.
  • Increasing the number of parameters can significantly increase the training time of the model.
In [ ]:
model_3.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), loss= 'categorical_crossentropy', metrics= ['accuracy'])

fit_history_3 = model_3.fit(X_train, y_train, epochs=30, validation_split=0.1, batch_size=64, verbose = 1)
Epoch 1/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - accuracy: 0.7599 - loss: 0.6924 - val_accuracy: 0.8547 - val_loss: 0.4064
Epoch 2/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.8620 - loss: 0.3888 - val_accuracy: 0.8662 - val_loss: 0.3795
Epoch 3/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.8741 - loss: 0.3455 - val_accuracy: 0.8723 - val_loss: 0.3450
Epoch 4/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 4s 5ms/step - accuracy: 0.8830 - loss: 0.3192 - val_accuracy: 0.8802 - val_loss: 0.3443
Epoch 5/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 4s 5ms/step - accuracy: 0.8891 - loss: 0.2995 - val_accuracy: 0.8825 - val_loss: 0.3293
Epoch 6/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.8957 - loss: 0.2810 - val_accuracy: 0.8805 - val_loss: 0.3311
Epoch 7/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.8999 - loss: 0.2634 - val_accuracy: 0.8842 - val_loss: 0.3247
Epoch 8/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.9030 - loss: 0.2629 - val_accuracy: 0.8838 - val_loss: 0.3393
Epoch 9/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.9070 - loss: 0.2469 - val_accuracy: 0.8837 - val_loss: 0.3285
Epoch 10/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.9088 - loss: 0.2424 - val_accuracy: 0.8827 - val_loss: 0.3477
Epoch 11/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 6s 5ms/step - accuracy: 0.9140 - loss: 0.2320 - val_accuracy: 0.8913 - val_loss: 0.3231
Epoch 12/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.9128 - loss: 0.2311 - val_accuracy: 0.8903 - val_loss: 0.3259
Epoch 13/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.9198 - loss: 0.2136 - val_accuracy: 0.8882 - val_loss: 0.3362
Epoch 14/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.9205 - loss: 0.2116 - val_accuracy: 0.8845 - val_loss: 0.3503
Epoch 15/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.9226 - loss: 0.2043 - val_accuracy: 0.8897 - val_loss: 0.3241
Epoch 16/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.9269 - loss: 0.1937 - val_accuracy: 0.8925 - val_loss: 0.3309
Epoch 17/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 6s 5ms/step - accuracy: 0.9275 - loss: 0.1951 - val_accuracy: 0.8902 - val_loss: 0.3333
Epoch 18/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.9297 - loss: 0.1908 - val_accuracy: 0.8868 - val_loss: 0.3480
Epoch 19/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.9321 - loss: 0.1837 - val_accuracy: 0.8917 - val_loss: 0.3301
Epoch 20/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 7s 6ms/step - accuracy: 0.9355 - loss: 0.1719 - val_accuracy: 0.8882 - val_loss: 0.3459
Epoch 21/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.9361 - loss: 0.1706 - val_accuracy: 0.8875 - val_loss: 0.3548
Epoch 22/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.9376 - loss: 0.1651 - val_accuracy: 0.8825 - val_loss: 0.3784
Epoch 23/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 6s 5ms/step - accuracy: 0.9388 - loss: 0.1639 - val_accuracy: 0.8908 - val_loss: 0.3658
Epoch 24/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.9417 - loss: 0.1568 - val_accuracy: 0.8922 - val_loss: 0.3587
Epoch 25/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.9423 - loss: 0.1535 - val_accuracy: 0.8793 - val_loss: 0.4636
Epoch 26/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 4s 5ms/step - accuracy: 0.9430 - loss: 0.1507 - val_accuracy: 0.8868 - val_loss: 0.3975
Epoch 27/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.9453 - loss: 0.1461 - val_accuracy: 0.8910 - val_loss: 0.3819
Epoch 28/30
844/844 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.9495 - loss: 0.1352 - val_accuracy: 0.8875 - val_loss: 0.4024
Epoch 29/30
650/844 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - accuracy: 0.9509 - loss: 0.1312

Observations

  • The validation accuracy of the model has further increased by ~0.71% and the training accuracy has further increased by ~1.4%. So, there is still a hint of overfitting.
  • We can play around with hyperparameters of the model or try different layer structures to improve the model performance and reduce the overfitting.
  • We can see that accuracy keeps increasing for the test data as the number of epochs increased but validation accuracy has become somewhat constant after 10 epochs.
  • This indicates that the model learns the training data more closely after each epoch but cannot replicate the performance on the validation data which is a sign of overfitting.
  • The same pattern can be observed for loss as well. It keeps decreasing for the training data with the increase in epochs but becomes somewhat constant for the validation data after 10 epochs.

Now, let's make final predictions on the test data using the last model we built.

Final Predictions on the Test Data¶

In [ ]:
final_pred = np.argmax(model_3.predict(X_test), axis  = -1)

metrics_score(testY, final_pred)
  • The precision and recall for class 6 (Shirt) have increased. The confusion matrix shows that the model is still not able to differentiate between T-shirt/top and Shirt but became better in differentiating Shirt with Pullover and Coat.
  • The model has become even better at identifying Trouser. It has an f1-score of 98% for class 1 (Trouser).
  • The overall accuracy on the test data is approximately the same as the validation accuracy.

Let's visualize the images from the test data.¶

  • We will randomly select 24 images from the test data and visualize them.
  • The title of each image would show the actual and predicted label of that image and the probability of the predicted class.
  • Higher the probability more confident the model is about the prediction.
In [ ]:
rows = 4
cols = 6
fig = plt.figure(figsize=(15, 15))
for i in range(cols):
    for j in range(rows):
        random_index = np.random.randint(0, len(testY))
        ax = fig.add_subplot(rows, cols, i * rows + j + 1)
        ax.imshow(X_test[random_index, :])
        pred_label = class_names_list[final_pred[random_index]]
        true_label = class_names_list[testY[random_index]]
        y_pred_test_max_probas = np.max(model_3.predict(X_test), axis=1)
        pred_proba = y_pred_test_max_probas[random_index]
        ax.set_title("actual: {}\npredicted: {}\nprobability: {:.3}\n".format(
               true_label, pred_label, pred_proba
        ))
plt.show()

Comments¶

  • We have trained 3 different models with some changes.
  • The plots track the variation in the accuracies and the loss across epochs and allow us to map how better these models generalize.
  • We have observed good performance on the train set but there is some amount of overfitting in the models that get more prominent as we increase the epochs.
  • We went ahead with model 3 and evaluated the test data on it.
  • Finally, we visualized some of the images from the test data.
In [ ]:
# Convert notebook to html
!jupyter nbconvert --to html "/content/drive/MyDrive/MIT - Data Sciences/Colab Notebooks/Week_Six_-_Deep_Learning/Hand_On_Quiz_ANN/Hands_on_quiz_ANN.ipynb"