Classifying Dogs and Cats Using InceptionResNetV2: A Deep Dive

The task of classifying images of dogs and cats has long been a popular benchmark in the field of computer vision. With the advent of deep learning and convolutional neural networks (CNNs), achieving high accuracy on this problem has become more accessible. In this blog post, we'll explore how to leverage the power of the InceptionResNetV2 architecture to build a robust model for classifying dog and cat images. We'll walk through the code implementation step by step, discussing data preparation, model architecture, training, and evaluation.



Why InceptionResNetV2?

InceptionResNetV2 is a powerful convolutional neural network that combines the strengths of Inception architectures and Residual connections. This model is pre-trained on the ImageNet dataset, which contains over a million images across 1,000 classes. By using transfer learning, we can adapt this pre-trained model to our specific task with relatively little data and computational resources.

Data Preparation

Dataset

We use the classic Dogs vs. Cats dataset, which consists of 25,000 images of dogs and cats. The dataset is split into training and validation sets to evaluate the model's performance on unseen data.

Directory Structure

The dataset is organized into subdirectories for training and validation:

data/

├── train/

│   ├── cats/

│   └── dogs/

└── validation/

    ├── cats/

    └── dogs/


Data Augmentation

To improve the model's generalization, we apply data augmentation techniques using Keras' ImageDataGenerator. The following augmentations are applied:

  • Rescaling: Pixel values are rescaled to the range [0, 1].
  • Rotation: Images are randomly rotated by up to 40 degrees.
  • Width and Height Shift: Images are shifted horizontally and vertically by up to 20%.
  • Shear Transformation: Shear intensity is set to 0.2.
  • Zoom: Images are zoomed in by up to 20%.
  • Horizontal Flip: Images are randomly flipped horizontally.


from keras.preprocessing.image import ImageDataGenerator


train_datagen = ImageDataGenerator(

    rescale=0.2,

    rotation_range=40,

    width_shift_range=0.2,

    height_shift_range=0.2,

    shear_range=0.2,

    zoom_range=0.2,

    horizontal_flip=True)


Data Generators

We create data generators for both training and validation sets:

train_generator = train_datagen.flow_from_directory(

    'data/train',

    target_size=(299, 299),

    batch_size=8,

    class_mode='categorical')


validation_generator = validation_datagen.flow_from_directory(

    'data/validation',

    target_size=(299, 299),

    batch_size=8,

    class_mode='categorical')


Building the Model

Loading the Pre-trained Model

We load the InceptionResNetV2 model without the top classification layer, setting include_top=False. This allows us to add our own classifier tailored to the dogs vs. cats problem.

from keras.applications.inception_resnet_v2 import InceptionResNetV2


base_model = InceptionResNetV2(weights='imagenet', include_top=False, input_shape=(299, 299, 3))


Adding Custom Layers

We add custom layers on top of the base model to adapt it to our binary classification task:

  • Global Average Pooling Layer: Reduces each feature map to a single number.
  • Dense Layer: Adds a fully connected layer with 256 units and ReLU activation.
  • Dropout Layer: Prevents overfitting by randomly setting input units to 0 with a frequency of 0.5 during training.
  • Output Layer: A single neuron with softmax activation for multi-class classification.

# Load the pre-trained InceptionResNetV2 model

base_model = InceptionResNetV2(weights='imagenet', include_top=False, input_tensor=None, input_shape=(299, 299, 3))


for layer in base_model.layers:

    layer.trainable = False


x = base_model.output

x = GlobalAveragePooling2D()(x)

x = Dense(1024, activation='relu')(x)

x = Dropout(0.5)(x)

output_layer = Dense(2, activation='softmax', name='softmax')(x)


model = Model(inputs=base_model.input, outputs=output_layer)


defined_metrics = [

    tf.keras.metrics.BinaryAccuracy(name='accuracy'),

    tf.keras.metrics.Precision(name='precision'),

    tf.keras.metrics.Recall(name='recall'),

    tf.keras.metrics.AUC(name='auc'),

]


sgd = SGD(learning_rate=1e-5, decay=1e-6, momentum=0.9, nesterov=True)

print('Configuration Start -------------------------')

print(sgd.get_config())

print('Configuration End -------------------------')


model.compile(optimizer=sgd, loss='categorical_crossentropy', metrics=defined_metrics)

print(model.summary())


Freezing Layers

Initially, we freeze the layers of the base model to train only the top layers.

for layer in base_model.layers:

    layer.trainable = False


Training the Model

We train the model for an initial 100 epochs to fit the top layers.

early_stopping = EarlyStopping(

    monitor='val_loss',

    patience=3,

    restore_best_weights=True

)


model_checkpoint = ModelCheckpoint(

    'best_model_resnet_v2.h5',

    monitor='val_accuracy',

    save_best_only=True

)



history = model.fit(

    train_generator,

    steps_per_epoch=train_generator.samples // batch_size,

    epochs=100,

    validation_data=validation_generator,

    validation_steps=validation_generator.samples // batch_size,

    callbacks=[early_stopping, model_checkpoint]

)

After the training is completed we can plot it's accuracy:


We obtain the following results for our trained model:

938/938 [==============================] - 124s 132ms/step - loss: 0.0221 - accuracy: 0.9937 - precision: 0.9937 - recall: 0.9937 - auc: 0.9993

loss :  0.022075723856687546

accuracy :  0.9937333464622498

precision :  0.9937333464622498

recall :  0.9937333464622498

auc :  0.9992535710334778

Evaluating the Model

We then predict the images without a category and visually confirm that the model is doing a good job:


Finally, I try with my own dog to see that the model is working as expected, and in fact it works:

Results

Our model achieves a high accuracy on both the training and validation sets, indicating that it has learned to distinguish between dog and cat images effectively. The use of data augmentation and fine-tuning of the pre-trained model contributes significantly to this performance.

Conclusion

In this blog post, we demonstrated how to use the InceptionResNetV2 model for the classic dogs vs. cats image classification problem. By leveraging transfer learning and fine-tuning techniques, we built a model that achieves high accuracy with relatively little training data. This approach can be generalized to other image classification tasks, offering a powerful tool for computer vision applications.

Future Work

  • Hyperparameter Tuning: Experiment with different optimizers and learning rates.
  • More Augmentation: Apply more aggressive data augmentation to further improve generalization.
  • Cross-Validation: Use k-fold cross-validation for a more robust evaluation.

Code Repository

The complete code for this project is available on GitHub:

Dogs vs. Cats Classification with InceptionResNetV2

Feel free to clone the repository and try it out yourself!

The trained model is also available here: https://github.com/JordiCorbilla/dogs-vs-cats-classification/releases/download/v1.0/best_model_resnet_v2.h5

Comments

Popular Posts