Cool dog! I wonder what kind it is?

13 min readDec 18, 2020

Images depicting the various colors of Great Danes — Variety of Great Danes

Have you ever been out for a walk and come across a dog and wondered what kind it was? I know I have. Wouldn’t it be cool if there was a quick way to identify the dog breed? Dogs come in a wide variety of shapes and sizes from Chihuahuas to Great Danes. Within just those two breeds there are a variety of colors and other distinguishing features that make it a challenge to figure out what breed they are. As the Variety of Great Danes image shows, within just Great Danes, there is a wide range of colors. Some have cropped ears, others do not. Thinking about all this variability, some questions come to mind. Can machine learning be used to identify dogs and determine its breed? Can machine learning be used to identify humans in images and determine what dog breed they look like? Can a machine do this with a high level of accuracy?

Project overview

The goal of this capstone project for the Udacity Data Scientist nano degree is to classify images of dogs according to dog breeds.

Develop an algorithm that could be used by a web or mobile app
Classify images of dogs according to dog breed
Classify images of humans according to dog breed
Return an error if no dogs or humans were detected

Strategy

To accomplish these goals, I used a combination of pre-trained machine learning classification models and the keras neural network library to build convolutional neural network model. I will first build a CNN from scratch and experiment with layers of various types and numbers and parameters. I will use transfer learning to combine pre-trained image classifiers with my CNN to see what impact they have. I will test each model accuracy with a set of test images. The project is contained in a Jupyter Notebook using Python.

Metrics

Because the breed model is multi-class single label classification, it makes the most sense to use keras categorical cross-entropy for the loss function. As the model is fitted, the loss and accuracy is calculated for each epoch and the parameters which showed the least loss and highest accuracy are saved using ModelCheckpoint. The formula used to check the accuracy of the models is:

Accuracy = number correctly predicted / total number of predictions

Data analysis and exploration

Before diving into building the models, it is important to get a sense of what the data looks like. Udacity provided training and label data for the project in the form of data sets related to dogs and humans.

Here is a breakdown of the training data provided for identifying dogs in images and then determining the breed.

Human training data

To train the model to detect humans in images, 13,233 human images were used.

The dog training data is a set of URLs to images of dogs. The target data is a one-hot encoded classification table of 133 features.

An array of dog names was also provided. The index into the array corresponds with one of the 133 features in the one-hot encoded classification table.

Looking at the training data distribution chart, it is easy to see that the data is not evenly distributed. In fact, some breeds have as much as 3 times the data of other breeds. The breed with the lowest representation has only 26 images and the breed with the highest representation has 77. The mean number of images per breed is 50. Having unbalanced data may result in the model being biased towards over-represented breeds. This will likely impact model accuracy.

Another area that will impact the models ability to detect breeds, is image quality. Looking at a random sample of the training images, shows some anomalies in image quality. Some images are very dark. If a dark dog is in a dark image, it makes it difficult to discern the breed. Some images are blurry, which also makes breed recognition a challenge. Another potential problem for model training is images with multiple dogs. Which one should the machine focus on. All of these factors will reduce the accuracy of the model.

Project methodology

For this project, Udacity provided a guided framework in the form of a Jupyter notebook using Python. In the notebook, there were a number of packages, some code blocks and a guided approach to help direct the project in a logical order with some questions and suggestions for further exploration. A project workspace was also provided with GPUs available so there was no need for a high powered machine to complete the project.

Project Plan:

Explore the data to understand size, shape, and contents
Build a model to identify human faces in images
Write an algorithm to indicate if an image contains any faces
Build a model to identify dogs in images
Write an algorithm to indicate if an image contains any dogs
Build a CNN from scratch to determine dog breed given an image with a minimum 1% accuracy within 5 epochs
Use transfer learning to build CNNs to determine dog breed
Evaluate and Refine
Find the most accurate CNN with a minimum accuracy of 60%
Write an algorithm to get the dog breed for an image
Write an algorithm to tie it all together by identifying human faces and dogs and then getting the breed of dog.
Test with a set of images including at least 2 dogs, 2 humans, and one with neither
Draw conclusions

Detecting humans in images

The first step was to take a pre-trained OpenCV Haar feature-base cascade classifier to detect faces in images. Documentation related to the pre-trained OpenCV model can be found here. The cv2 library was used to interact with the pre-trained model and to pre-process the images to convert them to grayscale. Using the detectMultiScale method the classifier detects faces and returns the number found.

The model used was haarcascade_frontalface_alt.xml. One hundred images each of humans and of dogs were pre-processed to convert to grayscale. Then they were processed to assess the accuracy of the classifier.

100% of the human images were identified as containing human faces
11% of the dog images were identified as containing human faces

Out of curiosity, I also ran the same set of images through another pre-trained OpenCV model haarcascade_profileface.xml. With the profile model, the accuracy went down significantly.

50% of the human images were identified as containing human faces
2% of the dog images were identified as containing human faces

Detecting dogs in images

To detect dogs in images a pre-trained ResNet-50 model was used. If you are curious, this is what the model looks like. The model was trained on 10 million image URLs linked to 1000 categories from ImageNet. The shape of the images is 255x255 with 3 color channels. These need to be converted to a 3D tensor and then a 4D tensor to get the data to the right shape for the model. The keras preprocess_input function was used to re-order the color channels and normalize the data and prepare it for the model. The ResNet-50 model’s predict method is used to predict if the image contains a dog and np.argmax is used to get an integer value representing the predicted object class. In the ImageNet dictionary of possible output classes, dogs are in the range of 151–268, inclusive. If the model returns a number in this range, a dog was detected.

To measure the accuracy of the model, a set of 100 human images and 100 dog images were sent through the classifier. The result was

0% of the human images were detected as dogs
100% of the dog images were identified as dogs

Creating a convolutional neural network from scratch

Now for the fun part, creating a CNN from scratch. The provided training images are divided by 255 to rescale each pixel of the images and the images are converted to tensors. The input shape of the model is 224 x 224 x 3 for the shape of the images and 3 color channels. The output of the model is 133 categories for the 133 breeds of dogs that we have dog names for. Since I as building this from scratch, I started by creating the CNN example provided in the notebook. I did some training on that and checked accuracy and started to experiment.

Input Shape: 224x224x3. This represents the shape of the images and 3 color channels
Output Shape: 133 A vector with the predicted breed

For a start, I decided to set padding to ‘same’ for all of the convolutional layers. I wanted the full area of the image to be available. I had read that increasing the filters from layer to layer was generally good for image processing, so I squared each layer from one to the next. I used a larger kernel_size in the layers with smaller filters. Initially I kept all of the strides set to 1, but later in an effort to increase accuracy decided to alternate between 1 and 2. I added pooling layers and dropout between each convolutional layer to reduce dimensionality and minimize over-fitting.

I initially struggled getting a CNN to work because of a typo in my output size. Ugh! I scrapped everything and carefully created the example model provided. Once I had a working model, I started making adjustments. My first working model had 3 convolutional layers and 2 dense layers and no dropout and had an accuracy of 5%. I tried adding additional convolutional layers, but noticed some overfitting. So I added dropout layers. For my final model, I used 5 convolutional layers (including the input layer), with max pooling layers to reduce dimensionality. I also used dropout layers to reduce overfitting. At the end there are 3 dense layers. I used increasing filter size to improve accuracy. Model checkpointing was used when fitting the model so that the best parameters were saved for future use.

It was interesting to note the amount of variability in accuracy that I got each time I tried a different model. I would switch GPUs between attempts but the accuracy range for the same model could vary by as much as 4%.

The goal of this model was to predict at a minimum accuracy of 1% within 5 epochs. The accuracy of the model is calculated as number-correct-predictions / total-predictions. My CNN had a test accuracy of 11.6029%. The accuracy charts show that overfitting was starting to come into play on this model at around 16 epochs.

Data augmentation was added to the model in an attempt to improve accuracy further. I added width shifting and height shifting and horizontal flipping. The augmented CNN had a test accuracy of 15.3110%. Data augmentation doubled the accuracy of my CNN.

Transfer learning with pre-trained models

What kind of accuracy gains could be achieved by exploring transfer learning with pre-trained models? Would there be an impact on accuracy? I looked at 3 pre-trained models that Udacity provided feeding their last convolutional layer output into my model which adds a global average pooling layer and a dense layer layer to output our 133 categories using Softmax activation.

The first pre-trained model I looked at was VGG-16. Training this model took seconds per epoch vs the minutes per epoch that my CNN took to train. The resulting test accuracy for VGG-16 was 41.9856%.

To add a pre-trained model for transfer learning, Udacity provided files to load bottleneck features from and create test, train, and validation data sets.

Using the same CNN structure on top of the ResNet-50 pre-trained model, I again found the training to be very fast and the accuracy improved even more. The test accuracy for ResNet-50 was 81.3397%.

I also looked at an InceptionV3 pre-trained model which performed similarly to the ResNet-50 model. The InceptionV3 test accuracy was 81.6986%.

Model Evaluation and Validation

The goal for the breed prediction CNN was to produce a model with at least 60% accuracy.

3 pre-trained models were evaluated: VGG-16, ResNet-50, and InceptionV3.
VGG-16 had a training accuracy of For this, the best performing model was the pre-trained transfer learning model was pretty close between ResNet-50 and InceptionV3. Based on the higher accuracy of InceptionV3, I chose that model for the algorithm. The InceptionV3 model has a 92MB size, 23,851,784 parameters, and a depth of 159. Based on size and having higher accuracy scores, as seen on Keras Application documentation, my decision was reinforced.

Putting it all together in an algorithm

Now that I had models to detect humans and dogs in images and a CNN to predict the breed of dog the dog or human is most like, it was time to create an algorithm that would put it all together.

Pseudo-code for dog breed predictor algorithm

Testing the algorithm

For my algorithm, I look for any images in an image folder and run them through the prediction algorithm. I used a set of images that include humans, dogs, sometimes both and sometimes neither. The following are a few of the images the algorithm was tested with.

Here the image contained both a human and a dog and since they looked alike, both were predicted with the same breed.

Error example when no dogs or humans were identified

The breed was not always correctly identified. For breeds that are not really distinct, incorrect predictions occurred.

Recap

Human Face detection used pre-trained OpenCV model and achieved a test accuracy of 100% given human images
Dog detection used a pre-trained ResNet-50 model and achieved a test accuracy of 100% given dog images
CNN created from scratch achieved a base test accuracy of 11.6029% surpassing the 1% minimum
CNN created from scratch with data augmentation achieved a test accuracy of 15.3110%
Transfer learning CNN based on InceptionV3 pre-trained model achieved a test accuracy of 81.6986% surpassing the 60% minimum
Accuracy = number correct predictions / total number of predictions
Accuracy could be improved more with parameter tuning and data augmentation

Conclusion

This project set out to create an algorithm that predicts dog breeds for humans and dogs found images. It does this using a combination of ML classifier models and a CNN leveraging transfer learning. Classifying breeds of dogs is very tricky due to wide ranges in what a dog can look like within the same breed. Convolutional Neural Networks lend themselves well to working with images. Transfer learning with pre-trained models provided significant reductions in training time as well as improvements in test accuracy.

The impact of data augmentation on the CNN created from scratch was enlightening. It makes sense that if we provide greater variety of focus on specific parts of images and image orientation, that the machine will learn to look for images in different ways. Doubling the accuracy of the base model was more than I expected.

Transfer learning was new to me. It provided huge improvements in both the amount of time it took to build the model and the accuracy of the model. This is definitely something to learn more about.

When creating a CNN from scratch, there are no hard and fast rules about how many layers to use, what types of layers to use. And, there are only some general guidelines around how to set the parameters for these layers. This makes coming up with a highly accurate CNN very challenging.

Going into this project, I knew nothing about CNNs, not even what CNN stands for. I had never worked with images. I found the project very interesting and was glad to have the starter Jupyter notebook to provide some guidance.

Improvements to explore

More training images and labels especially for breeds with a lot of color and shape variety.
Improve the quality of the images used
Remove images containing multiple dogs
My parting thoughts on areas for further exploration would be to try out the Xception pre-trained model the CNN to see how it preforms.
Add data augmentation to the CNN to see how that impacts accuracy.
The biggest area for more exploration requires more study to gain a better understanding of the impacts of each type of layer in the CNN and how to tune the model parameters.

The Jupyter Notebook used for this project can be seen here in my Github repo.