Beyond RGB in Image Classification
I have always been a bit curious about the effect of other color spaces on computer vision tasks such as image classification. This could be because of my curious nature on how exactly to make use of color spaces in the real world or perhaps a part of me that doesn’t like approaching problems using conventional means. However, in this article, you’ll get to know that images are much more than just RGB and the results will surprise you. So let’s begin!.
This article does not explain color spaces. If you want to learn about color spaces see this blog post by Adrian Rosebrock.
This work can simply be seen as presenting the same data in different color spaces to a simple convolutional neural network, keeping all other conditions constant and evaluating which color space works best. In case I missed something you can always edit the notebooks and try on a different dataset. However, similar research showed that other color spaces or some combinations do perform better than RGB for image classification.
In this little research, I compared the validation accuracy of various color spaces on the butterfly species datasets. Here are some things to note.
- I ensured every possible random seed was set to 42(I know right, why 42?).
- Ran the notebooks several times just to make sure the results are consistent.
- Ablation studies were conducted on different normalization methods such as dividing by 255, subtracting from the mean, and dividing by the standard deviation.
In all the notebooks I made use of Keras preprocessing_function which takes in a single image, performs any predefined operation and outputs it. Opencv was also used to convert the RGB images to other color spaces.
A simple convolutional neural network was used with rmsprop optimizer, a learning rate of 0.0001 and was trained with just 5 epochs on all cases. All experiments were carried out in Colab. Images were loaded using Keras image data generator, with batch size 32, and resized to 150 by 150. No augmentation was done.
Experiment 1
Here all images were normalized by dividing through by 255. Here are the results.
From the above figure, we can see that the validation accuracy of YCrCb came out on top followed by YUV and then HSV before RGB. Let's see the second notebook.
Experiment 2
Here all images were normalized by subtracting each image from their mean and dividing through by their standard deviation. Below are the results.
This is where things start getting interesting. Notice the RGB and RGB_Norm. Apparently, this way of normalizing RGB images seems better. However, YCrCb still won this round, followed by RGB_Norm and HSV. YUV is nowhere to be found.
Experiment 3
For this round, after converting to various color spaces, I split the images into their respective channels, normalized each channel using the method in experiment two (subtracting the mean and dividing by standard deviation) and then merged them back together. The results below will leave you with a few thoughts. Let’s see.
Yes!, I was also surprised because not only did YUV surpass YCrCb, but it also had the highest validation and test accuracy across all experiments!. YCrCb did come second though and HSV Third.
Experiment 4
Here I decided to use a pre-trained model (ResNet50) and normalization was done like in experiment 1(dividing through by 255). I noticed that when the layers were frozen, RGB performed better for obvious reasons. It has been trained on RGB images. I proceeded to unfreeze the layers and guess which color space came on top?.
You can notice before it started overfitting, YCrCb still took the lead followed by YUV and then HSV.
Results and Observations
From the above table, we can say YUV, YCrCb and HSV are indeed better than RGB when it comes to Image classification. Also splitting and normalizing each channel separately by subtracting the mean and dividing by the standard deviation is a better way to normalize YUV images. However, you can normalize RGB images this way and still get a good performance.
Conclusion
From the above four cases, you can agree that RGB isn’t always the best option for image classification. Imagine the effect of color spaces on other computer vision tasks such as image segmentation, object detection and so on. The question however still remains, why do we keep training our models using RGB images and not exploring other color spaces?.
Other questions worth answering, why does normalizing each channel separately work better, especially for YUV? Would these results be similar when we try it on other benchmark datasets?