内容简介:There are plenty of online tutorials where you can learn to train a neural network to classify handwritten digits using the MNIST dataset, or maybe telling the difference between cats and dogs. Us, humans, are always very good at these and can easily match
Reading Color Blindness Charts: Deep Learning and Computer Vision
There are plenty of online tutorials where you can learn to train a neural network to classify handwritten digits using the MNIST dataset, or maybe telling the difference between cats and dogs. Us, humans, are always very good at these and can easily match or beat the performance of a computer.
However, there are some cases where computers can actually help humans do something that we have difficulty with. For instance, I have mild red-green colorblindness. Thus, charts such as these have usually been difficult if not impossible to see:
What if I could make the computer do this test for me, without me having to squint and inevitably get the question wrong either way.
Well, this task seems simple. Let’s take some images, split them into a training and test sets, train a convolutional neural network, and bam, we are finished. Except… there is no dataset. Online, I was able to find only 54 different images, which is not enough for a training set, given that there are 9 classes (digits 1–9).
So what now? Well, we still have our good old MNIST dataset. We can use it to train a neural network that is amazing at classifying individual digits. With some OpenCV transformations, we can get our charts to look similar to MNIST, which looks like this:
Let’s do it!
Training Convolutional Neural Network on MNIST Dataset
There are many tutorials on this, but I will nonetheless give a high-level overview on how this is done.
First, we will need Tensorflow installed, which is available using pip.
pip install tensorflow
Or if you have a GPU:
pip install tensorflow-gpu
Now we will create a mnist.py file and get our data:
Next, we will setup our Convolutional Neural Network using one Conv2D layer, followed by MaxPooling, and Dropout. Then, our 2D output is flattened and put through a Dense layer with 128 units, followed by our classification layer with 10 classes (number 1–10). The output will be a vector of a length of 10 to indicate the prediction. For instance, a 2 will be predicted like this:
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
since the 1 is in the 2nd index. Here is the code:
Now, we compile the model, run it on our training data, evaluate on our test data, and save as an .h5 file in our directory:
When we run this code, we get a training accuracy of about 99% and test set accuracy of:
Not bad! Now, you should see a mnist.h5 file in your directory. Now, let’s move to the next step.
OpenCV Chart Processing
For this part, we will need a few more libraries:
pip install opencv-python pip install imutils pip install numpy pip install sklearn pip install scikit-image
We need to convert our charts to look somewhat like the MNIST dataset. At first, I thought, let’s just convert the image to grayscale. Well, then this happens:
What number is that? I have no idea. As you see, color matters. We cannot just ignore it. Our goal is this:
So, after hours of experimentation, here is what processing we will need to do:
- Increase the contrast
- Apply median and Gaussian blurring
- Apply K-means color clustering
- Convert to grayscale
- Apply thresholding (this one will be tricky)
- More blurring and thresholding
- Morphology open, close, erosion
- Skeletonizing
- Dilation
Wow, that is a lot. Let’s get started. First, contrast. The way contrast works. To be honest, I copied a function online that takes in an image and applies customized brightness and constrast transformations. I put this in a file ContrastBrightness.py and made it a class:
I am did not look too deeply into how this works, but on a higher level, increasing brightness adds values to the RGB channels of an image while increasing contrast multiplies the values by some constant. We will only use the contrast feature.
Another complex part of our algorithm is clustering. Again, I made a file Clusterer.py and put the necessary code into it that I got online:
This code will take an image and a number as an input. That number will determine how many color clusters we will use. Now, let’s make our last file main.py. We will start with imports:
Notice that we are importing our two classes that we just created. Now, please download the images in the charts directory from my Github .
These have all been sorted (with the help of people without color blindness) into appropriate folders.
Now, we will loop through all the images in our path and apply transformations 1–4. I commented my code pretty extensively.
Here is what some of those images look like:
That is an obvious improvement. The digits are all clear. Two problems remain:
- They do not look very hand-written and are too thick
- They are not fully white on a black background
So, we will threshold. However, due to the various coloring of the images, each needs a different threshold to work. We will automated the search for the perfect threshold. I noticed that a digit typically takes up 10–28% of the total image based on the number of pixels.
Thus, we will threshold until we reach that percent white. First, we will define a function that tells us what percent of an input image is white:
Now, we will start at a threshold of 0 and works up to 255 in increments of 10 until we are in the 0.1–0.28 zone (this code goes in our for loop):
Awesome! The finish is in sight. Now we get images like this (if we use the threshold we found):
Most images look pretty good! However, some are problematic. It turns out that some digits are darker than the background. Thus, thresholding makes them black, not white! The 0.1–0.28 zone is never reached.
We can check if our threshold was successful by the value of the variable. If the value of the variable is 260, it means that the while loop ended without finding a perfect threshold. For those images, we will have a separate procedure.
Essentially, we will
- Invert the images to make the inside bright compared to background
- Convert to black and white
- Create a circular mask to mask out the background (that went from black to white when we inverted).
Here is the visual process:
The last step is the most difficult, so I commented it in my code. Here is our whole function for this:
We will adjust our code to use the new function and also do steps 6–7. These are all built-in OpenCV transformations, so nothing surprising here:
Let’s see what the images look like!
Awesome! Clearly recognizable. Originally, our neural network accuracy was 11% because it was choosing among the 10 classes that it can guess randomly. If we stopped here, our accuracy would be about 63%, almost 6X better than random! However, we can do a little bit more.
We will skeletonize and dilate our images. This will give us a consistent width and look more uniform overall. Let’s do it:
As a reminder, this code is the last code we put inside the big for loop. Here is what everything looks like:
Yeah, maybe a little uneven, but so is handwriting. Should be no problem for the neural network. Now, we just reshape our list, load the model, and evaluate:
Put this code below your for loop. As a recap, we load our model, reshape the data, and print the accuracy after evaluating. The code may take a while to run because it takes some time for all 54 images to be transformed.
Results
After running the code, here is what gets printed for me:
Let’s take a look! We got an overall accuracy of…. 78%! That is 7–8 times better than random and probably a lot better than a person with medium to severe colorblindness can do. This is outstanding!
If we look at our recall (ratio of correctly predicted positive observations to the all observations in actual class) for our digits, we see that we had great performance for 1–5 and 9. We had okay performance for 8, and our neural network really struggled with 6s and 7s.
This approach clearly has limitations and the transformations I listed do not work for all possible color blindness images (there is actually one image in the dataset that it does not work with after the thresholding step). Try printing all the processed 9s and you will see that the thresholding step results in a ration between 0.1 and .28 but that is because the background becomes partly white. I did not try to find a solution for this because this only affected one image.
Conclusion
I hope this tutorial has been informative on how a similar dataset can be used to make predictions on a different one. Also, I hope that this tutorial helps beginners become more comfortable with OpenCV, Tensorflow, and Python in general.
To view the complete code and download the images and model, check out my Github .
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
网络经济的十种策略
(美)凯文・凯利 / 肖华敬/任平 / 广州出版社 / 2000-06 / 26.00元
全书介绍网络经济的十个新游戏规则,分别是:蜜蜂比狮子重要;级数比加法重要;普及比稀有重要;免费比利润重要;网络比公司重要;造山比登山重要;空间比场所重要;流动比平衡重要;关系比产能重要;机会比效率重要!一起来看看 《网络经济的十种策略》 这本书的介绍吧!