内容简介:I hope you enjoyed reading this post! ❤
How I learned my computer to play Spot it! using OpenCV and Deep Learning
Some fun with computer vision and CNNs with a small dataset.
Apr 21 ·8min read
A hobby of mine is playing board games and because I have some knowledge about CNNs, I decided to build an application that can beat humans in a card game. I wanted to build the model from scratch with my own dataset, to see how well a model from scratch with a small dataset would perform. I chose to start with a not-too-hard game, Spot it! (a.k.a. Dobble ).
In the case you don’t know Spot it! yet, here follows a short game explanation: Spot it! is a simple pattern recognition game in which players try to find an image shown on two cards. Each card in original Spot it! features eight different symbols, with the symbols varying in size from one card to the next. Any two cards have exactly one symbol in common. If you’re the first one who finds that symbol, you win the card. Whoever has collected the most cards when the 55-card deck runs out wins.
Where to start?
The first step of any data science problem is gathering data. I took some pictures with my phone, six of each card. That makes a total of 330 pictures. Four of them are shown below. You might think: is this enough to build a perfect Convolutional Neural Network? I will get back to that!
Processing the images
Okay, we have the data, what’s next? This is probably the most important part in order to succeed: processing the images. We need to extract the symbols shown on each card. There are some difficulties here. You can see in the pictures above that some symbols might be harder to extract: the snowman and ghost (third picture) and igloo (fourth picture) have a light color, and the stains (second picture) and exclamation mark (fourth picture) exist of multiple parts. To handle the light color symbols we add contrast to the images. After that we resize and save the image.
Adding contrast
We use the Lab color space for adding contrast. L stands for lightness, a is the color component ranging from green to magenta and b is the color component ranging from blue to yellow. We can extract these components easily with OpenCV :
import cv2 import imutilsimgname = 'picture1'image = cv2.imread(f’{imgname}.jpg’) lab = cv2.cvtColor(image, cv2.COLOR_BGR2LAB) l, a, b = cv2.split(lab)
Now we add contrast to the Light component, merge the components back together and convert the image back to normal:
clahe = cv2.createCLAHE(clipLimit=3.0, tileGridSize=(8,8)) cl = clahe.apply(l) limg = cv2.merge((cl,a,b)) final = cv2.cvtColor(limg, cv2.COLOR_LAB2BGR)
Resize
Then we resize and save the image:
resized = cv2.resize(final, (800, 800))# save the image cv2.imwrite(f'{imgname}processed.jpg', blurred)
Done!
Detecting the card and symbols
Now the image is processed we can start with detecting the card on the image. It's possible to find the outer contours with OpenCV. Then we need to convert the image to gray scale, choose a threshold (190 in this case) to create a black and white image, and find the contours. In code:
image = cv2.imread(f’{imgname}processed.jpg’) gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY) thresh = cv2.threshold(gray, 190, 255, cv2.THRESH_BINARY)[1]# find contours cnts = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) cnts = imutils.grab_contours(cnts)output = image.copy()# draw contours on image for c in cnts: cv2.drawContours(output, [c], -1, (255, 0, 0), 3)
If we sort the outer contours by area, we can find the contour with the biggest area: this is the card. We can create a white background to extract the symbols.
# sort by area, grab the biggest one cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[0]# create mask with the biggest contour mask = np.zeros(gray.shape,np.uint8) mask = cv2.drawContours(mask, [cnts], -1, 255, cv2.FILLED)# card in foreground fg_masked = cv2.bitwise_and(image, image, mask=mask)# white background (use inverted mask) mask = cv2.bitwise_not(mask) bk = np.full(image.shape, 255, dtype=np.uint8) bk_masked = cv2.bitwise_and(bk, bk, mask=mask)# combine back- and foreground final = cv2.bitwise_or(fg_masked, bk_masked)
Now it's symbol detection time! We can use the last image to detect outer contours again, these contours are the symbols. If we create a square around each symbol we can extract this region. The code is a bit longer:
# just like before (with detecting the card) gray = cv2.cvtColor(final, cv2.COLOR_RGB2GRAY) thresh = cv2.threshold(gray, 195, 255, cv2.THRESH_BINARY)[1] thresh = cv2.bitwise_not(thresh)cnts = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) cnts = imutils.grab_contours(cnts) cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:10]# handle each contour i = 0 for c in cnts: if cv2.contourArea(c) > 1000: # draw mask, keep contour mask = np.zeros(gray.shape, np.uint8) mask = cv2.drawContours(mask, [c], -1, 255, cv2.FILLED) # white background fg_masked = cv2.bitwise_and(image, image, mask=mask) mask = cv2.bitwise_not(mask) bk = np.full(image.shape, 255, dtype=np.uint8) bk_masked = cv2.bitwise_and(bk, bk, mask=mask) finalcont = cv2.bitwise_or(fg_masked, bk_masked) # bounding rectangle around contour output = finalcont.copy() x,y,w,h = cv2.boundingRect(c) # squares io rectangles if w < h: x += int((w-h)/2) w = h else: y += int((h-w)/2) h = w # take out the square with the symbol roi = finalcont[y:y+h, x:x+w] roi = cv2.resize(roi, (400,400)) # save the symbol cv2.imwrite(f"{imgname}_icon{i}.jpg", roi) i += 1
Sorting the symbols
Now comes the boring part! It's time to sort the symbols. We need a train, test and validation directory, containing 57 directories each (we have 57 different symbols). The folder structure looks like this:
symbols ├── test │ ├── anchor │ ├── apple │ │ ... │ └── zebra ├── train │ ├── anchor │ ├── apple │ │ ... │ └── zebra └── validation ├── anchor ├── apple │ ... └── zebra
It takes some time to put the extracted symbols (over 2500) in the right directories! I have code for creating the subfolders, the test and validation set on GitHub . Maybe it's better to do the sorting with a clustering algorithm next time…
Training a Convolutional Neural Network
Afther the boring part comes the cool part. Let’s build and train a CNN. You can find information about CNNs in this post .
Model architecture
This is a multiclass, single-label classification problem. We want one label for every symbol. That’s why it's necessary to choose a last-layer activation softmax with 57 nodes and a categorical crossentropy loss function.
The architecture of the final model looks like this:
# imports from keras import layers from keras import models from keras import optimizers from keras.preprocessing.image import ImageDataGenerator import matplotlib.pyplot as plt# layers, activation layer with 57 nodes (one for every symbol) model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(400, 400, 3))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(128, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(256, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(256, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(128, (3, 3), activation='relu')) model.add(layers.Flatten()) model.add(layers.Dropout(0.5)) model.add(layers.Dense(512, activation='relu')) model.add(layers.Dense(57, activation='softmax'))model.compile(loss='categorical_crossentropy', optimizer=optimizers.RMSprop(lr=1e-4), metrics=['acc'])
Data augmentation
For better performance I used data augmentation. Data augmentation is the process of increasing the amount and diversity of input data. This is possible by rotating, shifting, zooming, cropping and flipping existing images. It’s easy to perform data augmentation with Keras:
# specify the directories train_dir = 'symbols/train' validation_dir = 'symbols/validation' test_dir = 'symbols/test'# data augmentation with ImageDataGenerator from Keras (only train) train_datagen = ImageDataGenerator(rescale=1./255, rotation_range=40, width_shift_range=0.1, height_shift_range=0.1, shear_range=0.1, zoom_range=0.1, horizontal_flip=True, vertical_flip=True) test_datagen = ImageDataGenerator(rescale=1./255)train_generator = train_datagen.flow_from_directory(train_dir, target_size=(400,400), batch_size=20, class_mode='categorical') validation_generator = test_datagen.flow_from_directory(validation_dir, target_size=(400,400), batch_size=20, class_mode='categorical')
In case you wondered, an augmented ghost looks like this:
Fit the model
Let’s fit the model, save it to use for predictions and check out the results.
history = model.fit_generator(train_generator, steps_per_epoch=100, epochs=100, validation_data=validation_generator, validation_steps=50)# don't forget to save your model! model.save('models/model.h5')
Results
The baseline model I trained was without data augmentation, dropout and had less layers. This model gave the following results:
You can clearly see this model is overfitting. The results of the final model (from the code in earlier paragraphs) are a lot better. In the image below you can see the accuracy and loss of the train and validation set.
With the test set this model made only one mistake: it predicted a bomb as a drop. I decided to stick with the model, the accuracy was 0.995 on the test set.
Predict the common symbol of two cards
Now it’s possible to predict the common symbol on two cards. We can use two images, make predictions for each image separately and use an intersection to see what symbol the cards both have. This gives three possibilities:
- Something went wrong during prediction time: there are no common symbols found.
- There’s exactly one symbol in the intersection (can be wrong or right).
- There’s more than one symbol in the intersection. In this case I selected the symbol with the highest probability (mean of both predictions).
The code is on GitHub for predicting all combinations of two images in a directory, the main.py file.
Some results:
Conclusions
Is this a perfect performing model? Unfortunately, no! When I took new pictures of cards and let the model predict the common symbol, it had some issues with the snowman. Sometimes it predicted an eye or a zebra as a snowman! That gives some strange results:
Is this model better than humans? It depends: humans can do it perfectly, but the model is faster! I timed the computer: I gave it the 55 card deck and asked the common symbol for every combination of two cards. That’s a total of 1485 combinations. This took the computer less than 140 seconds. The computer made some mistakes, but it will definitely beat any human when it comes to speed!
I don’t think it’s really hard to build a 100% performing model. It can e.g. be done by using transfer learning. To understand what the model is doing we could visualize the layers for a test image. Things to try next time!
I hope you enjoyed reading this post! ❤
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。