How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

栏目: IT技术 · 发布时间: 4年前

内容简介：I hope you enjoyed reading this post! ❤

How I learned my computer to play Spot it! using OpenCV and Deep Learning

Some fun with computer vision and CNNs with a small dataset.

Apr 21 ·8min read

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

A hobby of mine is playing board games and because I have some knowledge about CNNs, I decided to build an application that can beat humans in a card game. I wanted to build the model from scratch with my own dataset, to see how well a model from scratch with a small dataset would perform. I chose to start with a not-too-hard game, Spot it! (a.k.a. Dobble ).

In the case you don’t know Spot it! yet, here follows a short game explanation: Spot it! is a simple pattern recognition game in which players try to find an image shown on two cards. Each card in original Spot it! features eight different symbols, with the symbols varying in size from one card to the next. Any two cards have exactly one symbol in common. If you’re the first one who finds that symbol, you win the card. Whoever has collected the most cards when the 55-card deck runs out wins.

Where to start?

The first step of any data science problem is gathering data. I took some pictures with my phone, six of each card. That makes a total of 330 pictures. Four of them are shown below. You might think: is this enough to build a perfect Convolutional Neural Network? I will get back to that!

Processing the images

Okay, we have the data, what’s next? This is probably the most important part in order to succeed: processing the images. We need to extract the symbols shown on each card. There are some difficulties here. You can see in the pictures above that some symbols might be harder to extract: the snowman and ghost (third picture) and igloo (fourth picture) have a light color, and the stains (second picture) and exclamation mark (fourth picture) exist of multiple parts. To handle the light color symbols we add contrast to the images. After that we resize and save the image.

Adding contrast

We use the Lab color space for adding contrast. L stands for lightness, a is the color component ranging from green to magenta and b is the color component ranging from blue to yellow. We can extract these components easily with OpenCV :

import cv2
import imutilsimgname = 'picture1'image = cv2.imread(f’{imgname}.jpg’)
lab = cv2.cvtColor(image, cv2.COLOR_BGR2LAB)
l, a, b = cv2.split(lab)

Now we add contrast to the Light component, merge the components back together and convert the image back to normal:

clahe = cv2.createCLAHE(clipLimit=3.0, tileGridSize=(8,8))
cl = clahe.apply(l)
limg = cv2.merge((cl,a,b))
final = cv2.cvtColor(limg, cv2.COLOR_LAB2BGR)

Resize

Then we resize and save the image:

resized = cv2.resize(final, (800, 800))# save the image
cv2.imwrite(f'{imgname}processed.jpg', blurred)

Done!

Detecting the card and symbols

Now the image is processed we can start with detecting the card on the image. It's possible to find the outer contours with OpenCV. Then we need to convert the image to gray scale, choose a threshold (190 in this case) to create a black and white image, and find the contours. In code:

image = cv2.imread(f’{imgname}processed.jpg’)
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
thresh = cv2.threshold(gray, 190, 255, cv2.THRESH_BINARY)[1]# find contours
cnts = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)output = image.copy()# draw contours on image
for c in cnts:
    cv2.drawContours(output, [c], -1, (255, 0, 0), 3)

If we sort the outer contours by area, we can find the contour with the biggest area: this is the card. We can create a white background to extract the symbols.

# sort by area, grab the biggest one
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[0]# create mask with the biggest contour
mask = np.zeros(gray.shape,np.uint8)
mask = cv2.drawContours(mask, [cnts], -1, 255, cv2.FILLED)# card in foreground
fg_masked = cv2.bitwise_and(image, image, mask=mask)# white background (use inverted mask)
mask = cv2.bitwise_not(mask)
bk = np.full(image.shape, 255, dtype=np.uint8)
bk_masked = cv2.bitwise_and(bk, bk, mask=mask)# combine back- and foreground
final = cv2.bitwise_or(fg_masked, bk_masked)

Now it's symbol detection time! We can use the last image to detect outer contours again, these contours are the symbols. If we create a square around each symbol we can extract this region. The code is a bit longer:

# just like before (with detecting the card)
gray = cv2.cvtColor(final, cv2.COLOR_RGB2GRAY)
thresh = cv2.threshold(gray, 195, 255, cv2.THRESH_BINARY)[1]
thresh = cv2.bitwise_not(thresh)cnts = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:10]# handle each contour
i = 0
for c in cnts:
    if cv2.contourArea(c) > 1000:
        # draw mask, keep contour
        mask = np.zeros(gray.shape, np.uint8)
        mask = cv2.drawContours(mask, [c], -1, 255, cv2.FILLED)        # white background
        fg_masked = cv2.bitwise_and(image, image, mask=mask)
        mask = cv2.bitwise_not(mask)
        bk = np.full(image.shape, 255, dtype=np.uint8)
        bk_masked = cv2.bitwise_and(bk, bk, mask=mask)
        finalcont = cv2.bitwise_or(fg_masked, bk_masked)        # bounding rectangle around contour
        output = finalcont.copy()
        x,y,w,h = cv2.boundingRect(c)
        # squares io rectangles
        if w < h:
            x += int((w-h)/2)
            w = h
        else:
            y += int((h-w)/2)
            h = w        # take out the square with the symbol
        roi = finalcont[y:y+h, x:x+w]
        roi = cv2.resize(roi, (400,400))        # save the symbol
        cv2.imwrite(f"{imgname}_icon{i}.jpg", roi)
        i += 1

Sorting the symbols

Now comes the boring part! It's time to sort the symbols. We need a train, test and validation directory, containing 57 directories each (we have 57 different symbols). The folder structure looks like this:

symbols
 ├── test
 │   ├── anchor
 │   ├── apple
 │   │   ...
 │   └── zebra
 ├── train
 │   ├── anchor
 │   ├── apple
 │   │   ...
 │   └── zebra
 └── validation
     ├── anchor
     ├── apple
     │   ...
     └── zebra

It takes some time to put the extracted symbols (over 2500) in the right directories! I have code for creating the subfolders, the test and validation set on GitHub . Maybe it's better to do the sorting with a clustering algorithm next time…

Training a Convolutional Neural Network

Afther the boring part comes the cool part. Let’s build and train a CNN. You can find information about CNNs in this post .

Model architecture

This is a multiclass, single-label classification problem. We want one label for every symbol. That’s why it's necessary to choose a last-layer activation softmax with 57 nodes and a categorical crossentropy loss function.

The architecture of the final model looks like this:

# imports
from keras import layers
from keras import models
from keras import optimizers
from keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt# layers, activation layer with 57 nodes (one for every symbol)
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(400, 400, 3)))
model.add(layers.MaxPooling2D((2, 2)))  
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(256, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(256, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dropout(0.5)) 
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(57, activation='softmax'))model.compile(loss='categorical_crossentropy',       optimizer=optimizers.RMSprop(lr=1e-4), metrics=['acc'])

Data augmentation

For better performance I used data augmentation. Data augmentation is the process of increasing the amount and diversity of input data. This is possible by rotating, shifting, zooming, cropping and flipping existing images. It’s easy to perform data augmentation with Keras:

# specify the directories
train_dir = 'symbols/train'
validation_dir = 'symbols/validation'
test_dir = 'symbols/test'# data augmentation with ImageDataGenerator from Keras (only train)
train_datagen = ImageDataGenerator(rescale=1./255, rotation_range=40, width_shift_range=0.1, height_shift_range=0.1, shear_range=0.1, zoom_range=0.1, horizontal_flip=True, vertical_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)train_generator = train_datagen.flow_from_directory(train_dir, target_size=(400,400), batch_size=20, class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(validation_dir, target_size=(400,400), batch_size=20, class_mode='categorical')

In case you wondered, an augmented ghost looks like this:

Fit the model

Let’s fit the model, save it to use for predictions and check out the results.

history = model.fit_generator(train_generator, steps_per_epoch=100, epochs=100, validation_data=validation_generator, validation_steps=50)# don't forget to save your model!
model.save('models/model.h5')

Results

The baseline model I trained was without data augmentation, dropout and had less layers. This model gave the following results:

You can clearly see this model is overfitting. The results of the final model (from the code in earlier paragraphs) are a lot better. In the image below you can see the accuracy and loss of the train and validation set.

With the test set this model made only one mistake: it predicted a bomb as a drop. I decided to stick with the model, the accuracy was 0.995 on the test set.

Predict the common symbol of two cards

Now it’s possible to predict the common symbol on two cards. We can use two images, make predictions for each image separately and use an intersection to see what symbol the cards both have. This gives three possibilities:

Something went wrong during prediction time: there are no common symbols found.
There’s exactly one symbol in the intersection (can be wrong or right).
There’s more than one symbol in the intersection. In this case I selected the symbol with the highest probability (mean of both predictions).

The code is on GitHub for predicting all combinations of two images in a directory, the main.py file.

Some results:

Conclusions

Is this a perfect performing model? Unfortunately, no! When I took new pictures of cards and let the model predict the common symbol, it had some issues with the snowman. Sometimes it predicted an eye or a zebra as a snowman! That gives some strange results:

Is this model better than humans? It depends: humans can do it perfectly, but the model is faster! I timed the computer: I gave it the 55 card deck and asked the common symbol for every combination of two cards. That’s a total of 1485 combinations. This took the computer less than 140 seconds. The computer made some mistakes, but it will definitely beat any human when it comes to speed!

I don’t think it’s really hard to build a 100% performing model. It can e.g. be done by using transfer learning. To understand what the model is doing we could visualize the layers for a test image. Things to try next time!

I hope you enjoyed reading this post! ❤

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Python高效开发实战——Django、Tornado、Flask、Twisted（第2版）

刘长龙 / 电子工业出版社 / 2019-1 / 99

也许你听说过全栈工程师，他们善于设计系统架构，精通数据库建模、通用网络协议、后端并发处理、前端界面设计，在学术研究或工程项目上能独当一面。通过对Python 3及相关Web框架的学习和实践，你就可以成为这样的全能型人才。《Python高效开发实战——Django、Tornado、Flask、Twisted（第2版）》分为3篇：上篇是Python基础，带领初学者实践Python开发环境，掌握......一起来看看《Python高效开发实战——Django、Tornado、Flask、Twisted（第2版）》这本书的介绍吧!

码农工具

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning