How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

栏目: IT技术 · 发布时间: 4年前

内容简介:I hope you enjoyed reading this post! ❤

How I learned my computer to play Spot it! using OpenCV and Deep Learning

Some fun with computer vision and CNNs with a small dataset.

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

A hobby of mine is playing board games and because I have some knowledge about CNNs, I decided to build an application that can beat humans in a card game. I wanted to build the model from scratch with my own dataset, to see how well a model from scratch with a small dataset would perform. I chose to start with a not-too-hard game, Spot it! (a.k.a. Dobble ).

In the case you don’t know Spot it! yet, here follows a short game explanation: Spot it! is a simple pattern recognition game in which players try to find an image shown on two cards. Each card in original Spot it! features eight different symbols, with the symbols varying in size from one card to the next. Any two cards have exactly one symbol in common. If you’re the first one who finds that symbol, you win the card. Whoever has collected the most cards when the 55-card deck runs out wins.

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

Try it yourself: what’s the common symbol on the cards shown above?

Where to start?

The first step of any data science problem is gathering data. I took some pictures with my phone, six of each card. That makes a total of 330 pictures. Four of them are shown below. You might think: is this enough to build a perfect Convolutional Neural Network? I will get back to that!

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

Processing the images

Okay, we have the data, what’s next? This is probably the most important part in order to succeed: processing the images. We need to extract the symbols shown on each card. There are some difficulties here. You can see in the pictures above that some symbols might be harder to extract: the snowman and ghost (third picture) and igloo (fourth picture) have a light color, and the stains (second picture) and exclamation mark (fourth picture) exist of multiple parts. To handle the light color symbols we add contrast to the images. After that we resize and save the image.

Adding contrast

We use the Lab color space for adding contrast. L stands for lightness, a is the color component ranging from green to magenta and b is the color component ranging from blue to yellow. We can extract these components easily with OpenCV :

import cv2
import imutilsimgname = 'picture1'image = cv2.imread(f’{imgname}.jpg’)
lab = cv2.cvtColor(image, cv2.COLOR_BGR2LAB)
l, a, b = cv2.split(lab)

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

From left to right: original image, Light component, a component and b component

Now we add contrast to the Light component, merge the components back together and convert the image back to normal:

clahe = cv2.createCLAHE(clipLimit=3.0, tileGridSize=(8,8))
cl = clahe.apply(l)
limg = cv2.merge((cl,a,b))
final = cv2.cvtColor(limg, cv2.COLOR_LAB2BGR)

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

From left to right: original image, Light component, added contrast, convert back to RGB

Resize

Then we resize and save the image:

resized = cv2.resize(final, (800, 800))# save the image
cv2.imwrite(f'{imgname}processed.jpg', blurred)

Done!

Detecting the card and symbols

Now the image is processed we can start with detecting the card on the image. It's possible to find the outer contours with OpenCV. Then we need to convert the image to gray scale, choose a threshold (190 in this case) to create a black and white image, and find the contours. In code:

image = cv2.imread(f’{imgname}processed.jpg’)
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
thresh = cv2.threshold(gray, 190, 255, cv2.THRESH_BINARY)[1]# find contours
cnts = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)output = image.copy()# draw contours on image
for c in cnts:
    cv2.drawContours(output, [c], -1, (255, 0, 0), 3)

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

Processed image, converted to grayscale, thresholded, and with outer contours

If we sort the outer contours by area, we can find the contour with the biggest area: this is the card. We can create a white background to extract the symbols.

# sort by area, grab the biggest one
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[0]# create mask with the biggest contour
mask = np.zeros(gray.shape,np.uint8)
mask = cv2.drawContours(mask, [cnts], -1, 255, cv2.FILLED)# card in foreground
fg_masked = cv2.bitwise_and(image, image, mask=mask)# white background (use inverted mask)
mask = cv2.bitwise_not(mask)
bk = np.full(image.shape, 255, dtype=np.uint8)
bk_masked = cv2.bitwise_and(bk, bk, mask=mask)# combine back- and foreground
final = cv2.bitwise_or(fg_masked, bk_masked)

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

Mask, background, foreground, combined

Now it's symbol detection time! We can use the last image to detect outer contours again, these contours are the symbols. If we create a square around each symbol we can extract this region. The code is a bit longer:

# just like before (with detecting the card)
gray = cv2.cvtColor(final, cv2.COLOR_RGB2GRAY)
thresh = cv2.threshold(gray, 195, 255, cv2.THRESH_BINARY)[1]
thresh = cv2.bitwise_not(thresh)cnts = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:10]# handle each contour
i = 0
for c in cnts:
    if cv2.contourArea(c) > 1000:
        # draw mask, keep contour
        mask = np.zeros(gray.shape, np.uint8)
        mask = cv2.drawContours(mask, [c], -1, 255, cv2.FILLED)        # white background
        fg_masked = cv2.bitwise_and(image, image, mask=mask)
        mask = cv2.bitwise_not(mask)
        bk = np.full(image.shape, 255, dtype=np.uint8)
        bk_masked = cv2.bitwise_and(bk, bk, mask=mask)
        finalcont = cv2.bitwise_or(fg_masked, bk_masked)        # bounding rectangle around contour
        output = finalcont.copy()
        x,y,w,h = cv2.boundingRect(c)
        # squares io rectangles
        if w < h:
            x += int((w-h)/2)
            w = h
        else:
            y += int((h-w)/2)
            h = w        # take out the square with the symbol
        roi = finalcont[y:y+h, x:x+w]
        roi = cv2.resize(roi, (400,400))        # save the symbol
        cv2.imwrite(f"{imgname}_icon{i}.jpg", roi)
        i += 1

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

Thresholded image, contours found, ghost symbol and heart symbol (symbols extracted with masks)

Sorting the symbols

Now comes the boring part! It's time to sort the symbols. We need a train, test and validation directory, containing 57 directories each (we have 57 different symbols). The folder structure looks like this:

symbols
 ├── test
 │   ├── anchor
 │   ├── apple
 │   │   ...
 │   └── zebra
 ├── train
 │   ├── anchor
 │   ├── apple
 │   │   ...
 │   └── zebra
 └── validation
     ├── anchor
     ├── apple
     │   ...
     └── zebra

It takes some time to put the extracted symbols (over 2500) in the right directories! I have code for creating the subfolders, the test and validation set on GitHub . Maybe it's better to do the sorting with a clustering algorithm next time…

Training a Convolutional Neural Network

Afther the boring part comes the cool part. Let’s build and train a CNN. You can find information about CNNs in this post .

Model architecture

This is a multiclass, single-label classification problem. We want one label for every symbol. That’s why it's necessary to choose a last-layer activation softmax with 57 nodes and a categorical crossentropy loss function.

The architecture of the final model looks like this:

# imports
from keras import layers
from keras import models
from keras import optimizers
from keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt# layers, activation layer with 57 nodes (one for every symbol)
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(400, 400, 3)))
model.add(layers.MaxPooling2D((2, 2)))  
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(256, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(256, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dropout(0.5)) 
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(57, activation='softmax'))model.compile(loss='categorical_crossentropy',       optimizer=optimizers.RMSprop(lr=1e-4), metrics=['acc'])

Data augmentation

For better performance I used data augmentation. Data augmentation is the process of increasing the amount and diversity of input data. This is possible by rotating, shifting, zooming, cropping and flipping existing images. It’s easy to perform data augmentation with Keras:

# specify the directories
train_dir = 'symbols/train'
validation_dir = 'symbols/validation'
test_dir = 'symbols/test'# data augmentation with ImageDataGenerator from Keras (only train)
train_datagen = ImageDataGenerator(rescale=1./255, rotation_range=40, width_shift_range=0.1, height_shift_range=0.1, shear_range=0.1, zoom_range=0.1, horizontal_flip=True, vertical_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)train_generator = train_datagen.flow_from_directory(train_dir, target_size=(400,400), batch_size=20, class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(validation_dir, target_size=(400,400), batch_size=20, class_mode='categorical')

In case you wondered, an augmented ghost looks like this:

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

Original ghost on the left, augmented ghosts on the other images

Fit the model

Let’s fit the model, save it to use for predictions and check out the results.

history = model.fit_generator(train_generator, steps_per_epoch=100, epochs=100, validation_data=validation_generator, validation_steps=50)# don't forget to save your model!
model.save('models/model.h5')

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

Perfect predictions!

Results

The baseline model I trained was without data augmentation, dropout and had less layers. This model gave the following results:

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

Results of the baseline model

You can clearly see this model is overfitting. The results of the final model (from the code in earlier paragraphs) are a lot better. In the image below you can see the accuracy and loss of the train and validation set.

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

Results of the final model

With the test set this model made only one mistake: it predicted a bomb as a drop. I decided to stick with the model, the accuracy was 0.995 on the test set.

Predict the common symbol of two cards

Now it’s possible to predict the common symbol on two cards. We can use two images, make predictions for each image separately and use an intersection to see what symbol the cards both have. This gives three possibilities:

  • Something went wrong during prediction time: there are no common symbols found.
  • There’s exactly one symbol in the intersection (can be wrong or right).
  • There’s more than one symbol in the intersection. In this case I selected the symbol with the highest probability (mean of both predictions).

The code is on GitHub for predicting all combinations of two images in a directory, the main.py file.

Some results:

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

Conclusions

Is this a perfect performing model? Unfortunately, no! When I took new pictures of cards and let the model predict the common symbol, it had some issues with the snowman. Sometimes it predicted an eye or a zebra as a snowman! That gives some strange results:

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

Snowman? Where?

Is this model better than humans? It depends: humans can do it perfectly, but the model is faster! I timed the computer: I gave it the 55 card deck and asked the common symbol for every combination of two cards. That’s a total of 1485 combinations. This took the computer less than 140 seconds. The computer made some mistakes, but it will definitely beat any human when it comes to speed!

I don’t think it’s really hard to build a 100% performing model. It can e.g. be done by using transfer learning. To understand what the model is doing we could visualize the layers for a test image. Things to try next time!

I hope you enjoyed reading this post! ❤


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

微商思维

微商思维

龚文祥、罗剑锋、触电会 / 金城出版社 / 2018-7 / 88.00元

微商不仅仅是一种继传统实体、电商之后的革命性新兴商业形态,更是一种能够写入中国商业史的思潮。龚文祥新著《微商思维》,从道的层面对广大微商人的商业实践智慧进行了高度浓缩与抽象总结,站在更高的视角解读微商背后的商业逻辑与本质。 本书前半部分,主要从本质、品牌、营销等几个方面,阐述了微商思维的内涵及应用场景,帮助读者了解并认识这种革命性的商业思维。 后半部分主要是触电会社群内部各位大咖的实操......一起来看看 《微商思维》 这本书的介绍吧!

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

XML、JSON 在线转换
XML、JSON 在线转换

在线XML、JSON转换工具

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器