How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

栏目: IT技术 · 发布时间: 4年前

内容简介:I hope you enjoyed reading this post! ❤

How I learned my computer to play Spot it! using OpenCV and Deep Learning

Some fun with computer vision and CNNs with a small dataset.

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

A hobby of mine is playing board games and because I have some knowledge about CNNs, I decided to build an application that can beat humans in a card game. I wanted to build the model from scratch with my own dataset, to see how well a model from scratch with a small dataset would perform. I chose to start with a not-too-hard game, Spot it! (a.k.a. Dobble ).

In the case you don’t know Spot it! yet, here follows a short game explanation: Spot it! is a simple pattern recognition game in which players try to find an image shown on two cards. Each card in original Spot it! features eight different symbols, with the symbols varying in size from one card to the next. Any two cards have exactly one symbol in common. If you’re the first one who finds that symbol, you win the card. Whoever has collected the most cards when the 55-card deck runs out wins.

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

Try it yourself: what’s the common symbol on the cards shown above?

Where to start?

The first step of any data science problem is gathering data. I took some pictures with my phone, six of each card. That makes a total of 330 pictures. Four of them are shown below. You might think: is this enough to build a perfect Convolutional Neural Network? I will get back to that!

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

Processing the images

Okay, we have the data, what’s next? This is probably the most important part in order to succeed: processing the images. We need to extract the symbols shown on each card. There are some difficulties here. You can see in the pictures above that some symbols might be harder to extract: the snowman and ghost (third picture) and igloo (fourth picture) have a light color, and the stains (second picture) and exclamation mark (fourth picture) exist of multiple parts. To handle the light color symbols we add contrast to the images. After that we resize and save the image.

Adding contrast

We use the Lab color space for adding contrast. L stands for lightness, a is the color component ranging from green to magenta and b is the color component ranging from blue to yellow. We can extract these components easily with OpenCV :

import cv2
import imutilsimgname = 'picture1'image = cv2.imread(f’{imgname}.jpg’)
lab = cv2.cvtColor(image, cv2.COLOR_BGR2LAB)
l, a, b = cv2.split(lab)

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

From left to right: original image, Light component, a component and b component

Now we add contrast to the Light component, merge the components back together and convert the image back to normal:

clahe = cv2.createCLAHE(clipLimit=3.0, tileGridSize=(8,8))
cl = clahe.apply(l)
limg = cv2.merge((cl,a,b))
final = cv2.cvtColor(limg, cv2.COLOR_LAB2BGR)

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

From left to right: original image, Light component, added contrast, convert back to RGB

Resize

Then we resize and save the image:

resized = cv2.resize(final, (800, 800))# save the image
cv2.imwrite(f'{imgname}processed.jpg', blurred)

Done!

Detecting the card and symbols

Now the image is processed we can start with detecting the card on the image. It's possible to find the outer contours with OpenCV. Then we need to convert the image to gray scale, choose a threshold (190 in this case) to create a black and white image, and find the contours. In code:

image = cv2.imread(f’{imgname}processed.jpg’)
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
thresh = cv2.threshold(gray, 190, 255, cv2.THRESH_BINARY)[1]# find contours
cnts = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)output = image.copy()# draw contours on image
for c in cnts:
    cv2.drawContours(output, [c], -1, (255, 0, 0), 3)

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

Processed image, converted to grayscale, thresholded, and with outer contours

If we sort the outer contours by area, we can find the contour with the biggest area: this is the card. We can create a white background to extract the symbols.

# sort by area, grab the biggest one
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[0]# create mask with the biggest contour
mask = np.zeros(gray.shape,np.uint8)
mask = cv2.drawContours(mask, [cnts], -1, 255, cv2.FILLED)# card in foreground
fg_masked = cv2.bitwise_and(image, image, mask=mask)# white background (use inverted mask)
mask = cv2.bitwise_not(mask)
bk = np.full(image.shape, 255, dtype=np.uint8)
bk_masked = cv2.bitwise_and(bk, bk, mask=mask)# combine back- and foreground
final = cv2.bitwise_or(fg_masked, bk_masked)

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

Mask, background, foreground, combined

Now it's symbol detection time! We can use the last image to detect outer contours again, these contours are the symbols. If we create a square around each symbol we can extract this region. The code is a bit longer:

# just like before (with detecting the card)
gray = cv2.cvtColor(final, cv2.COLOR_RGB2GRAY)
thresh = cv2.threshold(gray, 195, 255, cv2.THRESH_BINARY)[1]
thresh = cv2.bitwise_not(thresh)cnts = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:10]# handle each contour
i = 0
for c in cnts:
    if cv2.contourArea(c) > 1000:
        # draw mask, keep contour
        mask = np.zeros(gray.shape, np.uint8)
        mask = cv2.drawContours(mask, [c], -1, 255, cv2.FILLED)        # white background
        fg_masked = cv2.bitwise_and(image, image, mask=mask)
        mask = cv2.bitwise_not(mask)
        bk = np.full(image.shape, 255, dtype=np.uint8)
        bk_masked = cv2.bitwise_and(bk, bk, mask=mask)
        finalcont = cv2.bitwise_or(fg_masked, bk_masked)        # bounding rectangle around contour
        output = finalcont.copy()
        x,y,w,h = cv2.boundingRect(c)
        # squares io rectangles
        if w < h:
            x += int((w-h)/2)
            w = h
        else:
            y += int((h-w)/2)
            h = w        # take out the square with the symbol
        roi = finalcont[y:y+h, x:x+w]
        roi = cv2.resize(roi, (400,400))        # save the symbol
        cv2.imwrite(f"{imgname}_icon{i}.jpg", roi)
        i += 1

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

Thresholded image, contours found, ghost symbol and heart symbol (symbols extracted with masks)

Sorting the symbols

Now comes the boring part! It's time to sort the symbols. We need a train, test and validation directory, containing 57 directories each (we have 57 different symbols). The folder structure looks like this:

symbols
 ├── test
 │   ├── anchor
 │   ├── apple
 │   │   ...
 │   └── zebra
 ├── train
 │   ├── anchor
 │   ├── apple
 │   │   ...
 │   └── zebra
 └── validation
     ├── anchor
     ├── apple
     │   ...
     └── zebra

It takes some time to put the extracted symbols (over 2500) in the right directories! I have code for creating the subfolders, the test and validation set on GitHub . Maybe it's better to do the sorting with a clustering algorithm next time…

Training a Convolutional Neural Network

Afther the boring part comes the cool part. Let’s build and train a CNN. You can find information about CNNs in this post .

Model architecture

This is a multiclass, single-label classification problem. We want one label for every symbol. That’s why it's necessary to choose a last-layer activation softmax with 57 nodes and a categorical crossentropy loss function.

The architecture of the final model looks like this:

# imports
from keras import layers
from keras import models
from keras import optimizers
from keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt# layers, activation layer with 57 nodes (one for every symbol)
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(400, 400, 3)))
model.add(layers.MaxPooling2D((2, 2)))  
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(256, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(256, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dropout(0.5)) 
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(57, activation='softmax'))model.compile(loss='categorical_crossentropy',       optimizer=optimizers.RMSprop(lr=1e-4), metrics=['acc'])

Data augmentation

For better performance I used data augmentation. Data augmentation is the process of increasing the amount and diversity of input data. This is possible by rotating, shifting, zooming, cropping and flipping existing images. It’s easy to perform data augmentation with Keras:

# specify the directories
train_dir = 'symbols/train'
validation_dir = 'symbols/validation'
test_dir = 'symbols/test'# data augmentation with ImageDataGenerator from Keras (only train)
train_datagen = ImageDataGenerator(rescale=1./255, rotation_range=40, width_shift_range=0.1, height_shift_range=0.1, shear_range=0.1, zoom_range=0.1, horizontal_flip=True, vertical_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)train_generator = train_datagen.flow_from_directory(train_dir, target_size=(400,400), batch_size=20, class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(validation_dir, target_size=(400,400), batch_size=20, class_mode='categorical')

In case you wondered, an augmented ghost looks like this:

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

Original ghost on the left, augmented ghosts on the other images

Fit the model

Let’s fit the model, save it to use for predictions and check out the results.

history = model.fit_generator(train_generator, steps_per_epoch=100, epochs=100, validation_data=validation_generator, validation_steps=50)# don't forget to save your model!
model.save('models/model.h5')

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

Perfect predictions!

Results

The baseline model I trained was without data augmentation, dropout and had less layers. This model gave the following results:

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

Results of the baseline model

You can clearly see this model is overfitting. The results of the final model (from the code in earlier paragraphs) are a lot better. In the image below you can see the accuracy and loss of the train and validation set.

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

Results of the final model

With the test set this model made only one mistake: it predicted a bomb as a drop. I decided to stick with the model, the accuracy was 0.995 on the test set.

Predict the common symbol of two cards

Now it’s possible to predict the common symbol on two cards. We can use two images, make predictions for each image separately and use an intersection to see what symbol the cards both have. This gives three possibilities:

  • Something went wrong during prediction time: there are no common symbols found.
  • There’s exactly one symbol in the intersection (can be wrong or right).
  • There’s more than one symbol in the intersection. In this case I selected the symbol with the highest probability (mean of both predictions).

The code is on GitHub for predicting all combinations of two images in a directory, the main.py file.

Some results:

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

Conclusions

Is this a perfect performing model? Unfortunately, no! When I took new pictures of cards and let the model predict the common symbol, it had some issues with the snowman. Sometimes it predicted an eye or a zebra as a snowman! That gives some strange results:

How I Learned my Computer to Play Spot it! using OpenCV and Deep Learning

Snowman? Where?

Is this model better than humans? It depends: humans can do it perfectly, but the model is faster! I timed the computer: I gave it the 55 card deck and asked the common symbol for every combination of two cards. That’s a total of 1485 combinations. This took the computer less than 140 seconds. The computer made some mistakes, but it will definitely beat any human when it comes to speed!

I don’t think it’s really hard to build a 100% performing model. It can e.g. be done by using transfer learning. To understand what the model is doing we could visualize the layers for a test image. Things to try next time!

I hope you enjoyed reading this post! ❤


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

编程.建筑

编程.建筑

保罗·科茨 / 2012-9 / 45.00元

《编程•建筑》简单明了地介绍了计算机算法与程序用于建筑设计的历史,解释了基本的算法思想和计算机作为建筑设计工具的运用。作为计算机辅助设计的先驱,保罗·科茨通过多年讲授的计算、设计的教学内容和实例研究,向我们展示了算法思维。《编程•建筑》提供了详细、可操作的编码所需要的技术和哲学思想,给读者一些代码和算法例子的认识。一起来看看 《编程.建筑》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

随机密码生成器
随机密码生成器

多种字符组合密码

SHA 加密
SHA 加密

SHA 加密工具