Understanding Crossword Puzzles with OpenCV, OCR, and DNNs

栏目: IT技术 · 发布时间: 5年前

内容简介:This post was originally taken from myRecently I was given the task of creating an algorithm, to extract all possible metadata from the crossword photo. This seemed like an interesting task for me, so I decided to give it a try. These are the topics that w

This post was originally taken from my medium blog

Introduction

Recently I was given the task of creating an algorithm, to extract all possible metadata from the crossword photo. This seemed like an interesting task for me, so I decided to give it a try. These are the topics that will be covered in this blogpost:

  1. Crossword cells detection and extraction with OpenCV
  2. Crossword cell classification with Pytorch CNN
  3. Cell metadata extraction

You can find the full code implementation on my Github .

Crossword cells detection

First things first, to extract the metadata, you have to understand where it is located. For this purpose, I used simple OpenCV heuristics to identify the lines on the crossword puzzle and to form a cell grid out of these lines. The input image needs to be sufficiently large, so all lines could be detected easily.

Understanding Crossword Puzzles with OpenCV, OCR, and DNNs

Afterward, for cell detection, I found the intersection between lines and formed the cells based on intersection points.

Understanding Crossword Puzzles with OpenCV, OCR, and DNNs

Finally, at this stage, each cell is cut from the image and saved as a separate file for further manipulations.

Understanding Crossword Puzzles with OpenCV, OCR, and DNNs

Crossword cell classification with PyTorch CNN

For cell classification, everything was really straightforward. The problem was modeled as a multiclass classification problem with the following targets:

{0: 'both', 1: 'double_text', 2: 'down', 3: 'inverse_arrow', 4: 'other', 5: 'right', 6: 'single_text'}

For each of the target classes, I labeled manually around 100 cells for each class. Afterward, I fitted a simple PyTorch CNN model with the following architecture:

class Net(nn.Module):
# Pytorch CNN model class
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 3)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 3)

self.conv3 = nn.Conv2d(16, 32, 5)
self.conv4 = nn.Conv2d(32, 64, 5)


self.dropout = nn.Dropout(0.3)

self.fc1 = nn.Linear(64*11*11, 512)
self.bnorm1 = nn.BatchNorm1d(512)

self.fc2 = nn.Linear(512, 128)
self.bnorm2 = nn.BatchNorm1d(128)

self.fc3 = nn.Linear(128, 64)
self.bnorm3 = nn.BatchNorm1d(64)

self.fc4 = nn.Linear(64, 7)
def forward(self, x):
x = F.relu(self.conv1(x))
x = self.pool(F.relu(self.conv2(x)))

x = F.relu(self.conv3(x))
x = self.pool(F.relu(self.conv4(x)))

x = x.view(-1, 64*11*11)
x = self.dropout(x)
x = F.relu(self.bnorm1(self.fc1(x)))
x = F.relu(self.bnorm2(self.fc2(x)))
x = F.relu(self.bnorm3(self.fc3(x)))
x = self.fc4(x)
return x

The resulting model predictions were almost descent and generalized well even on crossword puzzles of different formats.

Cell metadata extraction

My final step was to extract all metadata from the labeled cells. For this purpose, I firstly created a classified representation of each image cell in the Pandas DataFrame format.

Understanding Crossword Puzzles with OpenCV, OCR, and DNNs

Finally, based on the cell class, I either extracted text from the image using Pytesseract, or I extracted arrow coordinates and direction if the cell was classified as one of the arrow cells.

The resulting output of the script looked the following way in JSON format:

{“definitions”: 
  [{“label”: “F Faitune |”, “position”: [0, 2], “solution”:{“startPosition”: [0, 3], “direction”: “down”}}, 
  {“label”: “anceur”, “position”: [0, 4], “solution”: {“startPosition”: [1, 4], “direction”: “down”}}]
}

Conclusion

This work was a great experience for me and offered a great opportunity to dive into a task which was a mix of simple OpenCV heuristics along with usage of more cutting edge concepts like OCR and DNNs for image classification. Thank you for your read!


以上所述就是小编给大家介绍的《Understanding Crossword Puzzles with OpenCV, OCR, and DNNs》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

创业无畏

创业无畏

彼得· 戴曼迪斯、史蒂芬· 科特勒 / 贾拥民 / 浙江人民出版社 / 2015-8 / 69.90元

 您是否有最大胆的商业梦想?您是否想把一个好主意快速转化为一家市值几百亿甚至几千亿元的公司?《创业无畏》不仅分享了成功创业家的真知灼见,更为我们绘制了一幅激情创业的行动路线图!  创业缺人手怎么办?如何解决钱的问题?把握指数型大众工具,互联网就是你车间,你的仓库。拥有好的创意,自然有人把钱“白白地送给你用”。当你大海捞针的时候,激励性大奖赛会让针自己跑到你的眼前来!  掌握指数级......一起来看看 《创业无畏》 这本书的介绍吧!

JS 压缩/解压工具
JS 压缩/解压工具

在线压缩/解压 JS 代码

URL 编码/解码
URL 编码/解码

URL 编码/解码

SHA 加密
SHA 加密

SHA 加密工具