内容简介:This post was originally taken from myRecently I was given the task of creating an algorithm, to extract all possible metadata from the crossword photo. This seemed like an interesting task for me, so I decided to give it a try. These are the topics that w
This post was originally taken from my medium blog
Recently I was given the task of creating an algorithm, to extract all possible metadata from the crossword photo. This seemed like an interesting task for me, so I decided to give it a try. These are the topics that will be covered in this blogpost:
- Crossword cells detection and extraction with OpenCV
- Crossword cell classification with Pytorch CNN
- Cell metadata extraction
You can find the full code implementation on my Github .
Crossword cells detection
First things first, to extract the metadata, you have to understand where it is located. For this purpose, I used simple OpenCV heuristics to identify the lines on the crossword puzzle and to form a cell grid out of these lines. The input image needs to be sufficiently large, so all lines could be detected easily.
Afterward, for cell detection, I found the intersection between lines and formed the cells based on intersection points.
Finally, at this stage, each cell is cut from the image and saved as a separate file for further manipulations.
Crossword cell classification with PyTorch CNN
For cell classification, everything was really straightforward. The problem was modeled as a multiclass classification problem with the following targets:
{0: 'both', 1: 'double_text', 2: 'down', 3: 'inverse_arrow', 4: 'other', 5: 'right', 6: 'single_text'}
For each of the target classes, I labeled manually around 100 cells for each class. Afterward, I fitted a simple PyTorch CNN model with the following architecture:
class Net(nn.Module): # Pytorch CNN model class def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(3, 6, 3) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(6, 16, 3) self.conv3 = nn.Conv2d(16, 32, 5) self.conv4 = nn.Conv2d(32, 64, 5) self.dropout = nn.Dropout(0.3) self.fc1 = nn.Linear(64*11*11, 512) self.bnorm1 = nn.BatchNorm1d(512) self.fc2 = nn.Linear(512, 128) self.bnorm2 = nn.BatchNorm1d(128) self.fc3 = nn.Linear(128, 64) self.bnorm3 = nn.BatchNorm1d(64) self.fc4 = nn.Linear(64, 7) def forward(self, x): x = F.relu(self.conv1(x)) x = self.pool(F.relu(self.conv2(x))) x = F.relu(self.conv3(x)) x = self.pool(F.relu(self.conv4(x))) x = x.view(-1, 64*11*11) x = self.dropout(x) x = F.relu(self.bnorm1(self.fc1(x))) x = F.relu(self.bnorm2(self.fc2(x))) x = F.relu(self.bnorm3(self.fc3(x))) x = self.fc4(x) return x
The resulting model predictions were almost descent and generalized well even on crossword puzzles of different formats.
Cell metadata extraction
My final step was to extract all metadata from the labeled cells. For this purpose, I firstly created a classified representation of each image cell in the Pandas DataFrame format.
Finally, based on the cell class, I either extracted text from the image using Pytesseract, or I extracted arrow coordinates and direction if the cell was classified as one of the arrow cells.
The resulting output of the script looked the following way in JSON format:
{“definitions”: [{“label”: “F Faitune |”, “position”: [0, 2], “solution”:{“startPosition”: [0, 3], “direction”: “down”}}, {“label”: “anceur”, “position”: [0, 4], “solution”: {“startPosition”: [1, 4], “direction”: “down”}}] }
This work was a great experience for me and offered a great opportunity to dive into a task which was a mix of simple OpenCV heuristics along with usage of more cutting edge concepts like OCR and DNNs for image classification. Thank you for your read!
以上所述就是小编给大家介绍的《Understanding Crossword Puzzles with OpenCV, OCR, and DNNs》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
马文胜 编 / 中国财政经济 / 2008-1 / 42.00元
《期货趋势程序化交易方法》可作为学习期货行业的教程。中国期货行业非常重视期货人才队伍的建设,无论是在抓紧推进期货分析师的认证体系建设、提升期货分析师的执业水平上,还是在专业人才的后续教育上。 要想在期货市场上长期生存并保持稳定的获利,必须在充分认识市场的基础上,建立一个有效的系统化的手段和程序化的方法,把一切的复杂性和不确定性全部加以量化,使所有的交易有序而直观,才能最终达到低风险、低回报。一起来看看 《期货趋势程序化交易方法》 这本书的介绍吧!