内容简介:Just like we need material such as textbooks/blogs/videos to learn new skills and test our knowledge, machine learning algorithms need datasets to do the same thing.The choice of a dataset is crucial. It’s precisely what stands between an outstanding machi
The why/when/what/where/which of CV datasets in the age of AI
May 21 ·11min read
Just like we need material such as textbooks/blogs/videos to learn new skills and test our knowledge, machine learning algorithms need datasets to do the same thing.
The choice of a dataset is crucial. It’s precisely what stands between an outstanding machine learning model or just another experiment.
There are plenty of excellent articles about text-based datasets. Over the past years lecturing topics in computer vision, I noticed students struggling to get their head around understanding the what/when/where/how of computer vision datasets.
So here’s the primer I usually give to those getting started:
- Why do we need a dataset?
- When do we need a dataset?
- What do we measure?
- Which datasets are available?
- Where do we find datasets?
Let’s start.
1- Why do we need a dataset?
By definition, a dataset is a collection of related examples that are used to train and test a model. This can be a selection of examples belonging to a particular topic or domain, and a dataset generally aims to cater to one or more application. A dataset may be labelled, and therefore, ideal for training and testing supervised models. However, there are also unlabeled datasets that are used to train unsupervised models.
Train and Test
From a machine learning perspective, we need Datasets to train models and subsequently test them. This process requires us to choose a part of the dataset (e.g. 70% of it) and ‘show’ it to the machine learning algorithm for learning purpose. We then select the remaining unseen examples in the dataset (e.g. the remaining 30%) and use them to test how well the model learnt. It is crucial that we don’t test with examples that were already used for training since the model will be predicting something it already knows, which is known as ‘overfitting’ a model. This is something that we wouldn’t want because it only guarantees the failure of the model once it is used on a different dataset. There are various methods for organising the train-test set, and you can take a look at these examples.
Benchmarking
Datasets also serve as a measurement tool when it comes to the performance of machine learning techniques. A selection of models performing the same task needs to be compared fairly. This is carried out by running the different methods on a range of datasets. The performance measurement of each method would, therefore, be comparable and allows for the neat comparison of results.
Ali Borji carried out and published an outstanding set of benchmarking exercises on Saliency techniques. These are some of his papers that I recommend to my students:
- Salient object detection: A survey (2019)
- Revisiting Video Saliency: A Large-scale Benchmark and a New Model (2018)
- Salient Object Detection: A Benchmark (2015)
Sidenote: Understand Bias
Bias is a vast topic within itself. There are some critical matters that we need to keep in mind.
Just like any other source of information, Datasets carry within them an inherent level of bias.
This might not necessarily have negative implications, especially if you want your model to survive the test of relevance in an already biased world. However, it’s very important that we are aware of any bias and measure any implications.
2- When do we need a dataset?
The aim of this article is not to focus on specific computer vision techniques. However, I’ll quickly walk you through a selection of topics and highlight the need for a dataset.
Object Detection and Recognition
Object Detection deals with identifying and locating an object of certain classes in an image. Interpreting the object localisation can be done in various ways. A commonly used approach in dataset annotation includes the drawing of a bounding box or polygon around the object as discussed below. Such an annotation allows the dataset to be used for detection. The same dataset can then be used for recognition if every annotation is accompanied by a label. Once objects are selected, they can also be used to mark every pixel in the image which contains the object (segmentation).
Object Segmentation
Segmentation is the process of partitioning an image into multiple segments (sets of pixels) that correspond to a specific region or object. This can be applied to objects using thresholding techniques such as Otsu’s method.
Segmentation can also make use of features. Modern approaches make use of deep learning methods where models trained over datasets containing thousands of pixel-level annotated labels. These approaches include Semantic Segmentation (region selection accompanied by a label) and Instance Segmentation (semantic segmentation that identifies multiple separate objects per class).
Visual Saliency
Visual saliency is a less popular area in computer vision that answers the following question: Which part of the image attracts more attention? Saliency detection techniques receive a colour image as input and return an 8-bit saliency map where the brighter the pixel value (max 255) implies a very salient pixel. Visual Saliency is used in different applications ranging from data compression to product placement and image manipulation. Datasets such as the MSRA10K featured below provide a binary image as ground-truth that indicates which pixels are salient or not.
3- What do we measure?
The type and quality of annotations available in a dataset are crucial to its relevance. In this section, I’ll quickly walk you through the main types of annotations. Credit goes to @jiayin_Supahands for her neat outline of this aspect, and I encourage you to read her article. Here, I’m only giving an overview of the most commonly used annotations and their relation to the topic.
Bounding Boxes
The bounding box approach is the simplest type of annotations and naturally involves the drawing of a bounding box around an object of interest. It is generally defined by a pair of coordinates and corresponding width and height. The bounding box definition often needs to be accompanied by a label if used for classification or recognition. The main drawback of using a bounding box is that it labels any background pixels caught in the bounding box in the same way as target object pixels. From an error metric perspective, it can be helpful for tracking recall, but it is then weak for precision, hence generating the need for something which is more specific.
Polygons
The limitation of bounding boxes brings along the need for something more precise: polygon annotation. The idea of polygon annotation is similar to the bounding box but allows for better pixel precision in labelling by reducing the number of background pixels being miss-labelled. A tool such as LabelMe is required for such an annotation. Label me is an opensource online annotation tool to build image databases for computer vision research. It also offers its own datasets.
Line Annotations
As the name implies, this approach uses lines to annotate specific regions in an image. Lines can be useful in a situation where a bounding box would take a substantial area of pixels. Lane detection is an easily applied case for the use of such an annotation. This can be also used for monitoring of queues and quality control situations.
Point Annotations
These annotations are a specification of groups of keypoints on an image, often carrying a semantic connotation. This approach is very commonly used for pose estimation and facial recognition. The geometrical properties between different points are used as features, and machine learning algorithms are trained using these features. This approach was used in our recent work titled “ Detecting abnormal human behaviour through a video generated model ” published in 2019.
4- Which datasets are available?
Well, plenty :)
There are dozens of remarkable computer vision datasets that were crucial to the development of models that are changing the world. In this section, I am focusing on a selection of landmark datasets that every computer vision professional should know about.
Image-Net
Official Website : http://www.image-net.org/
Image-Net is the legendary computer vision dataset that contributed to the rise of deep learning. It is an image database organised according to the WordNet hierarchy where each meaningful concept in, possibly described by multiple words, is called a “synonym set” or “synset”. Image-net is generally used for object classification/recognition. This dataset contains a total of 14,197,122 with a total of 1,034,908 images with bounding box annotations.
This dataset gained its popularity for the Image-net competition through which Deep Learning gained its traction after AlexNet won this competition in 2012. It was founded by Fei-Fei Li , and she shared the remarkable journey behind this dataset in the Ted talk I’m featuring below:
No matter how experienced you think/feel you are in computer vision, I strongly advise you invest some time in listening to this inspirational talk. Even though techniques have advanced since its release in 2015, the mindset and humbleness presented in this video are still highly relevant.
MNIST
- Original Numbers MNIST: http://yann.lecun.com/exdb/mnist/
- Fashion MNIST: https://github.com/zalandoresearch/fashion-mnist
The original MNIST dataset, led by Yann Le Cun , consisted of a large volume of handwritten images. It served the vital role of providing a much needed easy-access benchmark for early convolutional neural networks. By 2017, CNNs achieved constant outstanding accuracy (over 99%) on MNIST, and the need for a more challenging benchmark dataset arose. This served as a motivation for the Fashion MNSIT dataset. The latter version includes a training set of 60,000 examples and a test set of 10,000 examples, where every example is a 28x28px of a fashion item from 10 different classes.
CIFAR-10
Official Website : https://www.cs.toronto.edu/~kriz/cifar.html
This dataset was released by the Canadian Institute For Advanced Research (CIFAR) and probably gained some of its popularity through the involvement of Geoffrey Hinton and his associates. The CIFAR-10 dataset contains 60,000 32x32px colour images in 10 different classes. It is used for train/testing of object recognition models.
COCO
Official Website : http://cocodataset.org/
The Common Objects in Context (COCO) dataset is an object detection, segmentation, and captioning dataset. The COCO 2017 has a training and validation collection of 123,287 images containing a total of 886,284 instances. These instances are spread over 80 object categories.
Face2Text
Official Website : https://rival.research.um.edu.mt/
There is a significant number of datasets covering different sorts of facial data. Here, I chose to feature a new and innovative dataset compiled by my colleagues at the Unversity of Malta. Unlike other facial detection or recognition datasets, this one is annotated using descriptive text. This allows machine learning models to be trained to return a textual description of a face given just an image. The full details of the publication introducing this dataset can be found here and the dataset itself may be acquired by filling in the contact form on the official website of this project.
MSRA10K
Official Website : https://mmcheng.net/msra10k/
This is a Salient Object Image Database. Every image in this dataset has a mask for the most salient region in the image. The MSRA10K dataset gained its relevance from the volume of images it contains. It consists of 10,000 colour images with a corresponding binary image mask for the salient object.
MSR 3D
Official Website : https://www.microsoft.com/en-us/download/details.aspx?id=52358
The Microsoft Research Dataset (MSR) includes a sequence of 100 images (colour and depth) captured from 8 cameras showing the breakdancing and ballet scenes. This dataset contains frames for each scene. Every frame has a colour image and high-quality grayscale depth image, captured by an infrared camera.
COTS
Official Website : www.cotsdataset.info
This is a dataset I carefully designed and built last year to evaluate image manipulation techniques. One of such applications is inpainting where an object is removed from an image. Inpainting techniques are usually evaluated using subjective or opinion-based approach because datasets would lack adequate ground truth. This served as a motivation behind this dataset that has a series of progressive scenes as demonstrated below. Further details about this dataset and the experience behind its construction will be shared in separate work.
5- Where do we find datasets?
In academia, you’ll typically come across datasets in peer-reviewed publications about your topic of interest. However, you sometimes just need to browse your options, and for that, you need a good platform. Here follow my 4 favourite sources:
Google Dataset Search
- Pros : Very extensive
- Cons : Easy to get lost when comparing different datasets.
VisualData
- Pros : Focused on Computer Vision datasets, Excellent interface, easy to use and quick to get to direct repositories.
- Cons : Still limited in terms of selection of available datasets.
Kaggle
- Pros : Variety of datasets across different domains, active community, competitions.
- Cons : Can take longer to see what each dataset offers.
Tensorflow
- Pros : An extensive selection of straight to the point pages for every dataset. Every dataset is also accompanied by excellent usage resources.
- Cons : For completeness, I need to squeeze out a drawback. In this case, the disadvantage is that (obviously) this website only provides Tensorflow resources.
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
多任务下的数据结构与算法
周伟明 / 华中科技 / 2006-4 / 58.00元
本书和传统同类书籍的区别是除了介绍基本的数据结构容器如栈、队列、链表、树、二叉树、红黑树、AVL树和图之外,引进了多任务;还介绍了将任意数据结构容器变成支持多任务的方法;另外,还增加了复合数据结构和动态数据结构等新内容的介绍。在复合数据结构中不仅介绍了哈希链表、哈希红黑树、哈希AVL树等容器,还介绍了复合数据结构的通用设计方法;在动态数据结构中主要介绍了动态环形队列、动态等尺寸内存管理算法。在内存......一起来看看 《多任务下的数据结构与算法》 这本书的介绍吧!