内容简介:To help you learn new skills as well as win some prize money online, while working from home, we at MathWorks are launching a data science competition.The dataset will be comprised of image stack (a 3D image) taken from a live mouse brain showing blood ves
Preparation Guide for Video Classification
To help you learn new skills as well as win some prize money online, while working from home, we at MathWorks are launching a data science competition.
Teaser: The Datathon will be live in May. Signup for DrivenData account to receive the launch announcement. Request for complimentary MATLAB licenses here : Advance Alzheimer’s Research with Stall Catchers
The dataset will be comprised of image stack (a 3D image) taken from a live mouse brain showing blood vessels and blood flow. Each stack will have an outline drawn around a target vessel segment and will be converted to an .mp4 video file. The problem will be to classify the target vessel segment as either flowing or stalled. The challenge will be online, globally accessible and free to participate in. You can use any approach to solve the problem.
In this story, I will talk about the concepts and methods I learned while working on setting up this problem. I will also point you to the documents you can refer, to start preparing for the challenge.
Working with Data
Video Data
Working with videos is an extension of working with images; we additionally must consider dynamic nature of a video over the static nature of an image. A video can be defined as a stack of images, also referred to as frames arranged in a specific order. Each frame is meaningful, but the order is also very important. Hence both spatial and temporal content of the frames need to be measured.
So, the first step is extracting frames from video. Make sure that the frames should have both, the sequence modeling and the temporal reasoning.
Process Data
Another challenge in working with videos is the large size of the dataset. In MATLAB, you can use the concept of datastore , to create a repository for collections of data that are too large to fit in memory. A datastore allows you to read and process data stored in multiple files on a disk, a remote location, or a database as a single entity.
Documents to refer:
- Understand the concept of datastore: Getting Started with Datastore
- Create different datastore for images, text, audio, file etc. Datastore for different File Format or Application
- Use built-in datastores directly as input for a deep learning network: Datastores for Deep Learning
- Implement a custom datastore for file-based data: Develop Custom Datastore
- The data for the challenge will use the data stored in AWS. So, Learn how to access data from S3 bucket
Video Classification Methods
Once the data is ready, you can use either of the 5 below methods to proceed with classification. I will talk about the most commonly used video classification methods from basic non-deep learning approach to an advanced one. But I would encourage you to use the deep learning approaches due to the size of the data and to extract features from each frame in timely manner.
Classical Computer Vision Methods
Method 1: Optical Flow, Object Tracking & Cascade Classifier
Optical flow, activity recognition, motion estimation and tracking are the key activities you can use to determine the classes and their movement in adjacent frames of the video.
Resources to refer:
- To learn how to implement Optical flow using algorithms Horn-Schunck method, Farneback method and Lucas-Kanade method check out this tutorial video : Computer Vision Training, Motion Estimation
- More Examples and documentation for Tracking & Motion Estimation
- To learn object tracking using histogram based tracking, tracking occluded or hidden objects using a Kalman Filter , check out this tutorial video: Computer Vision Training, Object Tracking
- Example to show how to perform automatic detection and motion-based tracking of moving objects in a video: Motion-Based Multiple Object Tracking
Another approach can be by using the local features like blobs, corners and edge pixels of an image. The cascade classifier supports local features like Haar, local binary patterns (LBP) and histograms of oriented gradients (HOG).
Resources to refer:
- Computer Vision using Features
- Train a Cascade Object Detector
- Image Retrieval with Bag of Visual Words
- Image Classification with Bag of Visual Words
Deep Learning Methods
Method 2: Convolutional Neural Network (CNN) + Long short-term memory network (LSTM)
In this method, you convert the videos to a sequence of feature vectors using a pre-trained convolutional neural network to extract features from each frame. Then train a Long short-term memory (LSTM) network on the sequences to predict the video labels. As a final step, combine layers from both networks to assemble a final network that classifies videos directly.
To learn steps for this complete workflow, check this document: Classify Videos Using Deep Learning
Method 3: Large-scale video classification with CNN
If video classification is like image classification, why not just use convolutional neural network?
To answer this, remember I talked about the temporal component of the video. So, to capture the temporal and spatial aspects, you can use CNN, but you need to structure the network in different ways.
This paper from Stanford, Large-scale Video Classification with Convolutional Neural Networks , talks about the challenges of the basic CNN for videos. It further elaborates all the different models of CNN you can use, to fuse features from multiple frames.
Method 4: Two-stream CNN
The other approach as explained by the researchers in this paper: Two-Stream Convolutional Networks for Action Recognition in Videos , is two conv-nets each for spatial and temporal aspect.
Documents to refer to develop CNN architecture in MATLAB:
- Define Custom Deep Learning Layers
- Specify Layers of Convolutional Neural Network
- Set Up Parameters and Train Convolutional Neural Network
- Options for training deep learning neural network
- Deep Learning Tips and Tricks
Method 5: Using a 3D convolution network
3D ConvNets are on the initial choice for video classification since they inherently apply convolutions and max pooling in the 3D space. In this paper: Learning Spatiotemporal Features with 3D Convolutional Networks , researchers propose a C3D ( convolutional 3D ) with compact features and efficient compute.
Documents to refer:
- Design 3D-ConvNet using functions like : image3dInputLayer , convolution3dLayer , maxPooling3dLayer in MATLAB
- Design the network using Deep Network Designer
- Check the complete list of Deep Learning layers in MATLAB here: List of Deep Learning Layers
- Example to work on 3-D medical images: 3-D Brain Tumor Segmentation Using Deep Learning
Next Steps
If you do not have a MATLAB license, start your preparation by requesting for complimentary MATLAB licenses here: Advance Alzheimer’s Research with Stall Catchers .
Stay tuned for further updates, in my next blog in May, on the competition launch day. The blog will be the benchmark code for the problem with all other details.
Feel free to give your feedback or any questions you have in the comments below.
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
卓有成效的程序员
Neal Ford / 熊节 / 机械工业出版社 / 2009-3 / 45.00元
《卓有成效的程序员》就是讲述如何在开发软件的过程中变得更加高效。同时,《卓有成效的程序员》的讲述将会跨语言和操作系统:很多技巧的讲述都会伴随多种程序语言的例子,并且会跨越三种主要的操作系统,Windows(多个版本),Mac OS X以及 *-nix (Unix或者Linux)。 《卓有成效的程序员》讨论的是程序员个体的生产力,而不是团队的生产力问题,所以它不会涉及方法论(好吧,可能总会在......一起来看看 《卓有成效的程序员》 这本书的介绍吧!