Teaching Computers to Recognize Human Actions in Videos

栏目: IT技术 · 发布时间: 5年前

内容简介:ByIdentifying the various actions that people make with their bodies just from watching a video is a natural, simple task for humans. For example, most people would easily be able to identify a subject as, say, “For an artificial system, this seemingly bas

PREDICT and CLUSTER: Unsupervised Skeleton Based Action Recognition

By Eli Shlizerman

Teaching Computers to Recognize Human Actions in Videos

Photo by bruce mars on Unsplash

Identifying the various actions that people make with their bodies just from watching a video is a natural, simple task for humans. For example, most people would easily be able to identify a subject as, say, “ jumping back and forth ,” or “ hitting a ball with their foot ”. This is easy to recognize even if the subject shown in the video footage changes or was recorded from different views. What if we would like a computer system or a gaming console like an Xbox, PlayStation or similar, to be able to do the same? Would that be possible?

For an artificial system, this seemingly basic task is not as natural as for humans, requiring several layers of Artificial Intelligence capabilities such as (i) knowing which specific features ’ to track when making decisions, along with (ii) the ability to name, or label, a particular action .

With regards to (i), research in visual perception and computer vision has shown that, at least for the human body, 3D coordinates of the joints, i.e. skeleton features , are sufficient for identifying different actions. Additionally, current robust algorithms are able to track these features in real-time using nearly any video source footage, e.g. OpenPose [1].

Teaching Computers to Recognize Human Actions in Videos

Photo by Sam Sabourin on Unsplash with skeleton features marked

Teaching a computer system to make predictive associations between collections of points and actions using these features turns out to be a much more challenging task than just selecting said features alone. This is because the system is expected to group sequences of features into “classes” and subsequently associate these with names of the corresponding actions.

Teaching Computers to Recognize Human Actions in Videos

Skeleton-based action recognition: predictive association between collections of points (time series) and actions

Existing deep learning systems try to learn this type of association through a process called ‘ supervised learning ’, where the system learns from several given examples, each with an explanation of the action it represents. This technique also requires camera and depth inputs (RGB+D) at each step. While supervised action recognition has shown promising advancement, it relies on annotation of a large number of sequences and needs to be redone each time another subject, viewpoint, or new action is being considered.

Teaching Computers to Recognize Human Actions in Videos

Photo by Raymond Rasmusson on Unsplash

It’s of particular interest, then, to instead create systems that attempt to imitate the perceptual ability of humans, which learn to make these associations in an unsupervised way .

In our recent research entitled, “ Predict & Cluster: Unsupervised skeleton based action recognition ” [2] we developed such an unsupervised system. We have proposed that, rather than teaching the computer to catalog the sequences with their actions, the system will instead learn how to predict the sequences through ‘encoder-decoder’ learning. Such a system is fully unsupervised and operates with only inputs and not requiring labelling of actions at any stage.

Teaching Computers to Recognize Human Actions in Videos

Predict & Cluster: Unsupervised Skeleton Based Action Recognition

In particular, the encoder-decoder neural network system learns to encode each sequence into a code, which the decoder would use to generate exactly the same sequence. It turns out that in the process of learning to encode and then to decode , the Seq2Seq deep neural network self-organizes the sequences into distinct clusters . We developed a way to make sure that learning is optimal (by fixing the weights or states of the decoder) in order to create such an organization and developed tools to read this organization to associate each cluster with an action.

Teaching Computers to Recognize Human Actions in Videos

Schematics of Predict & Cluster

We are able to obtain action recognition results that outperform both previous unsupervised and supervised approaches. Our findings pave the way to a novel type of learning of any type of actions using any input of features. This might include anything from recognizing actions of flight patterns of flying insects to identification of malicious actions in internet activity.

For more info see of the overview video below and the paper in [2].

With Kun Su and Xiulong Liu.

References

[1] OpenPose: https://github.com/CMU-Perceptual-Computing-Lab/openpose

[2] Su, Kun, Xiulong Liu, and Eli Shlizerman. “Predict & cluster: Unsupervised skeleton based action recognition.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2020.


以上所述就是小编给大家介绍的《Teaching Computers to Recognize Human Actions in Videos》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

高性能网站建设指南(第二版)

高性能网站建设指南(第二版)

Steve Souders / 刘彦博 / 电子工业出版社 / 2015-5 / 55.00元

《高性能网站建设指南:前端工程师技能精髓》结合Web 2.0以来Web开发领域的最新形势和特点,介绍了网站性能问题的现状、产生的原因,以及改善或解决性能问题的原则、技术技巧和最佳实践。重点关注网页的行为特征,阐释优化Ajax、CSS、JavaScript、Flash和图片处理等要素的技术,全面涵盖浏览器端性能问题的方方面面。在《高性能网站建设指南:前端工程师技能精髓》中,作者给出了14条具体的优化......一起来看看 《高性能网站建设指南(第二版)》 这本书的介绍吧!

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具