Why the RepNet is so important

栏目: IT技术 · 发布时间: 5年前

内容简介:In our daily lives, repeating actions occur frequently. This ranges from organic cycles such as heartbeats and breathing, through programming and manufacturing, to planetary cycles like day-night rotation and seasons.The need to recognise these repetitions

Why the RepNet is so important

Using Deep Learning to Count Repetitions

Photo by Efe Kurnaz on Unsplash

In our daily lives, repeating actions occur frequently. This ranges from organic cycles such as heartbeats and breathing, through programming and manufacturing, to planetary cycles like day-night rotation and seasons.

The need to recognise these repetitions, like those in videos, is unavoidable and requires a system that can identify and count repetitions. Think exercising — how many repetitions are you doing?

The Unsolved Problem

Isolating repeating actions is a difficult task. I know, it seems pretty straight forward when you see someone in front of you jumping up and down but translating that into the form of a machine learning problem makes it much more difficult. How do you teach a computer what a jumping jack looks like from all 360 degrees? How can you generalise any inference from video?

Previous work in the space took the approach of analysing videos at a fine-grain level using a cycle-consistency constraints across different videos of the same action. Reading the paper of the old model , you can see that you’re basically building a model that compares frames in a collection of videos:

Temporal Cycle-Consistency Learning: [ source ]

However, in the real world problems are faced such as camera motion, objects in the field that distort the vision, and changes of form of the repeating view: basically trying to calculate features invariant to such noise. The existing process required a lot of work to ‘ densely label data ’ and it’d be much more ideal if an algorithm could learn a sequence from a single video.

That’s where RepNet comes in

A RepNet solves the problem of counting repetitions in real-world videos, incorporating a noise that ranges from having camera motion, obscured vision, drastic scale difference and changes in form etc.

Unlike in the past where this problem was addressed directly by comparing pixel intensities in frames, a RepNet can solve this in a single video that contains period action. The RepNet returns the number of repetitions of any such video.

A RepNet is composed of three components: a frame encoder, a temporal self-similarity matrix as an intermediate representation, and a period predictor.

Its frame encoder generates embeddings by fleeting each frame of a video to the encoder of the ResNet architecture.

Then the temporal self-similarity matrix (TSM) can be calculated by comparing each frame with every other frame in the video.

As such, a matrix that is easy for subsequent modules to analyse is returned for counting repetitions. Transformers are used directly from the sequence of similarities in the TSM.

Once the period is attained, the per-frame count can now be obtained from dividing the number of frames captured in a periodic segment by the period length.

An advantage of this representation is that the models interpretability is baked into the network architecture, as the network is being forced to predict the period from the self-similarity matrix only, as opposed to inferring the period from latent high-dimensional features (such as from the frames themselves).

Temporal Self Similarity: [ Source ]

Note that this learning architecture also allows the model to take into account speed changes of the repetition, but also, any obfuscation of a repeating series (i.e. a video that’s rotating whilst also showing a repeated task). The reason why that’s important is because it shows a model that’s generalising. Models that can generalise can be applied to a much wider array of problems, a great leap forward in ML.

You can can use the following resource for more information, including a downloadable pre-trained RepNet: source


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

用户力:需求驱动的产品、运营和商业模式

用户力:需求驱动的产品、运营和商业模式

郝志中 / 机械工业出版社 / 2015-11-1 / 59.00

《用户力:需求驱动的产品、运营和商业模式》从用户需求角度深刻阐释了互联网产品设计、网络运营、商业模式构建的本质与方法论! 本书以“用户需求”为主线,先用逆向思维进行倒推,从本质的角度分析了用户的需求是如何驱动企业的产品设计、网络运营和商业模式构建的,将这三个重要部分进行了系统性和结构化的串联,然后用顺向思维进行铺陈,从实践和方法论的角度总结了企业究竟应该如围绕用户的真实需求来进行产品设计、网......一起来看看 《用户力:需求驱动的产品、运营和商业模式》 这本书的介绍吧!

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具