Background Matting: The World is Your Green Screen

栏目: IT技术 · 发布时间: 4年前

内容简介:Using deep learning and GANs to enable professional quality background replacement from your own homeDo you wish that you could make professional quality videos without a full studio? Or that Zoom’s virtual background function worked better during your vid

Using deep learning and GANs to enable professional quality background replacement from your own home

Apr 9 ·7min read

Background Matting: The World is Your Green Screen

Do you wish that you could make professional quality videos without a full studio? Or that Zoom’s virtual background function worked better during your video conferences?

Our recently published paper [1] in CVPR 2020 provides a new and easy method to replace your background for a wide variety of applications. You can do this at home in everyday settings, using a fixed or handheld camera. Our method is also state-of-the-art and gives outputs comparable to professional results. In this article we walk through the motivation, technical details, and usage tips for our method.

You can also checkout out our project page and codebase .

What is Matting?

Matting is the process of separating an image into foreground and background so you can composite the foreground onto a new background. This is the key technique behind the green screen effect, and it is widely used in video production, graphics, and consumer apps. To model this problem, we represent every pixel in the captured image as a combination of foreground and background:

The Matting Equation

Our problem is to solve for the foreground (F), background (B), and transparency (alpha) for every pixel given a captured image (C). Clearly this is highly undetermined, and since images have RGB channels, this requires solving 7 unknowns from 3 observed values.

The Problem with Segmentation

One possible approach is to use segmentation to separate foreground for compositing. Although segmentation has made huge strides in recent years, it does not solve the full matting equation. Segmentation assigns a binary (0,1) label to each pixel in order to represent foreground and background instead of solving for a continuous alpha value. The effects of this simplification are visible in the following example:

Background Matting: The World is Your Green Screen

This example shows why segmentation does not solve the compositing problem. The segmentation was performed with DeepLab v3+ [2].

The areas around the edge, particularly in the hair, have a true alpha value between 0 and 1. Therefore, the binary nature of segmentation creates a harsh boundary around the foreground, leaving visible artifacts. Solving for the partial transparency and foreground color allows much better compositing in the second frame.

Using A Casually Captured Background

Because matting is a harder problem than segmentation, additional information is often used to solve this unconstrained problem, even when using deep learning.

Many existing methods [3][4][5] use a trimap, or a hand-annotated map of known foreground, background, and unknown regions. Although this is possible to do for an image, annotating video is extremely time consuming and is not a feasible research direction for this problem.

We choose instead to use a captured background as an estimate of the true background. This makes it easier to solve for the foreground and alpha value. We call it a “casually captured” background because it can contain slight movements, color differences, slight shadows, or similar colors as the foreground.

Background Matting: The World is Your Green Screen

Our capture process. When the subject leaves the scene we capture the background behind them to help the algorithm.

The figure above shows how we can easily provide a rough estimate of the true background. As the person leaves the scene, we capture the background behind them. The figure below shows what this looks like:

Background Matting: The World is Your Green Screen

Example of captured input, captured background, and composite on a new background.

Notice how this image is challenging because it has a very similar background and foreground color (particularly around the hair). It was also recorded with a handheld phone and contains slight background movements.

“We call it a casually captured background because it can contain slight movements, color differences, slight shadows, or similar colors as the foreground.”

Tips for Capturing

Although our method works with some background perturbations, it is still better when the background is constant and best in indoor settings. For example, it does not work in the presence of highly noticeable shadows cast by the subject, moving backgrounds (e.g. water, cars, trees), or large exposure changes.

Background Matting: The World is Your Green Screen

Failure case. The person was filmed in front of a moving fountain.

We also recommend capturing the background by having the person leave the scene at the end of the video, and pulling that frame from the continuous video. Many phones have different zoom and exposure settings when you switch from video mode to photo mode. You should also enable auto-exposure lock when filming with a phone.

Background Matting: The World is Your Green Screen

The ideal capture scenario. The background is indoors, not moving, and the subject does not cast a shadow

A summary of the capture tips:

  1. Choose the most constant background you can find.
  2. Don’t stand too close to the background so you don’t cast a shadow.
  3. Enable auto-exposure and auto-focus locks on the phone.

Is This Method Like Background Subtraction?

Another natural question is whether this is like background subtraction. Firstly, if it were easy to use any background for compositing, the movie industry would not be spending thousands of dollars on green screens all these years.

Background Matting: The World is Your Green Screen

Background subtraction doesn’t work well with casually captured backgrounds

In addition, background subtraction does not solve for partial alpha values, giving the same hard edge as segmentation. It also does not work well when there is a similar foreground and background color or any motions in the background.

Network Details

The network consists of a supervised step followed by an unsupervised refinement. We’ll briefly summarize them here, but for full details you can always check out the paper.

Supervised Learning

In order to first train the network, we use the Adobe Composition-1k dataset, which contains 450 carefully annotated ground truth alpha mattes. We train the network in a fully supervised way, with a per pixel loss on the output.

Background Matting: The World is Your Green Screen

The supervised portion of our network. We use several input cues, then we output an alpha matte and the predicted foreground color. We train on the Adobe 1k dataset with ground truth results provided.

Notice that we take several inputs, including the image, background, soft segmentation, and temporal motion information. Our novel Context Switching Block also ensures robustness to poor inputs.

Unsupervised Refinement with GANs

The problem with supervised learning is that the adobe dataset only contains 450 ground truth outputs, which is not nearly enough to train a good network. Obtaining more data is extremely difficult because it involves hand-annotating the alpha matte of an image.

To solve this problem, we use a GAN refinement step. We take the output alpha matte from the supervised network and composite it on a new background. The discriminator then tries to tell if it is a real or fake image. In response, the generator learns to update the alpha matte so the resulting composite is as real as possible in order to fool the discriminator.

Background Matting: The World is Your Green Screen

Our unsupervised GAN refinement step. We put the foreground on a new background, then a GAN tries to tell if it is real or fake.

The important part here is that we don’t need any labelled training data. The discriminator was trained with thousands of real images, which are very easy to obtain.

Training the GAN on Your Data

What’s also useful about the GAN is that you can train the generator on your own images to improve results at test time. Suppose you run the network and the output is not very good. You can update the weights of the generator on that exact data in order to better fool the discriminator. This will overfit to your data, but will improve the results on the images you provided.

Future Work

Although the results we see are quite good, we are continuing to make this method more accurate and easy to use.

In particular, we would like to make this method more robust to circumstances like background motions, camera movements, shadows, etc. We are also looking at ways to make this method work in real-time and with less computational resource power. This could enable a wide variety of use cases in areas like video streaming or mobile apps.

If you have any questions feel free to reach out to me , Vivek Jayaram, or Soumyadip Sengupta


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

技术的本质

技术的本质

布莱恩•阿瑟(Brian Arthur) / 曹东溟、王健 / 浙江人民出版社 / 2014-4-1 / 62.90

★《技术的本质》是复杂性科学奠基人、首屈一指的技术思想家、“熊彼特奖”得主布莱恩•阿瑟所创建的一套关于技术产生和进化的系统性理论,本书是打开“技术黑箱”的钥匙,它用平实的语言将技术最本质的思想娓娓道来。 ★技术,是一个异常美丽的主题,它不动声色地创造了我们的财富,成就了经济的繁荣,改变了我们存在的方式。尽管技术如此重要,却少有人在快节奏的生活中停下来深入思考技术。我们了解技术的原理,却不知道......一起来看看 《技术的本质》 这本书的介绍吧!

JS 压缩/解压工具
JS 压缩/解压工具

在线压缩/解压 JS 代码

XML、JSON 在线转换
XML、JSON 在线转换

在线XML、JSON转换工具

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具