Virtual Background For Video Conferencing In Python and OpenCV - A Silly Approach

栏目: IT技术 · 发布时间: 4年前

内容简介：Virtual Backgroundsare one of the hot topics among employees that work remotely at the moment. With some of us being isolated at the moment because of the Covid-19 pandemic, a lot of people have to take video calls in order to carry on their work. Some sof

Virtual Backgroundsare one of the hot topics among employees that work remotely at the moment. With some of us being isolated at the moment because of the Covid-19 pandemic, a lot of people have to take video calls in order to carry on their work. Some software tools for video conferincing allow setting a virtual background so that users can build a more friendly atmosphere for taking these calls.

Virtual Background For Video Conferencing In Python and OpenCV - A Silly Approach — Photo by timJ / Unsplash

Interested in more stories like this? Follow me on Twitter at @b_dmarius and I'll post there every new article.

As a programmer, I was naturally intrigued first time I used such a virtual background. How does it work, I wondered. Can I build such a virtual background? And if yes, how can I do it? Spoiler: it did not go well! Still, I think it was a good educational exercise and I didn't find too much information on this topic while researching this. Therefore, as I do with everyting I learn, I decided to document it here, maybe someone else will benefit from this.

So in this tutorial we are going to try a basic approach for building a a virtual background with Computer Vision techniques, using Python and OpenCV .

Introduction

The goal of this project is to take a video, try to figure out what's the background and what's the foreground of the video, remove the background part and replace it with a picture - the virtual background. Because in this project we are going to use trivial methods, we will need the assumption that the foreground will, in general, have colors different from the background. But first, let's see what are our tools.

Computer Vision

Computer Visionis an interdisciplinary field that deals with how computers can process and(maybe) understand images and videos. We say it is an interdisciplinary field because it borrows a lot of concepts from different disciplines(computer science, algebra, geometry and so on) and combines them to solve a lot of different and complex tasks, like object tracking , object detection, object recognition , object segmentation in images and videos.

OpenCV

OpenCV is a library built for solving computer vision tasks. It is open-source and it is available for several programming languages, including Python and C++. It has a tremendous amount of features for computer vision, with some of them being based on on maths and statistical approaches, and others being based on Machine Learning.

Python

If you've made it this far in this article, you probably know what Python is :grinning:

Building a virtual background

The approach I tried for this was the following. I'll show code snippets for every step and at the end of the article you'll have the full code.

Import dependencies

import numpy as np
import cv2

2. Load the video from the local environment and initialize data

ap = cv2.VideoCapture('video6.mp4')
ret = True
frameCounter = 0
previousFrame = None
nextFrame = None
iterations = 0

3. Load the substitute background image from the local environment

backgroundImage = cv2.imread("image1.jpg")

4. Split the video frame by frame

while (ret):
	ret, frame = cap.read()

5. Take every pair of two frames

if frameCounter % 2 == 1:
            nextFrame = frame

        if frameCounter % 2 == 0:
            frameCounter = 0
            previousFrame = frame

        frameCounter = frameCounter + 1
        iterations = iterations + 1

6. Find the absolute difference between the two frames and convert it to grayscale -> obtaining a mask.

if iterations > 2:
            diff = cv2.absdiff(previousFrame, nextFrame)
            mask = cv2.cvtColor(diff, cv2.COLOR_BGR2GRAY)

Every image consists of pixels - you can imagine this as a 2D matrix with lines and columns and every cell in the matrix is a pixel in an image(of course, for color images we have more dimensions than just 2, but for simplicity, we can ignore this).

We obtain the difference by going pixel by pixel in the first image(so cell by cell in the first matrix) and substituting the corresponding pixel from the other image(so the corresponding cell from the other matrix).

Now here's the trick: if between the 2 frames, a pixel has not been modified, then of course the result will be 0 . How can a pixel be different between 2 frames? If the video is completely static(nothing moves in the image), then the difference will be 0 between each and every frame for all the pixels, because nothing is changed. But if something moves in the image, then we can identify where in the image something has moved by detecting the pixel differences. And we can assume that, in a video conference, the things that move are in the foreground – that's you – and the static part is the background.

And what's so important about this 0 ? The image will show a black color for every pixel that is 0, and we are going to use that in our advantage.

7. Find the cells in the mask that are over a threshold value - I've chosen 3 as a threshold, but you can play with different values. A larger value will remove more from the background, but may also remove more from the foreground.

th = 3
            isMask = mask > th
            nonMask = mask <= th

8. Create an empty image(0 for every cell) with the size of any of the two frames.

result = np.zeros_like(nextFrame, np.uint8)

9. Resize the background image so that it has the same size as the frames.

resized = cv2.resize(backgroundImage, (result.shape[1], result.shape[0]), interpolation = cv2.INTER_AREA)

10. For every cell from the mask that is bigger than the threshold, copy from the original frame.

result[isMask] = nextFrame[isMask]

11. For every cell from the mask that is lower than the threshold, copy from the substitute background image.

result[nonMask] = resized[nonMask]

12. Save the result frame to the local environment.

cv2.imwrite("output" + str(iterations) + ".jpg", result)

Results and conclusion

So what are the results? Honestly, I've been a bit dissapointed by the result. Then I did more research and the reason became more obvious. You need a more advanced approach for this and it's no surprise that big companies invest lots of resources on this type of problem.

Here's a screenshot of the video I tried. It's basically a video of my hand moving in front of a wall.

And here's a screenshot of the output image. For the background I used a photo of me in Rasnov, Romania.

As I said, I am not very satisfied with the result. But I am satisfied with what I learned from this project. It was a fun learning experience and a nice way to spend my time working with concepts I am not comfortable to work with.

Other approaches to creating a virtual background

If you think a problem is very complicated and requires levels of intelligence unusual for what you've seen in a computer software - then the answer might be Machine Learning. :grinning:

There are already Deep Learning models out there that can perform this sort of tasks. But such a model requires large datasets to train on and lots of processing power, out of which I had none at the moment of writing this article. The task to be solved by such a deep learning model is called image segmentation.

Another approach would be a c0mputer vision method for finding the distance between the camera and the objects in the image. Then you would establish a threshold for separating the foreground from the background. After that, you can use the same mask I used to remove the background and introduce a new one.

Thank you so much for reading this. Interested in more stories like this? Follow me on Twitter at @b_dmarius and I'll post there every new article.

以上所述就是小编给大家介绍的《Virtual Background For Video Conferencing In Python and OpenCV - A Silly Approach》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

Virtual Background For Video Conferencing In Python and OpenCV - A Silly Approach

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

程序员面试金典（第5版）

[美] Gayle Laakmann McDowell / 李琳骁、漆　犇 / 人民邮电出版社 / 2013-11 / 59.00

本书是原谷歌资深面试官的经验之作，层层紧扣程序员面试的每一个环节，全面而详尽地介绍了程序员应当如何应对面试，才能在面试中脱颖而出。第1～7 章主要涉及面试流程解析、面试官的幕后决策及可能提出的问题、面试前的准备工作、对面试结果的处理等内容；第8～9 章从数据结构、概念与算法、知识类问题和附加面试题4 个方面，为读者呈现了出自微软、苹果、谷歌等多家知名公司的150 道编程面试题，并针对每一道面试题目......一起来看看《程序员面试金典（第5版）》这本书的介绍吧!

码农工具