Real-Time Head Pose Estimation in Python

栏目: IT技术 · 发布时间: 5年前

内容简介:For this project, we need OpenCV and Tensorflow so let’s install them.Our first step is to find the faces in the images on which we can find facial landmarks. For this task, we will be using a Caffe model of OpenCV’s DNN module. If you are wondering how it

Requirements

For this project, we need OpenCV and Tensorflow so let’s install them.

#Using pip
pip install opencv-python
pip install tensorflow#Using conda
conda install -c conda-forge opencv
conda install -c conda-forge tensorflow

Face Detection

Our first step is to find the faces in the images on which we can find facial landmarks. For this task, we will be using a Caffe model of OpenCV’s DNN module. If you are wondering how it fares against other models like Haar Cascades or Dlib’s frontal face detector or you want to know more about it in-depth then you can refer to this article:

You can download the required models from my GitHub repository .

import cv2
import numpy as npmodelFile = "models/res10_300x300_ssd_iter_140000.caffemodel"
configFile = "models/deploy.prototxt.txt"
net = cv2.dnn.readNetFromCaffe(configFile, modelFile)img = cv2.imread('test.jpg')
h, w = img.shape[:2]
blob = cv2.dnn.blobFromImage(cv2.resize(img, (300, 300)), 1.0,
(300, 300), (104.0, 117.0, 123.0))
net.setInput(blob)
faces = net.forward()#to draw faces on image
for i in range(faces.shape[2]):
        confidence = faces[0, 0, i, 2]
        if confidence > 0.5:
            box = faces[0, 0, i, 3:7] * np.array([w, h, w, h])
            (x, y, x1, y1) = box.astype("int")
            cv2.rectangle(img, (x, y), (x1, y1), (0, 0, 255), 2)

Load the network using cv2.dnn.readNetFromCaffe and pass the model’s layers and weights as its arguments. It performs best on images resized to 300×300.

Facial Landmark Detection

The most commonly used one is Dlib’s facial landmark detection which gives us 68 landmarks, however, it does not give good accuracy. Instead, we will be using a facial landmark detector provided by Yin Guobing in this Github repo . It also gives 68 landmarks and it is a Tensorflow CNN trained on 5 datasets! The pre-trained model can be found here . The author has only written a series of posts explaining the includes background, dataset, preprocessing, model architecture, training, and deployment that can be found here . I have provided a very brief summary here, but I would strongly encourage you to read them.

In the first of those series, he describes the problem of stability of facial landmarks in videos followed by labeling out the existing solutions like OpenFace and Dlib’s facial landmark detection along with the datasets available. The third article is all about data preprocessing and making it ready to use. In the next two articles, the work is to extract the faces and apply facial landmarks on it to make it ready to train a CNN and store them as TFRecord files. In the sixth article, a model is trained using Tensorflow. In this article, we can see how important loss functions are in training as first he used tf.losses.mean_pairwise_squared_error which uses the relationships between points as the basis for optimization when minimizing loss and could not generalize well. In contrast, when tf.losses.mean_squared_error was used it worked well. In the final article, the model is exported as an API and shown how to use it in Python.

The model takes square boxes of size 128×128 which contain faces and return 68 facial landmarks. The code provided below is taken from here and it can also be used to draw 3D annotation boxes on it. The code is modified to draw facial landmarks on all the faces, unlike the original code which would draw on only one.

This code will draw facial landmarks on the faces.

Drawing facial landmarks

Using the draw_annotation_box() we can also draw the annotation box as shown below.

With annotation box

Pose Estimation

This is a great article on Learn OpenCV which explains head pose detection on images with a lot of Maths about converting the points to 3D space and using cv2.solvePnP to find rotational and translational vectors. A quick read-through of that article will be great to understand the intrinsic working and hence I will write about it only in brief here.

We need six points of the face i.e. is nose tip, chin, extreme left and right points of lips, and the left corner of the left eye and right corner of the right eye. We take standard 3D coordinates of these facial landmarks and try to estimate the rational and translational vectors at the nose tip. Now, for an accurate estimate, we need to intrinsic parameters of the camera like focal length, optical center, and radial distortion parameters. We can estimate the former two and assume the last one is not present to make our work easier. After obtaining the required vectors we can project those 3D points on a 2D surface that is our image.

If we only use the code available and find the angle with the x-axis we can obtain the result shown below.

Result

It works great for recording the head moving up and down but not moving left or right. So how to do that? Well, above we had seen an annotation box on the face. If we could utilize it somehow to measure the left and right movements.

With annotation box

We can find the line in the middle of the two dark blue lines to act as our pointer and find the angle with the y-axis to find the angle of movement.

Result

Combining both of them we can get the result in which direction we want. The complete code can also be found here at my GitHub repository along with various other sub-models for an online proctoring solution.


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

C++ 程序设计语言(特别版)(英文影印版)

C++ 程序设计语言(特别版)(英文影印版)

[美] Bjarne Stroustrup / 高等教育出版社 / 2001-8-1 / 55.00

《C++程序设计语言》(特别版)(影印版)作者是C++的发明人,对C++语言有着全面、深入的理解,因此他强调应将语言作为设计与编程的工具,而不仅仅是语言本身,强调只有对语言功能有了深入了解之后才能真正掌握它。《C++程序设计语言》编写的目的就是帮助读者了解C++是如何支持编程技术的,使读者能从中获得新的理解,从而成为一名优秀的编程人员和设计人员。一起来看看 《C++ 程序设计语言(特别版)(英文影印版)》 这本书的介绍吧!

随机密码生成器
随机密码生成器

多种字符组合密码

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具