Automating Online Proctoring Using AI

栏目: IT技术 · 发布时间: 4年前

内容简介：Semi-automate proctoring based on vision and audio based capabilities to prevent cheating in online exams and monitor multiple students at a time.With the advent of COVID-19, remote learning has blossomed. Schools and universities may have been shut down b

Semi-automate proctoring based on vision and audio based capabilities to prevent cheating in online exams and monitor multiple students at a time.

Vardan Agarwal

Jun 28 ·6min read

Automating Online Proctoring Using AI — Photo by Everyday basics on Unsplash

With the advent of COVID-19, remote learning has blossomed. Schools and universities may have been shut down but they switched to applications like Microsoft Teams to finish their academic years. However, there has been no solution to examinations. Some have changed it to an assignment form where students can just copy and paste from the internet, while some have just canceled them outright. If the way we are living is to be the new norm there needs to be some solution.

ETS conducts TOEFL and GRE among others is allowing students to give exams from home where they will be monitored by a proctor for the whole duration of the exam. Implementing this scheme at a large scale will not be plausible due to the workforce required. So let’s create an AI in python which can monitor the students using the webcam and laptop microphone itself and would enable the teachers to monitor multiple students at once. The entire code can be found on my Github repo .

The AI will have four vision-based capabilities which are combined using multithreading so that they can work together:

Gaze tracking
Mouth open or close
Person Counting
Mobile phone detection

Apart from this, the speech from the microphone will be recorded, converted to text, and will also be compared to the text of the question paper to report the number of common words spoken by the test-taker.

Requirements

OpenCV
Dlib
TensorFlow
Speech_recognition
PyAudio
NLTK

Vision-Based Techniques

Gaze Tracking

We shall aim to track the eyeballs of the test-taker and report if he is looking to the left, right, or up which he might do to have a glance at a notebook or signal to someone. This can be done using Dlib’s facial keypoint detector and OpenCV for further image processing. I have already written an article on how to do real-time eye-tracking which explains in detail the methods used that will be used later on.

Real-time eye tracking using OpenCV and Dlib

Learn to create a real-time gaze detector through the webcam in python with this tutorial.

towardsdatascience.com

Mouth Detection

This is very similar to eye detection. Dlib’s facial keypoints are again used for this task and the test-taker is required to sit straight (as he would in the test) and the distance between the lips keypoints (5 outer pairs and 3 inner pairs) is noted for 100 frames and averaged.

If the user opens his/her mouth the distances between the points increases and if the increase in distance is more than a certain value for at least three outer pairs and two inner pairs then infringement is reported.

Person Counting and Mobile Phone Detection

I used the pre-trained weights of YOLOv3 trained on the COCO dataset to detect people and mobile phones in the webcam feed. For an in-depth explanation on how to use YOLOv3 in TensorFlow2 and to perform people counting you can refer to this article:

Count people in webcam using pre-trained YOLOv3

Learn to use instance segmentation (YOLOv3) to count the number of people using its pre-trained weights with TensorFlow…

medium.com

If the count is not equal to an alarm can be raised. The index of mobile phones in the COCO dataset is 67 so we need to check if any class index is equal to that then we can report a mobile phone as well.

Combining using Multithreading

Let’s dive into the code now. As eye-tracking and mouth detection are based on dlib we can create a single thread for them and another thread can be used for the YOLOv3 tasks: people counting and mobile detection.

First, we import all the necessary libraries and along with the helper functions. Then the dlib and YOLO models are loaded. Now in the eyes_mouth() function, we find out the facial key-points and work on them. For mouth detection, the original distances between in the outer and inner points are already defined and we calculate the current ones. If a certain amount is greater than the predefined ones, then the proctor is notified. For the eyes part, we find out their centroids as shown in the article linked and then we check which facial keypoints are they closest to. If both of them are on the sides then it is reported accordingly.

In the count_people_and_phone() function, YOLOv3 is applied to the webcam feed. Then the classes of objects detected are checked and appropriate action is taken if more than one person is detected or a mobile phone is detected.

These functions are passed to in separate threads and have infinite loops in them which the proctor can break by pressing ‘q’ twice.

Audio

The idea is to record audio from the microphone and convert it to text using Google’s speech recognition API. The API needs a continuous voice from the microphone which is not plausible so the audio is recorded in chunks such there is no compulsory space requirement in using this method (a ten-second wave file had a size of 1.5 Mb so a three-hour exam should have roughly 1.6 Gb). A different thread is used to call the API so that a continuous recording can without interruptions, and the API processes the last one stored, appends its data to a text file, and then deletes it to save space.

After that using NLTK, we remove the stopwords from it. The question paper (in text format) is taken whose stopwords are also removed and their contents are compared. We assume if someone wants to cheat, they will speak something from the question paper. Finally, the common words along with its frequency are presented to the proctor. The proctor can also look at the text file which has all the words spoken by the candidate during the exam.

Until line 85 in the code, we are continuously recording, converting, and storing text data in a file. The function read_audio() , as its name suggests, is used to record audio using a stream passed on to it by stream_audio() . The function convert() uses the API to convert it to text and appends it to a file test.txt along with a blank space. This part will run for the entire duration of the examination.

After this, using NLTK, we convert the text stored to tokens and remove the stop-words. The same is done for a text file of the question paper as well and then common words are found out and reported to the proctor.

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Automating Online Proctoring Using AI

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

JavaScript语言精粹

道格拉斯•克罗克福德 (Douglas Crockford) / 赵泽欣、鄢学鹍 / 电子工业出版社 / 2012-9-1 / 49.00元

JavaScript 曾是“世界上最被误解的语言”，因为它担负太多的特性，包括糟糕的交互和失败的设计，但随着Ajax 的到来，JavaScript“从最受误解的编程语言演变为最流行的语言”，这除了幸运之外，也证明了它其实是一门优秀的语言。Douglas Crockford 在本书中剥开了JavaScript 沾污的外衣，抽离出一个具有更好可靠性、可读性和可维护性的JavaScript 子集，让你看......一起来看看《JavaScript语言精粹》这本书的介绍吧!

码农工具