内容简介:Semi-automate proctoring based on vision and audio based capabilities to prevent cheating in online exams and monitor multiple students at a time.With the advent of COVID-19, remote learning has blossomed. Schools and universities may have been shut down b
Semi-automate proctoring based on vision and audio based capabilities to prevent cheating in online exams and monitor multiple students at a time.
Jun 28 ·6min read
With the advent of COVID-19, remote learning has blossomed. Schools and universities may have been shut down but they switched to applications like Microsoft Teams to finish their academic years. However, there has been no solution to examinations. Some have changed it to an assignment form where students can just copy and paste from the internet, while some have just canceled them outright. If the way we are living is to be the new norm there needs to be some solution.
ETS conducts TOEFL and GRE among others is allowing students to give exams from home where they will be monitored by a proctor for the whole duration of the exam. Implementing this scheme at a large scale will not be plausible due to the workforce required. So let’s create an AI in python which can monitor the students using the webcam and laptop microphone itself and would enable the teachers to monitor multiple students at once. The entire code can be found on my Github repo .
The AI will have four vision-based capabilities which are combined using multithreading so that they can work together:
- Gaze tracking
- Mouth open or close
- Person Counting
- Mobile phone detection
Apart from this, the speech from the microphone will be recorded, converted to text, and will also be compared to the text of the question paper to report the number of common words spoken by the test-taker.
Requirements
- OpenCV
- Dlib
- TensorFlow
- Speech_recognition
- PyAudio
- NLTK
Vision-Based Techniques
Gaze Tracking
We shall aim to track the eyeballs of the test-taker and report if he is looking to the left, right, or up which he might do to have a glance at a notebook or signal to someone. This can be done using Dlib’s facial keypoint detector and OpenCV for further image processing. I have already written an article on how to do real-time eye-tracking which explains in detail the methods used that will be used later on.
Mouth Detection
This is very similar to eye detection. Dlib’s facial keypoints are again used for this task and the test-taker is required to sit straight (as he would in the test) and the distance between the lips keypoints (5 outer pairs and 3 inner pairs) is noted for 100 frames and averaged.
If the user opens his/her mouth the distances between the points increases and if the increase in distance is more than a certain value for at least three outer pairs and two inner pairs then infringement is reported.
Person Counting and Mobile Phone Detection
I used the pre-trained weights of YOLOv3 trained on the COCO dataset to detect people and mobile phones in the webcam feed. For an in-depth explanation on how to use YOLOv3 in TensorFlow2 and to perform people counting you can refer to this article:
If the count is not equal to an alarm can be raised. The index of mobile phones in the COCO dataset is 67 so we need to check if any class index is equal to that then we can report a mobile phone as well.
Combining using Multithreading
Let’s dive into the code now. As eye-tracking and mouth detection are based on dlib we can create a single thread for them and another thread can be used for the YOLOv3 tasks: people counting and mobile detection.
First, we import all the necessary libraries and along with the helper functions. Then the dlib and YOLO models are loaded. Now in the eyes_mouth()
function, we find out the facial key-points and work on them. For mouth detection, the original distances between in the outer and inner points are already defined and we calculate the current ones. If a certain amount is greater than the predefined ones, then the proctor is notified. For the eyes part, we find out their centroids as shown in the article linked and then we check which facial keypoints are they closest to. If both of them are on the sides then it is reported accordingly.
In the count_people_and_phone()
function, YOLOv3 is applied to the webcam feed. Then the classes of objects detected are checked and appropriate action is taken if more than one person is detected or a mobile phone is detected.
These functions are passed to in separate threads and have infinite loops in them which the proctor can break by pressing ‘q’ twice.
Audio
The idea is to record audio from the microphone and convert it to text using Google’s speech recognition API. The API needs a continuous voice from the microphone which is not plausible so the audio is recorded in chunks such there is no compulsory space requirement in using this method (a ten-second wave file had a size of 1.5 Mb so a three-hour exam should have roughly 1.6 Gb). A different thread is used to call the API so that a continuous recording can without interruptions, and the API processes the last one stored, appends its data to a text file, and then deletes it to save space.
After that using NLTK, we remove the stopwords from it. The question paper (in text format) is taken whose stopwords are also removed and their contents are compared. We assume if someone wants to cheat, they will speak something from the question paper. Finally, the common words along with its frequency are presented to the proctor. The proctor can also look at the text file which has all the words spoken by the candidate during the exam.
Until line 85 in the code, we are continuously recording, converting, and storing text data in a file. The function read_audio()
, as its name suggests, is used to record audio using a stream passed on to it by stream_audio()
. The function convert()
uses the API to convert it to text and appends it to a file test.txt along with a blank space. This part will run for the entire duration of the examination.
After this, using NLTK, we convert the text stored to tokens and remove the stop-words. The same is done for a text file of the question paper as well and then common words are found out and reported to the proctor.
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Java Servlet & JSP Cookbook
Bruce W. Perry / O'Reilly Media / 2003-12-1 / USD 49.99
With literally hundreds of examples and thousands of lines of code, the Java Servlet and JSP Cookbook yields tips and techniques that any Java web developer who uses JavaServer Pages or servlets will ......一起来看看 《Java Servlet & JSP Cookbook》 这本书的介绍吧!