Source: Deep Learning on Medium
- Install OpenCV
- Multiprocessing VideoReader
- Tensorflow model Megadetector
- Batches
- Possible optimizations: Graph Optimize, TensorRT
As I didn t have data, resources, and time to train my own animal detection neural network, I searched the net for what was available today. And I ve found that the task even with state-of-the-art neural networks and data gathered all the world is not so simple as it seemed.
Of course, there are products and researches doing animal detection. Still, with one main difference from what I was looking for they are detecting creatures from photo cameras or smartphone cameras, and such shots differ by color, shapes, and quality from what you are getting with motion detection cameras.
But, whatever. There are still projects doing the same as my goal was. My searches led me to the CameraTraps project from Microsoft. As I understood, they are building Image Recognition API using data collected from different Wild Life Cameras all over the world. As a result of that, they open-sourced the pre-trained model for detecting, if animal or human , is present on the image, called MegaDetector.
The main limitation of that model is coming from the name of the model. It is only a Detector, but not a Classifier.
Even considered limitations like that, such an approach did fit me perfectly.
The model is trained to detect three different classes:
- Animal
- Person
- Vehicle
In most of the cases, you ll find in various blog posts when speaking of video object detection, the real-time video will be described. My case was a bit different as an input, I had a huge pile of video files produced by the camera, and as output, I also wanted video files.
For reading and writing video files in Python today as standard-de-facto considered OpenCV library. I also find it my favorite image manipulation package.
The logic of running inference on video file is quite straightforward:
- Read: Get a frame from the video
- Detect: Run inference on the image
- Write: Save a video frame to the new file with detection if there are any.
- Repeat: Run steps 1 3 until the end of the video
It can be implemented with this code sample
Even though such a straightforward approach has several bottlenecks as the same thread reading and writing, it works. So, if you are looking for a code to try your model on a video, check that script.
It took me around 10 minutes to process a FullHD one-minute 10 FPS video file.
Detection took 9 minutes and 18.18 seconds. Average detection time per frame: 0.93 seconds
But you can find many tutorials like that telling you how to run a vanilla OpenCV/Tensorflow inference. The challenging part is how to make that code run continuously and with nice performance.
I/O blocks
With the code provided, reading frames, detecting, and writing back are happening in the same loop, and it means sooner or later one of the operations will become a bottleneck, for example, reading video files from not-very-stable network storage.
To get rid of that part, I ve used instructions from a fantastic computer vision blogger Adrian Rosebrock and his library imutils. He is offering splitting reading frames and processing frames into multiple threads, and such an approach gave me a prepopulated queue of frames ready to be processed.
It won t impact much on inference time, but it helps with slow drives, which are often used for video storage.
Optimization: Graph analysis
Another part, I ve heard about was optimizing models for deployment. I ve followed a guide discovered here: https://towardsdatascience.com/optimize-nvidia-gpu-performance-for-efficient-model-inference-f3e9874e9fdc and managed to achieve some improvement by assigning non-GPU supported layers to be processed on CPU.
[INFO] :: Detection took 8 minutes and 39.91 seconds. Average detection time per frame: 0.86 seconds
Batch inference
Based on my previous experience, one of the bottleneck parts in deep learning training was data transfer from disk to GPU, and to minimize that time were used so-called batches when GPU got several images at once.
I wondered if it was possible to do the same batch processing on inference. And luckily, it was possible according to StackOverflow answer.
I just needed to find the largest acceptable batch size and pass array or frames for inference. For that, I ve extended FileVideoStream
class with batch functionality
[INFO] :: Detection took 8 minutes and 1.12 second. Average detection time per frame: 0.8 seconds
Optimization: Compiling from sources
Another important part, when we are talking about running heavy, time-consuming computations, is squeezing the most from the hardware.
One of the most straightforward approaches is using machine-type optimized packages. The message every Tensorflow user has seen:
tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
It means that Tensorflow is underutilizing hardware because of ignoring built-in CPU optimizations. And the reason for that is because the generic package was installed, which will work on any type of x86 machine.
One way of increasing its performance is to install optimized package from 3rd parties like https://github.com/lakshayg/tensorflow-build, https://github.com/mind/wheels or https://github.com/yaroslavvb/tensorflow-community-wheels/issues
Another way is to follow instructions from Google and build the package from source https://www.tensorflow.org/install/source#tensorflow_1x. But consider that it is could a bit difficult if you didn t have an experience before and it is quite a time and RAM consuming process (last time it took 3.5 hours on my six-core CPU).
The same comes with OpenCV, but that is an even more complex topic, so I m not covering it here. There are handy guides by Adrian Rosebrock, if you are interested in that topic, please follow them.
Share
Small Python application waiting for the incoming videos with detections. As the video arrives, it updates my Telegram channel. I ve used my previous project, which was resending incoming videos to my telegram channel.
The app is configured like that and continuously monitoring a folder for the new files using watchdog library
{
"xiaomi_video_watch_dir" : PATH_TO_WATCH,
"xiaomi_video_temp_dir" : PATH_TO_STORE_TEMP_FILES,
"xiaomi_video_gif_dir" : PATH_WITH_OUTPUT_GIFS,
"tg_key" : TELEGRAM_KEY
}
What didn t work out?
This project brought me lots of new learnings and even though I ve managed to reach my final goal, I ve gone through some failed trials. And I think that is one of the most important parts of each project.
Image Enhancing
During my research, I ve come across several reports from iWildCam Kaggle competition participants. They mentioned quite often about applying the CLAHE algorithm to input images for Histogram Equalization. I ve tried the mentioned algorithm and several others, but with no success. Applying image modification dropped the number of successful detections. But to be honest, night camera images looked more sharp and crisp.
def enchance_image(frame):
temp_img = frame
img_wb = wb.balanceWhite(temp_img)
img_lab = cv.cvtColor(img_wb, cv.COLOR_BGR2Lab)
l, a, b = cv.split(img_lab)
img_l = clahe.apply(l)
img_clahe = cv.merge((img_l, a, b))
return cv.cvtColor(img_clahe, cv.COLOR_Lab2BGR)
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。