内容简介:That is it guys! You now have a working instance segmentation pipeline for you to put to use. The entire code for this project as well as a clean and easy to use interface can be found in my repository givenI suggest that you read up on the R-CNN architect
Image Segmentation Using Mask R-CNN
A simple tutorial to perform instance segmentation using Python and OpenCV
Jul 12 ·6min read
Computer Vision as a field of research has seen a lot of development in recent years. Ever since the introduction of Convolutional Neural Networks, the state of the art in domains such as classification, object detection, image segmentation, etc. have constantly been challenged. With the aid of sophisticated hardware providing very high computational power, these neural network models are being employed in real-time in emerging fields such as Autonomous Navigation.
Our topic of focus today will be a sub-field of Computer Vision known as Image Segmentation. To be more precise, we’ll be performing Instance Segmentation on an image or video. Okay, that was a lot of technical jargon. Let’s reel it back a bit and understand what those terms mean. Each of these would require a post in itself and we won’t dive too deep into it now. I might write some articles on each of them separately in the near future. For now, let’s understand what Image Segmentation is, in a simplified manner. Also, if you do not know what Object Detection is, I suggest you read up on it as it would make it easier to understand the upcoming concepts easily. I have also written a concise article on implementing an Object Detection algorithm. Should you be interested in it, you can find it in my profile.
What is Image Segmentation?
Image segmentation is the process of classifying each pixel in the image as belonging to a specific category. Though there are several types of image segmentation methods, the two types of segmentation that are predominant when it comes to the domain of Deep Learning are:
- Semantic Segmentation
- Instance Segmentation
Let me draw a concise comparison between the two. In Semantic Segmentation every pixel of an object belonging to a particular class is given the same label/color value. On the other hand, in Instance Segmentation every pixel of each object of a class is given a separate label/color value. Take a look at the image below and read the previous sentence once again to understand it clearly. I hope it makes sense now :)
Implementation!
Mask R-CNN (Regional Convolutional Neural Network) is an Instance segmentation model. In this tutorial, we’ll see how to implement this in python with the help of the OpenCV library. If you are interested in learning more about the inner-workings of this model, I’ve given a few links at the reference section down below. That would help you understand the functionality of these models in great detail.
We begin by cloning (or downloading ) the given repository:-
Make sure you have all the dependencies listed in the requirements.txt
installed in your python environment. After which, don’t forget to run the command python setup.py install
. We’ll be using a model pre-trained on the coco dataset. The weights for which can be downloaded from here and the class names could be obtained from the coco_classes.txt
file from my repository . Now let’s start by creating a file called mask.py
within the cloned or downloaded repository and import the required libraries.
We will be using our own custom class and hence will be inheriting the existing CocoConfig
class and overriding the values of its variables. Note that you can set these values according to the capability of your GPU.
Now we’ll create a function called prepare_mrcnn_model()
that takes care of reading the class labels as well as initializing the model object according to our custom config. We also specify a color-to-object mapping to implement semantic segmentation too(if required). We’ll talk about this in a bit. The mode of our model should be set to ‘inference’ as we are going to use it for testing directly. The path to the pre-trained weights is supplied so that it can be loaded into the model.
The next function will be the most crucial part of this tutorial. Don’t be scared by its length, it’s the most simple one yet! We will now pass the test image to the detect function of our model object. This would perform all the detection and segmentation of objects for us. We now have two options — we can either choose to assign every object of the class the same color or assign every object irrespective of its class a distinct color. Since this post is steadfast about implementing instance segmentation, I chose the latter. I have given you the flexibility to do both so that you can strengthen your understanding of the same. This can be done by toggling the instance_segmentation
parameter.
The code above sketches the bounding box of the object while also segmenting it at the pixel level. It also provides the class that object belongs to along with the score. I’ve also given you the option to visualize the output using the internal visualization function present in Matterport’s MRCNN implementation. This can be accessed using the mrcnn_visualize
boolean parameter shown above.
Now that we have our model and the code to process the output which it produces, we only need to read and pass in the input image. This is done by the perform_inference_image()
function given down below. The save_enable
parameter allows us to save the processed image with the objects segmented.
This can now also be easily extended to video and live web-cam feed as shown by the perform_inference_video()
function here.
As you can see in the image above, every object is segmented. The model detects multiple instances of the class book and hence assigns each instance a separate color.
That is it guys! You now have a working instance segmentation pipeline for you to put to use. The entire code for this project as well as a clean and easy to use interface can be found in my repository given here . Additional steps to use the same can also be found there. I do realize that some of you might not have a CUDA compatible GPU or rather no GPU at all. So, I have also provided a Colab notebook that can be used to run the code. Integrating web-cam usage in Colab still eludes me and hence inference can only be done on images and video files for now. Matterport’s Mask R-CNN code supports Tensorflow 1.x by default. If you are using TF2.x, you are better off forking/cloning my repository directly as I have ported the code to support TF2.x.
I suggest that you read up on the R-CNN architectures (especially Faster R-CNN) to completely understand the working of Mask R-CNN. I’ve given links to some good resources in the references section down below. Also, having a sound understanding of how Object Detection works would make it easy to comprehend the crux of Image Segmentation.
~~ SowmiyaNarayanan G
Mind Bytes:-
“The secret of getting ahead is getting started.” — Mark Twain.
PS:-
Feel free to reach out to me if you have any doubts, I am here to help. I am open to any sorts of criticism to improve my work so that I can better cater to the needs of explorers such as yourself in the future. Don’t hesitate to reach out to let me know what you think. You can also connect with me on LinkedIn .
References:-
- The original ‘Mask R-CNN’ paper — https://arxiv.org/pdf/1703.06870.pdf
- Matterport’s Mask R-CNN Repository <— “ Super Important :smiley:”
- R-CNN, Fast R-CNN, Faster R-CNN, YOLO — Object Detection Algorithms
- The code to my YOLOv3 implementation
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。