Build a Custom-Trained Object Detection Model With 5 Lines of Code

栏目: IT技术 · 发布时间: 4年前

内容简介:These days, machine learning and computer vision are all the craze. We’ve all seen the news about self-driving cars and facial recognition and probably imagined how cool it’d be to build our own computer vision models. However, it’s not always easy to brea
Build a Custom-Trained Object Detection Model With 5 Lines of Code

These days, machine learning and computer vision are all the craze. We’ve all seen the news about self-driving cars and facial recognition and probably imagined how cool it’d be to build our own computer vision models. However, it’s not always easy to break into the field, especially without a strong math background. Libraries like PyTorch and TensorFlow can be tedious to learn if all you want to do is experiment with something small.

In this tutorial, I present a simple way for anyone to build fully-functional object detection models with just a few lines of code. More specifically, we’ll be using  Detecto , a Python package built on top of PyTorch that makes the process easy and open to programmers at all levels.

Quick and easy example

To demonstrate how simple it is to use Detecto, let’s load in a pre-trained model and run inference on the following image:

Build a Custom-Trained Object Detection Model With 5 Lines of Code

First, download the Detecto package using pip:

pip3 install detecto

Then, save the image above as “fruit.jpg” and create a Python file in the same folder as the image. Inside the Python file, write these 5 lines of code:

from detecto import core, utils, visualize

image = utils.read_image('fruit.jpg')
model = core.Model()

labels, boxes, scores = model.predict_top(image)
visualize.show_labeled_image(image, boxes, labels)

After running this file (it may take a few seconds if you don’t have a CUDA-enabled GPU on your computer; more on that later), you should see something similar to the plot below:

Build a Custom-Trained Object Detection Model With 5 Lines of Code

Awesome! We did all that with just 5 lines of code. Here’s what we did in each:

  1. Imported Detecto’s modules
  2. Read in an image
  3. Initialized a pre-trained model
  4. Generated the top predictions on our image
  5. Plotted our predictions

Detecto uses a  Faster R-CNN ResNet-50 FPN from PyTorch’s model zoo, which is able to detect about 80 different objects such as animals, vehicles, kitchen appliances, etc. However, what if you wanted to detect custom objects, like Coke vs. Pepsi cans, or zebras vs. giraffes?

You’ll be glad to know that training a Detecto model on a custom dataset is just as easy; again, all you need is 5 lines of code, as well as either an existing dataset or some time spent labeling images.

Building a custom dataset

In this tutorial, we’ll start from scratch by building our own dataset. I recommend that you do the same, but if you want to skip this step, you can download a sample dataset  here (modified from Stanford’s  Dog Dataset ).

For our dataset, we’ll be training our model to detect an underwater alien, bat, and witch from the  RoboSub competition, as shown below:

Build a Custom-Trained Object Detection Model With 5 Lines of Code

Ideally, you’ll want at least 100 images of each class. The good thing is that you can have multiple objects in each image, so you could theoretically get away with 100 total images if each image contains every class of object you want to detect. Also, if you have video footage, Detecto makes it easy to split that footage into images that you can then use for your dataset:

from detecto.utils import split_video

split_video('video.mp4', 'frames/', step_size=4)

The code above takes every 4th frame in “video.mp4” and saves it as a JPEG file in the “frames” folder.

Once you’ve produced your training dataset, you should have a folder that looks something like the following:

images/
|   image0.jpg
|   image1.jpg
|   image2.jpg
|   ...

If you want, you can also have a second folder containing a set of validation images.

Now comes the time-consuming part: labeling. Detecto supports the PASCAL VOC format, in which you have XML files containing label and position data for each object in your images. To create these XML files, you can use the open-source  LabelImg tool as follows:

pip3 install labelImg    # Download LabelImg using pip
labelImg                 # Launch the application

You should now see a window pop up. On the left, click the “Open Dir” button and select the folder of images that you want to label. If things worked correctly, you should see something like this:

Build a Custom-Trained Object Detection Model With 5 Lines of Code

To draw a bounding box, click the icon in the left menu bar (or use the keyboard shortcut “w”). You can then drag a box around your objects and write/select a label:

Build a Custom-Trained Object Detection Model With 5 Lines of Code

When you’ve finished labeling an image, use CTRL+S or CMD+S to save your XML file (for simplicity and speed, you can just use the default file location and name that they auto-fill). To label the next image, click “Next Image” (or use the keyboard shortcut “d”).

Once you’re done with the entire dataset, your folder should look something like this:

images/
|   image0.jpg
|   image0.xml
|   image1.jpg
|   image1.xml
|   ...

We’re almost ready to start training our object detection model!

Getting access to a GPU

First, check whether your computer has a  CUDA-enabled GPU . Since deep learning uses a lot of processing power, training on a typical CPU can be very slow. Thankfully, most modern deep learning frameworks like PyTorch and Tensorflow can run on GPUs, making things much faster. Make sure you have PyTorch downloaded (you should already have it if you installed Detecto), and then run the following 2 lines of code:

import torch

print(torch.cuda.is_available())

If it prints True, great! You can skip to the next section. If it prints False, don’t fret. Follow the below steps to create a  Google Colaboratory notebook, an online coding environment that comes with a free, usable GPU. For this tutorial, you’ll just be working from within a Google Drive folder rather than on your computer.

1. Log in to  Google Drive

2. Create a folder called “Detecto Tutorial” and navigate into this folder

3. Upload your training images (and/or validation images) to this folder

4. Right-click, go to “More”, and click “Google Colaboratory”:

Build a Custom-Trained Object Detection Model With 5 Lines of Code

You should now see an interface like this:

Build a Custom-Trained Object Detection Model With 5 Lines of Code

5. Give your notebook a name if you want, and then go to Edit ->Notebook settings -> Hardware accelerator and select GPU

6. Type the following code to “mount” your Drive, change directory to the current folder, and install Detecto:

import os
from google.colab import drive

drive.mount('/content/drive')

os.chdir('/content/drive/My Drive/Detecto Tutorial')

!pip install detecto
To make sure everything worked, you can create a new code cell and type 
!ls
to check that you’re in the right directory.

Train a custom model

Finally, we can now train a model on our custom dataset! As promised, this is the easy part. All it takes is 4 lines of code:

from detecto import core, utils, visualize

dataset = core.Dataset('images/')
model = core.Model(['alien', 'bat', 'witch'])

model.fit(dataset)

Let’s again break down what we’ve done with each line of code:

  1. Imported Detecto’s modules
  2. Created a Dataset from the “images” folder (containing our JPEG and XML files)
  3. Initialized a model to detect our custom objects (alien, bat, and witch)
  4. Trained our model on the dataset

This can take anywhere from 10 minutes to 1+ hours to run depending on the size of your dataset, so make sure your program doesn’t exit immediately after finishing the above statements (i.e. you’re using a Jupyter/Colab notebook that preserves state while active).

Using the trained model

Now that you have a trained model, let’s test it on some images. To read images from a file path, you can use the 
read_image
function from the 
detecto.utils
module (you could also use an image from the  Dataset you created above):
# Specify the path to your image
image = utils.read_image('images/image0.jpg')
predictions = model.predict(image)

# predictions format: (labels, boxes, scores)
labels, boxes, scores = predictions

# ['alien', 'bat', 'bat']
print(labels) 

#           xmin       ymin       xmax       ymax
# tensor([[ 569.2125,  203.6702, 1003.4383,  658.1044],
#         [ 276.2478,  144.0074,  579.6044,  508.7444],
#         [ 277.2929,  162.6719,  627.9399,  511.9841]])
print(boxes)

# tensor([0.9952, 0.9837, 0.5153])
print(scores)
As you can see, the model’s predict method returns a tuple of 3 elements: labels, boxes, and scores. In the above example, the model predicted an alien (
labels[0]
) at the coordinates [569, 204, 1003, 658] (
boxes[0]
) with a confidence level of 0.995 (
scores[0]
).
From these predictions, we can plot the results using the
detecto.visualize
module. For example:
visualize.show_labeled_image(image, boxes, labels)

Running the above code with the image and predictions you received should produce something that looks like this:

Build a Custom-Trained Object Detection Model With 5 Lines of Code

If you have a video, you can run object detection on it:

visualize.detect_video(model, 'input.mp4', 'output.avi')

This takes in a video file called “input.mp4” and produces an “output.avi” file with the given model’s predictions. If you open this file with  VLC or some other video player, you should see some promising results!

Lastly, you can save and load models from files, allowing you to save your progress and come back to it later:

model.save('model_weights.pth')

# ... Later ...

model = core.Model.load('model_weights.pth', ['alien', 'bat', 'witch'])

Advanced usage

You’ll be happy to know that Detecto isn’t just limited to 5 lines of code. Let’s say for example that the model didn’t do as well as you hoped. We can try to increase its performance by augmenting our dataset with torchvision transforms and defining a custom  DataLoader :

from torchvision import transforms

augmentations = transforms.Compose([
    transforms.ToPILImage(),
    transforms.RandomHorizontalFlip(0.5),
    transforms.ColorJitter(saturation=0.5),
    transforms.ToTensor(),
    utils.normalize_transform(),
])

dataset = core.Dataset('images/', transform=augmentations)

loader = core.DataLoader(dataset, batch_size=2, shuffle=True)
This code applies random horizontal flips and saturation effects on images in our dataset, increasing the diversity of our data. We then define a DataLoader object with 
batch_size=2
; we’ll pass this to  
model.fit
instead of the Dataset to tell our model to train on batches of 2 images rather than the default of 1.
If you created a separate validation dataset earlier, now is the time to load it in during training. By providing a validation dataset, the 
fit
method returns a list of the losses at each epoch, and if 
verbose=True
, then it will also print these out during the training process itself. The following code block demonstrates this as well as customizes several other training parameters:
import matplotlib.pyplot as plt

val_dataset = core.Dataset('validation_images/')

losses = model.fit(loader, val_dataset, epochs=10, learning_rate=0.001, 
                   lr_step_size=5, verbose=True)
                   
plt.plot(losses)
plt.show()

The resulting plot of the losses should be more or less decreasing:

Build a Custom-Trained Object Detection Model With 5 Lines of Code

For even more flexibility and control over your model, you can bypass Detecto altogether; the 
model.get_internal_model
method returns the underlying torchvision model used, which you can mess around with as much as you see fit.

Conclusion

In this tutorial, we showed that computer vision and object detection don’t need to be challenging. All you need is a bit of time and patience to come up with a labeled dataset.

If you’re interested in further exploration, check out  Detecto on GitHub or visit the  documentation for more tutorials and use cases!

Previously published at https://medium.com/@alankbi/build-a-custom-trained-object-detection-model-with-5-lines-of-code-713ba7f6c0fb


以上所述就是小编给大家介绍的《Build a Custom-Trained Object Detection Model With 5 Lines of Code》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

暗趋势

暗趋势

王煜全 / 中信出版集团 / 2019-1 / 59元

《暗趋势》由得到“全球创新260讲”专栏主讲人王煜全,为你揭示藏在科技浪潮中的商业机会,教你获得把握趋势的能力,发现小趋势,抓住大机遇。 《暗趋势》聚焦于改变你生活和未来的产业,深度解读人工智能、混合现实、区块链、生物医疗等你必须关注的科技行业,并分析新科技给企业和个人带来的发展机遇,前瞻性提出企业和个人的思维与行动应对策略。 王煜全作为全球科技前哨侦察兵,以其每年5亿元的科技投资及2......一起来看看 《暗趋势》 这本书的介绍吧!

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具