内容简介:These days, machine learning and computer vision are all the craze. We’ve all seen the news about self-driving cars and facial recognition and probably imagined how cool it’d be to build our own computer vision models. However, it’s not always easy to brea
These days, machine learning and computer vision are all the craze. We’ve all seen the news about self-driving cars and facial recognition and probably imagined how cool it’d be to build our own computer vision models. However, it’s not always easy to break into the field, especially without a strong math background. Libraries like PyTorch and TensorFlow can be tedious to learn if all you want to do is experiment with something small.
In this tutorial, I present a simple way for anyone to build fully-functional object detection models with just a few lines of code. More specifically, we’ll be using Detecto , a Python package built on top of PyTorch that makes the process easy and open to programmers at all levels.
Quick and easy example
To demonstrate how simple it is to use Detecto, let’s load in a pre-trained model and run inference on the following image:
First, download the Detecto package using pip:
pip3 install detecto
Then, save the image above as “fruit.jpg” and create a Python file in the same folder as the image. Inside the Python file, write these 5 lines of code:
from detecto import core, utils, visualize image = utils.read_image('fruit.jpg') model = core.Model() labels, boxes, scores = model.predict_top(image) visualize.show_labeled_image(image, boxes, labels)
After running this file (it may take a few seconds if you don’t have a CUDA-enabled GPU on your computer; more on that later), you should see something similar to the plot below:
Awesome! We did all that with just 5 lines of code. Here’s what we did in each:
- Imported Detecto’s modules
- Read in an image
- Initialized a pre-trained model
- Generated the top predictions on our image
- Plotted our predictions
Detecto uses a Faster R-CNN ResNet-50 FPN from PyTorch’s model zoo, which is able to detect about 80 different objects such as animals, vehicles, kitchen appliances, etc. However, what if you wanted to detect custom objects, like Coke vs. Pepsi cans, or zebras vs. giraffes?
You’ll be glad to know that training a Detecto model on a custom dataset is just as easy; again, all you need is 5 lines of code, as well as either an existing dataset or some time spent labeling images.
Building a custom dataset
In this tutorial, we’ll start from scratch by building our own dataset. I recommend that you do the same, but if you want to skip this step, you can download a sample dataset here (modified from Stanford’s Dog Dataset ).
For our dataset, we’ll be training our model to detect an underwater alien, bat, and witch from the RoboSub competition, as shown below:
Ideally, you’ll want at least 100 images of each class. The good thing is that you can have multiple objects in each image, so you could theoretically get away with 100 total images if each image contains every class of object you want to detect. Also, if you have video footage, Detecto makes it easy to split that footage into images that you can then use for your dataset:
from detecto.utils import split_video split_video('video.mp4', 'frames/', step_size=4)
The code above takes every 4th frame in “video.mp4” and saves it as a JPEG file in the “frames” folder.
Once you’ve produced your training dataset, you should have a folder that looks something like the following:
images/ | image0.jpg | image1.jpg | image2.jpg | ...
If you want, you can also have a second folder containing a set of validation images.
Now comes the time-consuming part: labeling. Detecto supports the PASCAL VOC format, in which you have XML files containing label and position data for each object in your images. To create these XML files, you can use the open-source LabelImg tool as follows:
pip3 install labelImg # Download LabelImg using pip labelImg # Launch the application
You should now see a window pop up. On the left, click the “Open Dir” button and select the folder of images that you want to label. If things worked correctly, you should see something like this:
To draw a bounding box, click the icon in the left menu bar (or use the keyboard shortcut “w”). You can then drag a box around your objects and write/select a label:
When you’ve finished labeling an image, use CTRL+S or CMD+S to save your XML file (for simplicity and speed, you can just use the default file location and name that they auto-fill). To label the next image, click “Next Image” (or use the keyboard shortcut “d”).
Once you’re done with the entire dataset, your folder should look something like this:
images/ | image0.jpg | image0.xml | image1.jpg | image1.xml | ...
We’re almost ready to start training our object detection model!
Getting access to a GPU
First, check whether your computer has a CUDA-enabled GPU . Since deep learning uses a lot of processing power, training on a typical CPU can be very slow. Thankfully, most modern deep learning frameworks like PyTorch and Tensorflow can run on GPUs, making things much faster. Make sure you have PyTorch downloaded (you should already have it if you installed Detecto), and then run the following 2 lines of code:
import torch print(torch.cuda.is_available())
If it prints True, great! You can skip to the next section. If it prints False, don’t fret. Follow the below steps to create a Google Colaboratory notebook, an online coding environment that comes with a free, usable GPU. For this tutorial, you’ll just be working from within a Google Drive folder rather than on your computer.
1. Log in to Google Drive
2. Create a folder called “Detecto Tutorial” and navigate into this folder
3. Upload your training images (and/or validation images) to this folder
4. Right-click, go to “More”, and click “Google Colaboratory”:
You should now see an interface like this:
5. Give your notebook a name if you want, and then go to Edit ->Notebook settings -> Hardware accelerator and select GPU
6. Type the following code to “mount” your Drive, change directory to the current folder, and install Detecto:
import os from google.colab import drive drive.mount('/content/drive') os.chdir('/content/drive/My Drive/Detecto Tutorial') !pip install detecto
!lsto check that you’re in the right directory.
Train a custom model
Finally, we can now train a model on our custom dataset! As promised, this is the easy part. All it takes is 4 lines of code:
from detecto import core, utils, visualize dataset = core.Dataset('images/') model = core.Model(['alien', 'bat', 'witch']) model.fit(dataset)
Let’s again break down what we’ve done with each line of code:
- Imported Detecto’s modules
- Created a Dataset from the “images” folder (containing our JPEG and XML files)
- Initialized a model to detect our custom objects (alien, bat, and witch)
- Trained our model on the dataset
This can take anywhere from 10 minutes to 1+ hours to run depending on the size of your dataset, so make sure your program doesn’t exit immediately after finishing the above statements (i.e. you’re using a Jupyter/Colab notebook that preserves state while active).
Using the trained model
read_imagefunction from the
detecto.utilsmodule (you could also use an image from the Dataset you created above):
# Specify the path to your image image = utils.read_image('images/image0.jpg') predictions = model.predict(image) # predictions format: (labels, boxes, scores) labels, boxes, scores = predictions # ['alien', 'bat', 'bat'] print(labels) # xmin ymin xmax ymax # tensor([[ 569.2125, 203.6702, 1003.4383, 658.1044], # [ 276.2478, 144.0074, 579.6044, 508.7444], # [ 277.2929, 162.6719, 627.9399, 511.9841]]) print(boxes) # tensor([0.9952, 0.9837, 0.5153]) print(scores)
labels[0]) at the coordinates [569, 204, 1003, 658] (
boxes[0]) with a confidence level of 0.995 (
scores[0]).
detecto.visualizemodule. For example:
visualize.show_labeled_image(image, boxes, labels)
Running the above code with the image and predictions you received should produce something that looks like this:
If you have a video, you can run object detection on it:
visualize.detect_video(model, 'input.mp4', 'output.avi')
This takes in a video file called “input.mp4” and produces an “output.avi” file with the given model’s predictions. If you open this file with VLC or some other video player, you should see some promising results!
Lastly, you can save and load models from files, allowing you to save your progress and come back to it later:
model.save('model_weights.pth') # ... Later ... model = core.Model.load('model_weights.pth', ['alien', 'bat', 'witch'])
Advanced usage
You’ll be happy to know that Detecto isn’t just limited to 5 lines of code. Let’s say for example that the model didn’t do as well as you hoped. We can try to increase its performance by augmenting our dataset with torchvision transforms and defining a custom DataLoader :
from torchvision import transforms augmentations = transforms.Compose([ transforms.ToPILImage(), transforms.RandomHorizontalFlip(0.5), transforms.ColorJitter(saturation=0.5), transforms.ToTensor(), utils.normalize_transform(), ]) dataset = core.Dataset('images/', transform=augmentations) loader = core.DataLoader(dataset, batch_size=2, shuffle=True)
batch_size=2; we’ll pass this to
model.fitinstead of the Dataset to tell our model to train on batches of 2 images rather than the default of 1.
fitmethod returns a list of the losses at each epoch, and if
verbose=True, then it will also print these out during the training process itself. The following code block demonstrates this as well as customizes several other training parameters:
import matplotlib.pyplot as plt val_dataset = core.Dataset('validation_images/') losses = model.fit(loader, val_dataset, epochs=10, learning_rate=0.001, lr_step_size=5, verbose=True) plt.plot(losses) plt.show()
The resulting plot of the losses should be more or less decreasing:
model.get_internal_modelmethod returns the underlying torchvision model used, which you can mess around with as much as you see fit.
Conclusion
In this tutorial, we showed that computer vision and object detection don’t need to be challenging. All you need is a bit of time and patience to come up with a labeled dataset.
If you’re interested in further exploration, check out Detecto on GitHub or visit the documentation for more tutorials and use cases!
Previously published at https://medium.com/@alankbi/build-a-custom-trained-object-detection-model-with-5-lines-of-code-713ba7f6c0fb
以上所述就是小编给大家介绍的《Build a Custom-Trained Object Detection Model With 5 Lines of Code》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。