A Thorough Breakdown of EfficientDet for Object Detection

栏目: IT技术 · 发布时间: 4年前

内容简介:In this post, we do a deep dive into the structure of EfficientDet for object detection, focusing on the model’s motivation, design, and architecture.Recently, the Google Brain teamIn this blog post, we explore the rationale behind the decisions that were

A Thorough Breakdown of EfficientDet for Object Detection

In this post, we do a deep dive into the structure of EfficientDet for object detection, focusing on the model’s motivation, design, and architecture.

Recently, the Google Brain team published their EfficientDet model for object detection with the goal of crystallizing architecture decisions into a scalable framework that can be easily applied to other use cases in object detection. The paper concludes that EfficientDet outperforms similar sized models on benchmark datasets . At Roboflow, we found that the base EfficientDet model generalizes to custom datasets hosted on our platform. (For an in depth tutorial on implementing EfficientDet, please see this blog post on how to train EfficientDet and this Colab Notebook on how to train EfficientDet .)

In this blog post, we explore the rationale behind the decisions that were made in forming the final EfficientDet model, how EfficientDet works, and how EfficientDet compares to popular object detection models like YOLOv3, Faster R-CNN, and MobileNet.

Exploring the rationale inspiring EfficientDet’s creation. Photo credit

Challenges in Deep Learning EfficientDet Addresses

Before exploring the model, here are some key areas that have prevented image detection systems from being deployed to real life use cases.

  1. Data Collection — With model architecture and pretrained checkpoints, EfficientDet cuts down on the amount of data required to generalize to a new domain.
  2. Model Design and Hyper Parameterization — Once the data has been collected, machine learning engineers need to carefully set up the model design and tune a number of hyper parameters.
  3. Training Time — the amount of time required to train the model on the gathered dataset. In the EfficientDet paper, this is measured in FLOPS (floating point operation per second).
  4. Memory Footprint — Once the model is trained, how much memory is required to store the model weights when called upon for inference?
  5. Inference Time — When the model is invoked, can it perform predictions quick enough to be used in a production setting?

Image Features and CNNs

The first step to training an object detection model is to translate the pixels of an image into features that can be fed through a neural network. Major progress has been made in the field of computer vision by using convolutional neural networks to create learnable features from an image. Convolutional neural networks mix and pool image features at different levels of granularity, allowing the model a choice of possible combinations to focus on when learning the image detection task at hand. However, the exact manner in which the convolutional neural network (ConvNet) creates features has been a keen area of interest in the research community for sometime. ConvNet releases have included ResNet, NASNet, YoloV3, Inception, DenseNet, … and each has sought to increase image detection performance by scaling ConvNet model size and tweaking the ConvNet design. These ConvNet models are provided in a scaling fashion, so programmers can deploy a larger model to improve performance if their resources allow.

EfficientNet: Motivation and Design

Recently, the Google Brain team released their own ConvNet model called EfficientNet. EfficientNet forms the backbone of the EfficientDet architecture, so we will cover its design before continuing to the contributions of EfficientDet. EfficientNet set out to study the scaling process of ConvNet architectures. There are many ways — it turns out - that you can add more parameters to a ConvNet.

You can make each layer wider, you can make the number of layers deeper, you can input images at a higher resolution, or you can make a combination of these improvements. As you can imagine, the exploration of all these possibilities can be quite tedious for machine learning researchers. EfficientNet set out to define an automatic procedure for scaling ConvNet model architectures. The paper seeks to optimize downstream performance given free range over depth, width, and resolution while staying within the constraints of target memory and target FLOPs. They find that their scaling methodology improves the optimization of previous ConvNets as well as their EfficientNet architecture.

EfficientNet Network, Scaling and Evaluation

Creating a new model scaling technique is a big step forward, and the authors take their findings a step further by creating a new ConvNet architecture that pushes their state of the art results even higher. The new model architecture is discovered through neural architecture search. The neural architecture search optimizes for accuracy, given a certain number of FLOPS and results in the creation of a baseline ConvNet called EfficientNet-B0. Using the scaling search, EfficientNet-B0 is scaled up to EfficientNet-B1. The scaling function from EfficientNet-B0 to EfficientNet-B1 is saved and applied to subsequent scalings through EfficientNet-B7 because additional search becomes prohibitively expensive.

The new family of EfficientNet networks is evaluated on the ImageNet leaderboard, which is an image classification task. Note: there have been some improvements since the original release of EfficientNet, including tweaks to optimize from the resolution discrepancy and deploying EfficientNet as a teacher and student .

Pretty sweet! The EfficientNet looks like a good backbone to build upon. It efficiently scales with model size and outperforms other ConvNet backbones.

So far, we have covered the following portion of the EfficientDet network:

Introducing EfficientDet

Now, we will move onto the contributions of EfficientDet, which seeks to answer the following question: how exactly should we combine the features of ConvNets for object detection? And how should we scale our model’s architecture once we have developed this combination process?

EfficientDet Feature Fusion

Feature fusion seeks to combine representations of a given image at different resolutions. Typically, the fusion uses the last few feature layers from the ConvNet, but the exact neural architecture may vary.

In the above image, FPN is a baseline way to fuse features with a top down flow. PA net allows the feature fusion to flow backwards and forwards from smaller to larger resolution. NAS-FPN is a feature fusion technique that was discovered through neural architecture search, and it certainly does not look like the first design one might think of. The EfficientDet paper uses “intuition” (and presumably many, many development sets) to edit the structure of NAS-FPN to settle on the BiFPN, a bidirectional feature pyramid network. The EfficientDet model stacks these BiFPN blocks on top of each other. The number of blocks varies in the model scaling procedure. Additionally, the authors hypothesize that certain features and feature channels might vary in the amount that they contribute to the end prediction, so they add a set of weights at the beginning of the channel that are learnable.

EfficientDet Model Scaling

Previous work on model scaling for image detection generally scaled portions of the network independently. For example, ResNet scales only the size of the backbone network. But a joint scaling function had not yet been explored. This approach is very reminiscent of the joint scaling work done to create EfficientNet.

The authors set up a scaling problem to vary the size of the backbone network, the BiFPN network, the class/box network, and the input resolution. The backbone network scales up directly with the pretrained checkpoints of EfficientNet-B0 through EfficientNet-B6. The BiFPN networks width and depth are varied along with the number of BiFPN stacks.

EfficientDet Model Evaluation and Discussion

The EfficientDet Model is evaluated on the COCO (Common Objects in Context) data set, which contains roughly 170 image classes and annotations across 100,000 images. COCO is considered to be the general purpose challenge for object detection. If the model performs well in this general domain, it will likely do very well on more specific tasks. EfficientDet outperforms previous object detection models under a number of constraints. Below, we look at the performance of the model as a function of FLOPS.

Here we can see that the model does quite well relative to other model families under similar constraints. The authors also evaluate the model on semantic segmentation on Pascal VOC. They find that they achieve state of the art there too. Hey, why not?

Simply, Why is EfficientDet Useful?

Taking a step back from the implementation details, it is pretty incredible to think about what the open sourced checkpoints of EfficientDet mean for computer vision engineers. The pretrained checkpoints of EfficientNet crystallize all of the findings and automaticity that the researchers at Google Brain placed into building a ConvNet, along with all of the supervision that image classification on ImageNet can provide . The EfficientNet checkpoints are further leveraged with feature fusion and all components of the architecture are efficiently scaled . Finally, these model weights are pretrained on COCO , a generalized image detection dataset. As a user, there are few decisions left up to question beyond the type of data to provide the model.

Nice Breakdown — How Do I Use EfficientDet?

At Roboflow , we have provided a tutorial on this blog post on how to train EfficientDet and this Colab Notebook on how to train EfficientDet . Through Roboflow, you can feed in your data set with annotations and simply feed a new data download link into our example and get some results. Then, after training, the notebook exports the trained weights for deployment to an application!

Roboflow is free to use, and you can host your images and annotations for up to 1GB of data.

Stay in Touch

If you have some cool results with EfficientDet and want to share, drop us a line !


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Java解惑

Java解惑

(美)布洛克·加夫特 / 陈昊鹏 / 人民邮电出版社 / 2010-11-22 / 49.00元

你认为自己了解Java多少?你是个爱琢磨的代码侦探吗?你是否曾经花费数天时间去追踪一个由Java或其类库的陷阱和缺陷而导致的bug?你喜欢智力测验吗?本书正好适合你! Bloch和Gafter继承了Effective Java一书的传统,深入研究了Java编程语言及其核心类库的细微之处。本书特写了95个噩梦般的谜题,中间穿插着许多有趣的视觉幻象,寓教于乐。任何具备Java知识的人都可以理解这......一起来看看 《Java解惑》 这本书的介绍吧!

在线进制转换器
在线进制转换器

各进制数互转换器

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

html转js在线工具
html转js在线工具

html转js在线工具