You don’t need“Big Data” to apply deep learning

栏目: IT技术 · 发布时间: 4年前

内容简介：For years, the biggest bottleneck to production deep learning was simple: we needed models that worked. And over the last decade—thanks to companies with access to unprecedented amounts of data and computer power, as well as new model architectures—we’ve l

You don’t always have to train new models from scratch

Caleb Kaiser

Mar 21 ·4min read

You don’t need“Big Data” to apply deep learning — Source: Pexels

Disclaimer: The following is based on my observations of machine learning teams — not an academic survey of the industry. For context, I’m a contributor to Cortex , an open source platform for deploying models in production.

For years, the biggest bottleneck to production deep learning was simple: we needed models that worked. And over the last decade—thanks to companies with access to unprecedented amounts of data and computer power, as well as new model architectures—we’ve largely cleared that hurdle.

We may not have fully autonomous vehicles or Bladerunner-esque AI, but when you call an Uber, you get an accurate ETA prediction. When you open an email in Gmail, you get a contextually appropriate suggestion from Smart Compose. When you turn on Netflix, you get a personalized curation of movies and shows.

As an unintended consequence of this development, people commonly believe that to use deep learning, you need the same resources as Google or Netflix: teams of researchers, endless funding, and tons of data.

This isn’t true. Even in niches like natural language processing and computer vision, there are often ways to make a small amount of data work.

You don’t need to train a better model than Google

Let me be clear, in pretty much all domains, training a state-of-the-art deep learning model from scratch requires a massive amount of data. For example, state-of-the-art language models like OpenAI’s GPT-2 require 10s of GBs of data and weeks of training.

What many miss, however, is that training a model from scratch is often times unnecessary. A few months ago, an ML-driven choose-your-own-adventure game called AI Dungeon went viral:

The game generates state-of-the-art responses, because it is built on a state-of-the-art model. But instead of being built by a team of researchers with GBs of data, the AI Dungeon model is the product of one engineer who trained a model on 30 MB of text for about 12 hours.

Nick Walton, the creator of AI Dungeon, was able to create such a powerful model precisely because he didn’t build it from scratch. Instead, he took OpenAI’s GPT-2 and fine tuned it with his data.

This process,transfer learning, is one of several techniques for leveraging the “knowledge” of an existing neural network to train a new one more efficiently. Transfer learning works because the lower layers in a neural network are responsible for identifying more primitive features—in a computer vision model, they’d recognize things like edges, colors, and contours, for example. This knowledge is typically applicable to many domains, not just the one the model was originally trained for.

In the case of AI Dungeon, this meant that Walton could take GPT-2’s general understanding of English, and fine tune it to the choose-your-own-adventure genre with a comparatively small dataset.

Anecdotally, many Cortex users who deploy deep learning models in their products use this same approach to training their models. Robert Lucian, who recently built a popular DIY license plate identifier, used a similar approach. He deployed a computer vision model (YOLOv3) that had been fine tuned on a small number of license plate images, and it worked:

A whole wave of new ML-native products are being launched on top of deep learning models that were not originally designed or trained by the companies using them. Instead, as in most areas of software, companies are building on top of open source technologies to create things they do not have the resources to build from scratch.

You can build entire products with deep learning and “small data”

If your knee-jerk reaction to deep learning is “We don’t have enough data,” I invite you to reconsider. So many applications of deep learning—recommendation engines, image parsers, conversational agents, sentiment analyzers, and more—can be built on top of open source, pre-trained models with a small amount of data.

As a matter of fact, there’s an entire industry now of transfer-learning-as-a-service platforms that allow you to upload data and fine tune models:

Rasa creates contextual AI assistants by fine tuning language models.
Owkin allows doctors to fine tune models with their own medical images.
TwentyBN is a computer vision-focused platform that allows users to fine tune models to their own domains.

As the field continues to mature, it is only going to get easier to use deep learning with small amounts of data. At the same time, expertise will always be needed—but you can learn about deep learning (or hire someone who is already experienced) much more easily than you can build a massive proprietary dataset.

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

You don’t need“Big Data” to apply deep learning

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

多处理器编程的艺术

（美）Maurice Herlihy、（美）Nir Shavit / 机械工业出版社 / 2013-2 / 79.00元

工业界称为多核的多处理器机器正迅速地渗入计算的各个领域。多处理器编程要求理解新型计算原理、算法及编程工具，至今很少有人能够精通这门编程艺术。现今，大多数工程技术人员都是通过艰辛的反复实践、求助有经验的朋友来学习多处理器编程技巧。这本最新的权威著作致力于改变这种状况，作者全面阐述了多处理器编程的指导原则，介绍了编制高效的多处理器程序所必备的算法技术。了解本书所涵盖的多处理器编程关键问题将使在......一起来看看《多处理器编程的艺术》这本书的介绍吧!

码农工具