You don’t need“Big Data” to apply deep learning

栏目: IT技术 · 发布时间: 4年前

内容简介:For years, the biggest bottleneck to production deep learning was simple: we needed models that worked. And over the last decade—thanks to companies with access to unprecedented amounts of data and computer power, as well as new model architectures—we’ve l

You don’t always have to train new models from scratch

Mar 21 ·4min read

You don’t need“Big Data” to apply deep learning

Source: Pexels

Disclaimer: The following is based on my observations of machine learning teams — not an academic survey of the industry. For context, I’m a contributor to Cortex , an open source platform for deploying models in production.

For years, the biggest bottleneck to production deep learning was simple: we needed models that worked. And over the last decade—thanks to companies with access to unprecedented amounts of data and computer power, as well as new model architectures—we’ve largely cleared that hurdle.

We may not have fully autonomous vehicles or Bladerunner-esque AI, but when you call an Uber, you get an accurate ETA prediction. When you open an email in Gmail, you get a contextually appropriate suggestion from Smart Compose. When you turn on Netflix, you get a personalized curation of movies and shows.

As an unintended consequence of this development, people commonly believe that to use deep learning, you need the same resources as Google or Netflix: teams of researchers, endless funding, and tons of data.

This isn’t true. Even in niches like natural language processing and computer vision, there are often ways to make a small amount of data work.

You don’t need to train a better model than Google

Let me be clear, in pretty much all domains, training a state-of-the-art deep learning model from scratch requires a massive amount of data. For example, state-of-the-art language models like OpenAI’s GPT-2 require 10s of GBs of data and weeks of training.

What many miss, however, is that training a model from scratch is often times unnecessary. A few months ago, an ML-driven choose-your-own-adventure game called AI Dungeon went viral:

The game generates state-of-the-art responses, because it is built on a state-of-the-art model. But instead of being built by a team of researchers with GBs of data, the AI Dungeon model is the product of one engineer who trained a model on 30 MB of text for about 12 hours.

Nick Walton, the creator of AI Dungeon, was able to create such a powerful model precisely because he didn’t build it from scratch. Instead, he took OpenAI’s GPT-2 and fine tuned it with his data.

This process,transfer learning, is one of several techniques for leveraging the “knowledge” of an existing neural network to train a new one more efficiently. Transfer learning works because the lower layers in a neural network are responsible for identifying more primitive features—in a computer vision model, they’d recognize things like edges, colors, and contours, for example. This knowledge is typically applicable to many domains, not just the one the model was originally trained for.

In the case of AI Dungeon, this meant that Walton could take GPT-2’s general understanding of English, and fine tune it to the choose-your-own-adventure genre with a comparatively small dataset.

Anecdotally, many Cortex users who deploy deep learning models in their products use this same approach to training their models. Robert Lucian, who recently built a popular DIY license plate identifier, used a similar approach. He deployed a computer vision model (YOLOv3) that had been fine tuned on a small number of license plate images, and it worked:

A whole wave of new ML-native products are being launched on top of deep learning models that were not originally designed or trained by the companies using them. Instead, as in most areas of software, companies are building on top of open source technologies to create things they do not have the resources to build from scratch.

You can build entire products with deep learning and “small data”

If your knee-jerk reaction to deep learning is “We don’t have enough data,” I invite you to reconsider. So many applications of deep learning—recommendation engines, image parsers, conversational agents, sentiment analyzers, and more—can be built on top of open source, pre-trained models with a small amount of data.

As a matter of fact, there’s an entire industry now of transfer-learning-as-a-service platforms that allow you to upload data and fine tune models:

  • Rasa creates contextual AI assistants by fine tuning language models.
  • Owkin allows doctors to fine tune models with their own medical images.
  • TwentyBN is a computer vision-focused platform that allows users to fine tune models to their own domains.

As the field continues to mature, it is only going to get easier to use deep learning with small amounts of data. At the same time, expertise will always be needed—but you can learn about deep learning (or hire someone who is already experienced) much more easily than you can build a massive proprietary dataset.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Using Google App Engine

Using Google App Engine

Charles Severance / O'Reilly Media / 2009-5-23 / USD 29.99

With this book, you can build exciting, scalable web applications quickly and confidently, using Google App Engine - even if you have little or no experience in programming or web development. App Eng......一起来看看 《Using Google App Engine》 这本书的介绍吧!

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具