Everyone can use deep learning now

栏目: IT技术 · 发布时间: 4年前

内容简介:A year ago, a few of us started working onWe were wrong.Over the last year, we’ve seen students, solo engineers, and small teams ship models to production. Just as surprisingly, these users frequently deploy large, state of the art deep learning models for

How transfer learning changed applied machine learning

Jun 18 ·6min read

A year ago, a few of us started working on Cortex , an open source platform for building machine learning APIs. At the outset, we assumed all of our users—and all of the companies actually applying ML in production, for that matter—would be large companies with mature data science teams.

We were wrong.

Over the last year, we’ve seen students, solo engineers, and small teams ship models to production. Just as surprisingly, these users frequently deploy large, state of the art deep learning models for use in real world applications.

A team of two, for example, recently spun up a 500 GPU inference cluster to support their application’s 10,000 concurrent users.

Not long ago, this kind of thing only happened at companies with large budgets and lots of data. Now, any team can do it. This transition is the result of many different factors, but one component, transfer learning, stands out as important.

What is transfer learning?

Let’s start with a high-level explanation of transfer learning (if you’re already familiar, feel free to skip ahead to the next section).

Broadly speaking, transfer learning refers to techniques for “transferring” the knowledge of a deep neural network trained for one task to a different network, trained for a related task.

For example, someone might use transfer learning to take a model trained for object detection, and “fine tune” it to detect something more specific—like hot dogs—using a small amount of data.

These techniques work because of the architecture of deep neural nets. The lower layers of a network are responsible for more basic knowledge, while more task-specific knowledge is typically contained at the top layers:

Everyone can use deep learning now

Source: Stanford

With the lower layers already being trained, the higher layers can be fine tuned with less data. An object detection model like YOLOv4, for example, can be fine tuned to recognize something specific, like license plates, with a very small dataset (the below was fine tuned with < 1,000 images):

The techniques for transferring knowledge between networks vary, but recently, there have been many new projects aimed at making this simpler. gpt-2-simple , for example, is a library that allows anyone to fine tune GPT-2 and generate predictions with a few Python functions:

https://gist.github.com/caleb-kaiser/dd40d16647b1e4cda7545837ea961272

How transfer learning unblocks machine learning

Most teams aren’t blocked from using machine learning because of a lack of knowledge. If you’re building an image classifier, for example, there are many well-known models that accomplish the task, and modern frameworks make it fairly straightforward to train one.

For most teams, machine learning is never considered as a realistic option because of its cost.

Let’s use GPT-2, the (until recently) best-of-its-kind language model from OpenAI, as an example.

GPT-2's training cost alone is estimated to be over $40,000 , assuming you use a public cloud. Beyond that cloud bill, GPT-2 was trained on 40 GB of text (over 20 million pages, conservatively). Scraping and wrangling that much text is a massive project in and of itself.

For most teams, this puts training their own GPT-2 out of reach. But what if you fine tuned it? Let’s look at a project that did that.

AI Dungeon is a choose-your-own-adventure game, styled after old command line dungeon crawlers. The game works about as you’d expect—the player inputs commands, and the game responds by advancing their adventure, except in this case, the adventure is written by a GPT-2 model trained to write choose-your-own-adventure texts:

AI Dungeon was developed by a single engineer, Nick Walton, who fine tuned GPT with gpt-2-simple and text scraped from chooseyourstory.com. According to Walton, fine tuning GPT-2 took 30 MB of text and about 12 hours of training time on a DGX-1 — roughly $374.62 on AWS’s equivalent instance type, the p3dn.24xlarge.

While $40,000 in cloud bills and 40 GB of scraped text might be beyond most teams, $375 and 30 MB is doable even for the smallest projects.

And the applications of transfer learning go beyond language models. In drug discovery, there often isn’t enough data on particular diseases to train a model from scratch. DeepScreening is a free platform that solves this problem, allowing users to upload their own datasets, fine tune a model, and then use it to screen libraries of compounds for potential interactions.

Training a model like this from scratch would be beyond the resources of most individual researchers, but because of transfer learning, it is suddenly accessible to everyone.

The new generation of deep learning models relies on transfer learning

It’s important to note that here that though my examples so far have focused on the economic benefits, transfer learning isn’t just a scrappy tool for small teams. Teams of all sizes use transfer learning to train deep learning models. In fact, new models are being released specifically for transfer learning.

For example, OpenAI recently released GPT-3, the appropriately named successor to GPT-2. The initial demos are impressive:

Everyone can use deep learning now
Source: OpenAI

Remember that when GPT-2 was first released, its raw size generated headlines. A 1.5 billion parameter model was unheard of. GPT-3, however, dwarfs GPT-2, clocking in at 175 billion parameters .

Training a 175 billion parameter language model is beyond the scope of just about every company besides OpenAI. Even deploying a model that large is questionable. So, OpenAI broke their tradition of releasing open source, pretrained versions of new models, and instead released GPT-3 as an API—which, of course, enables users to fine tune GPT-3 with their own data.

In other words, GPT-3 is so large that transfer learning isn’t just an economical way to train it for new tasks, it is the only way

This transfer-learning-first approach is becoming increasingly common. Google just released Big Transfer , an open source repository of state of the art computer vision models. While computer vision models have typically remained smaller than their language model counterparts, they’re starting to catch up — the pretrained ResNet-152x4 was trained on 14 million images and is 4.1 GB.

As the name suggests, Big Transfer was built to encourage the use of transfer learning with these models. As part of the repository, Google has also provided the code to easily fine tune each model.

As the chart below shows, models are only getting bigger over time (GPT-3, were it charted here, would increase the chart’s size 10x):

Everyone can use deep learning now

Source: Microsoft

If this trend continues, and there are no signs that it won’t, transfer learning will be the primary way teams use cutting edge deep learning.

Designing a platform to handle massive models

We’re biased, but when we look at charts like the above, we immediately think “How are we going to deploy this?”

As models have gotten bigger, and as transfer learning has made them accessible to every team, the number of huge deep learning models going into production has shot up. Serving these models is a challenge—they require quite a bit of space and memory just to serve inference, and they typically can’t handle many requests at once.

Already, we’ve introduced major features to Cortex specifically because of these models (GPU/ASIC inference, request-based autoscaling , spot instance support), and we’re constantly working on more as models get bigger.

Still, the difficulty of the infrastructure challenges is minuscule compared to the potential of a world in which every engineer can solve problems using state of the art deep learning.


以上所述就是小编给大家介绍的《Everyone can use deep learning now》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

一路编程

一路编程

史蒂夫·富特 (Steven Foote) / 佟达 / 电子工业出版社 / 2017-1-1 / CNY 65.00

《一路编程》是一本编程入门书籍,然而,如果以书中所讲内容作为入门标准,估计十有八九的在职程序员都不能算已入门。现代软件开发,已经不仅仅是写出正确的代码这么简单,环境、依赖、构建、版本、测试及文档,每一项都对软件是否成功交付起到至关重要的作用,这些都是每一个程序员在开发软件过程中必备的技能。《一路编程》对于上述的每一种技能都做了简洁而精练的介绍,以满足最基本的日常软件开发。换句话说,《一路编程》实际......一起来看看 《一路编程》 这本书的介绍吧!

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

MD5 加密
MD5 加密

MD5 加密工具

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试