内容简介:A year ago, a few of us started working onWe were wrong.Over the last year, we’ve seen students, solo engineers, and small teams ship models to production. Just as surprisingly, these users frequently deploy large, state of the art deep learning models for
How transfer learning changed applied machine learning
Jun 18 ·6min read
A year ago, a few of us started working on Cortex , an open source platform for building machine learning APIs. At the outset, we assumed all of our users—and all of the companies actually applying ML in production, for that matter—would be large companies with mature data science teams.
We were wrong.
Over the last year, we’ve seen students, solo engineers, and small teams ship models to production. Just as surprisingly, these users frequently deploy large, state of the art deep learning models for use in real world applications.
A team of two, for example, recently spun up a 500 GPU inference cluster to support their application’s 10,000 concurrent users.
Not long ago, this kind of thing only happened at companies with large budgets and lots of data. Now, any team can do it. This transition is the result of many different factors, but one component, transfer learning, stands out as important.
What is transfer learning?
Let’s start with a high-level explanation of transfer learning (if you’re already familiar, feel free to skip ahead to the next section).
Broadly speaking, transfer learning refers to techniques for “transferring” the knowledge of a deep neural network trained for one task to a different network, trained for a related task.
For example, someone might use transfer learning to take a model trained for object detection, and “fine tune” it to detect something more specific—like hot dogs—using a small amount of data.
These techniques work because of the architecture of deep neural nets. The lower layers of a network are responsible for more basic knowledge, while more task-specific knowledge is typically contained at the top layers:
With the lower layers already being trained, the higher layers can be fine tuned with less data. An object detection model like YOLOv4, for example, can be fine tuned to recognize something specific, like license plates, with a very small dataset (the below was fine tuned with < 1,000 images):
The techniques for transferring knowledge between networks vary, but recently, there have been many new projects aimed at making this simpler. gpt-2-simple
, for example, is a library that allows anyone to fine tune GPT-2 and generate predictions with a few Python functions:
https://gist.github.com/caleb-kaiser/dd40d16647b1e4cda7545837ea961272
How transfer learning unblocks machine learning
Most teams aren’t blocked from using machine learning because of a lack of knowledge. If you’re building an image classifier, for example, there are many well-known models that accomplish the task, and modern frameworks make it fairly straightforward to train one.
For most teams, machine learning is never considered as a realistic option because of its cost.
Let’s use GPT-2, the (until recently) best-of-its-kind language model from OpenAI, as an example.
GPT-2's training cost alone is estimated to be over $40,000 , assuming you use a public cloud. Beyond that cloud bill, GPT-2 was trained on 40 GB of text (over 20 million pages, conservatively). Scraping and wrangling that much text is a massive project in and of itself.
For most teams, this puts training their own GPT-2 out of reach. But what if you fine tuned it? Let’s look at a project that did that.
AI Dungeon is a choose-your-own-adventure game, styled after old command line dungeon crawlers. The game works about as you’d expect—the player inputs commands, and the game responds by advancing their adventure, except in this case, the adventure is written by a GPT-2 model trained to write choose-your-own-adventure texts:
AI Dungeon was developed by a single engineer, Nick Walton, who fine tuned GPT with gpt-2-simple
and text scraped from chooseyourstory.com. According to Walton, fine tuning GPT-2 took 30 MB of text and about 12 hours of training time on a DGX-1 — roughly $374.62 on AWS’s equivalent instance type, the p3dn.24xlarge.
While $40,000 in cloud bills and 40 GB of scraped text might be beyond most teams, $375 and 30 MB is doable even for the smallest projects.
And the applications of transfer learning go beyond language models. In drug discovery, there often isn’t enough data on particular diseases to train a model from scratch. DeepScreening is a free platform that solves this problem, allowing users to upload their own datasets, fine tune a model, and then use it to screen libraries of compounds for potential interactions.
Training a model like this from scratch would be beyond the resources of most individual researchers, but because of transfer learning, it is suddenly accessible to everyone.
The new generation of deep learning models relies on transfer learning
It’s important to note that here that though my examples so far have focused on the economic benefits, transfer learning isn’t just a scrappy tool for small teams. Teams of all sizes use transfer learning to train deep learning models. In fact, new models are being released specifically for transfer learning.
For example, OpenAI recently released GPT-3, the appropriately named successor to GPT-2. The initial demos are impressive:
Remember that when GPT-2 was first released, its raw size generated headlines. A 1.5 billion parameter model was unheard of. GPT-3, however, dwarfs GPT-2, clocking in at 175 billion parameters .
Training a 175 billion parameter language model is beyond the scope of just about every company besides OpenAI. Even deploying a model that large is questionable. So, OpenAI broke their tradition of releasing open source, pretrained versions of new models, and instead released GPT-3 as an API—which, of course, enables users to fine tune GPT-3 with their own data.
In other words, GPT-3 is so large that transfer learning isn’t just an economical way to train it for new tasks, it is the only way
This transfer-learning-first approach is becoming increasingly common. Google just released Big Transfer , an open source repository of state of the art computer vision models. While computer vision models have typically remained smaller than their language model counterparts, they’re starting to catch up — the pretrained ResNet-152x4 was trained on 14 million images and is 4.1 GB.
As the name suggests, Big Transfer was built to encourage the use of transfer learning with these models. As part of the repository, Google has also provided the code to easily fine tune each model.
As the chart below shows, models are only getting bigger over time (GPT-3, were it charted here, would increase the chart’s size 10x):
If this trend continues, and there are no signs that it won’t, transfer learning will be the primary way teams use cutting edge deep learning.
Designing a platform to handle massive models
We’re biased, but when we look at charts like the above, we immediately think “How are we going to deploy this?”
As models have gotten bigger, and as transfer learning has made them accessible to every team, the number of huge deep learning models going into production has shot up. Serving these models is a challenge—they require quite a bit of space and memory just to serve inference, and they typically can’t handle many requests at once.
Already, we’ve introduced major features to Cortex specifically because of these models (GPU/ASIC inference, request-based autoscaling , spot instance support), and we’re constantly working on more as models get bigger.
Still, the difficulty of the infrastructure challenges is minuscule compared to the potential of a world in which every engineer can solve problems using state of the art deep learning.
以上所述就是小编给大家介绍的《Everyone can use deep learning now》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。