GPT-3: The New Mighty Language Model from OpenAI

栏目: IT技术 · 发布时间: 4年前

内容简介:After the success of Bert, the field of NLP is increasingly moving in the direction of creatingWhile this type of

GPT-3: The New Mighty Language Model from OpenAI

Pushing Deep Learning to the Limit with 175B Parameters

Introduction

OpenAI recently released pre-print of its new mighty language model GPT-3. Its a much bigger and better version of its predecessor GPT-2. In fact, with close to 175B trainable parameters, GPT-3 is much bigger in terms of size in comparison to anything else out there. Here is a comparison of number of parameters of recent popular pre trained NLP models, GPT-3 clearly stands out.

What’s New?

After the success of Bert, the field of NLP is increasingly moving in the direction of creating pre-trained language models , trained on huge text corpus (in an unsupervised way), which are later fine-tuned on specific tasks such as translation, question answering etc using much smaller task specific datasets.

While this type of transfer learning obviates the need to use task specific model architectures, but you still need task specific datasets, which are a pain to collect, to achieve good performance.

Humans by contrast learn in a very different way, and have the ability to learn a new task based on very few examples. GPT-3 aims to address this specific pain point, that is, its a task agnostic model, which needs zero to very limited examples to do well and achieve close to state of the art performance on a number of NLP tasks

Terminologies

Before we deep dive, it may be useful to define some commonly used terminologies:

  • NPL Tasks: These are tasks which have something to do with human languages, example — Language Translation, Text Classification (e.g. Sentiment extraction), Reading Comprehension, Named Entity Recognition (e.g. recognizing person, location, company names in text)
  • Language Model s: These are models which can predict the most likely next words (and their probabilities) given a set of words (think something like Google query auto-complete). Turns out these type of models are useful for a host of other tasks although they may be trained on mundane next word prediction
  • Zero / One / Few shot learning: Refers to model’s ability to learn a new task by seeing zero / one / few examples for that task
  • Transfer Learning: Refers to the notion in Deep Learning where you train a model for one task (example object detection in images) , but the ability to leverage and build upon that for some other different task (example assessing MRI scans). After massive success in Computer Vision, its in vogue in NLP these days.
  • Transformer Models : Deep learning family of models, used primarily in NLP, which forms the basic building block of most of the state-of-the-art NLP architectures these days.

The Approach

The model is built using the standard concepts of Transformer , Attention etc and using the typical Common Crawl, Wikipedia, Books and some additional data sources. A lot of things — pre training, model, data are similar to GPT-2, but everything (model size, data size, training time) is just a lot bigger. In fact its humongous size is what drives most of the benefits of the model.

The following graph shows the benefit in accuracy for various Zero / One / Few shot tasks as a function of number of Model parameters, clearly major gains are achieved due to the scaled up size.

Source: Paper

Most of the things used in the model are so huge — example 96 Attention layers, Batch Size of 3.2M, 175B Parameters — that they are unlike anything in the past. The model is ~10x larger in terms of number of parameters to the next closest thing (Microsoft Turing NLG with 17B parameters)

There is no need to do gradient / parameter updates (fine tuning) for using the GPT-3 model for various tasks. One can just interact with the model using natural language and/or provide some examples of the tasks that you are trying to do and the model will do it!

Source: Paper

What Does All this Mean?

The concept of not requiring large custom, task specific datasets, in addition to not requiring task specific model architectures is a huge step in direction of making cutting edge NLP more accessible.

While GPT-3 delivers great performance on a lot of NLP tasks example — word prediction, common sense reasoning — but it doesn’t do equally well on everything. For instance it doesn’t do great on things like — Text synthesis, some reading comprehension tasks etc. In addition to this, it also suffers from bias in the data which may lead the model to generate stereotyped or prejudiced content. So there is more work to be done here.

In addition to all this, the huge size of GPT-3, makes it out of bounds for almost everyone except a select few companies and research labs in the world. As per the authors, the model is very versatile and contains a very wide range of skills not needed for specific tasks and there might be a scope of creating smaller, more manageable task specific models using the concept of distillation .

Would be exciting to see how this thing evolves in future.


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

深度学习轻松学

深度学习轻松学

冯超 / 电子工业出版社 / 2017-7 / 79.00

《深度学习轻松学:核心算法与视觉实践》介绍了深度学习基本算法和视觉领域的应用实例。书中以轻松直白的语言,生动详细地介绍了深层模型相关的基础知识,并深入剖析了算法的原理与本质。同时,书中还配有大量案例与源码,帮助读者切实体会深度学习的核心思想和精妙之处。除此之外,书中还介绍了深度学习在视觉领域的应用,从原理层面揭示其思路思想,帮助读者在此领域中夯实技术基础。 《深度学习轻松学:核心算法与视觉实......一起来看看 《深度学习轻松学》 这本书的介绍吧!

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

MD5 加密
MD5 加密

MD5 加密工具

html转js在线工具
html转js在线工具

html转js在线工具