GPT-3: The New Mighty Language Model from OpenAI

栏目: IT技术 · 发布时间: 4年前

内容简介：After the success of Bert, the field of NLP is increasingly moving in the direction of creatingWhile this type of

GPT-3: The New Mighty Language Model from OpenAI

Pushing Deep Learning to the Limit with 175B Parameters

Introduction

OpenAI recently released pre-print of its new mighty language model GPT-3. Its a much bigger and better version of its predecessor GPT-2. In fact, with close to 175B trainable parameters, GPT-3 is much bigger in terms of size in comparison to anything else out there. Here is a comparison of number of parameters of recent popular pre trained NLP models, GPT-3 clearly stands out.

What’s New?

After the success of Bert, the field of NLP is increasingly moving in the direction of creating pre-trained language models , trained on huge text corpus (in an unsupervised way), which are later fine-tuned on specific tasks such as translation, question answering etc using much smaller task specific datasets.

While this type of transfer learning obviates the need to use task specific model architectures, but you still need task specific datasets, which are a pain to collect, to achieve good performance.

Humans by contrast learn in a very different way, and have the ability to learn a new task based on very few examples. GPT-3 aims to address this specific pain point, that is, its a task agnostic model, which needs zero to very limited examples to do well and achieve close to state of the art performance on a number of NLP tasks

Terminologies

Before we deep dive, it may be useful to define some commonly used terminologies:

NPL Tasks: These are tasks which have something to do with human languages, example — Language Translation, Text Classification (e.g. Sentiment extraction), Reading Comprehension, Named Entity Recognition (e.g. recognizing person, location, company names in text)
Language Model s: These are models which can predict the most likely next words (and their probabilities) given a set of words (think something like Google query auto-complete). Turns out these type of models are useful for a host of other tasks although they may be trained on mundane next word prediction
Zero / One / Few shot learning: Refers to model’s ability to learn a new task by seeing zero / one / few examples for that task
Transfer Learning: Refers to the notion in Deep Learning where you train a model for one task (example object detection in images) , but the ability to leverage and build upon that for some other different task (example assessing MRI scans). After massive success in Computer Vision, its in vogue in NLP these days.
Transformer Models : Deep learning family of models, used primarily in NLP, which forms the basic building block of most of the state-of-the-art NLP architectures these days.

The Approach

The model is built using the standard concepts of Transformer , Attention etc and using the typical Common Crawl, Wikipedia, Books and some additional data sources. A lot of things — pre training, model, data are similar to GPT-2, but everything (model size, data size, training time) is just a lot bigger. In fact its humongous size is what drives most of the benefits of the model.

The following graph shows the benefit in accuracy for various Zero / One / Few shot tasks as a function of number of Model parameters, clearly major gains are achieved due to the scaled up size.

Source: Paper

Most of the things used in the model are so huge — example 96 Attention layers, Batch Size of 3.2M, 175B Parameters — that they are unlike anything in the past. The model is ~10x larger in terms of number of parameters to the next closest thing (Microsoft Turing NLG with 17B parameters)

There is no need to do gradient / parameter updates (fine tuning) for using the GPT-3 model for various tasks. One can just interact with the model using natural language and/or provide some examples of the tasks that you are trying to do and the model will do it!

Source: Paper

What Does All this Mean?

The concept of not requiring large custom, task specific datasets, in addition to not requiring task specific model architectures is a huge step in direction of making cutting edge NLP more accessible.

While GPT-3 delivers great performance on a lot of NLP tasks example — word prediction, common sense reasoning — but it doesn’t do equally well on everything. For instance it doesn’t do great on things like — Text synthesis, some reading comprehension tasks etc. In addition to this, it also suffers from bias in the data which may lead the model to generate stereotyped or prejudiced content. So there is more work to be done here.

In addition to all this, the huge size of GPT-3, makes it out of bounds for almost everyone except a select few companies and research labs in the world. As per the authors, the model is very versatile and contains a very wide range of skills not needed for specific tasks and there might be a scope of creating smaller, more manageable task specific models using the concept of distillation .

Would be exciting to see how this thing evolves in future.

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

GPT-3: The New Mighty Language Model from OpenAI

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Linux命令行大全

绍茨 (William E.Shotts) / 郭光伟、郝记生 / 人民邮电出版社 / 2013-3-1 / 69.00元

《Linux命令行大全》主要介绍Linux命令行的使用，循序渐进，深入浅出，引导读者全面掌握命令行的使用方法。《Linux命令行大全》分为四部分。第一部分开始了对命令行基本语言的学习之旅，包括命令结构、文件系统的导引、命令行的编辑以及关于命令的帮助系统和使用手册。第二部分主要讲述配置文件的编辑，用于计算机操作的命令行控制。第三部分讲述了从命令行开始执行的常规任务。类UNIX操作系统，比如L......一起来看看《Linux命令行大全》这本书的介绍吧!

码农工具