OpenAI debuts gigantic GPT-3 language model with 175 billion parameters

栏目: IT技术 · 发布时间: 4年前

内容简介：A team of more than 30 OpenAI researchers have released aMany advanced

A team of more than 30 OpenAI researchers have released a paper about GPT-3 , a language model capable of achieving state-of-the-art results on a range of benchmark and unique natural language processing tasks ranging from language translation to generating news articles to answering SAT questions. GPT-3 is a whopping 175 billion parameter model. By comparison, the largest version of GPT-2 was 1.5 billion parameters , and the largest Transformer-based language model in the world — introduced by Microsoft earlier this month — is 17 billion parameters .

OpenAI released GPT-2 last year and controversially chose to take a staggered release approach due to fear that the model could be used for malicious purposes. OpenAI wascriticized by some for the staggered approach, while others applauded OpenAI for demonstrating a way to carefully release an AI model with the potential for misuse. GPT-3 made its debut with a preprint arXiv paper Thursday but no release details are provided. VentureBeat has reached out to OpenAI for more details on if or how it plans to release a full version GPT-3 or one of seven smaller versions ranging from 125 million to 13 billion parameters in size.

Many advanced Transformer-based models have evolved to achieve human-level performance on a number of natural language tasks. Authors say the Transformer architecture-based approach behind many language model advances in recent years is limited by a need for task-specific data sets and fine-tuning. Instead, GPT-3 is an autoregressive model trained with unsupervised machine learning and focuses on few-shot learning, which supplies a demonstration of a task at inference run time.

“Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches,” the paper reads. “For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model.”

VB Transform 2020 Online - July 15-17. Join leading AI executives: Register for the free livestream .

“Broadly, on NLP tasks GPT-3 achieves promising results in the zero-shot and one-shot settings, and in the few-shot setting [it] is sometimes competitive with or even occasionally surpasses state-of-the-art (despite state-of-the-art being held by fine-tuned models),” the authors note.

The paper released Thursday examines forms of GPT-3 in varying sizes to assess few-shot learning results as well as one-shot learning, the kind thought to most mimic how humans learn, and zero-shot learning, where only a description of a task is provided at runtime.

Though GPT-3 works well to generate news articles and tasks like using novel words in sentences or performing arithmetic, it can fall short when it comes to common-sense reasoning. On the SuperGLUE benchmark introduced last year specifically to test reasoning and other tasks for advanced NLP models, GPT-3 achieves nearly state-of-the-art results in COPA and ReCoRD reading comprehension data sets but falls short with word in context analysis (WiC) and RACE, a set of middle school and high school exam questions.

“GPT-3 appears to be weak in the few-shot or one-shot setting at some tasks that involve comparing two sentences or snippets, for example, whether a word is used the same way in two sentences (WiC), whether one sentence is a paraphrase of another, or whether one sentence implies another,” the paper reads. “By presenting a broad characterization of GPT-3’s strengths and weaknesses, including these limitations, we hope to stimulate study of few-shot learning in language models and draw attention to where progress is most needed.”

Unlike many other pretrained language models, a preliminary assessment of algorithmic bias found in GPT-3 is also included in the paper. Sentiment analysis of GPT-3 racial bias performance was assessed using the Senti WordNet model and found that “Asian” had a consistently positive score, ranking first in racial groups in positive scores in three of the seven versions of GPT-3. “Black” consistently had low sentiment analysis scores across five of the seven versions of GPT-3.

In an assessment of associations between gender and occupation, GPT-3 demonstrated that it’s most likely to suggest a male identifier based on analysis of almost 400 occupations. A recent analysis of pretrained language models found race, gender, occupation, and religious bias prevalent among pretrained language models, but researchers found that OpenAI’s GPT-2 demonstrated more idealistic results than others.

The GPT-3 paper also includes documentation on data contamination; energy usage during training; the broader impact of the advanced language model; and potential misuses, such as “misinformation, spam, phishing, abuse of legal and governmental processes, fraudulent academic essay writing and social engineering pretexting.”

GPT-3 is trained on the CommonCrawl data set of nearly a trillion words collected between 2016 and 2019, as well as data sets related to web text, books, and Wikipedia.

OpenAI debuts gigantic GPT-3 language model with 175 billion parameters

Get a lifetime of SSD-powered web hosting for just $40

These slide templates will make your next presentation pop for less than $30

This app suite gives you access to 50+ tools designed for entrepreneurs

9 skills to teach yourself while you're on lockdown at home

Replace or blur your background on any video for $19.99

This Microsoft Excel training bundle is just $34 right now

8 great deals to keep you safe and healthy

Get a lifetime of amazing stock images and videos for $100

Save even more on 10 great deals to make your life better this Memorial Day

Celebrate Memorial Day with 10 software deals you can't afford to miss

View all deals

store

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

OpenAI debuts gigantic GPT-3 language model with 175 billion parameters

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

计算机系统概论

派特（Patt.Y.N.） / 梁阿磊、蒋兴昌、林凌 / 机械工业 / 2008-1-1 / 49.00元

《计算机系统概论(原书第2版)》是计算机科学的经典基础教材。全书以自底向上方法帮助学生理解计算机系统的原理，前半部分阐述了计算机底层结构，后半部分讲解了高级语言编程及编程方法学，主要内容包括数据类型及其运算、数字逻辑、冯·诺伊曼模型、汇编语言、输入和输出、TRAP程序和子程序、C语言编程等内容。《计算机系统概论(原书第2版)》可用作高等院校计算机及相关专业学生的入门教材，也可作为的计算机专......一起来看看《计算机系统概论》这本书的介绍吧!

码农工具