Bootstrapping cutting-edge NLP models

栏目: IT技术 · 发布时间: 6年前

内容简介:XLNet is a modern NLP language model that is based on Transformers (BERT, RoBERTa, TinyBERT, etc.) Results of XLNet on various Natural Language Understanding tasks are approaching that of human performance. XLNet can generate text at a level of a high-scho

How to get up and running with XLNet and Pytorch in 5 mins

Bootstrapping cutting-edge NLP models

Photo by Pietro Jeng on Unsplash

What is XLNet

XLNet is a modern NLP language model that is based on Transformers (BERT, RoBERTa, TinyBERT, etc.) Results of XLNet on various Natural Language Understanding tasks are approaching that of human performance. XLNet can generate text at a level of a high-schooler, it can answer simple questions. It can comprehend that a dog isn’t the same as a cat, but both of them are pets to humans.

Overall, XLNet is a model that builds on the advances of BERT.

XLNet solves NLP problems in 3 broad categories: classification, sequence labeling, and text generation —

Classification:

Classification tasks are the most common type of tasks in NLP.

Categorization (aka classification) tasks assign a category to a piece of text. More broadly, they answer a question of given a section of a text, tell me which category the text belongs to .

Tasks in the classification domain commonly answer questions like the ones below,

What medical billing code should we use for this visit? (description of visit provided)  Is this text spam? (text is provided)  Is this interesting to this user? (content and user profile provided)

Sequence labeling:

Another type of problem in NLP is the Sequence labeling. In Sequence labeling, we try to find something enclosed in the text provided. Commonly this type of task would include finding persons in the text provided(NER) or finding all co-references of an entity, i.e. if in the sentence “Mary jumped over a toad. It didn’t move.” The algorithm would find out ‘it’ refers to Mary, not the toad. Another example of Sequence labeling is to detect which ticker is associated with each mention of a company —

NVDA is scheduled to report second-quarter fiscal 2020 results on Aug 15.

In the trailing four quarters, the company’s (NVDA) earnings surpassed the Zacks Consensus Estimate thrice and missed the same (Zacks) once, the average positive surprise being 3.94%.

Text generation:

Third and last way XLNet can be used is for text generation. Here, given a short snippet of context, XLNet would predict the next word. And it would continue predicting the next word until instructed to stop. In the example below, Given the input of The quick brown XLNet would first predict fox , then look at the context as the whole and predict the next word jumped and so on.

The quick brown <fox> <jumped> <over> …

以上所述就是小编给大家介绍的《Bootstrapping cutting-edge NLP models》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

如何不在网上虚度人生

如何不在网上虚度人生

[美] 肯尼思·戈德史密斯 / 刘畅 / 北京联合出版公司 / 2017-9 / 39.80元

我们平时上网多大程度上是浪费时间,多大程度是在学习、关心社会、激发创造力?我们真能彻底断网,逃离社交网络吗? 手机把都市人变成一群电子僵尸,是福是祸? 浏览记录就是我们将来的回忆录吗?文件归档属于一种现代民间艺术? 不自拍、P图、发朋友圈,我还是我吗? 美国知名概念艺术家戈德史密斯认为:上网绝不是浪费时间,而是一种创造性的活动。在本书中他以跨学科角度、散文式语言进行论证,涉及大众传播学、计算......一起来看看 《如何不在网上虚度人生》 这本书的介绍吧!

SHA 加密
SHA 加密

SHA 加密工具

html转js在线工具
html转js在线工具

html转js在线工具

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具