Bootstrapping cutting-edge NLP models

栏目: IT技术 · 发布时间: 5年前

内容简介：XLNet is a modern NLP language model that is based on Transformers (BERT, RoBERTa, TinyBERT, etc.) Results of XLNet on various Natural Language Understanding tasks are approaching that of human performance. XLNet can generate text at a level of a high-scho

How to get up and running with XLNet and Pytorch in 5 mins

Daulet Nurmanbetov

Feb 20 ·4min read

Bootstrapping cutting-edge NLP models — Photo by Pietro Jeng on Unsplash

What is XLNet

XLNet is a modern NLP language model that is based on Transformers (BERT, RoBERTa, TinyBERT, etc.) Results of XLNet on various Natural Language Understanding tasks are approaching that of human performance. XLNet can generate text at a level of a high-schooler, it can answer simple questions. It can comprehend that a dog isn’t the same as a cat, but both of them are pets to humans.

Overall, XLNet is a model that builds on the advances of BERT.

XLNet solves NLP problems in 3 broad categories: classification, sequence labeling, and text generation —

Classification:

Classification tasks are the most common type of tasks in NLP.

Categorization (aka classification) tasks assign a category to a piece of text. More broadly, they answer a question of given a section of a text, tell me which category the text belongs to .

Tasks in the classification domain commonly answer questions like the ones below,

What medical billing code should we use for this visit? (description of visit provided)  Is this text spam? (text is provided)  Is this interesting to this user? (content and user profile provided)

Sequence labeling:

Another type of problem in NLP is the Sequence labeling. In Sequence labeling, we try to find something enclosed in the text provided. Commonly this type of task would include finding persons in the text provided(NER) or finding all co-references of an entity, i.e. if in the sentence “Mary jumped over a toad. It didn’t move.” The algorithm would find out ‘it’ refers to Mary, not the toad. Another example of Sequence labeling is to detect which ticker is associated with each mention of a company —

NVDA is scheduled to report second-quarter fiscal 2020 results on Aug 15.

In the trailing four quarters, the company’s (NVDA) earnings surpassed the Zacks Consensus Estimate thrice and missed the same (Zacks) once, the average positive surprise being 3.94%.

Text generation:

Third and last way XLNet can be used is for text generation. Here, given a short snippet of context, XLNet would predict the next word. And it would continue predicting the next word until instructed to stop. In the example below, Given the input of The quick brown XLNet would first predict fox , then look at the context as the whole and predict the next word jumped and so on.

The quick brown <fox> <jumped> <over> …

以上所述就是小编给大家介绍的《Bootstrapping cutting-edge NLP models》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

Bootstrapping cutting-edge NLP models

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

CSS 3实战

成林 / 机械工业出版社 / 2011-5 / 69.00元

全书一共分为9章，首先从宏观上介绍了CSS 3技术的最新发展现状、新特性，以及现有的主流浏览器对这些新特性的支持情况；然后详细讲解了CSS 3的选择器、文本特性、颜色特性、弹性布局、边框和背景特性、盒模型、UI设计、多列布局、圆角和阴影、渐变、变形、转换、动画、投影、开放字体、设备类型、语音样式等重要的理论知识，这部分内容是本书的基础和核心。不仅每个知识点都配有丰富的、精心设计的实战案例，而且详细......一起来看看《CSS 3实战》这本书的介绍吧!

码农工具