Google ‘BigBird’ Achieves SOTA Performance on Long-Context NLP Tasks

栏目: IT技术 · 发布时间: 5年前

内容简介:The phenomenal success of Google’s BERT and other natural language processing (NLP) models based on transformers isn’t accidental. Behind all the SOTA performances lies transformers’ innovative self-attention mechanism, which enables networks to capture co

The phenomenal success of Google’s BERT and other natural language processing (NLP) models based on transformers isn’t accidental. Behind all the SOTA performances lies transformers’ innovative self-attention mechanism, which enables networks to capture contextual information from an entire text sequence. However, the memory and computational requirements of self-attention grow quadratically with sequence length, making it very expensive to use transformer-based models for processing long sequences .

To alleviate the quadratic dependency of transformers, a team of researchers from Google Research recently proposed a new sparse attention mechanism dubbed BigBird. In their paper Big Bird: Transformers for Longer Sequences , the team demonstrates that despite being a sparse attention mechanism, BigBird preserves all known theoretical properties of quadratic full attention models. In experiments, BigBird is shown to dramatically improve performance across long-context NLP tasks, producing SOTA results in question answering and summarization .

Google ‘BigBird’ Achieves SOTA Performance on Long-Context NLP Tasks

The researchers designed BigBird to satisfy all known theoretical properties of full transformers, building three main components into the model:

  • A set of g global tokens that attend to all parts of a sequence.
  • For each query qi , a set of r random keys that each query will attend to.
  • A block of local neighbours w so that each node attends on their local structure

These innovations enable BigBird to handle sequences up to eight times longer than what was previously possible using standard hardware.

Google ‘BigBird’ Achieves SOTA Performance on Long-Context NLP Tasks

Additionally, inspired by the capability of BigBird to handle long contexts, the team introduced a novel application of attention-based models for extracting contextual representations of genomics sequences like DNA. In experiments, BigBird proved to be beneficial in processing the longer input sequences and also delivered improved performance on downstream tasks such as promoter-region and chromatin profile prediction.

Google ‘BigBird’ Achieves SOTA Performance on Long-Context NLP Tasks
Google ‘BigBird’ Achieves SOTA Performance on Long-Context NLP Tasks
Google ‘BigBird’ Achieves SOTA Performance on Long-Context NLP Tasks

The paper Big Bird: Transformers for Longer Sequences is on arXiv .

Reporter: Fangyu Cai | Editor : Michael Sarazen

Google ‘BigBird’ Achieves SOTA Performance on Long-Context NLP Tasks

Synced Report |  A Survey of China’s Artificial Intelligence Solutions in Response to the COVID-19 Pandemic — 87 Case Studies from 700+ AI Vendors

This report offers a look at how the Chinese government and business owners have leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle .

Click here to find more reports from us.

We know you don’t want to miss any story. Subscribe to our popular  Synced Global AI Weekly  to get weekly AI updates.

Google ‘BigBird’ Achieves SOTA Performance on Long-Context NLP Tasks

Advertisements


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

深入理解C#(第3版)

深入理解C#(第3版)

斯基特 (Jon Skeet) / 姚琪琳 / 人民邮电出版社 / 2014-4-1 / 99.00元

本书是世界顶级技术专家“十年磨一剑”的经典之作,在C#和.NET领域享有盛誉。与其他泛泛介绍C#的书籍不同,本书深度探究C#的特性,并结合技术发展,引领读者深入C#的时空。作者从语言设计的动机出发,介绍支持这些特性的核心概念。作者将新的语言特性放在C#语言发展的背景之上,用极富实际意义的示例,向读者展示编写代码和设计解决方案的最佳方式。同时作者将多年的C#开发经验与读者分享,读者可咀其精华、免走弯......一起来看看 《深入理解C#(第3版)》 这本书的介绍吧!

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具