Google ‘BigBird’ Achieves SOTA Performance on Long-Context NLP Tasks

栏目: IT技术 · 发布时间: 4年前

内容简介：The phenomenal success of Google’s BERT and other natural language processing (NLP) models based on transformers isn’t accidental. Behind all the SOTA performances lies transformers’ innovative self-attention mechanism, which enables networks to capture co

The phenomenal success of Google’s BERT and other natural language processing (NLP) models based on transformers isn’t accidental. Behind all the SOTA performances lies transformers’ innovative self-attention mechanism, which enables networks to capture contextual information from an entire text sequence. However, the memory and computational requirements of self-attention grow quadratically with sequence length, making it very expensive to use transformer-based models for processing long sequences .

To alleviate the quadratic dependency of transformers, a team of researchers from Google Research recently proposed a new sparse attention mechanism dubbed BigBird. In their paper Big Bird: Transformers for Longer Sequences , the team demonstrates that despite being a sparse attention mechanism, BigBird preserves all known theoretical properties of quadratic full attention models. In experiments, BigBird is shown to dramatically improve performance across long-context NLP tasks, producing SOTA results in question answering and summarization .

Google ‘BigBird’ Achieves SOTA Performance on Long-Context NLP Tasks

The researchers designed BigBird to satisfy all known theoretical properties of full transformers, building three main components into the model:

A set of g global tokens that attend to all parts of a sequence.
For each query qi , a set of r random keys that each query will attend to.
A block of local neighbours w so that each node attends on their local structure

These innovations enable BigBird to handle sequences up to eight times longer than what was previously possible using standard hardware.

Additionally, inspired by the capability of BigBird to handle long contexts, the team introduced a novel application of attention-based models for extracting contextual representations of genomics sequences like DNA. In experiments, BigBird proved to be beneficial in processing the longer input sequences and also delivered improved performance on downstream tasks such as promoter-region and chromatin profile prediction.

The paper Big Bird: Transformers for Longer Sequences is on arXiv .

Reporter: Fangyu Cai | Editor : Michael Sarazen

Synced Report | A Survey of China’s Artificial Intelligence Solutions in Response to the COVID-19 Pandemic — 87 Case Studies from 700+ AI Vendors

This report offers a look at how the Chinese government and business owners have leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle .

Click here to find more reports from us.

We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.

C陷阱与缺陷

凯尼格 / 高巍 / 人民邮电出版社 / 2008-2-1 / 30.00元

作者以自己1985年在Bell实验室时发表的一篇论文为基础，结合自己的工作经验扩展成为这本对C程序员具有珍贵价值的经典著作。写作本书的出发点不是要批判C语言，而是要帮助C程序员绕过编程过程中的陷阱和障碍。.. 全书分为8章，分别从词法分析、语法语义、连接、库函数、预处理器、可移植性缺陷等几个方面分析了C编程中可能遇到的问题。最后，作者用一章的篇幅给出了若干具有实用价值的建议。.. 本书......一起来看看《C陷阱与缺陷》这本书的介绍吧!

码农工具

JSON 在线解析

在线 JSON 格式化工具

Google ‘BigBird’ Achieves SOTA Performance on Long-Context NLP Tasks

C陷阱与缺陷

JSON 在线解析

URL 编码/解码