内容简介:The phenomenal success of Google’s BERT and other natural language processing (NLP) models based on transformers isn’t accidental. Behind all the SOTA performances lies transformers’ innovative self-attention mechanism, which enables networks to capture co
The phenomenal success of Google’s BERT and other natural language processing (NLP) models based on transformers isn’t accidental. Behind all the SOTA performances lies transformers’ innovative self-attention mechanism, which enables networks to capture contextual information from an entire text sequence. However, the memory and computational requirements of self-attention grow quadratically with sequence length, making it very expensive to use transformer-based models for processing long sequences .
To alleviate the quadratic dependency of transformers, a team of researchers from Google Research recently proposed a new sparse attention mechanism dubbed BigBird. In their paper Big Bird: Transformers for Longer Sequences , the team demonstrates that despite being a sparse attention mechanism, BigBird preserves all known theoretical properties of quadratic full attention models. In experiments, BigBird is shown to dramatically improve performance across long-context NLP tasks, producing SOTA results in question answering and summarization .
The researchers designed BigBird to satisfy all known theoretical properties of full transformers, building three main components into the model:
- A set of g global tokens that attend to all parts of a sequence.
- For each query qi , a set of r random keys that each query will attend to.
- A block of local neighbours w so that each node attends on their local structure
These innovations enable BigBird to handle sequences up to eight times longer than what was previously possible using standard hardware.
Additionally, inspired by the capability of BigBird to handle long contexts, the team introduced a novel application of attention-based models for extracting contextual representations of genomics sequences like DNA. In experiments, BigBird proved to be beneficial in processing the longer input sequences and also delivered improved performance on downstream tasks such as promoter-region and chromatin profile prediction.
The paper Big Bird: Transformers for Longer Sequences is on arXiv .
Reporter: Fangyu Cai | Editor : Michael Sarazen
Synced Report | A Survey of China’s Artificial Intelligence Solutions in Response to the COVID-19 Pandemic — 87 Case Studies from 700+ AI Vendors
This report offers a look at how the Chinese government and business owners have leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle .
Click here to find more reports from us.
We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.
Advertisements
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
加密与解密(第4版)
段钢 / 电子工业出版社 / 2018-10-1 / 198
《加密与解密(第4版)》以加密与解密为切入点,讲述了软件安全领域的基础知识和技能,如调试技能、逆向分析、加密保护、外壳开发、虚拟机设计等。这些知识彼此联系,读者在掌握这些内容之后,很容易就能在漏洞分析、安全编程、病毒分析、软件保护等领域进行扩展。从就业的角度来说,掌握加密与解密的相关技术,可以提高自身的竞争能力;从个人成长的角度来说,研究软件安全技术有助于掌握一些系统底层知识,是提升职业技能的重要......一起来看看 《加密与解密(第4版)》 这本书的介绍吧!