Applying Linearly Scalable Transformers to Model Longer Protein Sequences

栏目: IT技术 · 发布时间: 5年前

内容简介:In a bid to make transformer models even better for real-world applications, researchers from Google, University of Cambridge, DeepMind and Alan Turing Institute have proposed a new transformer architecture called “Performer” — based on what they call fast

In a bid to make transformer models even better for real-world applications, researchers from Google, University of Cambridge, DeepMind and Alan Turing Institute have proposed a new transformer architecture called “Performer” — based on what they call fast attention via orthogonal random features (FAVOR).

Applying Linearly Scalable Transformers to Model Longer Protein Sequences

Believed to be particularly well suited for language understanding tasks when proposed in 2017, transformer is a novel neural network architecture based on a self-attention mechanism. To date, in addition to achieving SOTA performance in Natural Language Processing and Neural Machine Translation tasks, transformer models have also performed well across other machine learning (ML) tasks such as document generation/summarization, time series prediction, image generation, and analysis of biological sequences.

Neural networks usually process language by generating fixed- or variable-length vector-space representations. A transformer however only performs a small, constant number of steps — in each step, it applies a self-attention mechanism that can directly model relationships between all words in a sentence, regardless of their respective position.

Although the attention mechanism can specify complex dependencies between the elements of each input sequence, the cost of training the attention mechanism to learn these complex dependencies between distant inputs can be prohibitively expensive. The mechanism also limits transformers’ scalability to longer sequences as it generally scales quadratically with the number of tokens in the input sequence.

To alleviate transformers’ quadratic dependency, various studies have proposed solutions that exploit the structure and sparsity of the learned attention matrix. According to the Performer team however, these solutions do not aim to approximate regular attention, but rather propose simpler and more tractable attention mechanisms, “often by incorporating additional constraints or by trading regular attention with sparse attention using more layers.”

Real-world applications such as biological sequence analysis often involve long sequences. Adding constraints to attention mechanisms can lead to failures in capturing long-distance correlations and thus impede such applications.

To address this challenge, the FAVOR-based Performer scales linearly rather than quadratically in the number of tokens in the sequence. This new type of transformer is characterized by sub-quadratic space complexity and does not incorporate any sparsity pattern priors, explain the researchers. “It is also backwards-compatible with pre-trained regular Transformers.”

Recent work has demonstrated that transformers can learn to accurately predict information about protein structure and function and generate new sequences with specific properties. While these models provide initial promise, their applicability beyond the design of single proteins is limited, mainly because they truncate sequences to 512 or 1024 amino acids.

Therefore, the ability to scale to longer sequences without imposing sparsity constraints would enable the use of transformers to jointly model multiple concatenated protein sequences and the interactions between them. And that’s why the linearly scalable mechanism they proposed has lots of potential in modern protein modelling.

The researchers demonstrated that the Performer can model multiple concatenated protein sequences as required and predict interactions among groups of proteins from sequence data. Compared to a baseline transformer, the Performer trains more efficiently and is able to train continuously — increasing its performance as training progresses.

Applying Linearly Scalable Transformers to Model Longer Protein Sequences

The researchers say their FAVOR mechanism also provides strong theoretical guarantees. “Our mechanism is to our knowledge the first unbiased estimation of the original algorithm with linear space and time complexity.”

Designed for long input sequences, the FAVOR mechanism can be effectively approximated without simplifying attention via the various structural priors some of the previous approaches required, enabling higher flexibility.

When combined with small amounts of fine-tuning, the Performer is backwards-compatible with pretrained regular transformers and can also be used beyond the transformer scope as a more scalable replacement for regular attention, which has a wide variety of uses in computer vision, reinforcement learning, and even combinatorial optimization, according to the researchers.

The paper Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers is on arXiv .

Reporter: Yuan Yuan | Editor : Michael Sarazen

Applying Linearly Scalable Transformers to Model Longer Protein Sequences

Synced Report |  A Survey of China’s Artificial Intelligence Solutions in Response to the COVID-19 Pandemic — 87 Case Studies from 700+ AI Vendors

This report offers a look at how the Chinese government and business owners have leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle .

Click here to find more reports from us.

We know you don’t want to miss any story.  Subscribe to our popular  Synced Global AI Weekly to get weekly AI updates.

Applying Linearly Scalable Transformers to Model Longer Protein Sequences

Advertisements


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

你不是个玩意儿

你不是个玩意儿

杰伦·拉尼尔 / 葛仲君 / 中信出版社 / 2011-8 / 35.00元

“你不是个玩意儿。” 这句话当然不是骂人,这是一个宣言。人当然不是玩意儿,不是机器,而是人。 在网络化程度越来越高的今天,我们每个人似乎都有足够的理由,无限欣喜地拥抱互联网。然而,你有没有想过互联网那些不完美的设计却是某种潜在的威胁…… 为什么如此多的暴民在社交网站上争吵不休,很多骂人的脏话我们在现实的人际交往中可能从来不会使用,但在匿名网络环境中却漫天飞舞? 互联网的本质......一起来看看 《你不是个玩意儿》 这本书的介绍吧!

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

MD5 加密
MD5 加密

MD5 加密工具