Applying Linearly Scalable Transformers to Model Longer Protein Sequences

栏目: IT技术 · 发布时间: 4年前

内容简介:In a bid to make transformer models even better for real-world applications, researchers from Google, University of Cambridge, DeepMind and Alan Turing Institute have proposed a new transformer architecture called “Performer” — based on what they call fast

In a bid to make transformer models even better for real-world applications, researchers from Google, University of Cambridge, DeepMind and Alan Turing Institute have proposed a new transformer architecture called “Performer” — based on what they call fast attention via orthogonal random features (FAVOR).

Applying Linearly Scalable Transformers to Model Longer Protein Sequences

Believed to be particularly well suited for language understanding tasks when proposed in 2017, transformer is a novel neural network architecture based on a self-attention mechanism. To date, in addition to achieving SOTA performance in Natural Language Processing and Neural Machine Translation tasks, transformer models have also performed well across other machine learning (ML) tasks such as document generation/summarization, time series prediction, image generation, and analysis of biological sequences.

Neural networks usually process language by generating fixed- or variable-length vector-space representations. A transformer however only performs a small, constant number of steps — in each step, it applies a self-attention mechanism that can directly model relationships between all words in a sentence, regardless of their respective position.

Although the attention mechanism can specify complex dependencies between the elements of each input sequence, the cost of training the attention mechanism to learn these complex dependencies between distant inputs can be prohibitively expensive. The mechanism also limits transformers’ scalability to longer sequences as it generally scales quadratically with the number of tokens in the input sequence.

To alleviate transformers’ quadratic dependency, various studies have proposed solutions that exploit the structure and sparsity of the learned attention matrix. According to the Performer team however, these solutions do not aim to approximate regular attention, but rather propose simpler and more tractable attention mechanisms, “often by incorporating additional constraints or by trading regular attention with sparse attention using more layers.”

Real-world applications such as biological sequence analysis often involve long sequences. Adding constraints to attention mechanisms can lead to failures in capturing long-distance correlations and thus impede such applications.

To address this challenge, the FAVOR-based Performer scales linearly rather than quadratically in the number of tokens in the sequence. This new type of transformer is characterized by sub-quadratic space complexity and does not incorporate any sparsity pattern priors, explain the researchers. “It is also backwards-compatible with pre-trained regular Transformers.”

Recent work has demonstrated that transformers can learn to accurately predict information about protein structure and function and generate new sequences with specific properties. While these models provide initial promise, their applicability beyond the design of single proteins is limited, mainly because they truncate sequences to 512 or 1024 amino acids.

Therefore, the ability to scale to longer sequences without imposing sparsity constraints would enable the use of transformers to jointly model multiple concatenated protein sequences and the interactions between them. And that’s why the linearly scalable mechanism they proposed has lots of potential in modern protein modelling.

The researchers demonstrated that the Performer can model multiple concatenated protein sequences as required and predict interactions among groups of proteins from sequence data. Compared to a baseline transformer, the Performer trains more efficiently and is able to train continuously — increasing its performance as training progresses.

Applying Linearly Scalable Transformers to Model Longer Protein Sequences

The researchers say their FAVOR mechanism also provides strong theoretical guarantees. “Our mechanism is to our knowledge the first unbiased estimation of the original algorithm with linear space and time complexity.”

Designed for long input sequences, the FAVOR mechanism can be effectively approximated without simplifying attention via the various structural priors some of the previous approaches required, enabling higher flexibility.

When combined with small amounts of fine-tuning, the Performer is backwards-compatible with pretrained regular transformers and can also be used beyond the transformer scope as a more scalable replacement for regular attention, which has a wide variety of uses in computer vision, reinforcement learning, and even combinatorial optimization, according to the researchers.

The paper Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers is on arXiv .

Reporter: Yuan Yuan | Editor : Michael Sarazen

Applying Linearly Scalable Transformers to Model Longer Protein Sequences

Synced Report |  A Survey of China’s Artificial Intelligence Solutions in Response to the COVID-19 Pandemic — 87 Case Studies from 700+ AI Vendors

This report offers a look at how the Chinese government and business owners have leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle .

Click here to find more reports from us.

We know you don’t want to miss any story.  Subscribe to our popular  Synced Global AI Weekly to get weekly AI updates.

Applying Linearly Scalable Transformers to Model Longer Protein Sequences

Advertisements


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

The Hard Thing About Hard Things

The Hard Thing About Hard Things

Ben Horowitz / HarperBusiness / 2014-3-4 / USD 29.99

Ben Horowitz, cofounder of Andreessen Horowitz and one of Silicon Valley's most respected and experienced entrepreneurs, offers essential advice on building and running a startup—practical wisdom for ......一起来看看 《The Hard Thing About Hard Things》 这本书的介绍吧!

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

SHA 加密
SHA 加密

SHA 加密工具

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换