Applying Linearly Scalable Transformers to Model Longer Protein Sequences

栏目: IT技术 · 发布时间: 5年前

内容简介:In a bid to make transformer models even better for real-world applications, researchers from Google, University of Cambridge, DeepMind and Alan Turing Institute have proposed a new transformer architecture called “Performer” — based on what they call fast

In a bid to make transformer models even better for real-world applications, researchers from Google, University of Cambridge, DeepMind and Alan Turing Institute have proposed a new transformer architecture called “Performer” — based on what they call fast attention via orthogonal random features (FAVOR).

Applying Linearly Scalable Transformers to Model Longer Protein Sequences

Believed to be particularly well suited for language understanding tasks when proposed in 2017, transformer is a novel neural network architecture based on a self-attention mechanism. To date, in addition to achieving SOTA performance in Natural Language Processing and Neural Machine Translation tasks, transformer models have also performed well across other machine learning (ML) tasks such as document generation/summarization, time series prediction, image generation, and analysis of biological sequences.

Neural networks usually process language by generating fixed- or variable-length vector-space representations. A transformer however only performs a small, constant number of steps — in each step, it applies a self-attention mechanism that can directly model relationships between all words in a sentence, regardless of their respective position.

Although the attention mechanism can specify complex dependencies between the elements of each input sequence, the cost of training the attention mechanism to learn these complex dependencies between distant inputs can be prohibitively expensive. The mechanism also limits transformers’ scalability to longer sequences as it generally scales quadratically with the number of tokens in the input sequence.

To alleviate transformers’ quadratic dependency, various studies have proposed solutions that exploit the structure and sparsity of the learned attention matrix. According to the Performer team however, these solutions do not aim to approximate regular attention, but rather propose simpler and more tractable attention mechanisms, “often by incorporating additional constraints or by trading regular attention with sparse attention using more layers.”

Real-world applications such as biological sequence analysis often involve long sequences. Adding constraints to attention mechanisms can lead to failures in capturing long-distance correlations and thus impede such applications.

To address this challenge, the FAVOR-based Performer scales linearly rather than quadratically in the number of tokens in the sequence. This new type of transformer is characterized by sub-quadratic space complexity and does not incorporate any sparsity pattern priors, explain the researchers. “It is also backwards-compatible with pre-trained regular Transformers.”

Recent work has demonstrated that transformers can learn to accurately predict information about protein structure and function and generate new sequences with specific properties. While these models provide initial promise, their applicability beyond the design of single proteins is limited, mainly because they truncate sequences to 512 or 1024 amino acids.

Therefore, the ability to scale to longer sequences without imposing sparsity constraints would enable the use of transformers to jointly model multiple concatenated protein sequences and the interactions between them. And that’s why the linearly scalable mechanism they proposed has lots of potential in modern protein modelling.

The researchers demonstrated that the Performer can model multiple concatenated protein sequences as required and predict interactions among groups of proteins from sequence data. Compared to a baseline transformer, the Performer trains more efficiently and is able to train continuously — increasing its performance as training progresses.

Applying Linearly Scalable Transformers to Model Longer Protein Sequences

The researchers say their FAVOR mechanism also provides strong theoretical guarantees. “Our mechanism is to our knowledge the first unbiased estimation of the original algorithm with linear space and time complexity.”

Designed for long input sequences, the FAVOR mechanism can be effectively approximated without simplifying attention via the various structural priors some of the previous approaches required, enabling higher flexibility.

When combined with small amounts of fine-tuning, the Performer is backwards-compatible with pretrained regular transformers and can also be used beyond the transformer scope as a more scalable replacement for regular attention, which has a wide variety of uses in computer vision, reinforcement learning, and even combinatorial optimization, according to the researchers.

The paper Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers is on arXiv .

Reporter: Yuan Yuan | Editor : Michael Sarazen

Applying Linearly Scalable Transformers to Model Longer Protein Sequences

Synced Report |  A Survey of China’s Artificial Intelligence Solutions in Response to the COVID-19 Pandemic — 87 Case Studies from 700+ AI Vendors

This report offers a look at how the Chinese government and business owners have leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle .

Click here to find more reports from us.

We know you don’t want to miss any story.  Subscribe to our popular  Synced Global AI Weekly to get weekly AI updates.

Applying Linearly Scalable Transformers to Model Longer Protein Sequences

Advertisements


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

程序员代码面试指南:IT名企算法与数据结构题目最优解(第2版)

程序员代码面试指南:IT名企算法与数据结构题目最优解(第2版)

左程云 / 电子工业出版社 / 109.00元

《程序员代码面试指南:IT名企算法与数据结构题目最优解(第2版)》是一本程序员代码面试"神书”!书中对IT名企代码面试各类题目的最优解进行了总结,并提供了相关代码实现。针对当前程序员面试缺乏权威题目汇总这一痛点,本书选取将近300道真实出现过的经典代码面试题,帮助广大程序员的面试准备做到接近万无一失。"刷”完本书后,你就是"题王”!《程序员代码面试指南:IT名企算法与数据结构题目最优解(第2版)》......一起来看看 《程序员代码面试指南:IT名企算法与数据结构题目最优解(第2版)》 这本书的介绍吧!

CSS 压缩/解压工具
CSS 压缩/解压工具

在线压缩/解压 CSS 代码

XML 在线格式化
XML 在线格式化

在线 XML 格式化压缩工具

html转js在线工具
html转js在线工具

html转js在线工具