Applying Linearly Scalable Transformers to Model Longer Protein Sequences

栏目: IT技术 · 发布时间: 4年前

内容简介:In a bid to make transformer models even better for real-world applications, researchers from Google, University of Cambridge, DeepMind and Alan Turing Institute have proposed a new transformer architecture called “Performer” — based on what they call fast

In a bid to make transformer models even better for real-world applications, researchers from Google, University of Cambridge, DeepMind and Alan Turing Institute have proposed a new transformer architecture called “Performer” — based on what they call fast attention via orthogonal random features (FAVOR).

Applying Linearly Scalable Transformers to Model Longer Protein Sequences

Believed to be particularly well suited for language understanding tasks when proposed in 2017, transformer is a novel neural network architecture based on a self-attention mechanism. To date, in addition to achieving SOTA performance in Natural Language Processing and Neural Machine Translation tasks, transformer models have also performed well across other machine learning (ML) tasks such as document generation/summarization, time series prediction, image generation, and analysis of biological sequences.

Neural networks usually process language by generating fixed- or variable-length vector-space representations. A transformer however only performs a small, constant number of steps — in each step, it applies a self-attention mechanism that can directly model relationships between all words in a sentence, regardless of their respective position.

Although the attention mechanism can specify complex dependencies between the elements of each input sequence, the cost of training the attention mechanism to learn these complex dependencies between distant inputs can be prohibitively expensive. The mechanism also limits transformers’ scalability to longer sequences as it generally scales quadratically with the number of tokens in the input sequence.

To alleviate transformers’ quadratic dependency, various studies have proposed solutions that exploit the structure and sparsity of the learned attention matrix. According to the Performer team however, these solutions do not aim to approximate regular attention, but rather propose simpler and more tractable attention mechanisms, “often by incorporating additional constraints or by trading regular attention with sparse attention using more layers.”

Real-world applications such as biological sequence analysis often involve long sequences. Adding constraints to attention mechanisms can lead to failures in capturing long-distance correlations and thus impede such applications.

To address this challenge, the FAVOR-based Performer scales linearly rather than quadratically in the number of tokens in the sequence. This new type of transformer is characterized by sub-quadratic space complexity and does not incorporate any sparsity pattern priors, explain the researchers. “It is also backwards-compatible with pre-trained regular Transformers.”

Recent work has demonstrated that transformers can learn to accurately predict information about protein structure and function and generate new sequences with specific properties. While these models provide initial promise, their applicability beyond the design of single proteins is limited, mainly because they truncate sequences to 512 or 1024 amino acids.

Therefore, the ability to scale to longer sequences without imposing sparsity constraints would enable the use of transformers to jointly model multiple concatenated protein sequences and the interactions between them. And that’s why the linearly scalable mechanism they proposed has lots of potential in modern protein modelling.

The researchers demonstrated that the Performer can model multiple concatenated protein sequences as required and predict interactions among groups of proteins from sequence data. Compared to a baseline transformer, the Performer trains more efficiently and is able to train continuously — increasing its performance as training progresses.

Applying Linearly Scalable Transformers to Model Longer Protein Sequences

The researchers say their FAVOR mechanism also provides strong theoretical guarantees. “Our mechanism is to our knowledge the first unbiased estimation of the original algorithm with linear space and time complexity.”

Designed for long input sequences, the FAVOR mechanism can be effectively approximated without simplifying attention via the various structural priors some of the previous approaches required, enabling higher flexibility.

When combined with small amounts of fine-tuning, the Performer is backwards-compatible with pretrained regular transformers and can also be used beyond the transformer scope as a more scalable replacement for regular attention, which has a wide variety of uses in computer vision, reinforcement learning, and even combinatorial optimization, according to the researchers.

The paper Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers is on arXiv .

Reporter: Yuan Yuan | Editor : Michael Sarazen

Applying Linearly Scalable Transformers to Model Longer Protein Sequences

Synced Report |  A Survey of China’s Artificial Intelligence Solutions in Response to the COVID-19 Pandemic — 87 Case Studies from 700+ AI Vendors

This report offers a look at how the Chinese government and business owners have leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle .

Click here to find more reports from us.

We know you don’t want to miss any story.  Subscribe to our popular  Synced Global AI Weekly to get weekly AI updates.

Applying Linearly Scalable Transformers to Model Longer Protein Sequences


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网






[日] 池田尚史、[日] 藤仓和明、[日] 井上史彰 / 严圣逸 / 人民邮电出版社 / 2015-7 / 49.00

本书以团队开发中所必需的工具的导入方法和使用方法为核心,对团队开发的整体结构进行概括性的说明。内容涉及团队开发中发生的问题、版本管理系统、缺陷管理系统、持续集成、持续交付以及回归测试,并且对“为什么用那个工具”“为什么要这样使用”等开发现场常有的问题进行举例说明。 本书适合初次接手开发团队的项目经理,计划开始新项目的项目经理、Scrum Master,以及现有项目中返工、延期问题频发的开发人......一起来看看 《高效团队开发》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具



HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具