AI Speeds Drug Discovery to fight COVID-19

栏目: IT技术 · 发布时间: 4年前

内容简介:An inside look at AI’s role in the race for COVID-19 treatment and drug discovery, and TCS’ first-hand research and experimentationThe virus named “SARS-CoV-2” is the source of a global pandemic COVID-19, which has severely affected the health and economy

An inside look at AI’s role in the race for COVID-19 treatment and drug discovery, and TCS’ first-hand research and experimentation

Apr 29 ·6min read

AI Speeds Drug Discovery to fight COVID-19

The virus named “SARS-CoV-2” is the source of a global pandemic COVID-19, which has severely affected the health and economy of several countries. Multiple studies are in progress, employing diverse approaches to design novel therapeutics against the potential target proteins in SARS-CoV-2. One of the well-studied protein targets for coronaviruses is the chymotrypsin-like (3CL) protease, responsible for post-translational modifications of viral polyproteins essential for its survival and replication in the host. There are various ongoing projects to find inhibitors against 3CL protease of SARS-CoV-2.

Recent studies have proven the efficiency of artificial intelligence (AI) techniques in understanding the known chemical space and generating novel small molecules. These small molecules have to satisfy several physicochemical properties to be able to be used as potential drug molecules. With the advent of AI-based methods, it is possible to design novel small molecules with desired drug-like properties. At Tata Consultancy Services(TCS), we employed deep neural network-based generative and predictive models for de novo design of small molecules capable of inhibiting the 3CL protease. The generated small molecules were filtered and screened against the binding site of the 3CL protease structure of SARS-CoV-2. Based on the screening results and further analysis, we have identified 31 potential compounds as ideal candidates for further synthesis and testing against SARS-CoV-2.

The AI-driven revolution in drug discovery

Finding a new drug takes a decade or more with a very low success rate. Advances in data curation and management have fueled the emergence of an AI-driven revolution in drug discovery.

AI-based methods are emerging as promising tools to explore the vast chemical space that is available to sample drug-like molecules. AI models are capable of learning the feature representations based on existing drugs that can be used to explore the chemical space in search of more drug-like molecules. This has provided a beacon of opportunity to the drug design community to overcome many challenges including the global antibiotic-resistance crisis. Most importantly, an AI-based approach can reduce the initial phase of the drug-discovery process from years to a few days.

TCS capability in terms of algorithms

In this study, we have utilized our in-house deep neural network-based generative and predictive models to design novel drug-like small molecules (new chemical entities or NCEs). Our in-house models and algorithms have been validated on a multitude of drug design tasks to tailor compounds to a specific protein target of interest. These validated, pre-trained state-of-the-art models were used to generate novel small molecules capable of inhibiting the 3CL protease of SARS-CoV-2, utilizing advanced training techniques such as transfer learning and regularized reinforcement learning.

An overview of dataset collection and pre-processing

ChEMBL is a public database which maintains the most comprehensive collection of drug-like small molecules. The generative model was initially trained on a dataset of ~1.6 million drug-like molecules from the ChEMBL database. The molecules were represented in Simplified Molecular Input Line Entry System (SMILES) format which will enable the model to learn the necessary features to design novel drug-like small molecules.

Training the deep learning models

The pre-processed SMILES dataset was used to train a recurrent neural network (RNN)-based generative model. The problem of learning the SMILES grammar and reproducing it to generate novel small molecules was cast as a classification problem. The entire SMILES string was considered as a time series, where every position or symbol was considered as a time point. The different symbols in the SMILES vocabulary were considered as the classes of the classification. At a given time point, the generative model was trained to predict the class of the next symbol given the class distributions of the previous symbols in the time series. Thus, the model learns the probability distribution over the various classes at each time point of the time series. The problem was cast this way, to resemble the class of natural language processing (NLP) problems for which sophisticated AI models and architectures have been developed over the years.

Our trained generative model has state-of-the-art accuracy of 96.6%, calculated based on the chemical and synthetic feasibility of the drug-like molecules inferred from the model. This general model capable of exploring the chemical space acted as our prior model, which was further adapted to generate small molecules specific to a target of interest using transfer learning. In order to bias the model to focus on the 3CL protease of SARS-CoV-2, a dataset of protease inhibitor molecules was manually curated from the ChEMBL database. The dataset of 2,515 protease inhibitor molecules was used to re-train the generative model using transfer learning. In the process of transfer learning, the generative model is biased towards focusing on a smaller subset of the chemical space. Further, regularized reinforcement learning was used to modulate the generative model to produce molecules with optimized physicochemical properties (Fig. 1).

AI Speeds Drug Discovery to fight COVID-19

Figure 1 : Approach used for generating novel compounds for targeting 3CL protease of SARS-CoV-2.

Filtering the potential drug-like molecules against SARS-CoV-2

The trained generative model was used for sampling 50,000 small molecules from the learned chemical space. After removal of duplicates and molecules which were identical to the ChEMBL database, the residual dataset consisted of 42,484 molecules. These molecules were subjected to stringent physicochemical property filters including drug-likeness, octanol-water partition coefficient (logP), hydrogen bond donor and acceptor counts, molecular weight, bioactivity and synthetic accessibility, which resulted in a set of 3,960 molecules. These molecules were further filtered based on their affinity towards SARS-CoV-2 3CL protease. After virtual screening, a total of 1,333 small molecules were obtained which could act as potential inhibitors.

We also observed that, the generative model could generate small molecules that are similar to HIV-protease inhibitors, but with better binding to the SARS-CoV-2 3CL protease. The complete set of promising small molecules can be found here so that anyone can test these molecules against SARS-CoV-2 in this hour of need.

Other AI-based applications for COVID-19 drug design research

Several companies and startups have transformed their research goals in innovative ways, to utilize AI in accelerating the search for a cure against COVID-19. The European AI-centered startup Molecule.one has released its patented syntheses planning platform for free access to the scientific community, in an effort to help researchers rapidly synthesize and test potential candidate molecules against COVID-19. IBM has applied its AI generative frameworks to three COVID-19 targets and has generated 3000 novel molecules. These molecules have been released under the Creative Commons License (CCL) to the scientific community for synthesis, testing and optimization. The Hong Kong-based pharmaceutical research company, InSilico Medicine has released a list of 97 candidate small molecules designed to inhibit the 3CL protease of SARS-CoV-2. Several AI-based rapid virtual screening models have been developed and tested against COVID-19, with existing public and commercial virtual screening compound libraries as primary databases. In essence, several directions of research have incorporated AI-based models to come up with potential therapeutics for COVID-19, at an unprecedented pace.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

增长的本质

增长的本质

凯萨·伊达尔戈 / 中信出版集团股份有限公司 / 2015-11-1 / CNY 49.00

这是一本物理学家撰写得跨经济学、社会、物理学等多学科的专著。在伊达尔戈之前,从来没有人以这样的方式研究经济增长。 什么是经济增长?为什么它只发生在历史长河的一些节点上?传统意义上的解释都强调了体制、地理、金融和心理因素。而塞萨尔?伊达尔戈告诉我们,想了解经济增长的本质,还需要走出社会科学的研究,在信息、关系网和复杂性这样的自然科学中寻求答案。为了认识经济发展,塞萨尔?伊达尔戈认为我们首先需要......一起来看看 《增长的本质》 这本书的介绍吧!

随机密码生成器
随机密码生成器

多种字符组合密码

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具