AI Speeds Drug Discovery to fight COVID-19

栏目: IT技术 · 发布时间: 5年前

内容简介:An inside look at AI’s role in the race for COVID-19 treatment and drug discovery, and TCS’ first-hand research and experimentationThe virus named “SARS-CoV-2” is the source of a global pandemic COVID-19, which has severely affected the health and economy

An inside look at AI’s role in the race for COVID-19 treatment and drug discovery, and TCS’ first-hand research and experimentation

Apr 29 ·6min read

AI Speeds Drug Discovery to fight COVID-19

The virus named “SARS-CoV-2” is the source of a global pandemic COVID-19, which has severely affected the health and economy of several countries. Multiple studies are in progress, employing diverse approaches to design novel therapeutics against the potential target proteins in SARS-CoV-2. One of the well-studied protein targets for coronaviruses is the chymotrypsin-like (3CL) protease, responsible for post-translational modifications of viral polyproteins essential for its survival and replication in the host. There are various ongoing projects to find inhibitors against 3CL protease of SARS-CoV-2.

Recent studies have proven the efficiency of artificial intelligence (AI) techniques in understanding the known chemical space and generating novel small molecules. These small molecules have to satisfy several physicochemical properties to be able to be used as potential drug molecules. With the advent of AI-based methods, it is possible to design novel small molecules with desired drug-like properties. At Tata Consultancy Services(TCS), we employed deep neural network-based generative and predictive models for de novo design of small molecules capable of inhibiting the 3CL protease. The generated small molecules were filtered and screened against the binding site of the 3CL protease structure of SARS-CoV-2. Based on the screening results and further analysis, we have identified 31 potential compounds as ideal candidates for further synthesis and testing against SARS-CoV-2.

The AI-driven revolution in drug discovery

Finding a new drug takes a decade or more with a very low success rate. Advances in data curation and management have fueled the emergence of an AI-driven revolution in drug discovery.

AI-based methods are emerging as promising tools to explore the vast chemical space that is available to sample drug-like molecules. AI models are capable of learning the feature representations based on existing drugs that can be used to explore the chemical space in search of more drug-like molecules. This has provided a beacon of opportunity to the drug design community to overcome many challenges including the global antibiotic-resistance crisis. Most importantly, an AI-based approach can reduce the initial phase of the drug-discovery process from years to a few days.

TCS capability in terms of algorithms

In this study, we have utilized our in-house deep neural network-based generative and predictive models to design novel drug-like small molecules (new chemical entities or NCEs). Our in-house models and algorithms have been validated on a multitude of drug design tasks to tailor compounds to a specific protein target of interest. These validated, pre-trained state-of-the-art models were used to generate novel small molecules capable of inhibiting the 3CL protease of SARS-CoV-2, utilizing advanced training techniques such as transfer learning and regularized reinforcement learning.

An overview of dataset collection and pre-processing

ChEMBL is a public database which maintains the most comprehensive collection of drug-like small molecules. The generative model was initially trained on a dataset of ~1.6 million drug-like molecules from the ChEMBL database. The molecules were represented in Simplified Molecular Input Line Entry System (SMILES) format which will enable the model to learn the necessary features to design novel drug-like small molecules.

Training the deep learning models

The pre-processed SMILES dataset was used to train a recurrent neural network (RNN)-based generative model. The problem of learning the SMILES grammar and reproducing it to generate novel small molecules was cast as a classification problem. The entire SMILES string was considered as a time series, where every position or symbol was considered as a time point. The different symbols in the SMILES vocabulary were considered as the classes of the classification. At a given time point, the generative model was trained to predict the class of the next symbol given the class distributions of the previous symbols in the time series. Thus, the model learns the probability distribution over the various classes at each time point of the time series. The problem was cast this way, to resemble the class of natural language processing (NLP) problems for which sophisticated AI models and architectures have been developed over the years.

Our trained generative model has state-of-the-art accuracy of 96.6%, calculated based on the chemical and synthetic feasibility of the drug-like molecules inferred from the model. This general model capable of exploring the chemical space acted as our prior model, which was further adapted to generate small molecules specific to a target of interest using transfer learning. In order to bias the model to focus on the 3CL protease of SARS-CoV-2, a dataset of protease inhibitor molecules was manually curated from the ChEMBL database. The dataset of 2,515 protease inhibitor molecules was used to re-train the generative model using transfer learning. In the process of transfer learning, the generative model is biased towards focusing on a smaller subset of the chemical space. Further, regularized reinforcement learning was used to modulate the generative model to produce molecules with optimized physicochemical properties (Fig. 1).

AI Speeds Drug Discovery to fight COVID-19

Figure 1 : Approach used for generating novel compounds for targeting 3CL protease of SARS-CoV-2.

Filtering the potential drug-like molecules against SARS-CoV-2

The trained generative model was used for sampling 50,000 small molecules from the learned chemical space. After removal of duplicates and molecules which were identical to the ChEMBL database, the residual dataset consisted of 42,484 molecules. These molecules were subjected to stringent physicochemical property filters including drug-likeness, octanol-water partition coefficient (logP), hydrogen bond donor and acceptor counts, molecular weight, bioactivity and synthetic accessibility, which resulted in a set of 3,960 molecules. These molecules were further filtered based on their affinity towards SARS-CoV-2 3CL protease. After virtual screening, a total of 1,333 small molecules were obtained which could act as potential inhibitors.

We also observed that, the generative model could generate small molecules that are similar to HIV-protease inhibitors, but with better binding to the SARS-CoV-2 3CL protease. The complete set of promising small molecules can be found here so that anyone can test these molecules against SARS-CoV-2 in this hour of need.

Other AI-based applications for COVID-19 drug design research

Several companies and startups have transformed their research goals in innovative ways, to utilize AI in accelerating the search for a cure against COVID-19. The European AI-centered startup Molecule.one has released its patented syntheses planning platform for free access to the scientific community, in an effort to help researchers rapidly synthesize and test potential candidate molecules against COVID-19. IBM has applied its AI generative frameworks to three COVID-19 targets and has generated 3000 novel molecules. These molecules have been released under the Creative Commons License (CCL) to the scientific community for synthesis, testing and optimization. The Hong Kong-based pharmaceutical research company, InSilico Medicine has released a list of 97 candidate small molecules designed to inhibit the 3CL protease of SARS-CoV-2. Several AI-based rapid virtual screening models have been developed and tested against COVID-19, with existing public and commercial virtual screening compound libraries as primary databases. In essence, several directions of research have incorporated AI-based models to come up with potential therapeutics for COVID-19, at an unprecedented pace.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Linux内核设计与实现

Linux内核设计与实现

拉芙 / 陈莉君、唐华、张波 / 机械工业出版社 / 2006-1 / 38.00元

《Linux内核设计与实现》基于Linux2.6内核系列详细介绍Linux内核系统,覆盖了从核心内核系统的应用到内核设计与实现等各方面的内容。主要内容包括:进程管理、系统调用、中断和中断处理程序、内核同步、时间管理、内存管理、地址空间、调试技术等。本书理论联系实践,既介绍理论也讨论具体应用,能够带领读者快速走进Linux内核世界,真正开发内核代码。 本书适合作为高等院校操作系统课程的教材......一起来看看 《Linux内核设计与实现》 这本书的介绍吧!

在线进制转换器
在线进制转换器

各进制数互转换器

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

MD5 加密
MD5 加密

MD5 加密工具