Modelling the lanuage of the immune system with machine learning (first steps)

栏目: IT技术 · 发布时间: 5年前

内容简介:The full set of antibodies and immune receptors in an individual contains traces of past and current immune responses. These traces can serve as biomarkers for diseases mediated by the adaptive immune system (e.g. infectious disease, organ rejection, autoi

Click here for our improved statistical classifier for immune repertoires, Dynamic Kernel Matching

Statistical classifiers for diagnosing disease from immune repertoires

LABORATORY OF DR. LINDSAY COWELL

Description

The full set of antibodies and immune receptors in an individual contains traces of past and current immune responses. These traces can serve as biomarkers for diseases mediated by the adaptive immune system (e.g. infectious disease, organ rejection, autoimmune disease, cancer). Only a handful of immune receptors that can be sequenced from a patient are expected to contain these traces. Here we present the source code to a method for elucidating these traces.

First, the CDR3 is parsed from every antibody sequence in a patient (see VDJ Server ). The CDR3 is then cut into fixed-length subsequences that we call snippets. These are nothing more than the k-mers of the CDR3. The amino acid residues of each snippet are then described by their biochemical properties in a position dependent manner using Atchley factors .

The main idea is to score every snippet by its biochemical features with a dectector function and to aggregate the scores into a single value that can represent a diagnosis. Because only a handful of snippets are expected to have a high score in patients with a disease, we aggregate the scores together by taking the maximum score. The maximum score is then used to predict the probability that a patient has a positive diagnosis (a high score would suggest a positive diagnosis, no high scores would suggest a negative diagnosis). The parameters of the detector function are fitted by maximizing the log-likelihood (minimizing the cross-entropy error) that each diagnosis is correct.

The model is fitted to the training data using gradient based optimization techniques. First, initial values are randomly drawn for each parameter. Then 2,500 steps of gradient based optimization are used to find a locally optimal fit to the data. We find that the fitting procedure must be repeated hundreds of thousands of times to find a good fit to the training data. Using TensorFlow, the fitting procedure is run repeatedly in parallel on a GPU. We call each thread a "replica", and the "replica" with the best fit to the training data is then scored on unseen and unused data.

For a complete description of this approach, see our publication in BMC Bioinformatics:

Requirements

Download

  • Download: zip
  • Git: git clone https://github.com/jostmey/MaxSnippetModel

Primary Files

  • model.py
  • train.py
  • score.py
  • dataplumbing.py (Data used to develop the approach cannot be made available at this time)
  • dataplumbing_synthetic_data.py (Overwrite dataplumbing.py with this file to see how the model performs on synthetic data)

Update

Improved repertoire classification models are published under:


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

YES!产品经理(上、下册)

YES!产品经理(上、下册)

汤圆、老马 / 电子工业出版社 / 2011-9-1 / 128.00元

《YES!产品经理(套装上下册)》是一本融合了经管、工具和职场小说特点的图书,作者是国内产品经理咨询界最有实力的团队。 《YES!产品经理(套装上下册)》以职场小说的形式全面介绍产品管理、产品经理相关的知识,所有的问答均放置在设计好的101个情节中,同时每一个情节之间也都有相应的联系,读者能够从具体的情节走向中不但了解到产品管理的完整知识,而且能够深刻感受到一个产品经理的现实工作状态,从知识......一起来看看 《YES!产品经理(上、下册)》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

MD5 加密
MD5 加密

MD5 加密工具

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试