Machine Learning on Graphs: Why Should you Care?

栏目: IT技术 · 发布时间: 4年前

内容简介:A few years ago, “Balboa Creole French” was considered as one language that is to disappear [1]. Balboa Island is located in Newport Beach,California. People their speak their modified version of French because many French families moved there after the fi

A basic overview of graphs and their intersection with machine learning.

A few years ago, “Balboa Creole French” was considered as one language that is to disappear [1]. Balboa Island is located in Newport Beach,California. People their speak their modified version of French because many French families moved there after the first world war and started to learn English, German, and Spanish until the language was formed. There are around 20 people who still speak that language.

Of course, everything I said was a complete hoax, but people did not believe so until someone actually went to the island to learn and the language and ended up finding that the language did not exist in the first place(at least that’s what the rumors say).

Now, you might ask what does this have to do with machine learning on graphs? Well, around 4 years ago, research [2] done at Stanford University came up with classifiers that managed to detect such hoaxes on Wikipedia that had an accuracy of 86% compared to the human-level accuracy of 66%!

The classifier they used was an ensemble of decision trees called Random Forests. The interesting part was how they crafted the features.

Machine Learning on Graphs: Why Should you Care?

Graph Diagrams for Real and Hoax Wikipedia Articles

One of the key ideas in the paper was how real articles link more coherently than false ones. In a Wikipedia article, you would have markup pointing to some other Wikipedia article. For real articles, the markups are linked together more than in a hoax and this turned out as a key factor in figuring out Wikipedia hoaxes.

Now, go to google, and type a question like “When did Leonardo Da Vinci die?”. You will get a lot of results for your search, but at the top, you will see a small box with the answer inside. How did Google know what we wanted?Back in 2012, Google released its Knowledge Graph which models entities in the world and relationships between them as a graph. So the string you input is not a string, rather a node in a huge graph. Leonardo Da Vinci is one node of this graph. The other node is May 2, 1519 which is his death date. There is a link connecting these two nodes. The link’s name or relation is Date of Death .

Of course, querying this graph and finding ways to embed the nodes/relations is another story which I would not tackle here!

Another one of the interesting applications of machine learning on graphs is the prediction of the side-effects due to the consumption of multiple drugs. Basically, many patients have to take sometimes more than one drug. Each drug affects a certain set of proteins. So if we can build a graph where the nodes are drugs and proteins. An arrow indicates that the associated drug affects the protein. Now, we know the effects of some drugs taken together. The problem is that we do not know the effects of all pairs of drugs since there are over 13000 drugs and doing experiments for each pair is time-consuming.

Machine Learning on Graphs: Why Should you Care?

Drug and Protein Graph

The other solution would be to use machine learning to predict these side-effects. Drugs are represented by triangles and proteins by circles. A link from a drug to a protein indicates that the protein is affected by this drug. A link between two drugs indicates that there is a side-effect if the two drugs are taken together. Notice how if drug #1 and drug #2 are taken together, nausea occurs. What happens if drug #2 and drug #3 are taken together? This is a task called Link Prediction where we aim to predict if there is a link between two nodes by taking advantage of the other links in the graph! Several side-effects have been predicted using Machine Learning without spending time on time-consuming experiments.

To end, graphs are gaining an increased attention these couple of years, especially in the machine learning community. They are a language to describe complex data across various domains. Combined with machine learning, they have had a great impact on social networking, drug design, AI reasoning, and many more.

I have given a basic overview of applications of graphs in Machine Learning. I am thinking of publishing articles tackling the theoretical and practical sides. I will cover basic graph theory, social networks, random graph models, spectral clustering, graph neural networks, and deep generative models for graphs. I will also be accompanying this with code to implement. But first, I need to know if there is an audience for this. If you are interested, please let me know what you think!

Thanks for your time!


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

算法分析导论(第2版)(英文版)

算法分析导论(第2版)(英文版)

[美]Robert Sedgewick(罗伯特•塞奇威克)、[美]Philippe Flajolet(菲利普•弗拉若莱) / 电子工业出版社 / 2015-6 / 128.00元

《算法分析导论(第2版)(英文版)》全面介绍了算法的数学分析中所涉及的主要技术。涵盖的内容来自经典的数学课题(包括离散数学、初等实分析、组合数学),以及经典的计算机科学课题(包括算法和数据结构)。《算法分析导论(第2版)(英文版)》的重点是“平均情况”或“概率性”分析,书中也论述了“最差情况”或“复杂性”分析所需的基本数学工具。 《算法分析导论(第2版)(英文版)》第 1 版为行业内的经典著......一起来看看 《算法分析导论(第2版)(英文版)》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

在线进制转换器
在线进制转换器

各进制数互转换器

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码