A Keras-Based Autoencoder for Anomaly Detection in Sequences

栏目: IT技术 · 发布时间: 4年前

内容简介:Use Keras to develop a robust NN architecture that can be used to efficiently recognize anomalies in sequencesSuppose that you have a very long list of string sequences, such as a list of amino acid structures (‘PHE-SER-CYS’, ‘GLN-ARG-SER’,…), product seri

Use Keras to develop a robust NN architecture that can be used to efficiently recognize anomalies in sequences

Jan 16 ·6min read

A Keras-Based Autoencoder for Anomaly Detection in Sequences

Photo by Markus Spiske on Unsplash

Suppose that you have a very long list of string sequences, such as a list of amino acid structures (‘PHE-SER-CYS’, ‘GLN-ARG-SER’,…), product serial numbers (‘AB121E’, ‘AB323’, ‘DN176’…), or users UIDs, and you are required to create a validation process of some kind that will detect anomalies in this sequence. An anomaly might be a string that follows a slightly different or unusual format than the others (whether it was created by mistake or on purpose) or just one that is extremely rare. To make things even more interesting, suppose that you don't know what is the correct format or structure that sequences suppose to follow.

This is a relatively common problem (though with an uncommon twist) that many data scientists usually approach using one of the popular unsupervised ML algorithms, such as DBScan, Isolation Forest, etc. Many of these algorithms typically do a good job in finding anomalies or outliers by singling out data points that are relatively far from the others or from areas in which most data points lie. Although autoencoders are also well-known for their anomaly detection capabilities, they work quite differently and are less common when it comes to problems of this sort.

A Keras-Based Autoencoder for Anomaly Detection in Sequences

Photo by Mika Baumeister on Unsplash

I will leave the explanations of what is exactly an autoencoder to the many insightful and well-written posts, and articles that are freely available online. Very very briefly (and please just read on if this doesn't make sense to you), just like other kinds of ML algorithms, autoencoders learn by creating different representations of data and by measuring how well these representations do in generating an expected outcome; and just like other kinds of neural network, autoencoders learn by creating different layers of such representations that allow them to learn more complex and sophisticated representations of data (which on my view is exactly what makes them superior for a task like ours). Autoencoders are a special form of a neural network, however, because the output that they attempt to generate is a reconstruction of the input they receive . An autoencoder starts with input data (i.e., a set of numbers) and then transforms it in different ways using a set of mathematical operations until it learns the parameters that it ought to use in order to reconstruct the same data (or get very close to it). In this learning process, an autoencoder essentially learns the format rules of the input data. And, that's exactly what makes it perform well as an anomaly detection mechanism in settings like ours.

Using autoencoders to detect anomalies usually involves two main steps:

First, we feed our data to an autoencoder and tune it until it is well trained to reconstruct the expected output with minimum error. An autoencoder that receives an input like 10,5,100 and returns 11,5,99, for example, is well-trained if we consider the reconstructed output as sufficiently close to the input and if the autoencoder is able to successfully reconstruct most of the data in this way.

Second, we feed all our data again to our trained autoencoder and measure the error term of each reconstructed data point. In other words, we measure how “far” is the reconstructed data point from the actual datapoint. A well-trained autoencoder essentially learns how to reconstruct an input that follows a certain format, so if we give a badly formatted data point to a well-trained autoencoder then we are likely to get something that is quite different from our input, and a large error term.


以上所述就是小编给大家介绍的《A Keras-Based Autoencoder for Anomaly Detection in Sequences》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

明解C语言(第3版)

明解C语言(第3版)

[日] 柴田望洋 / 管杰、罗勇、杜晓静 / 人民邮电出版社 / 2015-11-1 / 79.00元

本书是日本的C语言经典教材,自出版以来不断重印、修订,被誉为“C语言圣经”。 本书图文并茂,示例丰富,第3版从190段代码和164幅图表增加至205段代码和220幅图表,对C语言的基础知识进行了彻底剖析,内容涉及数组、函数、指针、文件操作等。对于C语言语法以及一些难以理解的概念,均以精心绘制的示意图,清晰、通俗地进行讲解。原著在日本广受欢迎,始终位于网上书店C语言著作排行榜首位。一起来看看 《明解C语言(第3版)》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

随机密码生成器
随机密码生成器

多种字符组合密码

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具