A Keras-Based Autoencoder for Anomaly Detection in Sequences

栏目: IT技术 · 发布时间: 6年前

内容简介:Use Keras to develop a robust NN architecture that can be used to efficiently recognize anomalies in sequencesSuppose that you have a very long list of string sequences, such as a list of amino acid structures (‘PHE-SER-CYS’, ‘GLN-ARG-SER’,…), product seri

Use Keras to develop a robust NN architecture that can be used to efficiently recognize anomalies in sequences

Jan 16 ·6min read

A Keras-Based Autoencoder for Anomaly Detection in Sequences

Photo by Markus Spiske on Unsplash

Suppose that you have a very long list of string sequences, such as a list of amino acid structures (‘PHE-SER-CYS’, ‘GLN-ARG-SER’,…), product serial numbers (‘AB121E’, ‘AB323’, ‘DN176’…), or users UIDs, and you are required to create a validation process of some kind that will detect anomalies in this sequence. An anomaly might be a string that follows a slightly different or unusual format than the others (whether it was created by mistake or on purpose) or just one that is extremely rare. To make things even more interesting, suppose that you don't know what is the correct format or structure that sequences suppose to follow.

This is a relatively common problem (though with an uncommon twist) that many data scientists usually approach using one of the popular unsupervised ML algorithms, such as DBScan, Isolation Forest, etc. Many of these algorithms typically do a good job in finding anomalies or outliers by singling out data points that are relatively far from the others or from areas in which most data points lie. Although autoencoders are also well-known for their anomaly detection capabilities, they work quite differently and are less common when it comes to problems of this sort.

A Keras-Based Autoencoder for Anomaly Detection in Sequences

Photo by Mika Baumeister on Unsplash

I will leave the explanations of what is exactly an autoencoder to the many insightful and well-written posts, and articles that are freely available online. Very very briefly (and please just read on if this doesn't make sense to you), just like other kinds of ML algorithms, autoencoders learn by creating different representations of data and by measuring how well these representations do in generating an expected outcome; and just like other kinds of neural network, autoencoders learn by creating different layers of such representations that allow them to learn more complex and sophisticated representations of data (which on my view is exactly what makes them superior for a task like ours). Autoencoders are a special form of a neural network, however, because the output that they attempt to generate is a reconstruction of the input they receive . An autoencoder starts with input data (i.e., a set of numbers) and then transforms it in different ways using a set of mathematical operations until it learns the parameters that it ought to use in order to reconstruct the same data (or get very close to it). In this learning process, an autoencoder essentially learns the format rules of the input data. And, that's exactly what makes it perform well as an anomaly detection mechanism in settings like ours.

Using autoencoders to detect anomalies usually involves two main steps:

First, we feed our data to an autoencoder and tune it until it is well trained to reconstruct the expected output with minimum error. An autoencoder that receives an input like 10,5,100 and returns 11,5,99, for example, is well-trained if we consider the reconstructed output as sufficiently close to the input and if the autoencoder is able to successfully reconstruct most of the data in this way.

Second, we feed all our data again to our trained autoencoder and measure the error term of each reconstructed data point. In other words, we measure how “far” is the reconstructed data point from the actual datapoint. A well-trained autoencoder essentially learns how to reconstruct an input that follows a certain format, so if we give a badly formatted data point to a well-trained autoencoder then we are likely to get something that is quite different from our input, and a large error term.


以上所述就是小编给大家介绍的《A Keras-Based Autoencoder for Anomaly Detection in Sequences》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

最优状态估计

最优状态估计

[美] D. 西蒙 / 张勇刚、李宁、奔粤阳 / 国防工业出版社 / 2013-5-1 / 68.00元

《最优状态估计——卡尔曼H∞及非线性滤波》共分为四个部分,全面介绍了最优状态估计的理论和方法。第1部分为基础知识,回顾了线性系统、概率论和随机过程相关知识,介绍了最小二乘法、维纳滤波、状态的统计特性随时间的传播过程。第2部分详细介绍了卡尔曼滤波及其等价形式,介绍了卡尔曼滤 波的扩展形式,包括相关噪声和有色噪声条件下的卡尔曼滤波、稳态滤波、衰减记忆滤波和带约束的卡尔 曼滤波等。第3部分详细介绍了H∞......一起来看看 《最优状态估计》 这本书的介绍吧!

在线进制转换器
在线进制转换器

各进制数互转换器

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器