A Keras-Based Autoencoder for Anomaly Detection in Sequences

栏目: IT技术 · 发布时间: 5年前

内容简介:Use Keras to develop a robust NN architecture that can be used to efficiently recognize anomalies in sequencesSuppose that you have a very long list of string sequences, such as a list of amino acid structures (‘PHE-SER-CYS’, ‘GLN-ARG-SER’,…), product seri

Use Keras to develop a robust NN architecture that can be used to efficiently recognize anomalies in sequences

Jan 16 ·6min read

A Keras-Based Autoencoder for Anomaly Detection in Sequences

Photo by Markus Spiske on Unsplash

Suppose that you have a very long list of string sequences, such as a list of amino acid structures (‘PHE-SER-CYS’, ‘GLN-ARG-SER’,…), product serial numbers (‘AB121E’, ‘AB323’, ‘DN176’…), or users UIDs, and you are required to create a validation process of some kind that will detect anomalies in this sequence. An anomaly might be a string that follows a slightly different or unusual format than the others (whether it was created by mistake or on purpose) or just one that is extremely rare. To make things even more interesting, suppose that you don't know what is the correct format or structure that sequences suppose to follow.

This is a relatively common problem (though with an uncommon twist) that many data scientists usually approach using one of the popular unsupervised ML algorithms, such as DBScan, Isolation Forest, etc. Many of these algorithms typically do a good job in finding anomalies or outliers by singling out data points that are relatively far from the others or from areas in which most data points lie. Although autoencoders are also well-known for their anomaly detection capabilities, they work quite differently and are less common when it comes to problems of this sort.

A Keras-Based Autoencoder for Anomaly Detection in Sequences

Photo by Mika Baumeister on Unsplash

I will leave the explanations of what is exactly an autoencoder to the many insightful and well-written posts, and articles that are freely available online. Very very briefly (and please just read on if this doesn't make sense to you), just like other kinds of ML algorithms, autoencoders learn by creating different representations of data and by measuring how well these representations do in generating an expected outcome; and just like other kinds of neural network, autoencoders learn by creating different layers of such representations that allow them to learn more complex and sophisticated representations of data (which on my view is exactly what makes them superior for a task like ours). Autoencoders are a special form of a neural network, however, because the output that they attempt to generate is a reconstruction of the input they receive . An autoencoder starts with input data (i.e., a set of numbers) and then transforms it in different ways using a set of mathematical operations until it learns the parameters that it ought to use in order to reconstruct the same data (or get very close to it). In this learning process, an autoencoder essentially learns the format rules of the input data. And, that's exactly what makes it perform well as an anomaly detection mechanism in settings like ours.

Using autoencoders to detect anomalies usually involves two main steps:

First, we feed our data to an autoencoder and tune it until it is well trained to reconstruct the expected output with minimum error. An autoencoder that receives an input like 10,5,100 and returns 11,5,99, for example, is well-trained if we consider the reconstructed output as sufficiently close to the input and if the autoencoder is able to successfully reconstruct most of the data in this way.

Second, we feed all our data again to our trained autoencoder and measure the error term of each reconstructed data point. In other words, we measure how “far” is the reconstructed data point from the actual datapoint. A well-trained autoencoder essentially learns how to reconstruct an input that follows a certain format, so if we give a badly formatted data point to a well-trained autoencoder then we are likely to get something that is quite different from our input, and a large error term.


以上所述就是小编给大家介绍的《A Keras-Based Autoencoder for Anomaly Detection in Sequences》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

App研发录:架构设计、Crash分析和竞品技术分析

App研发录:架构设计、Crash分析和竞品技术分析

包建强 / 机械工业出版社 / 2015-10-21 / CNY 59.00

本书是作者多年App开发的经验总结,从App架构的角度,重点总结了Android应用开发中常见的实用技巧和疑难问题解决方法,为打造高质量App提供有价值的实践指导,迅速提升应用开发能力和解决疑难问题的能力。本书涉及的问题有:Android基础建设、网络底层框架设计、缓存、网络流量优化、制定编程规范、模块化拆分、Crash异常的捕获与分析、持续集成、代码混淆、App竞品技术分析、项目管理和团队建设等......一起来看看 《App研发录:架构设计、Crash分析和竞品技术分析》 这本书的介绍吧!

JS 压缩/解压工具
JS 压缩/解压工具

在线压缩/解压 JS 代码

SHA 加密
SHA 加密

SHA 加密工具