The Surprisingly Effective Genetic Approach to Feature Selection

栏目: IT技术 · 发布时间: 5年前

The Surprisingly Effective Genetic Approach to Feature Selection

In genetic algorithms, a population of candidate solutions, also known as individuals, creatures, or phenotypes, are evolved towards better solutions in an optimization problem. Each candidate has a set of properties that can be mutated and altered.

These properties can be represented as a binary string (a sequences of zeroes and ones), but there exist other encodings. In the case of feature selection, each individual represents one selection of features, and each ‘property’ represents one feature, which can be turned on or off (1 or 0).

The evolution of individuals begins with a random generated population, meaning each’s properties are randomly initialized. Evolution is an iterative process, and the population in each iteration is referred to as a generation. In a genetic feature selection in a dataset with 900 columns, an initial population may consist of 300 individuals, or randomly generated combinations of on/off switches.

In each generation, the fitness, which is the function of the problem being solved, of each individual is evaluated.

One direct fitness function would be to simply evaluate the accuracy of a model when trained on that subset of data, or another of many possible model metrics . This can be a bit costly, though, so it should only be used with small datasets or populations.

An alternative is use a variety of cheaper-to-access metrics that can assist in evaluating the fitness of each solution. Some include:

  • Collinearity. Make sure that features in a subset do not contain similar information by evaluating the overall correlation of each subset.
  • Entropy / separability. With the current dataset, how well separated are the classes? The more separable the data, the better it is.
  • Hybrid. Combine these metrics with others like variance or how normally distributed the data is to yield a combination that satisfies the needs of the model.

With some controllable randomness injected to stimulate proper evolutionary discovery, individuals on the fitter side (scoring a better on the fitness function) are randomly selected. Randomness is added and ranking is not based on pure highest score because that would allow for little exploration and is not how evolution is conducted in the real biological world.

Request for deletion

About

MC.AI – Aggregated news about artificial intelligence

MC.AI collects interesting articles and news about artificial intelligence and related areas. The contributions come from various open sources and are presented here in a collected form.

The copyrights are held by the original authors, the source is indicated with each contribution.

Contributions which should be deleted from this platform can be reported using the appropriate form (within the contribution).

MC.AI is open for direct submissions, we look forward to your contribution!

Search on MC.AI

mc.ai aggregates articles from different sources - copyright remains at original authors


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

小米之道

小米之道

(美)克莱•舍基 / 张琪 / 浙江人民出版社 / 2017-10-1 / 49.90元

共享经济、自媒体预言者,“互联网先知”克莱·舍基,继《认知盈余》《人人时代》后,聚焦风口上的小米。资深科技商业观察家金错刀、润米咨询创始人刘润作序推荐。附多篇雷军内部讲话,详细解读成功完成“筑底”后小米的全新商业模式 纵观中国互联网发展史,可以明显发现,本土互联网企业的崛起,几乎都是先引入国外商业模式,然后通过强化本土化特点来构筑自己的壁垒。在这种背景下,小米是名副其实的新物种,它走的是相反......一起来看看 《小米之道》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

SHA 加密
SHA 加密

SHA 加密工具