Common Practices — Part 3

栏目: IT技术 · 发布时间: 4年前

内容简介：These are the lecture notes for FAU’s YouTube Lecture “Welcome back to deep learning! Today, we want to continue talking about our common practices. The methods that we are interested in today are about class imbalance. So, a very typical problem is that o

FAU LECTURE NOTES ON DEEP LEARNING

Common Practices — Part 3

Class Imbalance

Andreas Maier

Jul 13 ·5min read

Common Practices — Part 3 — Deep Learning at FAU. Image under CC BY 4.0 from the Deep Learning Lecture

These are the lecture notes for FAU’s YouTube Lecture “ Deep Learning ”. This is a full transcript of the lecture video & matching slides. We hope, you enjoy this as much as the videos. Of course, this transcript was created with deep learning techniques largely automatically and only minor manual modifications were performed. If you spot mistakes, please let us know!

Navigation

Previous Lecture / Watch this Video / Top Level / Next Lecture

Welcome back to deep learning! Today, we want to continue talking about our common practices. The methods that we are interested in today are about class imbalance. So, a very typical problem is that one class — in particular the very interesting one — is not very frequent. So, this is a challenge for all the machine learning algorithms.

Let’s take the example of fraud detection. Out of 10,000 transactions, 9,999 are genuine and only one is fraudulent. So, if you classify everything as genuine, you get 99.99% accuracy. Obviously, even in less severe situations, we are if you had a model and that would misclassify one out of a hundred transactions, then you would end up only in a model with 99% accuracy. This is of course a very hard problem. In particular, in screening applications, you have to be very careful because just classifying everything to the most common class would still get you very very good accuracy.

It doesn’t have to be credit cards, for example, here detecting mitotic cells is a very similar problem. A mitotic cell is a cell undergoing cell division. These cells are very important as we already heard in the introduction. If you count the cells under mitosis, you know how aggressively the associated cancer is growing. So this is a very important feature but you have to detect them correctly. They make up only a very small portion of the cells in tissues. So, the data of this class has been seen much less during the training, and measures like the accuracy, L2 norm, and cross-entropy don’t show this imbalance. So, they are not very responsive to this.

One thing that you can do is for example resampling. The idea is that you balance the class frequencies by sampling classes differently, So, you can understand this means that you have to throw away a lot of the training data of the most frequent classes. This way you get to train a classifier that will be balanced towards both of these classes. Now they are seen approximately as frequently as the other class. The disadvantage of this approach is that you’re not using all the data that is being seen and of course, you don’t want to throw away data.

So another technique is oversampling. You can just sample more often from the underrepresented classes. In this case, you can use all of the data. The disadvantage is of course that it can lead to heavy overfitting towards the less frequently seen examples. Also possible are combinations of under and oversampling.

This then leads to advanced resampling techniques that try to avoid the shortcomings of undersampling by a synthetic minority oversampling. It’s rather uncommon in deep learning. Underfitting caused by undersampling can be reduced by taking a different subset after each epoch. This is quite common and also you can use data augmentation to help reducing overfitting for underrepresented classes. So, you essentially augment more of the samples that you have seen less frequently.

Instead of fixing the data, of course, you can also try to adapt the loss function to be stable with respect to class imbalance. Here, you then choose a loss with the inverse class frequency. You can then create the weighted cross entropy where you introduce an additional weight w which is simply determined as the inverse class frequency. More common in segmentation problems are then things like the Dice loss based on the Dice coefficient . Here, you evaluate the loss on the dice coefficient that measures area overlap. It is a very typical measure for evaluating segmentations instead of class frequency. Weights can also be adapted with regards to other considerations but we are not discussing them here in this current lecture.

This already brings us to the end of this part and in the final section of common practices, we will now discuss measures of evaluation and how to evaluate our models appropriately. So, thank you very much for listening and goodbye!

If you liked this post, you can find more essays here , more educational material on Machine Learning here , or have a look at our Deep Learning Lecture . I would also appreciate a follow on YouTube , Twitter , Facebook , or LinkedIn in case you want to be informed about more essays, videos, and research in the future. This article is released under the Creative Commons 4.0 Attribution License and can be reprinted and modified if referenced.

References

[1] M. Aubreville, M. Krappmann, C. Bertram, et al. “A Guided Spatial Transformer Network for Histology Cell Differentiation”. In: ArXiv e-prints (July 2017). arXiv: 1707.08525 [cs.CV].

[2] James Bergstra and Yoshua Bengio. “Random Search for Hyper-parameter Optimization”. In: J. Mach. Learn. Res. 13 (Feb. 2012), pp. 281–305.

[3] Jean Dickinson Gibbons and Subhabrata Chakraborti. “Nonparametric statistical inference”. In: International encyclopedia of statistical science. Springer, 2011, pp. 977–979.

[4] Yoshua Bengio. “Practical recommendations for gradient-based training of deep architectures”. In: Neural networks: Tricks of the trade. Springer, 2012, pp. 437–478.

[5] Chiyuan Zhang, Samy Bengio, Moritz Hardt, et al. “Understanding deep learning requires rethinking generalization”. In: arXiv preprint arXiv:1611.03530 (2016).

[6] Boris T Polyak and Anatoli B Juditsky. “Acceleration of stochastic approximation by averaging”. In: SIAM Journal on Control and Optimization 30.4 (1992), pp. 838–855.

[7] Prajit Ramachandran, Barret Zoph, and Quoc V. Le. “Searching for Activation Functions”. In: CoRR abs/1710.05941 (2017). arXiv: 1710.05941.

[8] Stefan Steidl, Michael Levit, Anton Batliner, et al. “Of All Things the Measure is Man: Automatic Classification of Emotions and Inter-labeler Consistency”. In: Proc. of ICASSP. IEEE — Institute of Electrical and Electronics Engineers, Mar. 2005.

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Common Practices — Part 3

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

大数据时代小数据分析

屈泽中 / 电子工业出版社 / 2015-7-1 / 69.00元

《大数据时代小数据分析》是一本大数据时代下进行小数据分析的入门级教材，通过数据分析的知识点，将各类分析工具进行串联和对比，例如：在进行线性规划的时候可以选择使用Excel或LINGO或Crystal Ball。工具的应用难易结合，让读者循序渐进地学习相关工具。JMP和Mintab用来分析数据，分析的结果使用Excel、LINGO、Crystal Ball来建立数据模型，最后使用Xcelsius来动......一起来看看《大数据时代小数据分析》这本书的介绍吧!

码农工具