Attention Mechanism in Deep Learning: Simplified

栏目: IT技术 · 发布时间: 4年前

内容简介：This blog is a part of an effort to create simplified introductions to the field of Machine Learning. Follow the complete series hereOr simply read the next blog in the series

Attention Mechanism in Deep Learning : Simplified

Prakhar Ganesh

Feb 29 ·5min read

Why is attention in deep learning getting so much… umm, attention?

What exactly is the attention mechanism?

Look at the image below and answer me, what is the color of the soccer ball? Also, which Atletico player, the guys in red and white, is wearing the captaincy band?

When you were trying to figure out answers to the questions above, did your mind do this weird thing where it focused on only part of the image?

Also when you were reading the sentence above, did your mind start associating different words together, ignoring certain phrases at times to simplify the meaning?

What happened?Well, it’s easy enough to explain. You were ‘focusing’ on a smaller part of the whole thing because you knew the rest of the image/sentence was not useful to you at that particular moment. So when you were trying to figure out the color of the soccer ball, your mind was showing you the soccer ball in HD but the rest of the image was almost blurred. Similarly, when you were reading the question, once you understood that the guys in red and white were Atletico players (probably some of you already knew that :P), you could blur out that part of the sentence to simplify its meaning.

In an attempt to borrow inspiration from how a human mind works, researchers in Deep Learning have tried replicating this behavior using what is known as the ‘attention mechanism’. Very simply put, attention mechanism is just a way of focusing on only a smaller part of the complete input while ignoring the rest.

How does it work?

Attention can be simply represented as a 3 step mechanism. Since we are talking about attention in general, I will not go into details of how this adapts to CV or NLP, which is very straightforward actually.

Create a probability distribution that rates the importance of various input elements. These input representations can be words, pixels, vectors etc. Creating these probability distributions is actually a learnable task.
Scale the original input using this probability distribution such that values that deserve more attention gets enhanced while others get diluted. Kinda like blurring everything else that doesn’t need attention.
Now use these newly scaled inputs and do further processing to get focused outputs/results.

Attention has completely changed the NLP game

Attention mechanism has been adopted in NLP for a relatively long time now, being used with several recurrent processing models like RNNs, LSTMs etc. As we noticed earlier, by focusing on only a short subset of words at a time, the attention mechanism can help these models better understand the language. But even after all that, attention was only used as an addition to the main model and RNNs were still ruling the world of NLP.

However, things changed when around 3 years ago a new paper was released named ‘Attention is All you Need’. As the name suggests, this model architecture, which is commonly known as Transformer, was able to replace the recurrent processing units with solely attention networks. Not only did it easily outperform RNNs, but Transformer based models are still making amazing progress and are the current leaders of various NLP competitions and tasks.

Does attention mean explanation?

In the last few years, there has been a tremendous hype of what is known as explainable AI, or XAI for short. With AI breaking into fields like medical diagnosis and autonomous driving, people are now starting to fear that a BlackBox is making life and death decisions. For us to trust the decisions made by AI, new research has been done in the direction of creating models that can also explain these decisions.

For several years it was believed that the attention mechanism can provide some sort of explanation on the predictions provided by the model. I mean it does make sense to think that the part of the input the model is focusing on should tell us something about the reasoning of its predictions. However, a deeper probe recently claimed that attention really has no link with explainability and various attention distributions can provide similar results. To add to the fun of this discovery , another paper recently went against this claim , stating that ‘explainability’ is actually subjective and thus saying that attention does not provide any explanation is incorrect.

According to me though, at least on some intuitive level, probing into the results of attention branches of the network should provide insights on how the model works and thus should have some connection with explainability.

What’s next?

While attention has always been utilized as a side mechanism for improving the performance of deep learning architectures, the recent success of Transformers in NLP suggests that attention alone is powerful enough to do amazing things that other networks cannot do. Also, it will be interesting to see how the field of explainable AI adopts the attention mechanism.

This blog is a part of an effort to create simplified introductions to the field of Machine Learning. Follow the complete series here

Machine Learning : Simplified

Know it before you dive in

towardsdatascience.com

Or simply read the next blog in the series

Pre-trained Language Models : Simplified

Sesame street of the NLP world

towardsdatascience.com

References

[1] Ramachandran, Prajit, et al. “Stand-alone self-attention in vision models.” arXiv preprint arXiv:1906.05909 (2019).
[2] Guan, Qingji, et al. “Diagnose like a radiologist: Attention guided convolutional neural network for thorax disease classification.” arXiv preprint arXiv:1801.09927 (2018).
[3] Vaswani, Ashish, et al. “Attention is all you need.” Advances in neural information processing systems. 2017.
[4] Jain, Sarthak, and Byron C. Wallace. “Attention is not Explanation.” Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.
[5] Wiegreffe, Sarah, and Yuval Pinter. “Attention is not not Explanation.” Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019.

以上所述就是小编给大家介绍的《Attention Mechanism in Deep Learning: Simplified》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

Attention Mechanism in Deep Learning: Simplified

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

安全之美

Andy Oram、John Viega / 徐波、沈晓斌 / 机械工业出版社华章公司 / 2011-4-28 / 65.00元

“这本深思熟虑的论文集（《安全之美》）帮助读者摆脱安全领域闪烁着欺骗光芒的心理恐惧，转而欣赏安全的微妙美感。本书描述了安全的阴和阳，以及引人注目的破坏性和闪亮光辉的建设者之间剑拔弩张的气氛。” ——Gary McGraw，Cigital公司CTO，《Software Security》及其他9本书的作者大多数人不会太关注安全问题，直到他们的个人或商业系统受到攻击。这种发人深省的现象证......一起来看看《安全之美》这本书的介绍吧!

码农工具

图片转BASE64编码

在线图片转Base64编码工具

RGB CMYK 转换工具

RGB CMYK 互转工具