Attention Mechanism in Deep Learning: Simplified

栏目: IT技术 · 发布时间: 5年前

内容简介:This blog is a part of an effort to create simplified introductions to the field of Machine Learning. Follow the complete series hereOr simply read the next blog in the series

Attention Mechanism in Deep Learning : Simplified

Why is attention in deep learning getting so much… umm, attention?

What exactly is the attention mechanism?

Look at the image below and answer me, what is the color of the soccer ball? Also, which Atletico player, the guys in red and white, is wearing the captaincy band?

Attention Mechanism in Deep Learning: Simplified

[ Source }

When you were trying to figure out answers to the questions above, did your mind do this weird thing where it focused on only part of the image?

Attention Mechanism in Deep Learning: Simplified

Also when you were reading the sentence above, did your mind start associating different words together, ignoring certain phrases at times to simplify the meaning?

What happened?Well, it’s easy enough to explain. You were ‘focusing’ on a smaller part of the whole thing because you knew the rest of the image/sentence was not useful to you at that particular moment. So when you were trying to figure out the color of the soccer ball, your mind was showing you the soccer ball in HD but the rest of the image was almost blurred. Similarly, when you were reading the question, once you understood that the guys in red and white were Atletico players (probably some of you already knew that :P), you could blur out that part of the sentence to simplify its meaning.

In an attempt to borrow inspiration from how a human mind works, researchers in Deep Learning have tried replicating this behavior using what is known as the ‘attention mechanism’. Very simply put, attention mechanism is just a way of focusing on only a smaller part of the complete input while ignoring the rest.

How does it work?

Attention can be simply represented as a 3 step mechanism. Since we are talking about attention in general, I will not go into details of how this adapts to CV or NLP, which is very straightforward actually.

Attention Mechanism in Deep Learning: Simplified

  1. Create a probability distribution that rates the importance of various input elements. These input representations can be words, pixels, vectors etc. Creating these probability distributions is actually a learnable task.
  2. Scale the original input using this probability distribution such that values that deserve more attention gets enhanced while others get diluted. Kinda like blurring everything else that doesn’t need attention.
  3. Now use these newly scaled inputs and do further processing to get focused outputs/results.

Attention has completely changed the NLP game

Attention Mechanism in Deep Learning: Simplified

[ Source }

Attention mechanism has been adopted in NLP for a relatively long time now, being used with several recurrent processing models like RNNs, LSTMs etc. As we noticed earlier, by focusing on only a short subset of words at a time, the attention mechanism can help these models better understand the language. But even after all that, attention was only used as an addition to the main model and RNNs were still ruling the world of NLP.

However, things changed when around 3 years ago a new paper was released named ‘Attention is All you Need’. As the name suggests, this model architecture, which is commonly known as Transformer, was able to replace the recurrent processing units with solely attention networks. Not only did it easily outperform RNNs, but Transformer based models are still making amazing progress and are the current leaders of various NLP competitions and tasks.

Attention Mechanism in Deep Learning: Simplified

A small attention-based Transformer network [ Source ]

Does attention mean explanation?

In the last few years, there has been a tremendous hype of what is known as explainable AI, or XAI for short. With AI breaking into fields like medical diagnosis and autonomous driving, people are now starting to fear that a BlackBox is making life and death decisions. For us to trust the decisions made by AI, new research has been done in the direction of creating models that can also explain these decisions.

For several years it was believed that the attention mechanism can provide some sort of explanation on the predictions provided by the model. I mean it does make sense to think that the part of the input the model is focusing on should tell us something about the reasoning of its predictions. However, a deeper probe recently claimed that attention really has no link with explainability and various attention distributions can provide similar results. To add to the fun of this discovery , another paper recently went against this claim , stating that ‘explainability’ is actually subjective and thus saying that attention does not provide any explanation is incorrect.

According to me though, at least on some intuitive level, probing into the results of attention branches of the network should provide insights on how the model works and thus should have some connection with explainability.

What’s next?

While attention has always been utilized as a side mechanism for improving the performance of deep learning architectures, the recent success of Transformers in NLP suggests that attention alone is powerful enough to do amazing things that other networks cannot do. Also, it will be interesting to see how the field of explainable AI adopts the attention mechanism.

This blog is a part of an effort to create simplified introductions to the field of Machine Learning. Follow the complete series here

Or simply read the next blog in the series

References

[1] Ramachandran, Prajit, et al. “Stand-alone self-attention in vision models.” arXiv preprint arXiv:1906.05909 (2019).
[2] Guan, Qingji, et al. “Diagnose like a radiologist: Attention guided convolutional neural network for thorax disease classification.” arXiv preprint arXiv:1801.09927 (2018).
[3] Vaswani, Ashish, et al. “Attention is all you need.” Advances in neural information processing systems. 2017.
[4] Jain, Sarthak, and Byron C. Wallace. “Attention is not Explanation.” Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.
[5] Wiegreffe, Sarah, and Yuval Pinter. “Attention is not not Explanation.” Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019.


以上所述就是小编给大家介绍的《Attention Mechanism in Deep Learning: Simplified》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Web Design Index 7

Web Design Index 7

Pepin Press / PEPIN PRESS / 20070501 / TWD$1000.00

《網頁設計索引》年刊自2000年誕生起現已發展成同行業最重要的出版物之一,每年都會對網頁設計的最新趨勢給予準確概述。網站可簡單到只有一頁,也可以設計為具有最新數位性能的複雜結構。《網頁設計索引》的篩選標準是根據設計品質、創意及效率-而不管複雜程度如何。因此在本書中你可以找到所有可能的樣式和風格的實例。 每輯《網頁設計索引》都展示了1002個精采的網頁 同時提供了每個網頁的URL。網頁設計和編......一起来看看 《Web Design Index 7》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

在线进制转换器
在线进制转换器

各进制数互转换器

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具