内容简介:In this post, we will unveil the magic behind them, see what they are, why they have become a standard in theEverything will be explained in a
Simple and intuitive Word Embeddings
Jun 13 ·8min read
Introduction
Word embeddings have become one of the most used tools and main drivers of the amazing achievements of Artificial Intelligence tasks that require processing natural languages like speech or texts.
In this post, we will unveil the magic behind them, see what they are, why they have become a standard in the Natural Language Processing (NLP hereinafter) world, how they are built, and explore some of the most used word embedding algorithms.
Everything will be explained in a simple and intuitive manner , avoiding complex maths and trying to make the content of the post as accessible as possible.
It will be broken down in the following subsections:
- What are word embeddings?
- Why should we use word embeddings?
- How are word embeddings built?
- What are the most popular word embeddings?
Once you are ready, let's start by seeing what word embeddings are.
1) What are word embeddings?
Computers break everything down to numbers.Bits (zeros and ones) more specifically. What happens when a software inside a computer (like a Machine Learning algorithm for example) has to operate or process a word? Simple, this word needs to be given to the computer as the only thing it can understand: as numbers.
In NLP, the most simple way to do this is by creating a vocabulary with a huge amount of words (100.000 words let’s say), and assigning a number to each word in the vocabulary.
The first word in our vocabulary (‘ apple ’ maybe) will be number 0. The second word (‘ banana ’) will be number 1, and so on up to number 99.998, the previous to last word (‘ king ’) and 999.999 being assigned to the last word (‘ queen ’).
Then we represent every word as a vector of length 100.000 , where every single item is a zero except one of them, corresponding to the index of the number that the word is associated with.
This is called one-hot encoding for words.
The one-hot encoding have various different issues related with efficiency and context, that we will see in just a moment.
Word embeddings are just another form representing words through vectors , that successfully solve many of the issues derived from using a one-hot encoding by somehow abstracting the context or high-level meaning of each word.
The main takeaway here is that word embeddings are vectors that represent words, so that similar meaning words have similar vectors.
2) Why should we use word embeddings?
Consider the previous example but with only three words in our vocabulary: ‘apple’, ‘banana’ and ‘king’. The one hot encoding vector representations of these words would be the following.
If we then plotted these word vectors in a 3 dimensional space , we would get a representation like the one shown in the following figure, where each axis represents one of the dimensions that we have, and the icons represent where the end of each word vector would be.
As we can see, the distance from any vector (position of the icons) to all the other ones is the same : two size 1 steps in different directions. This would be the same if we expanded the problem to 100.000 dimensions, taking more steps but maintaining the same distance between all the word vectors.
Ideally, we would want vectors for words that have similar meanings or represent similar items to be close together, and far away from those that have completely different meanings: we want apple to be close to banana but far away from king .
Also, one hot encodings are very inefficient . If you think about it, they are huge empty vectors with only one item having a value different than zero. They are very sparse, and can greatly slow down our calculations.
In conclusion:one hot encodings don’t take into account the context or meaning of the words, all the words vectors have the same distance in between them, and are highly inefficient.
Word embeddings solve these problemsby representing each word in the vocabulary by a fairly small (150, 300, 500 dimensional) fixed size vector, called an embedding, which is learned during the training.
These vectors are created in a manner so that words that appear in similar contexts or have similar meaning are close together , and they are not sparse vectors like the ones derived from one-hot embeddings.
If we had a 2 dimensional word embedding representation of our previous 4 words, and plotted it on a 2D grid, it would look something like the following figure.
以上所述就是小编给大家介绍的《Deep Learning for NLP: Word Embeddings》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
浪潮之巅(上册)
吴军 / 人民邮电出版社 / 2013-5-1 / 35.00元
《浪潮之巅(第2版)(上册)》不是一本科技产业发展历史集,而是在这个数字时代,一本IT人非读不可,而非IT人也应该阅读的作品。一个企业的发展与崛起,绝非只是空有领导强人即可达成。任何的决策、同期的商业环境,都在都影响着企业的兴衰。《浪潮之巅》不只是一本历史书,除了讲述科技顶尖企业的发展规律,对于华尔街如何左右科技公司,以及金融风暴对科技产业的冲击,也多有着墨。此外,《浪潮之巅》也着力讲述很多尚在普......一起来看看 《浪潮之巅(上册)》 这本书的介绍吧!
CSS 压缩/解压工具
在线压缩/解压 CSS 代码
html转js在线工具
html转js在线工具