内容简介:In this post, we will unveil the magic behind them, see what they are, why they have become a standard in theEverything will be explained in a
Simple and intuitive Word Embeddings
Jun 13 ·8min read
Introduction
Word embeddings have become one of the most used tools and main drivers of the amazing achievements of Artificial Intelligence tasks that require processing natural languages like speech or texts.
In this post, we will unveil the magic behind them, see what they are, why they have become a standard in the Natural Language Processing (NLP hereinafter) world, how they are built, and explore some of the most used word embedding algorithms.
Everything will be explained in a simple and intuitive manner , avoiding complex maths and trying to make the content of the post as accessible as possible.
It will be broken down in the following subsections:
- What are word embeddings?
- Why should we use word embeddings?
- How are word embeddings built?
- What are the most popular word embeddings?
Once you are ready, let's start by seeing what word embeddings are.
1) What are word embeddings?
Computers break everything down to numbers.Bits (zeros and ones) more specifically. What happens when a software inside a computer (like a Machine Learning algorithm for example) has to operate or process a word? Simple, this word needs to be given to the computer as the only thing it can understand: as numbers.
In NLP, the most simple way to do this is by creating a vocabulary with a huge amount of words (100.000 words let’s say), and assigning a number to each word in the vocabulary.
The first word in our vocabulary (‘ apple ’ maybe) will be number 0. The second word (‘ banana ’) will be number 1, and so on up to number 99.998, the previous to last word (‘ king ’) and 999.999 being assigned to the last word (‘ queen ’).
Then we represent every word as a vector of length 100.000 , where every single item is a zero except one of them, corresponding to the index of the number that the word is associated with.
This is called one-hot encoding for words.
The one-hot encoding have various different issues related with efficiency and context, that we will see in just a moment.
Word embeddings are just another form representing words through vectors , that successfully solve many of the issues derived from using a one-hot encoding by somehow abstracting the context or high-level meaning of each word.
The main takeaway here is that word embeddings are vectors that represent words, so that similar meaning words have similar vectors.
2) Why should we use word embeddings?
Consider the previous example but with only three words in our vocabulary: ‘apple’, ‘banana’ and ‘king’. The one hot encoding vector representations of these words would be the following.
If we then plotted these word vectors in a 3 dimensional space , we would get a representation like the one shown in the following figure, where each axis represents one of the dimensions that we have, and the icons represent where the end of each word vector would be.
As we can see, the distance from any vector (position of the icons) to all the other ones is the same : two size 1 steps in different directions. This would be the same if we expanded the problem to 100.000 dimensions, taking more steps but maintaining the same distance between all the word vectors.
Ideally, we would want vectors for words that have similar meanings or represent similar items to be close together, and far away from those that have completely different meanings: we want apple to be close to banana but far away from king .
Also, one hot encodings are very inefficient . If you think about it, they are huge empty vectors with only one item having a value different than zero. They are very sparse, and can greatly slow down our calculations.
In conclusion:one hot encodings don’t take into account the context or meaning of the words, all the words vectors have the same distance in between them, and are highly inefficient.
Word embeddings solve these problemsby representing each word in the vocabulary by a fairly small (150, 300, 500 dimensional) fixed size vector, called an embedding, which is learned during the training.
These vectors are created in a manner so that words that appear in similar contexts or have similar meaning are close together , and they are not sparse vectors like the ones derived from one-hot embeddings.
If we had a 2 dimensional word embedding representation of our previous 4 words, and plotted it on a 2D grid, it would look something like the following figure.
以上所述就是小编给大家介绍的《Deep Learning for NLP: Word Embeddings》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Java RESTful Web Service实战
韩陆 / 机械工业出版社 / 2014-10-1 / 69.00
国内首本完整、深度地讲解如何基于Java标准规范实现REST风格的Web服务的专著,阿里巴巴Java技术专家12年开发经验结晶,3位业内著名技术专家联袂推荐!不仅深刻解读了最新的JAX-RS标准和其API设计,以及Jersey的使用要点和实现原理,而且系统讲解了REST的基本理论,更重要的是从实践角度深度讲解了如何基于Jersey实现完整的、安全的、高性能的REST式的Web服务。 《Jav......一起来看看 《Java RESTful Web Service实战》 这本书的介绍吧!