A Case For Embeddings In Recommendation Problems

栏目: IT技术 · 发布时间: 5年前

内容简介:Once you have worked on different machine learning problems, most things in the field start to feel very similar. You take your raw input data, map it to a different latent space with fewer dimensions, and then perform your classification/regression/cluste

Once you have worked on different machine learning problems, most things in the field start to feel very similar. You take your raw input data, map it to a different latent space with fewer dimensions, and then perform your classification/regression/clustering. Recommender systems, new and old, are no different. In the classic collaborative filtering problem, you factorize your partially filled usage matrix to learn user-factors and item-factors , and try to predict user ratings with a dot-product of the factors.

A Case For Embeddings In Recommendation Problems

This has worked well for many people at different companies and I have also had successes with it firsthand at Flipboard . And of course, people try to incorporate more signals into this model to get better performance for cold-start and other domain specific problems .

However, I didn’t really care about using fancy deep learning techniques for my recommendation problems until a friend asked me a very simple question at a conference a few years ago. If I recall correctly, he questioned my use of a certain regularizer , and I soon realized that his clever suggestion required me to go back to the whiteboard, recompute all the gradients and optimization steps, and essentially reimplement the core algorithm from scratch to test a relatively straightforward modification - I wasn’t writing PyTorch code, I only wished I did.

Enter AutoGrad and Embeddings

So, as it turns out the classic matrix-factorization problem can be formulated as a deep learning problem if you just think of the user-factors and item-factors as embeddings . An embedding is simply a mapping of a discrete valued list to a real valued lower dimensional vector ( cough ). Looking at the problem from this perspective gives you a lot more modelling flexibility thanks to the number of great autograd software out there. If you randomly initialize these embeddings, and define mean-squared-error as your loss, backpropagation would get you embeddings that would be very similar to what you would get with matrix factorization.

A Case For Embeddings In Recommendation Problems

But as Justin Basilico showed in his informative ICML workshop talk , modelling the problem as a deep feed-forward network makes the learning task a lot trickier. Due to having more parameters and hyper-parameters, it requires more compute while only providing questionable improvements for the actual task. So why should we bother thinking of the problem in this way?

I would argue that modelling flexibility and experimentation ease are nothing to be scoffed at. This perspective allows you to incorporate all sorts of data into this framework fairly easily. Recommendation is also more than just predicting user-ratings, and you can solve many other recommendation problems such as sequence-aware recommendations a lot easier. Not to mention, because of autograd software, you end up with much shorter code that allows you to tweak things a lot quicker. I like optimizing my matrix-factorization with conjugate gradient as much as everyone else but please don’t ask me to recompute my CG steps after you add some new data and change your regularizer in the year 2020.

Other Ways To Learn Embeddings

The other great thing about embeddings is that there are several different ways of learning this mapping. If you don’t want to learn embeddings through random initialization and backpropagation from an input matrix, one very common approach is Skip-gram with negative-sampling . This method has been extremely popular in natural language processing, and has also been successful in creating embeddings from non-textual sequences such as graph-nodes , video games and Pinterest pins .

The core idea in skip-gram with negative sampling is to create a dataset with positive examples by sliding a context-window through a sequence and creating pairs of items that co-occur with a central item, and also generating negative data by random sampling items from the entire corpus, and create pairs of items that do not usually co-occur in the same window.

A Case For Embeddings In Recommendation Problems

Once you have the dataset with both postive and negative examples, you simply train a classifier with a deep neural network and learn your embeddings . In this formulation, things that co-occur close to each other would have similar embeddings, which is usually what we need for most search and recommendation tasks.

For recommendation, there are many different ways to create these sequences. Airbnb has a great paper on how they collect sequences of listings based on a user’s sequential clicks of listings during a search/booking session to learn item-embeddings. Alibaba has another interesting way where they maintain an item-item interaction graph, where an edge from an item A to B indicates how often a user clicked on an item B after an item A, and then use random walks in the graph to generate sequences.

So What’s So Cool About These Embeddings?

In addition to the task at hand that each of those representations help solve (such as finding similar items ), they are modular and amenable for transfer learning . One great thing about deep learning has been that you almost never have to start solving a problem from scratch, and all these different embeddings act as great places to start for new problems. If you wanted to build a new classifier (say, a spam detector), you could use your item embeddings as a starting point, and would be able to train a model much quicker with some basic fine-tuning .

These modular mappings to latent spaces have been extremely useful for me, and in addition to solving some recommendation problems, I have also been able to reuse and fine-tune these embeddings and solve many different end-tasks. Storing these embeddings in a centralized model storage further helps teams reduce redundancy and provides them with good foundations to build on for many problems.

While I hadn’t initially bought into the whole deep learning for recommender systems craze, I am starting to see beyond just the minimal performance gains on the original task, and highly recommend everyone to play around with this (still relatively new) paradigm in recommender systems!


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

企业应用架构模式

企业应用架构模式

Martin Fowler / 王怀民、周斌 / 机械工业出版社 / 2010-4 / 59.00元

《企业应用架构模式》作者是当今面向对象软件开发的权威,他在一组专家级合作者的帮助下,将40多种经常出现的解决方案转化成模式,最终写成这本能够应用于任何一种企业应用平台的、关于解决方案的、不可或缺的手册。《企业应用架构模式》获得了2003年度美国软件开发杂志图书类的生产效率奖和读者选择奖。《企业应用架构模式》分为两大部分。第一部分是关于如何开发企业应用的简单介绍。第二部分是《企业应用架构模式》的主体......一起来看看 《企业应用架构模式》 这本书的介绍吧!

在线进制转换器
在线进制转换器

各进制数互转换器

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

MD5 加密
MD5 加密

MD5 加密工具