Rank your things… An AES story

栏目: IT技术 · 发布时间: 5年前

内容简介：Automated essay scoring (AES) is the use of specialized computer programs to assign grades to essays written in an educational setting. Its objective is to classify a large set of textual entities into a small number of discrete categories, corresponding t

Rank your things… An AES story

Automated essay scoring (AES) is the use of specialized computer programs to assign grades to essays written in an educational setting. Its objective is to classify a large set of textual entities into a small number of discrete categories, corresponding to the possible grades. But here, let’s look at it in an industrial problem setting.

Source: link

You are a basketball coach. You have 40 kids under you who want to play ball. You are asked to select a team of 5 tallest kids. Pretty easy! You ask all of them to line up in the decreasing order of their heights and you pick the first 5.

Now say you are comic book writer. Not really a writer, but you are the guy who gets to decide the fancy villain names. You have plenty of interns under you who create villain descriptions and also names the villains for you. They create hundreds of such potential villain characters for your comic and now you have to choose which among them should you even consider for your comics. The tricky bit is that you might want to do the selection based on what your readers like too.

Technically, you want to score each of your potential villains and rank them in the decreasing order of the reader’s affinity score (or rating).

(Ignore the detail of how you have the reader affinity. Assume that the comic God gave you those.)

So, you have all your villains (that eventually got into the comics) and their respective reader affinity scores. Now your task is to use that information (somehow, duh) and rank or score the future villains that your interns create.

In literature, this is called the Automatic Essay Scoring or Automatic Text Scoring problem.

The Approach

This is a domain that is continuously progressing in the research world. So, you’ll be able to find a lot of solutions. Let’s focus on one solution that gets the job done.

One way to think of it is as a prediction problem and try to predict the reader affinity scores. But there is a small issue with that and it may not help in solving our problem. The reader affinity score is for the whole comic and not just for the villain. A person can still give a good score if she likes the plot but hates the villain. If we are trying to predict this score, we’ll need to use a lot more information (like the comic category, month of release, age group targeted etc,) and not just the villain information (like the name, description, etc).

Let’s also note the fact that the predicted score of one villain is not really of use to us because our job is to find the best villains from a pool of villains. Individually, the scores may not make as much sense as they would if they were considered relatively. If we have 100 scores, we can easily know which villain is likely to perform better than the others.

Therefore, we can still proceed with our prediction logic but instead of looking at the predicted scores objectively, we just need to make sure they are correlating with the actual scores. This means that if a villain X is scored higher than villain Y , irrespective of how good or bad our prediction is, if the actual scores also follow the same order or rank then it’s a win.

The Solution

Source: link

To get straight to one solution (out of many, like I said the literature is pretty lit :fire:), we use two specific types of models. Since scoring of text is the task, we need some sort of a text-to-embedding technique to represent our text as vectors. Any text-to-embedding technique can be picked but I’ve chosen the Universal Sentence Encoder .

usemodel = hub.Module('models/sentence_encoder')def get_use_vectors(list_text):
'''
 Computing the USE vector representation of list of sentences
 @param list_text : list of sentences
'''
 messages = list_text 
 num_batches = math.ceil(len(messages) / BATCH_SIZE)  message_embeddings = []
with tf.Session() as session:
 session.run([tf.global_variables_initializer(),
 tf.tables_initializer()]) 
for batch in range(num_batches):
print(batch * batch_size, batch * batch_size + batch_size)
 batch_msgs = messages[batch * batch_size: batch * batch_size + batch_size]

 message_embeddings_temp = session.run([model_use(batch_msgs)]) 

 message_embeddings.append(message_embeddings_temp)

 all_embedding = np.concatenate(tuple(message_embeddings))
return all_embedding1, all_embedding2

This model is used to convert the villain names and their descriptions into vectors and we use these as features (along with other features) in a prediction model. The other features could be categorical such as category of comic, name of author, etc or ordinal features such as number of purchase, price, etc.

These can be one hot encoded and appended to our feature list.

The prediction model is a simple Random Forest Regressor model taken straight out of the sklearn tutorial section.

import pickle
from sklearn.ensemble import RandomForestRegressor

params = {'n_estimators':[20, 50, 100], 'max_depth':[2, 4, 6, 8, None], 'min_samples_split': [2, 4, 6, 8],
 'n_jobs': [10]}


rf = RandomForestRegressor(n_estimators = 250, random_state = 42)

grid = GridSearchCV(rf, params)

grid.fit(X_train, y_train)

predictions = grid.predict(X_test)

errors = abs(predictions - y_test)

print(grid.best_score_)
print(grid.best_estimator_)

This gives us a model that is trained on our past historic data that predicts how well a villain name/description would perform. Technically, it’s a user affinity score predictor. But, like we discussed, since we aren’t using all possible and available features to predict this score and since we aren’t treating this as a user affinity score predictor model, the final predictions that we get will be inaccurate. But if the scores give us a relative indication about the performance of two or more villains, it’ll help us pick the top villains.

The Metrics

Cohen’s Kappa Score is usually used as a metric to identify how close our ranking or ordering of the predictions is when compared to the actual ordering. But, this metric assumes the predictions to be categories such as marks (0 to 5). We have a more continuous prediction and hence this metric wouldn’t work well for us.

For this, we can use simple Spearman and Pearson Correlations.

Plotting the actual vs the predicted scores plot gives a good idea if our predictions are following the right trend or not.

The correlation coefficients corresponding to the predictions to the left are:

Pearson: 0.65, pvalue = 2.14 e-92 | Spearman 0.60, pvalue =8.13 e-123

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Rank your things… An AES story

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

程序员代码面试指南：IT名企算法与数据结构题目最优解

左程云 / 电子工业出版社 / 2015-9 / 79.00元

这是一本程序员面试宝典！书中对IT名企代码面试各类题目的最优解进行了总结，并提供了相关代码实现。针对当前程序员面试缺乏权威题目汇总这一痛点，本书选取将近200道真实出现过的经典代码面试题，帮助广大程序员的面试准备做到万无一失。“刷”完本书后，你就是“题王”！__eol__本书采用题目+解答的方式组织内容，并把面试题类型相近或者解法相近的题目尽量放在一起，读者在学习本书时很容易看出面试题解法之间的联......一起来看看《程序员代码面试指南：IT名企算法与数据结构题目最优解》这本书的介绍吧!

码农工具