Rank your things… An AES story

栏目: IT技术 · 发布时间: 4年前

内容简介:Automated essay scoring (AES) is the use of specialized computer programs to assign grades to essays written in an educational setting. Its objective is to classify a large set of textual entities into a small number of discrete categories, corresponding t

Rank your things… An AES story

Automated essay scoring (AES) is the use of specialized computer programs to assign grades to essays written in an educational setting. Its objective is to classify a large set of textual entities into a small number of discrete categories, corresponding to the possible grades. But here, let’s look at it in an industrial problem setting.

Source: link

You are a basketball coach. You have 40 kids under you who want to play ball. You are asked to select a team of 5 tallest kids. Pretty easy! You ask all of them to line up in the decreasing order of their heights and you pick the first 5.

Now say you are comic book writer. Not really a writer, but you are the guy who gets to decide the fancy villain names. You have plenty of interns under you who create villain descriptions and also names the villains for you. They create hundreds of such potential villain characters for your comic and now you have to choose which among them should you even consider for your comics. The tricky bit is that you might want to do the selection based on what your readers like too.

Technically, you want to score each of your potential villains and rank them in the decreasing order of the reader’s affinity score (or rating).

(Ignore the detail of how you have the reader affinity. Assume that the comic God gave you those.)

So, you have all your villains (that eventually got into the comics) and their respective reader affinity scores. Now your task is to use that information (somehow, duh) and rank or score the future villains that your interns create.

In literature, this is called the Automatic Essay Scoring or Automatic Text Scoring problem.

The Approach

This is a domain that is continuously progressing in the research world. So, you’ll be able to find a lot of solutions. Let’s focus on one solution that gets the job done.

One way to think of it is as a prediction problem and try to predict the reader affinity scores. But there is a small issue with that and it may not help in solving our problem. The reader affinity score is for the whole comic and not just for the villain. A person can still give a good score if she likes the plot but hates the villain. If we are trying to predict this score, we’ll need to use a lot more information (like the comic category, month of release, age group targeted etc,) and not just the villain information (like the name, description, etc).

Let’s also note the fact that the predicted score of one villain is not really of use to us because our job is to find the best villains from a pool of villains. Individually, the scores may not make as much sense as they would if they were considered relatively. If we have 100 scores, we can easily know which villain is likely to perform better than the others.

Therefore, we can still proceed with our prediction logic but instead of looking at the predicted scores objectively, we just need to make sure they are correlating with the actual scores. This means that if a villain X is scored higher than villain Y , irrespective of how good or bad our prediction is, if the actual scores also follow the same order or rank then it’s a win.

The Solution

Source: link

To get straight to one solution (out of many, like I said the literature is pretty lit :fire:), we use two specific types of models. Since scoring of text is the task, we need some sort of a text-to-embedding technique to represent our text as vectors. Any text-to-embedding technique can be picked but I’ve chosen the Universal Sentence Encoder .

usemodel = hub.Module('models/sentence_encoder')def get_use_vectors(list_text):
'''
Computing the USE vector representation of list of sentences
@param list_text : list of sentences
'''
messages = list_text
num_batches = math.ceil(len(messages) / BATCH_SIZE)
message_embeddings = []
with tf.Session() as session:
session.run([tf.global_variables_initializer(),
tf.tables_initializer()])
for batch in range(num_batches):
print(batch * batch_size, batch * batch_size + batch_size)
batch_msgs = messages[batch * batch_size: batch * batch_size + batch_size]

message_embeddings_temp = session.run([model_use(batch_msgs)])

message_embeddings.append(message_embeddings_temp)

all_embedding = np.concatenate(tuple(message_embeddings))
return all_embedding1, all_embedding2

This model is used to convert the villain names and their descriptions into vectors and we use these as features (along with other features) in a prediction model. The other features could be categorical such as category of comic, name of author, etc or ordinal features such as number of purchase, price, etc.

These can be one hot encoded and appended to our feature list.

The prediction model is a simple Random Forest Regressor model taken straight out of the sklearn tutorial section.

import pickle
from sklearn.ensemble import RandomForestRegressor

params = {'n_estimators':[20, 50, 100], 'max_depth':[2, 4, 6, 8, None], 'min_samples_split': [2, 4, 6, 8],
'n_jobs': [10]}


rf = RandomForestRegressor(n_estimators = 250, random_state = 42)

grid = GridSearchCV(rf, params)

grid.fit(X_train, y_train)

predictions = grid.predict(X_test)

errors = abs(predictions - y_test)

print(grid.best_score_)
print(grid.best_estimator_)

This gives us a model that is trained on our past historic data that predicts how well a villain name/description would perform. Technically, it’s a user affinity score predictor. But, like we discussed, since we aren’t using all possible and available features to predict this score and since we aren’t treating this as a user affinity score predictor model, the final predictions that we get will be inaccurate. But if the scores give us a relative indication about the performance of two or more villains, it’ll help us pick the top villains.

The Metrics

Cohen’s Kappa Score is usually used as a metric to identify how close our ranking or ordering of the predictions is when compared to the actual ordering. But, this metric assumes the predictions to be categories such as marks (0 to 5). We have a more continuous prediction and hence this metric wouldn’t work well for us.

For this, we can use simple Spearman and Pearson Correlations.

Plotting the actual vs the predicted scores plot gives a good idea if our predictions are following the right trend or not.

The correlation coefficients corresponding to the predictions to the left are:

Pearson: 0.65, pvalue = 2.14 e-92 | Spearman 0.60, pvalue =8.13 e-123


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

长尾理论

长尾理论

[美]克里斯·安德森 / 中信出版集团股份有限公司 / 2015-8-1 / 59.00元

互联网时代,大众市场不再一统天下,小众市场也可以呼风唤雨。 在《长尾理论》一书中,克里斯·安德森详细阐释了长尾的精华所在,指出商业和文化的未来不在于传统需求曲线上那个代表“畅销商品”的头部,而是那条代表“冷门商品”的经常被人遗忘的长尾。尽管我们仍然对热门商品着迷,但它们对消费者的吸引力已经大不如从前,因为市场已经大大分化。黄金电视节目的收视率几十年来一直在萎缩,若是在七八十年代,现在的一档最......一起来看看 《长尾理论》 这本书的介绍吧!

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具