Reflecting and Comparing different Sentiment Classification Models for Restaurant Reviews

栏目: IT技术 · 发布时间: 5年前

内容简介:Models used were simple machine learning techniques to deep learning methods for Sentiment Classification. The best result obtained was from Feed Forward neural network with using BOW as input feature. CNN and Logistic Regression were also quite closer to

Comparison of Models

Models used were simple machine learning techniques to deep learning methods for Sentiment Classification. The best result obtained was from Feed Forward neural network with using BOW as input feature. CNN and Logistic Regression were also quite closer to it in terms of average accuracy. This is a good way of testing where we found out that using the complex method like CNN can be useful to extract complex features, but for sentiment classification having some set of words present in the sentence was indicative of sentiment of review. Hence, CNN did not add more value to the model as such. If the dataset included lot of complex phrases depicting the sentiment, then CNN would have been useful to extract those. Using more resources for CNN like complex method might not be the best solution for this case. It is good to test it and see how the results are for smaller dataset and you can also test it on complete dataset. If it is better than other simple approaches then I choose it for production otherwise I usually prefer a less resource heavy method which can give good enough results.

Following chart, I have listed all the average accuracies obtained for different methods implemented in the previous posts for sentiment classification.

Results from previous experiments for Sentiment Classification

Word2Vec embeddings capturing the semantics was not that useful compared to using simple BOW or TF-IDF in terms of accuracy improvement. As you can see, it can tremendously reduce the number of features being used for machine learning models. In the example used, BOW/TF-IDF had feature size of 30056 as opposed to 1000 for Word2Vec. Simple Word2Vec averaged vectors were used for classification of reviews and this turned out to be giving better results compared to using Doc2Vec. This always might not be the case since Doc2Vec is more superior in terms of algorithm to generalize for a document as a whole. Doc2Vec generates an efficient and high-quality distributed vectors which captures the precise syntactic and semantic word relationships ( ref ). Hence, it is always a good decision to test the two methods and compare the results.

Between BOW and TF-IDF, it turns out that BOW performed better than TF-IDF although not significantly but clearly using TF-IDF model did not improve the classification model. TF-IDF involves more computations than BOW hence it is good to test those before settling on one method. TF-IDF reduces the weights of the words which occur frequently across the whole corpus and are not unique to documents. This method does not seem to work well for sentiment classification problems related to reviews where positive and negative words used are appearing in most of the documents. When used as input to the Feed Forward Neural Network or Logistic Regression, you can save some time in training with multiple methods like BOW, TF-IDF, Word2Vec or Doc2Vec if you have tested and compared them using simple classification models. If you want to experiment further, try using these as inputs to the models in PyTorch as input.

Feed Forward Neural Network performed well for the given data. For neural networks, different activation, optimization, loss functions as mentioned in the previous posts can be tested to see if this result improves. You could try testing with different learning rate, epochs, other non-linear activation functions like tanh , sigmoid etc. , other optimization algorithms like Adam , RMSProp etc. There is lot of room for experimentation and obtaining better results than achieved in the previous posts.

I have shown in the experiments how to quickly prototype your models for Sentiment Classification problems. In this post, I have shown how I compared and planned to use various methods in different sequence to get the best result possible.

This is the end of the Sentiment Classification Series. I will aim to group relevant topics and this was my first series. I will continue publishing as I am exploring new topics. Watch this space!

As always — Happy experimenting and exploring!


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

计算机系统基础

计算机系统基础

袁春风 / 机械工业出版社 / 2014-7-1 / CNY 49.00

《计算机类专业系统能力培养系列教材:计算机系统基础》主要介绍与计算机系统相关的核心概念,解释这些概念如何相互关联并最终影响程序执行的结果和性能。共分8章,主要内容包括数据的表示和运算、程序的转换及机器级表示、程序的链接、程序的执行、存储器层次结构、虚拟存储器、异常控制流和I/O操作的实现等。内容详尽,反映现实,概念清楚,通俗易懂,实例丰富,并提供大量典型习题供读者练习。本书可以作为计算机专业本科或......一起来看看 《计算机系统基础》 这本书的介绍吧!

在线进制转换器
在线进制转换器

各进制数互转换器

URL 编码/解码
URL 编码/解码

URL 编码/解码

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器