内容简介:This short-read shows the common steps of any text mining project. If you want to follow along in a notebook, you canThis goal is not to give an exhaustive overview of text mining, but to quickstart your thinking and give ideas for further enhancements.Ste
The common steps of any NLP project in 20 lines of code
Mar 8 ·2min read
This short-read shows the common steps of any text mining project. If you want to follow along in a notebook, you can get the notebook over here .
This goal is not to give an exhaustive overview of text mining, but to quickstart your thinking and give ideas for further enhancements.
For teaching purposes, we start with a very very small data set of 6 reviews.
Data often comes from web scraping review websites, because they are good sources of data with at the same time a raw text and a numeric evaluation.
Step 2: Data preparation
The data will often have to be cleaned more than in this example, eg regex, or python string operations.
The real challenge of text mining is converting text to numerical data. This is often done in two steps:
- Stemming / Lemmatizing: bringing all words back to their ‘base form’ in order to make an easier word count
- Vectorizing: applying an algorithm that is based on wordcount (more advanced)
- In this example, I use a LancasterStemmer and a CountVecotrizer, which are well-known and easy-to-use methods.
Step 2a: LancasterStemmer to bring words back to their base form
Step 2b: CountVecorizer to apply Bag Of Word (basically a word count) for vectorizing (that means converting text data into numerical data)
Step 3: Machine Learning
Since the text has been converted to numeric data, just use any method that you could use on regular data!
I hope this short example helps you on your journey. Don’t hesitate to ask any questions in the comments. Thanks for reading!
Link to the complete notebook: over here.
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
VISUAL BASIC 6.0 WINDOWS API讲座
王国荣 / 人民邮电出版社 / 1999-06-01 / 76.00元
本书全面介绍了在Visual Basic 6.0中如何调用Windows API的技术,特别是结合读者在应用中经常遇到的具体问题编写了许多应用范例,书中还给出了API函数的速查表。本书主要内容包括: Windows API的基本概念和调用方法,资源文件的使用,Windows的消息系统及其应用,API在绘图中的应用,多媒体文件的播放,特殊命令按钮的制作等。 本书适用于已熟悉Visual Basic的一起来看看 《VISUAL BASIC 6.0 WINDOWS API讲座》 这本书的介绍吧!