Text Mining for Dummies: Text Classification with Python

栏目: IT技术 · 发布时间: 5年前

内容简介:This short-read shows the common steps of any text mining project. If you want to follow along in a notebook, you canThis goal is not to give an exhaustive overview of text mining, but to quickstart your thinking and give ideas for further enhancements.Ste

The common steps of any NLP project in 20 lines of code

Text Mining for Dummies: Text Classification with Python

This short-read shows the common steps of any text mining project. If you want to follow along in a notebook, you can get the notebook over here .

This goal is not to give an exhaustive overview of text mining, but to quickstart your thinking and give ideas for further enhancements.

Step 1: Data

For teaching purposes, we start with a very very small data set of 6 reviews.

Data often comes from web scraping review websites, because they are good sources of data with at the same time a raw text and a numeric evaluation.

Step 2: Data preparation

The data will often have to be cleaned more than in this example, eg regex, or python string operations.

The real challenge of text mining is converting text to numerical data. This is often done in two steps:

  • Stemming / Lemmatizing: bringing all words back to their ‘base form’ in order to make an easier word count
  • Vectorizing: applying an algorithm that is based on wordcount (more advanced)
  • In this example, I use a LancasterStemmer and a CountVecotrizer, which are well-known and easy-to-use methods.

Step 2a: LancasterStemmer to bring words back to their base form

Text Mining for Dummies: Text Classification with Python

Step 2b: CountVecorizer to apply Bag Of Word (basically a word count) for vectorizing (that means converting text data into numerical data)

Text Mining for Dummies: Text Classification with Python

Step 3: Machine Learning

Since the text has been converted to numeric data, just use any method that you could use on regular data!

Text Mining for Dummies: Text Classification with Python

I hope this short example helps you on your journey. Don’t hesitate to ask any questions in the comments. Thanks for reading!

Link to the complete notebook: over here.


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

数据驱动设计

数据驱动设计

[美]罗谢尔·肯(RochelleKing)、[美]伊丽莎白F.邱吉尔(Elizabeth F Churchill)、Caitlin Tan / 傅婕 / 机械工业出版社 / 2018-8 / 69.00元

本书旨在帮你了解数据引导设计的基本原则,了解数据与设计流程整合的价值,避免常见的陷阱与误区。本书重点关注定量实验与A/B测试,因为我们发现,数据分析与设计实践在此鲜有交集,但相对的潜在价值与机会缺大。本书提供了一些关于在组织中开展数据实践的观点。通过阅读这本书,你将转变你的团队的工作方式,从数据中获得大收益。后希望你可以在衡量指标的选择、佳展示方式与展示时机、测试以及设计意图增强方面,自信地表达自......一起来看看 《数据驱动设计》 这本书的介绍吧!

URL 编码/解码
URL 编码/解码

URL 编码/解码

html转js在线工具
html转js在线工具

html转js在线工具

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具