开源一个文本分析项目

内容简介：开源一个文本分析项目

Github

https://github.com/sea-boat/TextAnalyzer

TextAnalyzer

a text analizer that can analyze text. so far, it can extract hot words in a text segment by using tf-idf algorithm,at the same time using a score factor to optimize the final score.

also it provides machine learning to make a classification.

Features

extracting hot words from a text.

to gather statistics via frequence.
to gather statistics via by tf-idf algorithm
to gather statistics via a score factor additionally.

synonym can be recognized

SVM Classificator

this analyzer supports to classify text by svm. it involves vectoring the text. we can train the samples and then make a classification by the model.

for convenience,the model,tfidf and vector will be stored.

kmeans clustering && xmeans clustering

this analyzer supports to clustering text by kmeans and xmeans.

vsm clustering

this analyzer supports to clustering text by vsm.

Dependence

https://github.com/sea-boat/IKAnalyzer-Mirror.git

TODO

other ml algorithms.
emotion analization.

How to use

just simple like this

extracting hot words

indexing a document and get a docId.

long docId = TextIndexer.index(text);

extracting by docId.

HotWordExtractor extractor = new HotWordExtractor();
 List<Result> list = extractor.extract(0, 20, false);
 if (list != null) for (Result s : list)
    System.out.println(s.getTerm() + " : " + s.getFrequency() + " : " + s.getScore());

a result contains term,frequency and score.

失业证 : 1 : 0.31436604
户口 : 1 : 0.30099702
单位 : 1 : 0.29152703
提取 : 1 : 0.27927202
领取 : 1 : 0.27581802
职工 : 1 : 0.27381304
劳动 : 1 : 0.27370203
关系 : 1 : 0.27080503
本市 : 1 : 0.27080503
终止 : 1 : 0.27080503

SVM classificator

training the samples.

SVMTrainer trainer = new SVMTrainer();
trainer.train();

predicting text classification.

double[] data = trainer.getWordVector(text);
trainer.predict(data);

kmeans clustering && xmeans clustering

List<String> list = DataReader.readContent(KMeansCluster.DATA_FILE);
int[] labels = new KMeansCluster().learn(list);

vsm clustering

List<String> list = DataReader.readContent(VSMCluster.DATA_FILE);
List<String> labels = new VSMCluster().learn(list);

==========广告时间==========

鄙人的新书《Tomcat内核设计剖析》已经在京东预售了，有需要的朋友可以到 https://item.jd.com/12185360.html 进行预定。感谢各位朋友。

=========================

欢迎关注：

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

JAVA 2核心技术卷Ⅰ

[美] 霍斯特曼、[美] 科奈尔 / 叶乃文、邝劲筠等 / 机械工业出版社 / 2006-5 / 88.00元

本书是Java技术经典参考书，多年畅销不衰，第7版在保留以前版本风格的基础上，涵盖Java2开发平台标准版J2SE5.0的基础知识，主要内容包括面各对象程序设计、反射与代理、接口与内部类、事件监听器模型、使用Swing UI工具箱进行图形用户界面设计，异常处理、流输入/输出和对象序列化、泛型程序设计等。本书内容翔实、深入浅出，附有大量程序实例，极具实用价值，是Java初学者和Java程序员......一起来看看《JAVA 2核心技术卷Ⅰ》这本书的介绍吧!

码农工具

开源一个文本分析项目

Github

TextAnalyzer

Features

extracting hot words from a text.

synonym can be recognized

SVM Classificator

kmeans clustering && xmeans clustering

vsm clustering

Dependence

TODO

How to use

just simple like this

extracting hot words

SVM classificator

kmeans clustering && xmeans clustering

vsm clustering

JAVA 2核心技术卷Ⅰ

JSON 在线解析

UNIX 时间戳转换

HSV CMYK 转换工具

开源一个文本分析项目

Github

TextAnalyzer

Features

extracting hot words from a text.

synonym can be recognized

SVM Classificator

kmeans clustering && xmeans clustering

vsm clustering

Dependence

TODO

How to use

just simple like this

extracting hot words

SVM classificator

kmeans clustering && xmeans clustering

vsm clustering

JAVA 2核心技术 卷Ⅰ

JSON 在线解析

UNIX 时间戳转换

HSV CMYK 转换工具

JAVA 2核心技术卷Ⅰ