Elasticsearch v5.1.1安装IK分词

内容简介：Elasticsearch V5.1.1安装IK分词

Elasticsearch默认提供的分词器，会把每个汉字分开，而不是我们想要的根据关键词来分词。例如：

[vagrant@testLnmp2 tmp]$ curl -XGET 'http://localhost:9200/_analyze?pretty' -d '大家好，本站的网站是www.codercto.com'

我们会得到这样的结果：

{
  "tokens" : [ {
    "token" : "大",
    "start_offset" : 0,
    "end_offset" : 1,
    "type" : "<IDEOGRAPHIC>",
    "position" : 0
  }, {
    "token" : "家",
    "start_offset" : 1,
    "end_offset" : 2,
    "type" : "<IDEOGRAPHIC>",
    "position" : 1
  }, {
    "token" : "好",
    "start_offset" : 2,
    "end_offset" : 3,
    "type" : "<IDEOGRAPHIC>",
    "position" : 2
  }, {
    "token" : "本",
    "start_offset" : 4,
    "end_offset" : 5,
    "type" : "<IDEOGRAPHIC>",
    "position" : 3
  }, {
    "token" : "站",
    "start_offset" : 5,
    "end_offset" : 6,
    "type" : "<IDEOGRAPHIC>",
    "position" : 4
  }, {
    "token" : "的",
    "start_offset" : 6,
    "end_offset" : 7,
    "type" : "<IDEOGRAPHIC>",
    "position" : 5
  }, {
    "token" : "网",
    "start_offset" : 7,
    "end_offset" : 8,
    "type" : "<IDEOGRAPHIC>",
    "position" : 6
  }, {
    "token" : "站",
    "start_offset" : 8,
    "end_offset" : 9,
    "type" : "<IDEOGRAPHIC>",
    "position" : 7
  }, {
    "token" : "是",
    "start_offset" : 9,
    "end_offset" : 10,
    "type" : "<IDEOGRAPHIC>",
    "position" : 8
  }, {
    "token" : "www.codercto.com",
    "start_offset" : 10,
    "end_offset" : 26,
    "type" : "<ALPHANUM>",
    "position" : 9
  } ]
}

正常情况下，这不是我们想要的结果，比如我们更希望 “大家好”，“网址”等这样的分词，这样我们就需要安装中文分词插件，ik就是实现这个功能的。

elasticsearch-analysis-ik 是一款中文的分词插件，支持自定义词库。

这里我们来介绍一下 ik 分词插件的安装。

1、下载 ik 插件

这里需要注意 ik 版本要 ES 版本对应

wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.1.1/elasticsearch-analysis-ik-6.1.1.zip

2、解压

unzip elasticsearch-analysis-ik-6.1.1.zip

3、在 ES plugins 目录创建目录

mkdir analysis-ik

4、移动插件代码到 plugins 目录

mv * analysis-ik

5、重启 ES

6、测试

[vagrant@testLnmp2 plugins]$ curl -XGET 'http://localhost:9200/_analyze?pretty&analyzer=ik_max_word' -d '大家好，本站的网站是www.codercto.com'
{
  "tokens" : [
    {
      "token" : "大家好",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "大家",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "好",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_CHAR",
      "position" : 2
    },
    {
      "token" : "本站",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "的",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "CN_CHAR",
      "position" : 4
    },
    {
      "token" : "网站",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "是",
      "start_offset" : 9,
      "end_offset" : 10,
      "type" : "CN_CHAR",
      "position" : 6
    },
    {
      "token" : "www.codercto.com",
      "start_offset" : 10,
      "end_offset" : 26,
      "type" : "LETTER",
      "position" : 7
    },
    {
      "token" : "www",
      "start_offset" : 10,
      "end_offset" : 13,
      "type" : "ENGLISH",
      "position" : 8
    },
    {
      "token" : "codercto",
      "start_offset" : 14,
      "end_offset" : 22,
      "type" : "ENGLISH",
      "position" : 9
    },
    {
      "token" : "com",
      "start_offset" : 23,
      "end_offset" : 26,
      "type" : "ENGLISH",
      "position" : 10
    }
  ]
}

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

XForms Essentials

Micah Dubinko / O'Reilly Media, Inc. / 2003-08-27 / USD 29.95

The use of forms on the Web is so commonplace that most user interactions involve some type of form. XForms - a combination of XML and forms - offers a powerful alternative to HTML-based forms. By pro......一起来看看《XForms Essentials》这本书的介绍吧!

码农工具

CSS 压缩/解压工具

在线压缩/解压 CSS 代码

正则表达式在线测试

RGB HSV 转换

RGB HSV 互转工具