Elasticsearch v5.1.1安装IK分词

栏目: 编程工具 · 后端 · 码农笔记 · 发布时间: 8年前

内容简介:Elasticsearch V5.1.1安装IK分词

Elasticsearch默认提供的分词器,会把每个汉字分开,而不是我们想要的根据关键词来分词。例如:

[vagrant@testLnmp2 tmp]$ curl -XGET 'http://localhost:9200/_analyze?pretty' -d '大家好,本站的网站是www.codercto.com'

我们会得到这样的结果:

{
  "tokens" : [ {
    "token" : "大",
    "start_offset" : 0,
    "end_offset" : 1,
    "type" : "<IDEOGRAPHIC>",
    "position" : 0
  }, {
    "token" : "家",
    "start_offset" : 1,
    "end_offset" : 2,
    "type" : "<IDEOGRAPHIC>",
    "position" : 1
  }, {
    "token" : "好",
    "start_offset" : 2,
    "end_offset" : 3,
    "type" : "<IDEOGRAPHIC>",
    "position" : 2
  }, {
    "token" : "本",
    "start_offset" : 4,
    "end_offset" : 5,
    "type" : "<IDEOGRAPHIC>",
    "position" : 3
  }, {
    "token" : "站",
    "start_offset" : 5,
    "end_offset" : 6,
    "type" : "<IDEOGRAPHIC>",
    "position" : 4
  }, {
    "token" : "的",
    "start_offset" : 6,
    "end_offset" : 7,
    "type" : "<IDEOGRAPHIC>",
    "position" : 5
  }, {
    "token" : "网",
    "start_offset" : 7,
    "end_offset" : 8,
    "type" : "<IDEOGRAPHIC>",
    "position" : 6
  }, {
    "token" : "站",
    "start_offset" : 8,
    "end_offset" : 9,
    "type" : "<IDEOGRAPHIC>",
    "position" : 7
  }, {
    "token" : "是",
    "start_offset" : 9,
    "end_offset" : 10,
    "type" : "<IDEOGRAPHIC>",
    "position" : 8
  }, {
    "token" : "www.codercto.com",
    "start_offset" : 10,
    "end_offset" : 26,
    "type" : "<ALPHANUM>",
    "position" : 9
  } ]
}

正常情况下,这不是我们想要的结果,比如我们更希望 “大家好”,“网址”等这样的分词,这样我们就需要安装中文分词插件,ik就是实现这个功能的。

elasticsearch-analysis-ik 是一款中文的分词插件,支持自定义词库。

这里我们来介绍一下 ik 分词插件的安装。

1、下载 ik 插件

这里需要注意 ik 版本要 ES 版本对应

wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.1.1/elasticsearch-analysis-ik-6.1.1.zip

2、解压

unzip elasticsearch-analysis-ik-6.1.1.zip

3、在 ES plugins 目录创建目录

mkdir analysis-ik

4、移动插件代码到 plugins 目录

mv * analysis-ik

5、重启 ES

6、测试

[vagrant@testLnmp2 plugins]$ curl -XGET 'http://localhost:9200/_analyze?pretty&analyzer=ik_max_word' -d '大家好,本站的网站是www.codercto.com'
{
  "tokens" : [
    {
      "token" : "大家好",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "大家",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "好",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_CHAR",
      "position" : 2
    },
    {
      "token" : "本站",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "的",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "CN_CHAR",
      "position" : 4
    },
    {
      "token" : "网站",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "是",
      "start_offset" : 9,
      "end_offset" : 10,
      "type" : "CN_CHAR",
      "position" : 6
    },
    {
      "token" : "www.codercto.com",
      "start_offset" : 10,
      "end_offset" : 26,
      "type" : "LETTER",
      "position" : 7
    },
    {
      "token" : "www",
      "start_offset" : 10,
      "end_offset" : 13,
      "type" : "ENGLISH",
      "position" : 8
    },
    {
      "token" : "codercto",
      "start_offset" : 14,
      "end_offset" : 22,
      "type" : "ENGLISH",
      "position" : 9
    },
    {
      "token" : "com",
      "start_offset" : 23,
      "end_offset" : 26,
      "type" : "ENGLISH",
      "position" : 10
    }
  ]
}

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Essential PHP Security

Essential PHP Security

Chris Shiflett / O'Reilly Media / 2005-10-13 / USD 29.95

Being highly flexible in building dynamic, database-driven web applications makes the PHP programming language one of the most popular web development tools in use today. It also works beautifully wit......一起来看看 《Essential PHP Security》 这本书的介绍吧!

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具