elasticsearch学习笔i记（二十五）——Elasticsearch mapping详解以及索引内部原理

栏目: 后端 · 发布时间: 7年前

内容简介：下面先简单描述一下mapping是什么？当我们插入几条数据，让ES自动为我们建立一个索引查看mapping

下面先简单描述一下mapping是什么？

当我们插入几条数据，让ES自动为我们建立一个索引

PUT /website/_doc/1
{
  "post_date": "2017-01-01",
  "title": "my first article",
  "content": "this is my first article in this website",
  "author_id": 11400
}
PUT /website/_doc/2
{
  "post_date": "2017-01-02",
  "title": "my second article",
  "content": "this is my second article in this website",
  "author_id": 11400
}
PUT /website/_doc/3
{
  "post_date": "2017-01-03",
  "title": "my third article",
  "content": "this is my third article in this website",
  "author_id": 11400
}

查看mapping

GET /website/_mapping
{
  "website" : {
    "mappings" : {
      "properties" : {
        "author_id" : {
          "type" : "long"
        },
        "content" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "post_date" : {
          "type" : "date"
        },
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

上面是插入数据自动生成的mapping，还有手动生成的mapping。这种自动或手动为index中的type建立的一种数据结构和相关配置，称为mapping。

下面是手动创建的mapping。

PUT /test_mapping
{
  "mappings" : {
    "properties" : {
      "author_id" : {
        "type" : "long"
      },
      "content" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      },
      "post_date" : {
        "type" : "date"
      },
      "title" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 256
          }
        }
      }
    }
  }
}

1、精确匹配与全文搜索的对比分析

（1）exact value

也就是某个field必须全部匹配才能返回相应的document

示例:

GET /website/_search?q=post_date:2017
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

GET /website/_search?q=post_date:2017-01-01
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "website",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "post_date" : "2017-01-01",
          "title" : "my first article",
          "content" : "this is my first article in this website",
          "author_id" : 11400
        }
      }
    ]
  }
}

（2）full text

full text与exact value不一样，不是说单纯的只是匹配完整的一个值，而是可以对值进行拆分词语后（分词）进行匹配，也可以通过缩写、时态、大小写、同义词等进行匹配。

示例：

GET /website/_search?q=title:article
{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.087011375,
    "hits" : [
      {
        "_index" : "website",
        "_type" : "doc",
        "_id" : "1",
        "_score" : 0.087011375,
        "_source" : {
          "post_date" : "2017-01-01",
          "title" : "my first article",
          "content" : "this is my first article in this website",
          "author_id" : 11400
        }
      },
      {
        "_index" : "website",
        "_type" : "doc",
        "_id" : "2",
        "_score" : 0.087011375,
        "_source" : {
          "post_date" : "2017-01-02",
          "title" : "my second article",
          "content" : "this is my second in this website",
          "author_id" : 11400
        }
      },
      {
        "_index" : "website",
        "_type" : "doc",
        "_id" : "3",
        "_score" : 0.087011375,
        "_source" : {
          "post_date" : "2017-01-03",
          "title" : "my third article",
          "content" : "this is my third in this website",
          "author_id" : 11400
        }
      }
    ]
  }
}

2、倒排索引核心原理

下面演示一下倒排索引简单建立的过程，当然实际中倒排索引的建立过程会非常的复杂。

doc1: I really liked my small dogs, and I think my mom also liked them.

doc2: He never liked any dogs, so I hope that my mom will not expect me to liked him.

分词，初步的倒排索引的建立

word    doc1    doc2
I        *        *
really   *
liked    *        *
my       *        *
small    *
dogs     *
and      *
think    *
mom      *        *
also     *        
them     *
He                *
never             *
any               *
so                *
hope              *
that              *
will              *
not               *
expect            *
me                *
to                *
him               *

搜索 mother like little dog, 不会有任何结果

mother

little

dog

这肯定不是我们想要的结果。比如mother和mom其实根本就没有区别。但是却检索不到。但是做下测试发现ES是可以查到的。实际上ES在建立倒排索引的时候，还会执行一个操作，就是会对拆分的各个单词进行相应的处理，以提升后面搜索的时候能够搜索到相关联的文档的概率。像时态的转换，单复数的转换，同义词的转换，大小写的转换。这个过程称为正则化（normalization）

mother-> mom

liked -> like

small -> little

dogs -> dog

这样重新建立倒排索引：

word    doc1    doc2
I        *        *
really   *
like     *        *
my       *        *
little   *
dog      *
and      *
think    *
mom      *        *
also     *        
them     *
He                *
never             *
any               *
so                *
hope              *
that              *
will              *
not               *
expect            *
me                *
to                *
him               *

查询：mother like little dog 分词正则化

mother -> mom

like -> like

little -> little

dog -> dog

doc1和doc2都会搜索出来

doc1：I really liked my small dogs, and I think my mom also liked them.

doc2：He never liked any dogs, so I hope that my mom will not expect me to liked him.

3、对mapping进一步总结

（1）往ES里面直接插入数据，ES会自动建立索引，同时建立type以及对应的mapping

（2）mapping中自动定义了每个fieldd的数据类型

（3）不同的数据类型（比如说text和date），可能有的是exact value，有的是full text

（4）exact value，在建立倒排索引的时候，分词的时候，都是将整个值一起作为关键字建立到倒排索引中；full text会经历各种各样的处理，分词，normalization（时态转换，同义词转换，大小写转换），才会建立到倒排索引中

（5）在搜索的时候，exact value和full text类型就决定了，对exact value和full text field进行搜索的行为也是不一样的，会跟建立倒排索引的行为保持一致；比如说exact value搜索的时候，就是直接按照整个值进行匹配，full text也会进行分词和正则化normalization再去倒排索引中去搜索。

（6）可以用 ES的dynamic mapping，让其自动建立mapping,包括自动设置数据类型；也可以提前手动创建index和type的mapping,自己对各个field进行设置，包括数据类型，包括索引行为，包括分析器等等。

mapping本质上就是index的type的元数据，决定了数据类型，建立倒排索引的行为，还有进行搜索的行为。

4、mapping核心数据类型以及dynamic mapping

（1）核心数据类型

string text：字符串类型

byte:字节类型

short：短整型

integer：整型

long:长整型

float:浮点型

boolean:布尔类型

date:时间类型

当然还有一些高级类型，像数组，对象object，但其底层都是text字符串类型

（2） dynamic mapping

true or false -> boolean

123 -> long

123.45 -> float

2017-01-01 -> date

"hello world" -> string text

（3）查看mapping

GET /{index}/mapping


GET /test/_mapping
{
  "test" : {
    "mappings" : {
      "properties" : {
        "field1" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "field2" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

5、手动建立和修改mapping以及定制string类型是否分词

注意：只能创建index时手动建立mapping，或者新增field mapping，但是不能update field mapping。

# 创建索引
PUT /website
{
  "mappings": {
    "properties": {
      "author_id": {
        "type": "long"
      },
      "title": {
        "type": "text",
        "analyzer": "standard"
      },
      "content": {
        "type": "text"
      },
      "post_date": {
        "type": "date"
      },
      "publisher_id": {
        "type": "text",
        "index": false
      }
    }
  }
}
#修改字段的mapping
PUT /website
{
  "mappings": {
    "properties": {
      "author_id": {
        "type": "text"
      }
    }
  }
}
{
  "error": {
    "root_cause": [
      {
        "type": "resource_already_exists_exception",
        "reason": "index [website/5xLohnJITHqCwRYInmBFmA] already exists",
        "index_uuid": "5xLohnJITHqCwRYInmBFmA",
        "index": "website"
      }
    ],
    "type": "resource_already_exists_exception",
    "reason": "index [website/5xLohnJITHqCwRYInmBFmA] already exists",
    "index_uuid": "5xLohnJITHqCwRYInmBFmA",
    "index": "website"
  },
  "status": 400
}
#增加mapping的字段
PUT /website/_mapping
{
  "properties": {
    "new_field": {
      "type": "text"
    }
  }
}
{
  "acknowledged" : true
}

6、mapping复杂类型y以及object类型数据底层结构

（1）multivalue field

{
    "tags": ["tag1", "tag2"]
}

（2）empty field

null, []

（3）object field

PUT /test/_create/1
{
  "address": {
    "country": "china",
    "province": "guangdong",
    "city": "guangzhou"
  },
  "name": "jack",
  "age": 27,
  "join_date": "2017-01-01"
}
GET /test/_mapping
{
  "test" : {
    "mappings" : {
      "properties" : {
        "address" : {
          "properties" : {
            "city" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "country" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "province" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        },
        "age" : {
          "type" : "long"
        },
        "join_date" : {
          "type" : "date"
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

GET /test/_doc/1

{
  "_index" : "test",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "address" : {
      "country" : "china",
      "province" : "guangdong",
      "city" : "guangzhou"
    },
    "name" : "jack",
    "age" : 27,
    "join_date" : "2017-01-01"
  }
}

注意：建立索引的时候与string时一样的，数据类型不能混

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

超越门户

吴晨光 / 中国人民大学出版社 / 2015-4-17 / 39.80

在这个PC端影响力下降、人们对手机的依赖与日俱增的时代，这种探索的意义非同寻常，可以说是试图树立新媒体时代的行业标准。 ——陈彤（小米内容投资与运营副总裁、新浪网前总编辑、资深网络媒体人）我将对此书的阅读，视作对往日岁月的怀念，它提醒我，自己曾投身于多么富有蓬勃朝气和探索精神的事业。而对这种事业的原则、逻辑和方法的继承和继续学习，对于互联网时代的企业形象塑造，同样有融会变通的参考......一起来看看《超越门户》这本书的介绍吧!

码农工具