Elasticsearch查询和聚合基本语法

栏目: 后端 · 发布时间: 5年前

内容简介：Elasticsearch主要的查询语法包括URI查询和body查询，URI比较轻便快速，而body查询作为一种json的格式化查询，可以有许多限制条件。本文主要介绍结构化查询的query，filter，aggregate的使用，本文使用的ES版本为6.5.4，中文分词器使用的ik，安装和使用可以参考：

1.概述

Elasticsearch主要的查询语法包括URI查询和body查询，URI比较轻便快速，而body查询作为一种json的格式化查询，可以有许多限制条件。本文主要介绍结构化查询的query，filter，aggregate的使用，本文使用的ES版本为6.5.4，中文分词器使用的ik，安装和使用可以参考：

Elasticsearch 安装和使用

Elasticsearch中ik分词器的使用

在ES建立以下索引，并且导入数据

PUT /news
{
        "aliases": {
            "test.chixiao.news": {}
        },
        "mappings":{
            "news": {
                "dynamic": "false",
                "properties": {
                    "id": {
                        "type": "integer"
                    },
                    "title": {
                        "analyzer": "ik_max_word",
                        "type": "text"
                    },
                    "summary": {
                        "analyzer": "ik_max_word",
                        "type": "text"
                    },
                    "author": {
                        "type": "keyword"
                    },
                    "publishTime": {
                        "type": "date"
                    },
                    "modifiedTime": {
                        "type": "date"
                    },
                    "createTime": {
                        "type": "date"
                    },
                    "docId": {
                        "type": "keyword"
                    },
                    "voteCount": {
                        "type": "integer"
                    },
                    "replyCount": {
                        "type": "integer"
                    }
                }
            }
        },
        "settings":{
            "index": {
                "refresh_interval": "1s",
                "number_of_shards": 3,
                "max_result_window": "10000000",
                "mapper": {
                    "dynamic": "false"
                },
                "number_of_replicas": 1
            },
            "analysis": {
                "normalizer": {
                    "lowercase": {
                        "type": "custom",
                        "char_filter": [],
                        "filter": [
                            "lowercase",
                            "asciifolding"
                        ]
                    }
                },
                "analyzer": {
                    "1gram": {
                        "type": "custom",
                        "tokenizer": "ngram_tokenizer"
                    }
                },
                "tokenizer": {
                    "ngram_tokenizer": {
                        "type": "nGram",
                        "min_gram": "1",
                        "max_gram": "1",
                        "token_chars": [
                            "letter",
                            "digit"
                        ]
                    }
                }
            }
        }
    }复制代码

2.查询

2.1一个查询的例子

一个简单的查询例子如下，查询主要分为query和filter，这两种类型的查询结构都在query里面，剩下的sort标识排序，size和from用来翻页，_source用来指定召回document返回哪些字段。

GET /news/_search
{
  "query": {"match_all": {}}, 
  "sort": [
    {
      "publishTime": {
        "order": "desc"
      }
    }
  ],
  "size": 2,
  "from": 0,
  "_source": ["title", "id", "summary"]
}复制代码

返回结果：

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 204,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "news",
        "_type" : "news",
        "_id" : "228",
        "_score" : null,
        "_source" : {
          "summary" : "据陕西高院消息，6月11日上午，西安市中级人民法院二审公开开庭宣判了陕西省首例“套路贷”涉黑案件——韩某某等人非法放贷一案，法院驳回上诉，维持原判。西安市中级人",
          "id" : 228,
          "title" : "陕西首例套路贷涉黑案宣判:团伙对借款人喷辣椒水"
        },
        "sort" : [
          1560245097000
        ]
      },
      {
        "_index" : "news",
        "_type" : "news",
        "_id" : "214",
        "_score" : null,
        "_source" : {
          "summary" : "网易娱乐6月11日报道6月11日，有八卦媒体曝光曹云金与妻子唐菀现身天津民政局办理了离婚手续。对此，网易娱乐向曹云金经纪人求证，得到了对方独家回应：“确实是离婚",
          "id" : 214,
          "title" : "曹云金承认已离婚:和平离婚 有人恶意中伤心思歹毒"
        },
        "sort" : [
          1560244657000
        ]
      }
    ]
  }
}复制代码

返回结果中took表示耗时，_shards表示分片信息，当前index有3个分片，并且3个分片都工作正常，hits表示命中的结果，total表示命中总数，max_score表示最大的分值，hits表示命中的具体document。

查询分为精确过滤（filter）和全文搜索（query）两种：精确过滤容易被缓存，因此它的执行速度非常快。

2.2 FIlter查询

term

term 查找可以精确的找到符合条件的记录，其中的FIELD标识索引中的字段，VALUE表示需要查询的值。

{"term": {
    "FIELD": {
      "value": "VALUE"
    }
  }
}复制代码

比如，查询source为中新经纬的新闻，那么可以这么使用：

GET /news/_search
{
  "query": {"term": {
    "source": {
      "value": "中新经纬"
    }
  }}
}复制代码

bool

当需要多个逻辑组合查询的时候，可以使用bool来组各逻辑。bool可以包含

{   "bool" : {      "must" :     [],      "should" :   [],      "must_not" : [],   }}复制代码

must：搜索的结果必须匹配，类似 SQL 的AND

must_not: 搜索的结果必须不匹配，类似SQL的NOT

should: 搜索的结果至少匹配到一个，类似SQL的OR

当我们需要查source为中新经纬，并且id为4或者75的新闻，可以这样使用，其中的minimun_should_match用来指定should内的条件需要匹配多少个，默认是0，0的情况下should内容只参与打分，不做倒排过滤

GET /news/_search{  "query": {    "bool": {    "must": [    {"term": {      "source": {        "value": "中新经纬"      }    }}  ],  "should": [    {"term": {      "id": {        "value": "4"      }    }},    {"term": {      "id": {        "value": "75"      }    }}  ],  "minimum_should_match": 1  }}}复制代码

terms

对于上面查找多个精确值的情况，可以使用terms，比如查找id是4或者75的文章

GET /news/_search{  "query": {"terms": {    "id": [      "4",      "75"    ]  }}}复制代码

range

对于需要用到范围的查询，可以使用range，range和term作用的位置相同，比如查找id从1到10的文章，其中：

gt : > 大于（greater than）
lt : < 小于（less than）
gte : >= 大于或等于（greater than or equal to）
lte : <= 小于或等于（less than or equal to）

GET /news/_search{  "query": {"range": {    "id": {      "gte": 1,      "lte": 10    }  }}}复制代码

exists

es中可以使用exists来查找某个字段存在或者不存在的document，比如查找存在author字段的文档，也可以在bool内配合should和must_not使用，就可以实现不存在或者可能存在的查询。

GET /news/_search{  "query": {    "exists": {"field": "author"}  }}复制代码

2.3.Query查询

和filter的精确匹配不一样，query可以进行一些字段的全文搜索和搜索结果打分，es中只有类型为text的字段才可以被分词，类型为keyword虽然也是字符串，但只能作为枚举，不能被分词，text的分词类型可以在创建索引的时候指定。

match

当我们想要搜某个字段的时候可以使用match，比如查找文章中出现体育的新闻

GET /news/_search{  "query": {    "match": {      "summary": "体育"    }  }}复制代码

在match中我们还可以指定分词器，比如指定分词器为ik_smart对输入的词尽量分大颗粒，此时召回的就是含有进口红酒的document，如果指定分词器为ik_max_word则分出的词颗粒会比较小，会召回包含口红和红酒的document

{    "match": {      "name": {        "query": "进口红酒",        "analyzer": "ik_smart"      }         }  }复制代码

对于query的文本有可能分出好几个词，这个时候可以用and连接，表示多个词都命中才被召回，如果用or连接，则类似should可以控制，至少命中多少个词才被召回。比如搜索包含体育新闻内容的新闻，下面这个查询只要包含一个体育或者新闻的document都会被召回

GET /news/_search{  "query": {    "match": {      "summary": {        "query": "体育新闻",        "operator": "or",        "minimum_should_match": 1      }    }  }}复制代码

multi_match

当需要搜索多个字段的时候，可以使用multi_match进行查询，比如在title或者summary中搜索含有新闻关键词的document

GET /news/_search{  "query": {    "multi_match": {      "query": "新闻",      "fields": ["title", "summary"]    }  }}复制代码

2.4.组合查询

有了全文搜索和过滤的这些字段，配合bool就可以实现复杂的组合查询

GET /news/_search{  "query": {"bool": {    "must": [      {"match": {        "summary": {          "boost": 1,          "query": "长安"        }      }      },      {        "term": {          "source": {            "value": "中新经纬",            "boost": 2          }        }      }    ],    "filter": {"bool": {      "must":[        {"term":{          "id":75        }}        ]    }}  }}}复制代码

上面请求bool中的must、must_not、should可以使用term，range、match。这些默认都是参与打分的，可以通过boost来控制打分的权重，如果不想要某些查询条件参与打分，可以在bool中添加filter，这个filter中的查询字段都不参与打分，而且查询的内容可以被缓存。

3.聚合

聚合的基本格式为：

GET /news/_search{  "size": 0,  "aggs": {    "NAME": {      "AGG_TYPE": {}    }  }}复制代码

其中NAME表示当前聚合的名字，可以取任意合法的字符串，AGG_TYPE表示聚合的类型，常见的为分为多值聚合和单值聚合

3.1.一个聚合的例子

GET /news/_search{ "size": 0,  "aggs": {    "sum_all": {      "sum": {        "field": "replyCount"      }    }  }}复制代码

上面的例子表示查询当前库里面的replayCount的和，返回结果：

{  "took" : 8,  "timed_out" : false,  "_shards" : {    "total" : 3,    "successful" : 3,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : 204,    "max_score" : 0.0,    "hits" : [ ]  },  "aggregations" : {    "sum_all" : {      "value" : 390011.0    }  }}复制代码

返回结果中默认会包含命中的document，所以需要把size指定为0，结果中的sum_all为请求中指定的名字。

Elasticsearch中的聚合类型主要分为Metrics和Bucket

3.2.Metrics

metrics主要是一些单值的返回，像 avg、max、min、sum、stats等 这些计算。

比如计算index里面最多的点赞数是多少

GET /news/_search{  "size": 0,  "aggs": {    "max_replay": {      "max": {        "field": "replyCount"      }    }  }}复制代码

stats

常用的一些统计信息，可以用stats，比如查看某个字段的，总数，最小值，最大值，平均值等，比如查看document中新闻回复量的基本情况

GET /news/_search{ "size": 0,  "aggs": {    "cate": {      "stats": {        "field": "replyCount"      }    }  }}复制代码

返回结果为:

{  "took" : 3,  "timed_out" : false,  "_shards" : {    "total" : 3,    "successful" : 3,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : 204,    "max_score" : 0.0,    "hits" : [ ]  },  "aggregations" : {    "cate" : {      "count" : 202,      "min" : 0.0,      "max" : 32534.0,      "avg" : 1930.7475247524753,      "sum" : 390011.0    }  }}复制代码

3.3.Bucket

桶类似于sql里面的group by，使用Bucket会对内容进行分桶

terms

利用terms分桶之后，可以查看数据的分布，比如可以查看index中一共有多少个source，每个source有多少文章，size是用来指定返回最多的几个分类

GET /news/_search{  "size": 0,  "aggs": {    "myterms": {      "terms": {        "field": "source",        "size": 100      }    }  }}复制代码

3.4.组合聚类

GET /news/_search{  "size": 0,  "aggs": {    "myterms": {      "terms": {        "field": "source",        "size": 100      },      "aggs": {        "replay": {          "terms": {            "field": "replyCount",            "size": 10          }        },        "avg_price": {            "avg": {                  "field": "voteCount"               }            }      }    }  }}复制代码

上面代码首先对source分桶，在每个souce类型里面在对replayCount进行分桶，并且计算每个source类里面的voteCount的平均值

返回的某一项结果如下

4.查询和聚和的组合

有了查询和聚合，我们就可以对查询的结果做聚合，比如我想查看summary中包含体育的新闻都是那些来源网站，就可以像下面这样查询

GET /news/_search{ "size": 0, "query": {"bool": {"must": [   {"match": {     "summary": "体育"   }} ]}},  "aggs": {    "cate": {      "terms": {        "field": "source"      }    }  }}复制代码

以上所述就是小编给大家介绍的《Elasticsearch查询和聚合基本语法》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Visual Thinking

Colin Ware / Morgan Kaufmann / 2008-4-18 / USD 49.95

Increasingly, designers need to present information in ways that aid their audiences thinking process. Fortunately, results from the relatively new science of human visual perception provide valuable ......一起来看看《Visual Thinking》这本书的介绍吧!

码农工具

CSS 压缩/解压工具

在线压缩/解压 CSS 代码

Elasticsearch查询和聚合基本语法

1.概述

2.查询

2.1一个查询的例子

2.2 FIlter查询

term

bool

terms

range

exists