elasticsearch中数据聚合问题

栏目: 后端 · 发布时间: 6年前

内容简介:由于项目中最近用到了elasticsearch,并且用到elasticsearch的聚合(Aggregation)功能,就深入研究了一下,elasticsearch中的聚合主要有四种:Bucketing Aggregation、Metric Aggregation、Matrix Aggregation和Pipeline Aggregation。请求示例:返回结果:

由于项目中最近用到了elasticsearch,并且用到elasticsearch的聚合(Aggregation)功能,就深入研究了一下,elasticsearch中的聚合主要有四种:Bucketing Aggregation、Metric Aggregation、Matrix Aggregation和Pipeline Aggregation。

聚合的基本结构

"aggregations" : {    
    "<aggregation_name>" : {    --用户自己起的名字
        "<aggregation_type>" : {      --聚合类型,如avg, sum
            <aggregation_body>      -- 针对的字段
        }
        [,"meta" : {  [<meta_data_body>] } ]?
        [,"aggregations" : { [<sub_aggregation>]+ } ]?    --聚合里面可以嵌套聚合
    }
    [,"<aggregation_name_2>" : { ... } ]*
}

Metric Aggregation

Avg Aggregation--计算平均值

请求示例:

GET /endpoint_avg/_search
{
  "size": 0,
  "aggs": {
    "avg_value": {
      "avg": {"field": "value"}
    }
  }
}

返回结果:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 315,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "avg_value" : {
      "value" : 342.84761904761905
    }
  }
}

之前看到其他博客上有说search_type=count可以只返回aggregation部分的结果,但我在7.x版本中试了下,好像不行,这边只能通过将size设为0来隐藏掉除了统计数据以外的数据。

Cardinality Aggregation--去重(相当于 mysql 中的distinct)

请求示例:

GET /endpoint_avg/_search
{
  "size": 0,
  "aggs": {
    "avg_value": {
      "cardinality": {"field": "service_id"}
    }
  }
}

返回结果:

{
  "took" : 11,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 317,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "avg_value" : {
      "value" : 2
    }
  }
}

Extended Status Aggragation--获取某个字段的所有统计信息(包括平均值,最大/小值....)

请求示例:

GET /endpoint_avg/_search
{
  "size": 0,
  "aggs": {
    "avg_status": {
      "extended_stats": {
        "field": "value"
      }
    }
  }
}

返回结果:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 326,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "avg_status" : {
      "count" : 326,       // 数量
      "min" : 2.0,         // 最小值  
      "max" : 2481.0,      // 最大值 
      "avg" : 347.63803680981596,    // 均值
      "sum" : 113330.0,              // 和
      "sum_of_squares" : 1.02303634E8,
      "variance" : 192962.62358387595,
      "std_deviation" : 439.275111500613,
      "std_deviation_bounds" : {
        "upper" : 1226.188259811042,
        "lower" : -530.91218619141
      }
    }
  }
}

Max Aggregation--求最大值

请求示例:

GET /endpoint_avg/_search
{
  "size": 0,
  "aggs": {
    "max_value": {
      "max": {
        "field": "value"
      }
    }
  }
}

返回结果:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 352,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "max_value" : {
      "value" : 2481.0
    }
  }
}

Min Aggreegation--计算最小值

请求示例:

GET /endpoint_avg/_search
{
  "size": 0,
  "aggs": {
    "min_value": {
      "min": {
        "field": "value"
      }
    }
  }
}

返回结果:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 352,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "min_value" : {
      "value" : 2.0
    }
  }
}

Percentiles Aggregation -- 百分比统计,按照[ 1, 5, 25, 50, 75, 95, 99 ]来统计

请求示例:

GET /endpoint_avg/_search
{
  "size": 0,
  "aggs": {
    "value_outlier": {
      "percentiles": {
        "field": "value"
      }
    }
  }
}

返回结果:

{
  "took" : 44,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 334,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "value_outlier" : {
      "values" : {
        "1.0" : 4.0,
        "5.0" : 67.2,
        "25.0" : 91.33333333333333,
        "50.0" : 151.0,
        "75.0" : 420.0,
        "95.0" : 1412.4000000000005,
        "99.0" : 1906.32
      }
    }
  }
}

从返回结果可以看出来,75%的数据在420ms加载完毕。

当然我们也可以指定自己需要统计的百分比:

GET /endpoint_avg/_search
{
  "size": 0,
  "aggs": {
    "value_outlier": {
      "percentiles": {
        "field": "value",
        "percents": [95, 96, 99, 99.5]
      }
    }
  }
}

返回结果:

{
  "took" : 20,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 330,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "value_outlier" : {
      "values" : {
        "95.0" : 1366.0,
        "96.0" : 1449.8000000000002,
        "99.0" : 1906.3999999999999,
        "99.5" : 2064.400000000004
      }
    }
  }
}

Percentile Ranks Aggregation -- 统计返回内数据的百分比

GET /endpoint_avg/_search
{
  "size": 0,
  "aggs": {
    "value_range": {
      "percentile_ranks": {
        "field": "value",
        "values": [100, 200]
      }
    }
  }
}

返回结果:

{
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 346,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "value_range" : {
      "values" : {
        "100.0" : 32.51445086705203,
        "200.0" : 65.19405450041288
      }
    }
  }
}

从返回结果可以看出,在100ms左右加载完毕的占了32%, 200ms左右加载完毕的占了65%

Status Aggregation -- 状态统计

请求示例:

GET /endpoint_avg/_search
{
  "size": 0,
  "aggs": {
    "value_status": {
      "stats": {
        "field": "value"
      }
    }
  }
}

返回结果:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 355,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "value_status" : {
      "count" : 355,
      "min" : 2.0,
      "max" : 2753.0,
      "avg" : 339.8112676056338,
      "sum" : 120633.0
    }
  }
}

可以发现跟之前的extended stats aggregation返回数据类似,只是少了一些较复杂的标准差之类的数据。

Sum Aggregation -- 求和函数

请求示例:

GET /endpoint_avg/_search
{
  "size": 0,
  "query": {"term": {
    "service_id": {
      "value": 5
    }
  }}, 
  "aggs": {
    "sum_value": {
      "sum": {
        "field": "value"
      }
    }
  }
}

返回结果:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 194,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "sum_value" : {
      "value" : 91322.0
    }
  }
}

Top Hits Aggregation -- 获取前n条数据, 可以嵌套使用

请求示例:

GET /endpoint_avg/_search
{
  "size": 0,
  "aggs": {
    "top_tags": {
      "terms": {
        "field": "service_id",
        "size": 2
      },
      "aggs": {
        "top_value": {
          "top_hits": {
            "size": 3,
            "sort": [{
              "time_bucket": {"order": "desc"}
            }]
          }
        }
      }
    }
  }
}

返回结果:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 372,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "top_tags" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 5,
          "doc_count" : 198,
          "top_value" : {
            "hits" : {
              "total" : 198,
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "endpoint_avg",
                  "_type" : "type",
                  "_id" : "201906191621_25",
                  "_score" : null,
                  "_source" : {
                    "service_id" : 5,
                    "count" : 2,
                    "time_bucket" : 201906191621,
                    "service_instance_id" : 250,
                    "entity_id" : "25",
                    "value" : 149,
                    "summation" : 299
                  },
                  "sort" : [
                    201906191621
                  ]
                },
                {
                  "_index" : "endpoint_avg",
                  "_type" : "type",
                  "_id" : "201906191620_24",
                  "_score" : null,
                  "_source" : {
                    "service_id" : 5,
                    "count" : 1,
                    "time_bucket" : 201906191620,
                    "service_instance_id" : 250,
                    "entity_id" : "24",
                    "value" : 93,
                    "summation" : 93
                  },
                  "sort" : [
                    201906191620
                  ]
                },
                {
                  "_index" : "endpoint_avg",
                  "_type" : "type",
                  "_id" : "201906191620_37",
                  "_score" : null,
                  "_source" : {
                    "service_id" : 5,
                    "count" : 1,
                    "time_bucket" : 201906191620,
                    "service_instance_id" : 250,
                    "entity_id" : "37",
                    "value" : 122,
                    "summation" : 122
                  },
                  "sort" : [
                    201906191620
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : 3,
          "doc_count" : 174,
          "top_value" : {
            "hits" : {
              "total" : 174,
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "endpoint_avg",
                  "_type" : "type",
                  "_id" : "201906191621_144",
                  "_score" : null,
                  "_source" : {
                    "service_id" : 3,
                    "count" : 1,
                    "time_bucket" : 201906191621,
                    "service_instance_id" : 238,
                    "entity_id" : "144",
                    "value" : 93,
                    "summation" : 93
                  },
                  "sort" : [
                    201906191621
                  ]
                },
                {
                  "_index" : "endpoint_avg",
                  "_type" : "type",
                  "_id" : "201906191620_70",
                  "_score" : null,
                  "_source" : {
                    "service_id" : 3,
                    "count" : 1,
                    "time_bucket" : 201906191620,
                    "service_instance_id" : 238,
                    "entity_id" : "70",
                    "value" : 192,
                    "summation" : 192
                  },
                  "sort" : [
                    201906191620
                  ]
                },
                {
                  "_index" : "endpoint_avg",
                  "_type" : "type",
                  "_id" : "201906191620_18",
                  "_score" : null,
                  "_source" : {
                    "service_id" : 3,
                    "count" : 2,
                    "time_bucket" : 201906191620,
                    "service_instance_id" : 238,
                    "entity_id" : "18",
                    "value" : 81,
                    "summation" : 162
                  },
                  "sort" : [
                    201906191620
                  ]
                }
              ]
            }
          }
        }
      ]
    }
  }
}

Value Count Aggregation--统计不同值的数量

请求示例:

GET /endpoint_avg/_search
{
  "size": 2, 
  "aggs": {
    "value_count": {
      "value_count": {
        "field": "value"
      }
    }
  }
}

返回结果:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 357,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "endpoint_avg",
        "_type" : "type",
        "_id" : "201906191457_16",
        "_score" : 1.0,
        "_source" : {
          "service_id" : 3,
          "count" : 1,
          "time_bucket" : 201906191457,
          "service_instance_id" : 238,
          "entity_id" : "16",
          "value" : 129,
          "summation" : 129
        }
      },
      {
        "_index" : "endpoint_avg",
        "_type" : "type",
        "_id" : "201906191503_691",
        "_score" : 1.0,
        "_source" : {
          "service_id" : 5,
          "count" : 2,
          "time_bucket" : 201906191503,
          "service_instance_id" : 250,
          "entity_id" : "691",
          "value" : 178,
          "summation" : 357
        }
      }
    ]
  },
  "aggregations" : {
    "value_count" : {
      "value" : 357
    }
  }
}

基本metrics中常用的聚合函数就这几种,今天太累了,其他三类的聚合后续再做研究吧!


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Java Message Service API Tutorial and Reference

Java Message Service API Tutorial and Reference

Hapner, Mark; Burridge, Rich; Sharma, Rahul / 2002-2 / $ 56.49

Java Message Service (JMS) represents a powerful solution for communicating between Java enterprise applications, software components, and legacy systems. In this authoritative tutorial and comprehens......一起来看看 《Java Message Service API Tutorial and Reference》 这本书的介绍吧!

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具