elasticsearch之实例篇

栏目: 服务器 · 发布时间: 6年前

内容简介:elasticsearch之实例篇

本文上接《 elasticsearch之搭建篇 》,看看如何实现类似糯米的检索功能。

在商铺和商品的存储方面,有嵌套(Nested)和父子文档(Parent-Child)两种方式,经过探索与碰壁,最终选择了嵌套(Nested)。

父子文档的探索过程我也保留了下来,感兴趣可以看本文末尾被删除的段落。

嵌套(Nested)

创建type

商品将嵌套在商铺的文档中,其创建方法如下:

<?php
 
require_once__DIR__ . "/vendor/autoload.php";
 
// 客户端
$client = Elasticsearch\ClientBuilder::fromConfig([
    'hosts' => ['localhost:9200', 'localhost:9201', 'localhost:9203'], // 最好在为ES集群搭建Haproxy反向代理
    'retries' => 2
]);
 
// 创建商铺type
$indices = $client->indices();
// 先删除旧的basic索引
$indices->delete(['index' => 'basic']);
// 创建basic索引的同时指定商铺的type mapping
$indices->create([
    'index' => 'basic',
    'body' => [
        // index配置
        'settings' => [
            "number_of_shards" => 3,    // 3个分区
            "number_of_replicas" => 2,  // 每个分区有1个主分片和2个从分片
        ],
        // type映射
        'mappings' => [
            // 商铺
            'merchant' => [
                // 属性
                'properties' => [
                    // 商铺名称
                    'merchant_name' => [
                        'type' => 'string', // 字符串
                        'index' => 'analyzed', // 全文索引
                        'analyzer' => 'ik_max_word', // 中文分词
                    ],
                    // 商铺图片
                    'merchant_img' => [
                        'type' => 'string', // 字符串
                        'index' => 'no', // 不索引
                    ],
                    // 商铺类型
                    'merchant_type' => [
                        'type' => 'string', // 字符串
                        'index' => 'not_analyzed', // 不分词,直接索引
                    ],
                    // 用户评分
                    'merchant_score' => [
                        'type' => 'integer', // 整形
                        'index' => 'not_analyzed', // 直接索引,用于过滤/排序
                    ],
                    // 人均价格
                    'merchant_avg_price' => [
                        'type' => 'integer', // 整形
                        'index' => 'not_analyzed', // 直接索引,用于过滤/排序
                    ],
                    // 地理坐标
                    'merchant_location' => [
                        'type' => 'geo_point', // 地址坐标
                    ],
                    // 嵌套商品列表
                    'merchant_product' => [
                        'type' => 'nested', // 嵌套文档
                        'properties' => [
                            // 商品ID
                            'product_id' => [
                                'type' => 'long', // 长整形
                                'index' => 'not_analyzed', // 不分词,直接索引
                            ],
                            // 商品名称
                            'product_name' => [
                                'type' => 'string', // 字符串
                                'index' => 'analyzed', // 全文索引
                                'analyzer' => 'ik_max_word', // 中文分词
                            ],
                            // 商品图片
                            'product_img' => [
                                'type' => 'string', // 字符串
                                'index' => 'no', // 不索引
                            ],
                            // 商品类型
                            'product_type' => [
                                'type' => 'string', // 字符串
                                'index' => 'not_analyzed', // 不分词,直接索引
                            ],
                            // 商品价格
                            'product_price' => [
                                'type' => 'integer', // 整形
                                'index' => 'not_analyzed', // 直接索引,用于过滤/排序
                            ],
                            // 商品销量
                            'product_sold' => [
                                'type' => 'integer', // 整形
                                'index' => 'not_analyzed', // 直接索引,用于排序/过滤
                            ]
                        ]
                    ]
                ]
            ],
        ]
    ],
]);

可见,商品列表作为一个属性存储在商铺中(type=nested,嵌套的),一个商铺有多个商品。(这里是 参考文档

可以访问接口看一下已经建立好的type:

[work@df6c675da97enuomi-search]$ curllocalhost:9200/basic?pretty
{
  "basic" : {
    "aliases" : { },
    "mappings" : {
      "merchant" : {
        "properties" : {
          "merchant_avg_price" : {
            "type" : "integer"
          },
          "merchant_img" : {
            "type" : "keyword",
            "index" : false
          },
          "merchant_location" : {
            "type" : "geo_point"
          },
          "merchant_name" : {
            "type" : "text",
            "analyzer" : "ik_max_word"
          },
          "merchant_product" : {
            "type" : "nested",
            "properties" : {
              "product_id" : {
                "type" : "long"
              },
              "product_img" : {
                "type" : "keyword",
                "index" : false
              },
              "product_name" : {
                "type" : "text",
                "analyzer" : "ik_max_word"
              },
              "product_price" : {
                "type" : "integer"
              },
              "product_sold" : {
                "type" : "integer"
              },
              "product_type" : {
                "type" : "keyword"
              }
            }
          },
          "merchant_score" : {
            "type" : "integer"
          },
          "merchant_type" : {
            "type" : "keyword"
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1489890529746",
        "number_of_shards" : "3",
        "number_of_replicas" : "2",
        "uuid" : "sY5hH9kqQLq2mmZyW9HQmA",
        "version" : {
          "created" : "5020299"
        },
        "provided_name" : "basic"
      }
    }
  }
}

符合预期,商铺merchant表已建立成功,它内部嵌套了属于它的商品列表。

录入数据

我们通过bulk API批量的插入测试数据:

<?php
 
require_once__DIR__ . "/vendor/autoload.php";
 
// 客户端
$client = Elasticsearch\ClientBuilder::fromConfig([
    'hosts' => ['localhost:9200', 'localhost:9201', 'localhost:9203'], // 最好在为ES集群搭建Haproxy反向代理
    'retries' => 2
]);
 
// 批量插入测试数据
$client->bulk([
    'index' => 'basic',
    'type' => 'merchant',
    'body' => [
        // index索引请求,元信息是['_id':1]
        ['index' => ['_id' => 1]],  // _id就是店铺的ID(一般来自于Mysql)
        // 请求体
        [  // 主文档
            'merchant_name' => '鑫明明拉面',
            'merchant_score' => 4,
            'merchant_type' => '美食',
            'merchant_img' => 'http://merchant.com/1.jpg',
            'merchant_avg_price' => 2100,
            'merchant_location' => [120.3945890000, 36.0705170000],
            'merchant_product' => [ // 嵌套文档列表
                [
                    'product_id' => 1,
                    'product_name' => '羊肉烩面',
                    'product_type' => '面食',
                    'product_img' => 'http://product.com/2.jpg',
                    'product_sold' => 11,
                    'product_price' => 2200,
                ],
                [
                    'product_id' => 2,
                    'product_name' => '烤羊肉串',
                    'product_type' => '烤串',
                    'product_img' => 'http://product.com/3.jpg',
                    'product_sold' => 12,
                    'product_price' => 2300,
                ],
            ]
        ],
 
        ['index' => ['_id' => 2]],
        [
            'merchant_name' => '东方宫兰州拉面',
            'merchant_score' => 3,
            'merchant_type' => '美食',
            'merchant_img' => 'http://merchant.com/2.jpg',
            'merchant_avg_price' => 1800,
            'merchant_location' => [36.0693500000, 120.3928290000],
            'merchant_product' => [
                [
                    'product_id' => 3,
                    'product_name' => '牛肉炒面',
                    'product_type' => '面食',
                    'product_img' => 'http://product.com/4.jpg',
                    'product_sold' => 10,
                    'product_price' => 2400,
                ],
                [
                    'product_id' => 4,
                    'product_name' => '蛋炒饭',
                    'product_type' => '主食',
                    'product_img' => 'http://product.com/5.jpg',
                    'product_sold' => 14,
                    'product_price' => 2300,
                ],
                [
                    'product_id' => 5,
                    'product_name' => '羊肉汤',
                    'product_type' => '汤粉',
                    'product_img' => 'http://product.com/6.jpg',
                    'product_sold' => 10,
                    'product_price' => 2200,
                ],
            ]
        ],
 
        ['index' => ['_id' => 3]],
        [
            'merchant_name' => '开海饭店',
            'merchant_score' => 3,
            'merchant_type' => '美食',
            'merchant_img' => 'http://merchant.com/3.jpg',
            'merchant_avg_price' => 3500,
            'merchant_location' => [120.4051170000, 36.0683000000],
            'merchant_product' => [
                [
                    'product_id' => 6,
                    'product_name' => '海鲜炒饭',
                    'product_type' => '主食',
                    'product_img' => 'http://product.com/7.jpg',
                    'product_sold' => 10,
                    'product_price' => 2400,
                ],
                [
                    'product_id' => 7,
                    'product_name' => '西红柿鸡蛋面',
                    'product_type' => '面食',
                    'product_img' => 'http://product.com/8.jpg',
                    'product_sold' => 10,
                    'product_price' => 2300,
                ],
                [
                    'product_id' => 8,
                    'product_name' => '鸭血粉丝汤',
                    'product_type' => '汤粉',
                    'product_img' => 'http://product.com/9.jpg',
                    'product_sold' => 10,
                    'product_price' => 2200,
                ],
                [
                    'product_id' => 9,
                    'product_name' => '兰州炒饭',
                    'product_type' => '主食',
                    'product_img' => 'http://product.com/10.jpg',
                    'product_sold' => 15,
                    'product_price' => 2500,
                ],
            ]
        ],
    ]
]);

这里直接用ES的ID存储”商铺”的唯一ID,而嵌套的商品的ID则单独指定一个product_id字段来存储。

嵌套查询

利用嵌套查询,查找”商铺”名称或者其售卖的”商品”名称匹配”搜索关键字”的”商铺”记录:

<?php
 
require_once__DIR__ . "/vendor/autoload.php";
 
// 客户端
$client = Elasticsearch\ClientBuilder::fromConfig([
    'hosts' => ['localhost:9200', 'localhost:9201', 'localhost:9203'], // 最好在为ES集群搭建Haproxy反向代理
    'retries' => 2
]);
 
// 搜索关键字
$keyword = '东方宫拉面';
 
// 嵌套查询
$result = $client->search([
    'index' => 'basic', // 数据库
    'type' => 'merchant',  // 表
    'body' => [ // 查询体
        'query' => [
            // 查询请求,影响相关性打分
            'bool' => [ // 布尔组合
                'should' => [ // 各个子句相当于或的关系
                    // 第1项
                    [
                        // 全文匹配
                        'match' => ['merchant_name' => $keyword], // 商铺名
                    ],
                    // 第2项
                    [
                        // 嵌套查询
                        'nested' => [
                            'path' => 'merchant_product', // 子文档的路径
                            'score_mode' => 'max', // 子文档的评分方式(max表示取最多个子文档中最匹配的那个的相关性)
                            'query' => [ // 子文档查询请求,影响相关性打分
                                'match' => [ // 全文匹配
                                    'merchant_product.product_name' => $keyword, // 商品名(必须全路径)
                                ]
                            ]
                        ]
                    ]
                ]
            ]
        ],
    ]
]);
 
print_r($result);

分析一下这个查询:

  • query:必须要写的。
  • bool:组合查询。
  • should:OR的意思,里面有2个子句,最终计算2个子句的相关性加和,除以子句的数量得到merchant的总相关性。
  • match:merchant_name匹配关键字,得到第一个子句的相关性。
  • nested:针对嵌入文档列表的一个子查询,是第二个子句。
  • path:嵌入文档的路径。
  • score_mode:max表示取嵌入文档列表(商品列表)中最大相关性的值,作为第二个子句的总相关性。
  • query:必须要写的。
  • match:product_name匹配关键字,得到每个商品的相关性。

执行这个查询,关键字是”东方宫拉面”:

[work@df6c675da97enuomi-search]$ phpmain.php
Array
(
    [took] => 21
    [timed_out] =>
    [_shards] => Array
        (
            [total] => 3
            [successful] => 3
            [failed] => 0
        )
 
    [hits] => Array
        (
            [total] => 3
            [max_score] => 2.5505729
            [hits] => Array
                (
                    [0] => Array
                        (
                            [_index] => basic
                            [_type] => merchant
                            [_id] => 1
                            [_score] => 2.5505729
                            [_source] => Array
                                (
                                    [merchant_name] => 鑫明明拉面
                                    [merchant_score] => 4
                                    [merchant_type] => 美食
                                    [merchant_img] => http://merchant.com/1.jpg
                                    [merchant_avg_price] => 2100
                                    [merchant_location] => Array
                                        (
                                            [0] => 127
                                            [1] => 128
                                        )
 
                                    [merchant_product] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [product_id] => 1
                                                    [product_name] => 羊肉烩面
                                                    [product_type] => 面食
                                                    [product_img] => http://product.com/2.jpg
                                                    [product_sold] => 11
                                                    [product_price] => 2200
                                                )
 
                                            [1] => Array
                                                (
                                                    [product_id] => 2
                                                    [product_name] => 烤羊肉串
                                                    [product_type] => 烤串
                                                    [product_img] => http://product.com/3.jpg
                                                    [product_sold] => 12
                                                    [product_price] => 2300
                                                )
 
                                        )
 
                                )
 
                        )
 
                    [1] => Array
                        (
                            [_index] => basic
                            [_type] => merchant
                            [_id] => 2
                            [_score] => 2.0315127
                            [_source] => Array
                                (
                                    [merchant_name] => 东方宫兰州拉面
                                    [merchant_score] => 3
                                    [merchant_type] => 美食
                                    [merchant_img] => http://merchant.com/2.jpg
                                    [merchant_avg_price] => 1800
                                    [merchant_location] => Array
                                        (
                                            [0] => 120
                                            [1] => 120
                                        )
 
                                    [merchant_product] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [product_id] => 3
                                                    [product_name] => 牛肉炒面
                                                    [product_type] => 面食
                                                    [product_img] => http://product.com/4.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2400
                                                )
 
                                            [1] => Array
                                                (
                                                    [product_id] => 4
                                                    [product_name] => 蛋炒饭
                                                    [product_type] => 主食
                                                    [product_img] => http://product.com/5.jpg
                                                    [product_sold] => 14
                                                    [product_price] => 2300
                                                )
 
                                            [2] => Array
                                                (
                                                    [product_id] => 5
                                                    [product_name] => 羊肉汤
                                                    [product_type] => 汤粉
                                                    [product_img] => http://product.com/6.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2200
                                                )
 
                                        )
 
                                )
 
                        )
 
                    [2] => Array
                        (
                            [_index] => basic
                            [_type] => merchant
                            [_id] => 3
                            [_score] => 1.0982643
                            [_source] => Array
                                (
                                    [merchant_name] => 开海饭店
                                    [merchant_score] => 3
                                    [merchant_type] => 美食
                                    [merchant_img] => http://merchant.com/3.jpg
                                    [merchant_avg_price] => 3500
                                    [merchant_location] => Array
                                        (
                                            [0] => 50
                                            [1] => 50
                                        )
 
                                    [merchant_product] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [product_id] => 6
                                                    [product_name] => 海鲜炒饭
                                                    [product_type] => 主食
                                                    [product_img] => http://product.com/7.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2400
                                                )
 
                                            [1] => Array
                                                (
                                                    [product_id] => 7
                                                    [product_name] => 西红柿鸡蛋面
                                                    [product_type] => 面食
                                                    [product_img] => http://product.com/8.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2300
                                                )
 
                                            [2] => Array
                                                (
                                                    [product_id] => 8
                                                    [product_name] => 鸭血粉丝汤
                                                    [product_type] => 汤粉
                                                    [product_img] => http://product.com/9.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2200
                                                )
 
                                            [3] => Array
                                                (
                                                    [product_id] => 9
                                                    [product_name] => 兰州炒饭
                                                    [product_type] => 主食
                                                    [product_img] => http://product.com/10.jpg
                                                    [product_sold] => 15
                                                    [product_price] => 2500
                                                )
 
                                        )
 
                                )
 
                        )
 
                )
 
        )
 
)

作为一个用户来说,我预期应该是”东方宫兰州拉面”最为匹配,可是没想到”鑫明明拉面”的相关性竟然高于”东方宫兰州拉面”,这是怎么回事呢?

这是因为文档的相关性计算是采用的『 词频/逆向文档频率 (TF/IDF)』,其中TF表示单词(TERM)在一个文档内的出现比例,IDF表示单词在所有文章中的出现比例,TF越高表示单词在文章内更重要(比如:这篇博客里多次出现的”ES”),而IDF越高表示单词很普通并不重要(例如:的,了,吧…),TF除IDF越高则表示这个单词与这篇文章关系更密切,也就是相关度更高。

而搜索引擎的职责就是返回相关性更高的文档,因此TF/IDF也被ES拿来实现检索。回到问题本身,其实”东方宫拉面”应该与”东方宫兰州拉面”这家店更加相关,但是因为我只插入了3个商铺文档,而我的商铺表又被分成了3个分片,这导致”东方宫兰州拉面”商铺独自存储在一个分片,而”开海饭店”和”鑫明明拉面”在另外一个分片上。

ES在计算IDF的时候默认基于分片内的词频数据统计的,正确的做法应该是根据全部分片的词频总和做为IDF,不过当数据规模较大的情况下,每个分片内的IDF都差不多均匀,因此一般不需要修改这个默认行为。

“东方宫兰州拉面”所在分片只有自己,导致所有单词的IDF都是1,而”开海饭店”标题里没有出现关键字”东方宫拉面”内任意单词,而在”鑫明明拉面”里出现了”拉面”单词,导致这个单词的IDF必然小于1,因此对于”鑫明明拉面”来说,其”拉面”的TF / IDF是大于”东方宫兰州拉面”的”拉面”的,导致了我们没有预期的结果。

在开发阶段数据量较小,我们修改默认的IDF计算方式:让每个分片计算相关性时汇总其他分片的IDF,这样就是一个相对准确的计算了,通过增加search_type参数可以做到:

<?php
 
require_once__DIR__ . "/vendor/autoload.php";
 
// 客户端
$client = Elasticsearch\ClientBuilder::fromConfig([
    'hosts' => ['localhost:9200', 'localhost:9201', 'localhost:9203'], // 最好在为ES集群搭建Haproxy反向代理
    'retries' => 2
]);
 
// 搜索关键字
$keyword = '东方宫拉面';
 
// 嵌套查询
$result = $client->search([
    'index' => 'basic', // 数据库
    'type' => 'merchant',  // 表
    'search_type' => 'dfs_query_then_fetch',  // 汇总IDF计算相关
    'body' => [ // 查询体
        'query' => [
            // 查询请求,影响相关性打分
            'bool' => [ // 布尔组合
                'should' => [ // 各个子句相当于或的关系
                    // 第1项
                    [
                        // 全文匹配
                        'match' => ['merchant_name' => $keyword], // 商铺名
                    ],
                    // 第2项
                    [
                        // 嵌套查询
                        'nested' => [
                            'path' => 'merchant_product', // 子文档的路径
                            'score_mode' => 'max', // 子文档的评分方式(max表示取最多个子文档中最匹配的那个的相关性)
                            'query' => [ // 子文档查询请求,影响相关性打分
                                'match' => [ // 全文匹配
                                    'merchant_product.product_name' => $keyword, // 商品名(必须全路径)
                                ]
                            ]
                        ]
                    ]
                ]
            ]
        ],
    ]
]);
 
print_r($result);

其结果符合预期,你可以 扩展阅读文档了解TF/IDF

[work@df6c675da97enuomi-search]$ phpmain.php
Array
(
    [took] => 30
    [timed_out] =>
    [_shards] => Array
        (
            [total] => 3
            [successful] => 3
            [failed] => 0
        )
 
    [hits] => Array
        (
            [total] => 3
            [max_score] => 3.5293567
            [hits] => Array
                (
                    [0] => Array
                        (
                            [_index] => basic
                            [_type] => merchant
                            [_id] => 2
                            [_score] => 3.5293567
                            [_source] => Array
                                (
                                    [merchant_name] => 东方宫兰州拉面
                                    [merchant_score] => 3
                                    [merchant_type] => 美食
                                    [merchant_img] => http://merchant.com/2.jpg
                                    [merchant_avg_price] => 1800
                                    [merchant_location] => Array
                                        (
                                            [0] => 120
                                            [1] => 120
                                        )
 
                                    [merchant_product] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [product_id] => 3
                                                    [product_name] => 牛肉炒面
                                                    [product_type] => 面食
                                                    [product_img] => http://product.com/4.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2400
                                                )
 
                                            [1] => Array
                                                (
                                                    [product_id] => 4
                                                    [product_name] => 蛋炒饭
                                                    [product_type] => 主食
                                                    [product_img] => http://product.com/5.jpg
                                                    [product_sold] => 14
                                                    [product_price] => 2300
                                                )
 
                                            [2] => Array
                                                (
                                                    [product_id] => 5
                                                    [product_name] => 羊肉汤
                                                    [product_type] => 汤粉
                                                    [product_img] => http://product.com/6.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2200
                                                )
 
                                        )
 
                                )
 
                        )
 
                    [1] => Array
                        (
                            [_index] => basic
                            [_type] => merchant
                            [_id] => 1
                            [_score] => 2.155528
                            [_source] => Array
                                (
                                    [merchant_name] => 鑫明明拉面
                                    [merchant_score] => 4
                                    [merchant_type] => 美食
                                    [merchant_img] => http://merchant.com/1.jpg
                                    [merchant_avg_price] => 2100
                                    [merchant_location] => Array
                                        (
                                            [0] => 127
                                            [1] => 128
                                        )
 
                                    [merchant_product] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [product_id] => 1
                                                    [product_name] => 羊肉烩面
                                                    [product_type] => 面食
                                                    [product_img] => http://product.com/2.jpg
                                                    [product_sold] => 11
                                                    [product_price] => 2200
                                                )
 
                                            [1] => Array
                                                (
                                                    [product_id] => 2
                                                    [product_name] => 烤羊肉串
                                                    [product_type] => 烤串
                                                    [product_img] => http://product.com/3.jpg
                                                    [product_sold] => 12
                                                    [product_price] => 2300
                                                )
 
                                        )
 
                                )
 
                        )
 
                    [2] => Array
                        (
                            [_index] => basic
                            [_type] => merchant
                            [_id] => 3
                            [_score] => 1.1084312
                            [_source] => Array
                                (
                                    [merchant_name] => 开海饭店
                                    [merchant_score] => 3
                                    [merchant_type] => 美食
                                    [merchant_img] => http://merchant.com/3.jpg
                                    [merchant_avg_price] => 3500
                                    [merchant_location] => Array
                                        (
                                            [0] => 50
                                            [1] => 50
                                        )
 
                                    [merchant_product] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [product_id] => 6
                                                    [product_name] => 海鲜炒饭
                                                    [product_type] => 主食
                                                    [product_img] => http://product.com/7.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2400
                                                )
 
                                            [1] => Array
                                                (
                                                    [product_id] => 7
                                                    [product_name] => 西红柿鸡蛋面
                                                    [product_type] => 面食
                                                    [product_img] => http://product.com/8.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2300
                                                )
 
                                            [2] => Array
                                                (
                                                    [product_id] => 8
                                                    [product_name] => 鸭血粉丝汤
                                                    [product_type] => 汤粉
                                                    [product_img] => http://product.com/9.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2200
                                                )
 
                                            [3] => Array
                                                (
                                                    [product_id] => 9
                                                    [product_name] => 兰州炒饭
                                                    [product_type] => 主食
                                                    [product_img] => http://product.com/10.jpg
                                                    [product_sold] => 15
                                                    [product_price] => 2500
                                                )
 
                                        )
 
                                )
 
                        )
 
                )
 
        )
 
)

最佳子句

上面的例子工作的挺好,但是接下来我想搜索”兰州炒饭”,为了省事我只输入了”兰炒饭”这三个字,猜猜会发生什么:

[work@df6c675da97enuomi-search]$ phpmain.php
Array
(
    [took] => 17
    [timed_out] =>
    [_shards] => Array
        (
            [total] => 3
            [successful] => 3
            [failed] => 0
        )
 
    [hits] => Array
        (
            [total] => 3
            [max_score] => 2.666227
            [hits] => Array
                (
                    [0] => Array
                        (
                            [_index] => basic
                            [_type] => merchant
                            [_id] => 2
                            [_score] => 2.666227
                            [_source] => Array
                                (
                                    [merchant_name] => 东方宫兰州拉面
                                    [merchant_score] => 3
                                    [merchant_type] => 美食
                                    [merchant_img] => http://merchant.com/2.jpg
                                    [merchant_avg_price] => 1800
                                    [merchant_location] => Array
                                        (
                                            [0] => 120
                                            [1] => 120
                                        )
 
                                    [merchant_product] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [product_id] => 3
                                                    [product_name] => 牛肉炒面
                                                    [product_type] => 面食
                                                    [product_img] => http://product.com/4.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2400
                                                )
 
                                            [1] => Array
                                                (
                                                    [product_id] => 4
                                                    [product_name] => 蛋炒饭
                                                    [product_type] => 主食
                                                    [product_img] => http://product.com/5.jpg
                                                    [product_sold] => 14
                                                    [product_price] => 2300
                                                )
 
                                            [2] => Array
                                                (
                                                    [product_id] => 5
                                                    [product_name] => 羊肉汤
                                                    [product_type] => 汤粉
                                                    [product_img] => http://product.com/6.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2200
                                                )
 
                                        )
 
                                )
 
                        )
 
                    [1] => Array
                        (
                            [_index] => basic
                            [_type] => merchant
                            [_id] => 1
                            [_score] => 2.155528
                            [_source] => Array
                                (
                                    [merchant_name] => 鑫明明拉面
                                    [merchant_score] => 4
                                    [merchant_type] => 美食
                                    [merchant_img] => http://merchant.com/1.jpg
                                    [merchant_avg_price] => 2100
                                    [merchant_location] => Array
                                        (
                                            [0] => 127
                                            [1] => 128
                                        )
 
                                    [merchant_product] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [product_id] => 1
                                                    [product_name] => 羊肉烩面
                                                    [product_type] => 面食
                                                    [product_img] => http://product.com/2.jpg
                                                    [product_sold] => 11
                                                    [product_price] => 2200
                                                )
 
                                            [1] => Array
                                                (
                                                    [product_id] => 2
                                                    [product_name] => 烤羊肉串
                                                    [product_type] => 烤串
                                                    [product_img] => http://product.com/3.jpg
                                                    [product_sold] => 12
                                                    [product_price] => 2300
                                                )
 
                                        )
 
                                )
 
                        )
 
                    [2] => Array
                        (
                            [_index] => basic
                            [_type] => merchant
                            [_id] => 3
                            [_score] => 1.76352
                            [_source] => Array
                                (
                                    [merchant_name] => 开海饭店
                                    [merchant_score] => 3
                                    [merchant_type] => 美食
                                    [merchant_img] => http://merchant.com/3.jpg
                                    [merchant_avg_price] => 3500
                                    [merchant_location] => Array
                                        (
                                            [0] => 50
                                            [1] => 50
                                        )
 
                                    [merchant_product] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [product_id] => 6
                                                    [product_name] => 海鲜炒饭
                                                    [product_type] => 主食
                                                    [product_img] => http://product.com/7.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2400
                                                )
 
                                            [1] => Array
                                                (
                                                    [product_id] => 7
                                                    [product_name] => 西红柿鸡蛋面
                                                    [product_type] => 面食
                                                    [product_img] => http://product.com/8.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2300
                                                )
 
                                            [2] => Array
                                                (
                                                    [product_id] => 8
                                                    [product_name] => 鸭血粉丝汤
                                                    [product_type] => 汤粉
                                                    [product_img] => http://product.com/9.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2200
                                                )
 
                                            [3] => Array
                                                (
                                                    [product_id] => 9
                                                    [product_name] => 兰州炒饭
                                                    [product_type] => 主食
                                                    [product_img] => http://product.com/10.jpg
                                                    [product_sold] => 15
                                                    [product_price] => 2500
                                                )
 
                                        )
 
                                )
 
                        )
 
                )
 
        )
 
)

从肉眼来看,显然”开海饭店”最符合我的预期,因为它明确售卖”兰州炒饭”,可为什么这次却是”东方宫兰州拉面”相关性最高呢?

默认情况下,bool的should会对其内部的2个子句(一个匹配商铺名称,一个匹配商品名称)的相关性加和并除以子句个数(这里是2)作为整个商铺文档的总相关性。

“开海饭店”是商铺标题,和”兰炒饭”没有一丁点相关性(0相关性),虽然其商品”兰州炒饭”完美相关有很高的相关性,但是经过除以2就变得小很多了;反观”东方宫兰州拉面”,它的标题里出现了”兰州”,商品里出现过”炒饭”(蛋炒饭),因此相关性加和后除以2也比”开海饭店”的相关性要高。

其实我们的初衷是找到最符合搜索关键字的字段,无论它出现在”商铺”名称还是”商品”名称中,因此”最佳字段”就是解决这个问题的:它只会保留多个检索字段中最大的相关性作为总相关性,因此”开海饭店”的”兰州炒饭”将完美相关,做法如下:

<?php
 
require_once__DIR__ . "/vendor/autoload.php";
 
// 客户端
$client = Elasticsearch\ClientBuilder::fromConfig([
    'hosts' => ['localhost:9200', 'localhost:9201', 'localhost:9203'], // 最好在为ES集群搭建Haproxy反向代理
    'retries' => 2
]);
 
// 搜索关键字
$keyword = '兰拉面';
 
// 嵌套查询
$result = $client->search([
    'index' => 'basic', // 数据库
    'type' => 'merchant',  // 表
    'search_type' => 'dfs_query_then_fetch',  // 汇总IDF计算相关
    'body' => [ // 查询体
        'query' => [
            // 查询请求,影响相关性打分
            'dis_max' => [ // 最佳字段
                'queries' => [ // 取最大的相关性
                    // 第1项
                    [
                        // 全文匹配
                        'match' => ['merchant_name' => $keyword], // 商铺名
                    ],
                    // 第2项
                    [
                        // 嵌套查询
                        'nested' => [
                            'path' => 'merchant_product', // 子文档的路径
                            'score_mode' => 'max', // 子文档的评分方式(max表示取最多个子文档中最匹配的那个的相关性)
                            'query' => [ // 子文档查询请求,影响相关性打分
                                'match' => [ // 全文匹配
                                    'merchant_product.product_name' => $keyword, // 商品名(必须全路径)
                                ]
                            ]
                        ]
                    ]
                ]
            ]
        ],
    ]
]);
 
print_r($result);

主要做了如下调整:

  • bool组合查询替换成了dis_max最佳字段。
  • should替换成了queries,下面包含多个查询子句。

结果当然变成了我们想要的样子,关于”最佳字段”可以 看这个文档

[work@df6c675da97enuomi-search]$ phpmain.php
Array
(
    [took] => 41
    [timed_out] =>
    [_shards] => Array
        (
            [total] => 3
            [successful] => 3
            [failed] => 0
        )
 
    [hits] => Array
        (
            [total] => 3
            [max_score] => 1.76352
            [hits] => Array
                (
                    [0] => Array
                        (
                            [_index] => basic
                            [_type] => merchant
                            [_id] => 3
                            [_score] => 1.76352
                            [_source] => Array
                                (
                                    [merchant_name] => 开海饭店
                                    [merchant_score] => 3
                                    [merchant_type] => 美食
                                    [merchant_img] => http://merchant.com/3.jpg
                                    [merchant_avg_price] => 3500
                                    [merchant_location] => Array
                                        (
                                            [0] => 50
                                            [1] => 50
                                        )
 
                                    [merchant_product] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [product_id] => 6
                                                    [product_name] => 海鲜炒饭
                                                    [product_type] => 主食
                                                    [product_img] => http://product.com/7.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2400
                                                )
 
                                            [1] => Array
                                                (
                                                    [product_id] => 7
                                                    [product_name] => 西红柿鸡蛋面
                                                    [product_type] => 面食
                                                    [product_img] => http://product.com/8.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2300
                                                )
 
                                            [2] => Array
                                                (
                                                    [product_id] => 8
                                                    [product_name] => 鸭血粉丝汤
                                                    [product_type] => 汤粉
                                                    [product_img] => http://product.com/9.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2200
                                                )
 
                                            [3] => Array
                                                (
                                                    [product_id] => 9
                                                    [product_name] => 兰州炒饭
                                                    [product_type] => 主食
                                                    [product_img] => http://product.com/10.jpg
                                                    [product_sold] => 15
                                                    [product_price] => 2500
                                                )
 
                                        )
 
                                )
 
                        )
 
                    [1] => Array
                        (
                            [_index] => basic
                            [_type] => merchant
                            [_id] => 2
                            [_score] => 1.6903362
                            [_source] => Array
                                (
                                    [merchant_name] => 东方宫兰州拉面
                                    [merchant_score] => 3
                                    [merchant_type] => 美食
                                    [merchant_img] => http://merchant.com/2.jpg
                                    [merchant_avg_price] => 1800
                                    [merchant_location] => Array
                                        (
                                            [0] => 120
                                            [1] => 120
                                        )
 
                                    [merchant_product] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [product_id] => 3
                                                    [product_name] => 牛肉炒面
                                                    [product_type] => 面食
                                                    [product_img] => http://product.com/4.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2400
                                                )
 
                                            [1] => Array
                                                (
                                                    [product_id] => 4
                                                    [product_name] => 蛋炒饭
                                                    [product_type] => 主食
                                                    [product_img] => http://product.com/5.jpg
                                                    [product_sold] => 14
                                                    [product_price] => 2300
                                                )
 
                                            [2] => Array
                                                (
                                                    [product_id] => 5
                                                    [product_name] => 羊肉汤
                                                    [product_type] => 汤粉
                                                    [product_img] => http://product.com/6.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2200
                                                )
 
                                        )
 
                                )
 
                        )
 
                    [2] => Array
                        (
                            [_index] => basic
                            [_type] => merchant
                            [_id] => 1
                            [_score] => 1.1084312
                            [_source] => Array
                                (
                                    [merchant_name] => 鑫明明拉面
                                    [merchant_score] => 4
                                    [merchant_type] => 美食
                                    [merchant_img] => http://merchant.com/1.jpg
                                    [merchant_avg_price] => 2100
                                    [merchant_location] => Array
                                        (
                                            [0] => 127
                                            [1] => 128
                                        )
 
                                    [merchant_product] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [product_id] => 1
                                                    [product_name] => 羊肉烩面
                                                    [product_type] => 面食
                                                    [product_img] => http://product.com/2.jpg
                                                    [product_sold] => 11
                                                    [product_price] => 2200
                                                )
 
                                            [1] => Array
                                                (
                                                    [product_id] => 2
                                                    [product_name] => 烤羊肉串
                                                    [product_type] => 烤串
                                                    [product_img] => http://product.com/3.jpg
                                                    [product_sold] => 12
                                                    [product_price] => 2300
                                                )
 
                                        )
 
                                )
 
                        )
 
                )
 
        )
 
)

过滤距离

通常,我们希望找到附近N公里内的商铺,因此必须利用坐标进行筛选。

假设我的坐标是(127.1,128.1),筛选范围是以我为圆心,半径为1公里的圆形,那么ES会怎么做呢?

它早已为为每个商铺的merchant_location建立了索引,一个坐标将建立2个索引,按经度索引与按纬度索引。

ES在执行查询时,首先以我的坐标为中心画一个矩形,它恰好包裹圆形,接下来:

  1. 矩形的x轴区间范围,可以使用经度索引筛选出一批在x范围内的文档。
  2. 矩形的y轴区间范围,可以使用纬度索引筛选出一批在y范围内的文档。
  3. 两个文档集合求交集,得到矩形范围内的所有文档。
  4. 遍历所有文档,计算它们和我的坐标之间的距离是否在圆形范围内。

这种工作方式叫做”地理坐标盒模型”,它是一种比较精确的过滤手段。

下面我执行一个”过滤查询”,只保留1KM之内的与”拉面”相关的商铺:

<?php
 
require_once__DIR__ . "/vendor/autoload.php";
 
// 客户端
$client = Elasticsearch\ClientBuilder::fromConfig([
    'hosts' => ['localhost:9200', 'localhost:9201', 'localhost:9203'], // 最好在为ES集群搭建Haproxy反向代理
    'retries' => 2
]);
 
// 搜索关键字
$keyword = '拉面';
 
// 嵌套查询
$result = $client->search([
    'index' => 'basic', // 数据库
    'type' => 'merchant',  // 表
    'search_type' => 'dfs_query_then_fetch',  // 汇总IDF计算相关
    'body' => [ // 查询体
        'query' => [
            // 组合
            'bool' => [
                'must' => [
                    // 查询请求,影响相关性打分
                    'dis_max' => [ // 布尔组合
                        'queries' => [ // 各个子句相当于或的关系
                            // 第1项
                            [
                                // 全文匹配
                                'match' => ['merchant_name' => $keyword], // 商铺名
                            ],
                            // 第2项
                            [
                                // 嵌套查询
                                'nested' => [
                                    'path' => 'merchant_product', // 子文档的路径
                                    'score_mode' => 'max', // 子文档的评分方式(max表示取最多个子文档中最匹配的那个的相关性)
                                    'query' => [ // 子文档查询请求,影响相关性打分
                                        'match' => [ // 全文匹配
                                            'merchant_product.product_name' => $keyword, // 商品名(必须全路径)
                                        ]
                                    ]
                                ]
                            ]
                        ]
                    ]
                ],
                // 过滤
                'filter' => [
                    // 地理距离过滤器
                    'geo_distance' => [
                        'distance' => '1km',
                        'merchant_location' => [
                            120.3887320000, 36.0683290000
                        ]
                    ]
                ]
            ]
        ],
    ]
]);
 
print_r($result);

这里使用了filter过滤,在ES5.0中过滤filter必须和查询放在一个bool组合中,在ES5.0之前语法是完全不同的, 可以参考这里

ES会先执行filter过滤器缩小结果集,之后再对这批结果执行查询计算相关性,结果只搜出了2个店铺,它们按相关性排序:

[work@df6c675da97enuomi-search]$ phpmain.php
Array
(
    [took] => 89
    [timed_out] =>
    [_shards] => Array
        (
            [total] => 3
            [successful] => 3
            [failed] => 0
        )
 
    [hits] => Array
        (
            [total] => 2
            [max_score] => 1.1084312
            [hits] => Array
                (
                    [0] => Array
                        (
                            [_index] => basic
                            [_type] => merchant
                            [_id] => 1
                            [_score] => 1.1084312
                            [_source] => Array
                                (
                                    [merchant_name] => 鑫明明拉面
                                    [merchant_score] => 4
                                    [merchant_type] => 美食
                                    [merchant_img] => http://merchant.com/1.jpg
                                    [merchant_avg_price] => 2100
                                    [merchant_location] => Array
                                        (
                                            [0] => 120.394589
                                            [1] => 36.070517
                                        )
 
                                    [merchant_product] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [product_id] => 1
                                                    [product_name] => 羊肉烩面
                                                    [product_type] => 面食
                                                    [product_img] => http://product.com/2.jpg
                                                    [product_sold] => 11
                                                    [product_price] => 2200
                                                )
 
                                            [1] => Array
                                                (
                                                    [product_id] => 2
                                                    [product_name] => 烤羊肉串
                                                    [product_type] => 烤串
                                                    [product_img] => http://product.com/3.jpg
                                                    [product_sold] => 12
                                                    [product_price] => 2300
                                                )
 
                                        )
 
                                )
 
                        )
 
                    [1] => Array
                        (
                            [_index] => basic
                            [_type] => merchant
                            [_id] => 2
                            [_score] => 0.97589093
                            [_source] => Array
                                (
                                    [merchant_name] => 东方宫兰州拉面
                                    [merchant_score] => 3
                                    [merchant_type] => 美食
                                    [merchant_img] => http://merchant.com/2.jpg
                                    [merchant_avg_price] => 1800
                                    [merchant_location] => Array
                                        (
                                            [0] => 120.383579
                                            [1] => 36.071833
                                        )
 
                                    [merchant_product] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [product_id] => 3
                                                    [product_name] => 牛肉炒面
                                                    [product_type] => 面食
                                                    [product_img] => http://product.com/4.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2400
                                                )
 
                                            [1] => Array
                                                (
                                                    [product_id] => 4
                                                    [product_name] => 蛋炒饭
                                                    [product_type] => 主食
                                                    [product_img] => http://product.com/5.jpg
                                                    [product_sold] => 14
                                                    [product_price] => 2300
                                                )
 
                                            [2] => Array
                                                (
                                                    [product_id] => 5
                                                    [product_name] => 羊肉汤
                                                    [product_type] => 汤粉
                                                    [product_img] => http://product.com/6.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2200
                                                )
 
                                        )
 
                                )
 
                        )
 
                )
 
        )
 
)

修改distance为2km,三个”商铺”将全部返回,在此不做演示。

排序

其实,各类APP的搜索功能中有一个叫做:”综合排序”,其实就是指相关性排序,之前我们默认采用相关性作为 排序 标准。

但是糯米检索还支持若干其他排序标准,比如:按距离排序。

现在我的查询需求是:2KM之内与”拉面”相关的”店铺”,并且按与我的距离远近排序。

<?php
 
require_once__DIR__ . "/vendor/autoload.php";
 
// 客户端
$client = Elasticsearch\ClientBuilder::fromConfig([
    'hosts' => ['localhost:9200', 'localhost:9201', 'localhost:9203'], // 最好在为ES集群搭建Haproxy反向代理
    'retries' => 2
]);
 
// 搜索关键字
$keyword = '拉面';
 
// 嵌套查询
$result = $client->search([
    'index' => 'basic', // 数据库
    'type' => 'merchant',  // 表
    'search_type' => 'dfs_query_then_fetch',  // 汇总IDF计算相关
    'body' => [ // 查询体
        'query' => [
            // 组合
            'bool' => [
                'must' => [
                    // 查询请求,影响相关性打分
                    'dis_max' => [ // 布尔组合
                        'queries' => [ // 各个子句相当于或的关系
                            // 第1项
                            [
                                // 全文匹配
                                'match' => ['merchant_name' => $keyword], // 商铺名
                            ],
                            // 第2项
                            [
                                // 嵌套查询
                                'nested' => [
                                    'path' => 'merchant_product', // 子文档的路径
                                    'score_mode' => 'max', // 子文档的评分方式(max表示取最多个子文档中最匹配的那个的相关性)
                                    'query' => [ // 子文档查询请求,影响相关性打分
                                        'match' => [ // 全文匹配
                                            'merchant_product.product_name' => $keyword, // 商品名(必须全路径)
                                        ]
                                    ]
                                ]
                            ]
                        ]
                    ]
                ],
                // 过滤
                'filter' => [
                    // 地理距离过滤器
                    'geo_distance' => [
                        'distance' => '2km',
                        'merchant_location' => [
                            120.3887320000, 36.0683290000
                        ]
                    ]
                ]
            ]
        ],
        // 排序
        'sort' => [
            [
                '_geo_distance' => [
                    // 计算与这个点之间的距离
                    'merchant_location' => [
                        120.3887320000, 36.0683290000
                    ],
                    // 距离近的排列在前面
                    'order' => 'asc',
                    // 返回单位是km
                    'unit' => 'km',
                ]
            ]
        ]
    ]
]);
 
print_r($result);

这里通过sort指定了一个排序规则,其返回的结果中有一个sort字段返回了与每个”商铺”的距离(其单位是km),并且按距离有序返回:

[work@df6c675da97enuomi-search]$ phpmain.php
Array
(
    [took] => 68
    [timed_out] =>
    [_shards] => Array
        (
            [total] => 3
            [successful] => 3
            [failed] => 0
        )
 
    [hits] => Array
        (
            [total] => 3
            [max_score] =>
            [hits] => Array
                (
                    [0] => Array
                        (
                            [_index] => basic
                            [_type] => merchant
                            [_id] => 1
                            [_score] =>
                            [_source] => Array
                                (
                                    [merchant_name] => 鑫明明拉面
                                    [merchant_score] => 4
                                    [merchant_type] => 美食
                                    [merchant_img] => http://merchant.com/1.jpg
                                    [merchant_avg_price] => 2100
                                    [merchant_location] => Array
                                        (
                                            [0] => 120.394589
                                            [1] => 36.070517
                                        )
 
                                    [merchant_product] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [product_id] => 1
                                                    [product_name] => 羊肉烩面
                                                    [product_type] => 面食
                                                    [product_img] => http://product.com/2.jpg
                                                    [product_sold] => 11
                                                    [product_price] => 2200
                                                )
 
                                            [1] => Array
                                                (
                                                    [product_id] => 2
                                                    [product_name] => 烤羊肉串
                                                    [product_type] => 烤串
                                                    [product_img] => http://product.com/3.jpg
                                                    [product_sold] => 12
                                                    [product_price] => 2300
                                                )
 
                                        )
 
                                )
 
                            [sort] => Array
                                (
                                    [0] => 0.57992238133363
                                )
 
                        )
 
                    [1] => Array
                        (
                            [_index] => basic
                            [_type] => merchant
                            [_id] => 2
                            [_score] =>
                            [_source] => Array
                                (
                                    [merchant_name] => 东方宫兰州拉面
                                    [merchant_score] => 3
                                    [merchant_type] => 美食
                                    [merchant_img] => http://merchant.com/2.jpg
                                    [merchant_avg_price] => 1800
                                    [merchant_location] => Array
                                        (
                                            [0] => 120.383579
                                            [1] => 36.071833
                                        )
 
                                    [merchant_product] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [product_id] => 3
                                                    [product_name] => 牛肉炒面
                                                    [product_type] => 面食
                                                    [product_img] => http://product.com/4.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2400
                                                )
 
                                            [1] => Array
                                                (
                                                    [product_id] => 4
                                                    [product_name] => 蛋炒饭
                                                    [product_type] => 主食
                                                    [product_img] => http://product.com/5.jpg
                                                    [product_sold] => 14
                                                    [product_price] => 2300
                                                )
 
                                            [2] => Array
                                                (
                                                    [product_id] => 5
                                                    [product_name] => 羊肉汤
                                                    [product_type] => 汤粉
                                                    [product_img] => http://product.com/6.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2200
                                                )
 
                                        )
 
                                )
 
                            [sort] => Array
                                (
                                    [0] => 0.60523716061392
                                )
 
                        )
 
                    [2] => Array
                        (
                            [_index] => basic
                            [_type] => merchant
                            [_id] => 3
                            [_score] =>
                            [_source] => Array
                                (
                                    [merchant_name] => 开海饭店
                                    [merchant_score] => 3
                                    [merchant_type] => 美食
                                    [merchant_img] => http://merchant.com/3.jpg
                                    [merchant_avg_price] => 3500
                                    [merchant_location] => Array
                                        (
                                            [0] => 120.405117
                                            [1] => 36.0683
                                        )
 
                                    [merchant_product] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [product_id] => 6
                                                    [product_name] => 海鲜炒饭
                                                    [product_type] => 主食
                                                    [product_img] => http://product.com/7.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2400
                                                )
 
                                            [1] => Array
                                                (
                                                    [product_id] => 7
                                                    [product_name] => 西红柿鸡蛋面
                                                    [product_type] => 面食
                                                    [product_img] => http://product.com/8.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2300
                                                )
 
                                            [2] => Array
                                                (
                                                    [product_id] => 8
                                                    [product_name] => 鸭血粉丝汤
                                                    [product_type] => 汤粉
                                                    [product_img] => http://product.com/9.jpg
                                                    [product_sold] => 10
                                                    [product_price] => 2200
                                                )
 
                                            [3] => Array
                                                (
                                                    [product_id] => 9
                                                    [product_name] => 兰州炒饭
                                                    [product_type] => 主食
                                                    [product_img] => http://product.com/10.jpg
                                                    [product_sold] => 15
                                                    [product_price] => 2500
                                                )
 
                                        )
 
                                )
 
                            [sort] => Array
                                (
                                    [0] => 1.4726960298128
                                )
 
                        )
 
                )
 
        )
 
)

更多的过滤和排序

上面我们演示了:按相关性排序,按距离排序,按距离筛选,大概这三类功能的结合体。为了更多的演示糯米搜索功能,我们继续丰富这个搜索请求:

2KM以内,与”拉面”相关,商铺评分>=4分,商铺均价<=20元,按商铺评分和商品总销量排序,并且返回结果中包含距离。

<?php
 
require_once__DIR__ . "/vendor/autoload.php";
 
// 客户端
$client = Elasticsearch\ClientBuilder::fromConfig([
    'hosts' => ['localhost:9200', 'localhost:9201', 'localhost:9203'], // 最好在为ES集群搭建Haproxy反向代理
    'retries' => 2
]);
 
// 搜索关键字
$keyword = '拉面';
 
// 嵌套查询
$result = $client->search([
    'index' => 'basic', // 数据库
    'type' => 'merchant',  // 表
    'search_type' => 'dfs_query_then_fetch',  // 汇总IDF计算相关
    'body' => [ // 查询体
        'query' => [
            // 组合
            'bool' => [
                'must' => [
                    // 查询请求,影响相关性打分
                    'dis_max' => [ // 最佳子句
                        'queries' => [ // 各个子句相当于或的关系
                            // 第1项
                            [
                                // 全文匹配
                                'match' => ['merchant_name' => $keyword], // 商铺名
                            ],
                            // 第2项
                            [
                                // 嵌套查询
                                'nested' => [
                                    'path' => 'merchant_product', // 子文档的路径
                                    'score_mode' => 'max', // 子文档的评分方式(max表示取最多个子文档中最匹配的那个的相关性)
                                    'query' => [ // 子文档查询请求,影响相关性打分
                                        'match' => [ // 全文匹配
                                            'merchant_product.product_name' => $keyword, // 商品名(必须全路径)
                                        ]
                                    ]
                                ]
                            ]
                        ]
                    ]
                ],
                // 过滤(不参与相关性计算)
                'filter' => [
                    'bool' => [ // 组合过滤
                        'must' => [ // AND关系
                            // 2KM内
                            [
                                // 地理距离过滤器
                                'geo_distance' => [
                                    'distance' => '2km',
                                    'merchant_location' => [
                                        120.3887320000, 36.0683290000
                                    ]
                                ]
                            ],
                            // 商铺评分>=4
                            [
                                'range' => [
                                    'merchant_score' => [
                                        'gte' => 4,
                                    ]
                                ]
                            ],
                            // 商品均价<=20元
                            [
                                'range' => [
                                    'merchant_avg_price' => [
                                        'lte' => 2100,
                                    ]
                                ]
                            ],
                        ]
                    ]
                ]
            ]
        ],
        // 排序
        'sort' => [
            [
                // 先按店铺评分从高到低排序
                'merchant_score' => [
                    'order' => 'desc',
                ],
                // 再按嵌套的商品总销量从高到低排序
                'merchant_product.product_sold' => [
                    'mode' => 'sum', // 求商品的总销量
                    'order' => 'desc',
                    'nested_path' => 'merchant_product', // 嵌套文档的路径
                ]
            ]
        ],
        // 仍旧返回_source完整文档内容
        '_source' => [],
        // 脚本计算字段
        'script_fields' => [
            // 自定义的字段名
            'distance' => [
                'script' => [
                    "lang" => "painless",
                    // 自定义的脚本输入参数
                    'params' => [
                        'lon' => 120.3887320000,
                        'lat' => 36.0683290000,
                    ],
                    //脚本内容
                    'inline' => "doc['merchant_location'].arcDistance(params['lat'],params['lon'])"
                ]
            ]
        ]
    ]
]);
 
print_r($result);

分析一下这个查询:

  • query:通过bool组合实现”带过滤filter的相关性查询”:
    • 即先通过filter过滤出一批满足条件的数据。
    • 再对这批数据进行must(AND关系)相关性计算。
  • sort:有2项排序规则:
    • 按商铺分数倒序。
    • 如果商铺分数一样,则按商品总销量倒序。
  • _source:控制返回的字段,为空表示返回所有字段。
  • script_fields:脚本计算生成额外的字段,这里定义了一个distance字段:
    • lang是标示脚本的语言,ES支持多种脚本语言。
    • params是脚本输入参数。
    • inline是内联脚本,它计算出每个文档和我所在坐标之间的距离。

得到的结果如下:

[work@df6c675da97enuomi-search]$ phpmain.php
Array
(
    [took] => 74
    [timed_out] =>
    [_shards] => Array
        (
            [total] => 3
            [successful] => 3
            [failed] => 0
        )
 
    [hits] => Array
        (
            [total] => 1
            [max_score] =>
            [hits] => Array
                (
                    [0] => Array
                        (
                            [_index] => basic
                            [_type] => merchant
                            [_id] => 1
                            [_score] =>
                            [_source] => Array
                                (
                                    [merchant_name] => 鑫明明拉面
                                )
 
                            [fields] => Array
                                (
                                    [distance] => Array
                                        (
                                            [0] => 579.92238133363
                                        )
 
                                )
 
                            [sort] => Array
                                (
                                    [0] => 4
                                    [1] => 23
                                )
 
                        )
 
                )
 
        )
 
)

可见_source仍旧是匹配的完整文档内容,fields则是我们的脚本补充的额外字段(距离我579米),sort是排序时依据的数据。

更多关于ES执行script的用法可以参考: 文档1文档2 .

查询还是过滤

在上面这个例子中,我们使用了”带过滤的查询”,最终却没有使用”相关性”作为排序标准,因此”相关性计算”这个环节其实是完全没有必要的,我们需要的仅仅是通过倒排索引查出匹配关键字的文档,无需计算文档与关键字的相关性。

怎么关掉相关性计算呢?

<?php
 
require_once__DIR__ . "/vendor/autoload.php";
 
// 客户端
$client = Elasticsearch\ClientBuilder::fromConfig([
    'hosts' => ['localhost:9200', 'localhost:9201', 'localhost:9203'], // 最好在为ES集群搭建Haproxy反向代理
    'retries' => 2
]);
 
// 搜索关键字
$keyword = '拉面';
 
// 嵌套查询
$result = $client->search([
    'index' => 'basic', // 数据库
    'type' => 'merchant',  // 表
    'search_type' => 'dfs_query_then_fetch',  // 汇总IDF计算相关
    'body' => [ // 查询体
        'query' => [
            // 组合
            'bool' => [
                'must' => [
                    'constant_score' => [
                        'query' => [
                            // 查询请求,影响相关性打分
                            'dis_max' => [ // 最佳子句
                                'queries' => [ // 各个子句相当于或的关系
                                    // 第1项
                                    [
                                        // 全文匹配
                                        'match' => ['merchant_name' => $keyword], // 商铺名
                                    ],
                                    // 第2项
                                    [
                                        // 嵌套查询
                                        'nested' => [
                                            'path' => 'merchant_product', // 子文档的路径
                                            'score_mode' => 'max', // 子文档的评分方式(max表示取最多个子文档中最匹配的那个的相关性)
                                            'query' => [ // 子文档查询请求,影响相关性打分
                                                'match' => [ // 全文匹配
                                                    'merchant_product.product_name' => $keyword, // 商品名(必须全路径)
                                                ]
                                            ]
                                        ]
                                    ]
                                ]
                            ]
                        ]
                    ]
                ],
                // 过滤(不参与相关性计算)
                'filter' => [
                    'bool' => [ // 组合过滤
                        'must' => [ // AND关系
                            // 2KM内
                            [
                                // 地理距离过滤器
                                'geo_distance' => [
                                    'distance' => '2km',
                                    'merchant_location' => [
                                        120.3887320000, 36.0683290000
                                    ]
                                ]
                            ],
                            // 商铺评分>=4
                            [
                                'range' => [
                                    'merchant_score' => [
                                        'gte' => 4,
                                    ]
                                ]
                            ],
                            // 商品均价<=20元
                            [
                                'range' => [
                                    'merchant_avg_price' => [
                                        'lte' => 2100,
                                    ]
                                ]
                            ],
                        ]
                    ]
                ]
            ]
        ],
        // 排序
        'sort' => [
            [
                // 先按店铺评分从高到低排序
                'merchant_score' => [
                    'order' => 'desc',
                ],
                // 再按嵌套的商品总销量从高到低排序
                'merchant_product.product_sold' => [
                    'mode' => 'sum', // 求商品的总销量
                    'order' => 'desc',
                    'nested_path' => 'merchant_product', // 嵌套文档的路径
                ]
            ]
        ],
        // 仍旧返回_source完整文档内容
        '_source' => [],
        // 脚本计算字段
        'script_fields' => [
            // 自定义的字段名
            'distance' => [
                'script' => [
                    "lang" => "painless",
                    // 自定义的脚本输入参数
                    'params' => [
                        'lon' => 120.3887320000,
                        'lat' => 36.0683290000,
                    ],
                    //脚本内容
                    'inline' => "doc['merchant_location'].arcDistance(params['lat'],params['lon'])"
                ]
            ]
        ]
    ]
]);
 
print_r($result);

区别于之前的代码,我将must下面的唯一1个查询子句挪到了constant_score下面,并用query包裹起来,这样整个子句对外的相关性打分将成为常量1,ES只会判断关键字是否出现过,搜索过程变成了非0即1的黑白判定。

具体可以 参考文档 了解细节。

聚合统计分析

ES支持对查询出来的结果集合进行进一步的聚合分析,支持类似 Mysql 中的max,min,count,sum等聚合操作,也支持类似group by的分桶,以及分桶后的聚合操作。

聚合这块内容比较多,建议自己学习 官方文档

我在这里简单的演示一下;首先保持之前的搜索语句不变,额外增加一个aggs聚合语句,统计每种product_type的平均销量:

<?php
 
require_once__DIR__ . "/vendor/autoload.php";
 
// 客户端
$client = Elasticsearch\ClientBuilder::fromConfig([
    'hosts' => ['localhost:9200', 'localhost:9201', 'localhost:9203'], // 最好在为ES集群搭建Haproxy反向代理
    'retries' => 2
]);
 
// 搜索关键字
$keyword = '拉面';
 
// 嵌套查询
$result = $client->search([
    'index' => 'basic', // 数据库
    'type' => 'merchant',  // 表
    'search_type' => 'dfs_query_then_fetch',  // 汇总IDF计算相关
    'body' => [ // 查询体
        'query' => [
            // 组合
            'bool' => [
                'must' => [
                    'constant_score' => [
                        'query' => [
                            // 查询请求,影响相关性打分
                            'dis_max' => [ // 最佳子句
                                'queries' => [ // 各个子句相当于或的关系
                                    // 第1项
                                    [
                                        // 全文匹配
                                        'match' => ['merchant_name' => $keyword], // 商铺名
                                    ],
                                    // 第2项
                                    [
                                        // 嵌套查询
                                        'nested' => [
                                            'path' => 'merchant_product', // 子文档的路径
                                            'score_mode' => 'max', // 子文档的评分方式(max表示取最多个子文档中最匹配的那个的相关性)
                                            'query' => [ // 子文档查询请求,影响相关性打分
                                                'match' => [ // 全文匹配
                                                    'merchant_product.product_name' => $keyword, // 商品名(必须全路径)
                                                ]
                                            ]
                                        ]
                                    ]
                                ]
                            ]
                        ]
                    ]
                ],
                // 过滤(不参与相关性计算)
                'filter' => [
                    'bool' => [ // 组合过滤
                        'must' => [ // AND关系
                            // 2KM内
                            [
                                // 地理距离过滤器
                                'geo_distance' => [
                                    'distance' => '2km',
                                    'merchant_location' => [
                                        120.3887320000, 36.0683290000
                                    ]
                                ]
                            ],
                            // 商铺评分>=4
                            [
                                'range' => [
                                    'merchant_score' => [
                                        'gte' => 4,
                                    ]
                                ]
                            ],
                            // 商品均价<=20元
                            [
                                'range' => [
                                    'merchant_avg_price' => [
                                        'lte' => 2100,
                                    ]
                                ]
                            ],
                        ]
                    ]
                ]
            ]
        ],
        // 排序
        'sort' => [
            [
                // 先按店铺评分从高到低排序
                'merchant_score' => [
                    'order' => 'desc',
                ],
                // 再按嵌套的商品总销量从高到低排序
                'merchant_product.product_sold' => [
                    'mode' => 'sum', // 求商品的总销量
                    'order' => 'desc',
                    'nested_path' => 'merchant_product', // 嵌套文档的路径
                ]
            ]
        ],
        // 仍旧返回_source完整文档内容
        '_source' => [],
        // 脚本计算字段
        'script_fields' => [
            // 自定义的字段名
            'distance' => [
                'script' => [
                    "lang" => "painless",
                    // 自定义的脚本输入参数
                    'params' => [
                        'lon' => 120.3887320000,
                        'lat' => 36.0683290000,
                    ],
                    //脚本内容
                    'inline' => "doc['merchant_location'].arcDistance(params['lat'],params['lon'])"
                ]
            ]
        ],
        // 聚合(aggs和query一样必须写)
        'aggs' => [ // 1个aggs下面可以写多个key,每个key是一个聚合项
            'merchant_product' => [
                'nested' => [ // 深入到merchant_product嵌套文档
                    'path' => 'merchant_product'
                ],
                // merchant_product没有分桶,直接运用如下的聚合运算
                'aggs' => [
                    // 一个聚合项
                    'product_type' => [
                        // 数据先按product_type分桶
                        'terms' => [
                            'field' => 'merchant_product.product_type',
                        ],
                        // 对每个桶,进一步聚合
                        'aggs' => [
                            // 一个聚合项
                            'product_avg_sold' => [
                                // 不分桶
 
                                // 直接计算product_sold的平均值
                                'avg' => [
                                    'field' => 'merchant_product.product_sold'
                                ]
                            ]
                        ]
                    ],
                ]
            ]
        ]
    ]
]);
 
print_r($result);
  • aggs用来容纳若干聚合项,它们将分别计算。
  • 每个聚合项可以应用2种子句:
    • 调用sum,avg,min,max等对字段进行聚合,它们成为”指标”。
    • 调用terms将数据分成桶,然后通过嵌套的aggs对每个桶进一步聚合。

聚合aggs是在query完成后执行的,它的输入是query的输出,如果没有query语句,那么aggs将在全部文档上执行。

聚合只是用来分析数据用的,并不能把聚合的结果拿来过滤数据集,这是一定要注意的!

如果你感觉理解困难,对括号嵌套一塌糊涂,建议仔细揣摩上面的例子和解释,最好能够系统的看一下 官方文档

经过一系列的实例,应该对ES的常见用法有了一定的掌握。

我认为2个知识点是学以致用关键:

  • 掌握全文检索的基本原理(TF/IDF,相关性)
  • 亲自动手实践ES的查询语法,揣摩编写的逻辑。

父子文档(Parent-Child)

“parent-child父子文档”功能:它关联同一个index下的2个type形成父子关系,两个type可以各自独立更新,在查询的时候可以选择其中之一作为检索主表,另外一张作为辅表,从而实现用子文档筛选父文档或者用父文档筛选子文档的能力。 一定要注意,”父子文档”并不是数据库那样的JOIN操作,因此不会将匹配的父子记录一起返回,这也是为什么”糯米美食”同时检索”店铺”和”商品”信息,但最终只能返回”店铺”数据的原因。 “父子文档”要求父亲与孩子符合一定的存储要求:
  • 如果父亲与孩子不在同一个index中存储,那么不同index各自进行分布式存储,两者数据无法本地化。
  • 父亲与属于其的孩子不在同一个shard中存储,那么不同shard各自进行分布式存储,两者数据也无法本地化。
因为”父子文档”在技术上的这些限制,因此:
  • 父亲与孩子必须存储在同一个index下的不同type中。
  • 对于父亲A和属于A的孩子们,它们应该”路由”到同一个shard下。
无论如何,本节我们先创建一个index再说,它有3个分片shard,每个shard有2个备份,这样规划的原因是因为我有3个ES节点:
<del><spanstyle="color: #808080;"><?php
 
require_once__DIR__ . "/vendor/autoload.php";
 
// 客户端
$client = Elasticsearch\ClientBuilder::fromConfig([
    'hosts' => ['localhost:9200', 'localhost:9201', 'localhost:9203'], // 最好在为ES集群搭建Haproxy反向代理
    'retries' => 2
]);
 
// 创建index
$indices = $client->indices();
$indices->create([
    'index' => 'basic', // 基础数据
    'body' => [
        'settings' => [
            "number_of_shards" => 3,    // 3个分区
            "number_of_replicas" => 2,  // 每个分区有1个主分片和2个从分片
        ]
    ]
]);
 
 
</span></del>
首先创建”商铺”表:
<del><spanstyle="color: #808080;"><?php
 
require_once__DIR__ . "/vendor/autoload.php";
 
// 客户端
$client = Elasticsearch\ClientBuilder::fromConfig([
    'hosts' => ['localhost:9200', 'localhost:9201', 'localhost:9203'], // 最好在为ES集群搭建Haproxy反向代理
    'retries' => 2
]);
 
// 创建商铺type
$indices = $client->indices();
$indices->putMapping([
    'index' => 'basic',
    'type' => 'merchant', // 商铺表
    'body' => [
        // 属性
        'properties' => [
            // 商铺名称
            'merchant_name' => [
                'type' => 'string', // 字符串
                'index' => 'analyzed', // 全文索引
                'analyzer' => 'ik_max_word', // 中文分词
            ],
            // 商铺图片
            'merchant_img' => [
                'type' => 'string', // 字符串
                'index' => 'no', // 不索引
            ],
            // 商铺类型
            'merchant_type' => [
                'type' => 'string', // 字符串
                'index' => 'not_analyzed', // 不分词,直接索引
            ],
            // 用户评分
            'merchant_score' => [
                'type' => 'integer', // 整形
                'index' => 'not_analyzed', // 直接索引,用于过滤/排序
            ],
            // 人均价格
            'merchant_avg_price' => [
                'type' => 'integer', // 整形
                'index' => 'not_analyzed', // 直接索引,用于过滤/排序
            ],
            // 地理坐标
            'merchant_location' => [
                'type' => 'geo_point', // 地址坐标
            ]
        ]
    ],
]);
 
</span></del>
需要过滤,排序,检索的字段,需要根据其用途配置index项:
  • not_analyzed:不需要分词的将被作为整体索引,那么使用。
  • analyzed:需要分词的,先经过analyzer分词成很多单词再逐个被索引。
  • no:不需要索引,仅作为数据字段一起被保存。
同样的道理,我们现在创建”商品type”:
<del><spanstyle="color: #808080;"><?php
 
require_once__DIR__ . "/vendor/autoload.php";
 
// 客户端
$client = Elasticsearch\ClientBuilder::fromConfig([
    'hosts' => ['localhost:9200', 'localhost:9201', 'localhost:9203'], // 最好在为ES集群搭建Haproxy反向代理
    'retries' => 2
]);
 
// 创建商品type
$indices = $client->indices();
$indices->putMapping([
    'index' => 'basic',
    'type' => 'product', // 商品表
    'body' => [
        '_parent' => [
            'type' => 'merchant',
        ],
        // 属性
        'properties' => [
            // 商品名称
            'product_name' => [
                'type' => 'string', // 字符串
                'index' => 'analyzed', // 全文索引
                'analyzer' => 'ik_max_word', // 中文分词
            ],
            // 商品图片
            'product_img' => [
                'type' => 'string', // 字符串
                'index' => 'no', // 不索引
            ],
            // 商品类型
            'product_type' => [
                'type' => 'string', // 字符串
                'index' => 'not_analyzed', // 不分词,直接索引
            ],
            // 商品价格
            'product_price' => [
                'type' => 'integer', // 整形
                'index' => 'not_analyzed', // 直接索引,用于过滤/排序
            ],
            // 商品销量
            'product_sold' => [
                'type' => 'integer', // 整形
                'index' => 'not_analyzed', // 直接索引,用于排序/过滤
            ]
        ]
    ],
]);</span></del>
我通过给_parent指定了它的父亲为merchant,但是这条命令会报错:can’t add a _parent field that points to an already existing type, that isn’t already a parent。 ES的强制规定,表达父子关系的2个type必须在创建index的时候一起提交mapping。也就是说:假如父type已存在,希望让一个新的子type关联到父type是不可能的( 详细可见说明 )! 无论如何,我确定要使用”父子文档”,因此我重新执行这段代码:它删除现在的index,并在重建index的同时传入2个type的mapping定义:
<del><spanstyle="color: #808080;"><?php
 
require_once__DIR__ . "/vendor/autoload.php";
 
// 客户端
$client = Elasticsearch\ClientBuilder::fromConfig([
    'hosts' => ['localhost:9200', 'localhost:9201', 'localhost:9203'], // 最好在为ES集群搭建Haproxy反向代理
    'retries' => 2
]);
 
// 创建商铺type
$indices = $client->indices();
$indices->delete(['index' => 'basic']);
$indices->create([
    'index' => 'basic',
    'body' => [
        // index配置
        'settings' => [
            "number_of_shards" => 3,    // 3个分区
            "number_of_replicas" => 2,  // 每个分区有1个主分片和2个从分片
        ],
        // type映射
        'mappings' => [
            // 商铺
            'merchant' => [
                // 属性
                'properties' => [
                    // 商铺名称
                    'merchant_name' => [
                        'type' => 'string', // 字符串
                        'index' => 'analyzed', // 全文索引
                        'analyzer' => 'ik_max_word', // 中文分词
                    ],
                    // 商铺图片
                    'merchant_img' => [
                        'type' => 'string', // 字符串
                        'index' => 'no', // 不索引
                    ],
                    // 商铺类型
                    'merchant_type' => [
                        'type' => 'string', // 字符串
                        'index' => 'not_analyzed', // 不分词,直接索引
                    ],
                    // 用户评分
                    'merchant_score' => [
                        'type' => 'integer', // 整形
                        'index' => 'not_analyzed', // 直接索引,用于过滤/排序
                    ],
                    // 人均价格
                    'merchant_avg_price' => [
                        'type' => 'integer', // 整形
                        'index' => 'not_analyzed', // 直接索引,用于过滤/排序
                    ],
                    // 地理坐标
                    'merchant_location' => [
                        'type' => 'geo_point', // 地址坐标
                    ]
                ]
            ],
            // 商品
            'product' => [
                '_parent' => [
                    'type' => 'merchant',
                ],
                // 属性
                'properties' => [
                    // 商品名称
                    'product_name' => [
                        'type' => 'string', // 字符串
                        'index' => 'analyzed', // 全文索引
                        'analyzer' => 'ik_max_word', // 中文分词
                    ],
                    // 商品图片
                    'product_img' => [
                        'type' => 'string', // 字符串
                        'index' => 'no', // 不索引
                    ],
                    // 商品类型
                    'product_type' => [
                        'type' => 'string', // 字符串
                        'index' => 'not_analyzed', // 不分词,直接索引
                    ],
                    // 商品价格
                    'product_price' => [
                        'type' => 'integer', // 整形
                        'index' => 'not_analyzed', // 直接索引,用于过滤/排序
                    ],
                    // 商品销量
                    'product_sold' => [
                        'type' => 'integer', // 整形
                        'index' => 'not_analyzed', // 直接索引,用于排序/过滤
                    ]
                ]
            ],
        ]
    ],
]);</span></del>
通过curl看一下basic数据库的定义:
<del><spanstyle="color: #808080;">[work@df6c675da97enuomi-search]$ curllocalhost:9200/basic?pretty
{
  "basic" : {
    "aliases" : { },
    "mappings" : {
      "product" : {
        "_parent" : {
          "type" : "merchant"
        },
        "_routing" : {
          "required" : true
        },
        "properties" : {
          "product_img" : {
            "type" : "keyword",
            "index" : false
          },
          "product_name" : {
            "type" : "text",
            "analyzer" : "ik_max_word"
          },
          "product_price" : {
            "type" : "integer"
          },
          "product_sold" : {
            "type" : "integer"
          },
          "product_type" : {
            "type" : "keyword"
          }
        }
      },
      "merchant" : {
        "properties" : {
          "merchant_avg_price" : {
            "type" : "integer"
          },
          "merchant_img" : {
            "type" : "keyword",
            "index" : false
          },
          "merchant_location" : {
            "type" : "geo_point"
          },
          "merchant_name" : {
            "type" : "text",
            "analyzer" : "ik_max_word"
          },
          "merchant_score" : {
            "type" : "integer"
          },
          "merchant_type" : {
            "type" : "keyword"
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1489749586006",
        "number_of_shards" : "3",
        "number_of_replicas" : "2",
        "uuid" : "iOCEqZHiQ1inEK9SvePetw",
        "version" : {
          "created" : "5020299"
        },
        "provided_name" : "basic"
      }
    }
  }
}</span></del>
可以看到product表的_parent已经生效,_routing强制为true是因为之前说的原因:子文档必须和父文档路由到同一个shard内才能实现”父子文档”的联合查询,那么什么是_routing呢? 一个文档进入哪个shard是由hash(_routing)%分片个数 来决定的,而_routing默认等于文档_id( 可以点这里了解 )。

插入数据

下面通过bulk批量API,先添加3个店铺:
<del><spanstyle="color: #808080;"><?php
 
require_once__DIR__ . "/vendor/autoload.php";
 
// 客户端
$client = Elasticsearch\ClientBuilder::fromConfig([
    'hosts' => ['localhost:9200', 'localhost:9201', 'localhost:9203'], // 最好在为ES集群搭建Haproxy反向代理
    'retries' => 2
]);
 
$client->bulk([
    'index' => 'basic',
    'type' => 'merchant',
    'body' => [
        /* 第1行记录 */
        // index方法+元数据
        ['index' => ['_id' => 1]],  // _id:店铺ID,默认_routing=_id
        // 请求体
        [
            'merchant_name' => '鑫明明拉面',
            'merchant_score' => 4,
            'merchant_type' => '美食',
            'merchant_img' => 'http://merchant.com/1.jpg',
            'merchant_avg_price' => 2100,
            'merchant_location' => ['127', '128']
        ],
 
 
        /* 第2行记录 */
        // index方法+元数据
        ['index' => ['_id' => 2]],
        // 请求体
        [
            'merchant_name' => '东方宫兰州拉面',
            'merchant_score' => 3,
            'merchant_type' => '美食',
            'merchant_img' => 'http://merchant.com/2.jpg',
            'merchant_avg_price' => 1800,
            'merchant_location' => ['120', '120']
        ],
 
        /* 第3行记录 */
        // index方法+元数据
        ['index' => ['_id' => 3]],
        // 请求体
        [
            'merchant_name' => '开海饭店',
            'merchant_score' => 3,
            'merchant_type' => '美食',
            'merchant_img' => 'http://merchant.com/3.jpg',
            'merchant_avg_price' => 3500,
            'merchant_location' => ['50', '50']
        ],
    ]
]);</span></del>
之后为每个店铺添加一些商品:
<del><spanstyle="color: #808080;">$client->bulk([
    'index' => 'basic',
    'type' => 'product',
    'body' => [
        // 第1个商铺的商品,设置其_parent=1从而与对应的商铺进入同一个shard
        // index方法+元数据
        ['index' => ['_id' => 1, '_parent' => 1]],  // _id:商品ID
        // 请求体
        [
            'product_name' => '牛肉拉面',
            'product_type' => '面食',
            'product_img' => 'http://product.com/1.jpg',
            'product_sold' => 10,
            'product_price' => 2100,
        ],
        // index方法+元数据
        ['index' => ['_id' => 2, '_parent' => 1]],  // _id:商品ID
        // 请求体
        [
            'product_name' => '羊肉烩面',
            'product_type' => '面食',
            'product_img' => 'http://product.com/2.jpg',
            'product_sold' => 11,
            'product_price' => 2200,
        ],
        // index方法+元数据
        ['index' => ['_id' => 3, '_parent' => 1]],  // _id:商品ID
        // 请求体
        [
            'product_name' => '烤羊肉串',
            'product_type' => '烤串',
            'product_img' => 'http://product.com/3.jpg',
            'product_sold' => 12,
            'product_price' => 2300,
        ],
        ///////// 第1个商铺的商品插入结束
 
        // 第2个商铺的商品,设置其_parent=2从而与对应的商铺进入同一个shard
        // index方法+元数据
        ['index' => ['_id' => 4, '_parent' => 2]],  // _id:商品ID
        // 请求体
        [
            'product_name' => '牛肉炒面',
            'product_type' => '面食',
            'product_img' => 'http://product.com/4.jpg',
            'product_sold' => 10,
            'product_price' => 2400,
        ],
        // index方法+元数据
        ['index' => ['_id' => 5, '_parent' => 2]],  // _id:商品ID
        // 请求体
        [
            'product_name' => '蛋炒饭',
            'product_type' => '主食',
            'product_img' => 'http://product.com/5.jpg',
            'product_sold' => 10,
            'product_price' => 2300,
        ],
        // index方法+元数据
        ['index' => ['_id' => 6, '_parent' => 2]],  // _id:商品ID
        // 请求体
        [
            'product_name' => '羊肉汤',
            'product_type' => '汤粉',
            'product_img' => 'http://product.com/6.jpg',
            'product_sold' => 10,
            'product_price' => 2200,
        ],
        ///////// 第2个商铺的商品插入结束
 
        // 第3个商铺的商品,设置其_parent=3从而与对应的商铺进入同一个shard
        // index方法+元数据
        ['index' => ['_id' => 7, '_parent' => 3]],  // _id:商品ID
        // 请求体
        [
            'product_name' => '海鲜炒饭',
            'product_type' => '主食',
            'product_img' => 'http://product.com/7.jpg',
            'product_sold' => 10,
            'product_price' => 2400,
        ],
        // index方法+元数据
        ['index' => ['_id' => 8, '_parent' => 3]],  // _id:商品ID
        // 请求体
        [
            'product_name' => '西红柿鸡蛋面',
            'product_type' => '面食',
            'product_img' => 'http://product.com/8.jpg',
            'product_sold' => 10,
            'product_price' => 2300,
        ],
        // index方法+元数据
        ['index' => ['_id' => 9, '_parent' => 3]],  // _id:商品ID
        // 请求体
        [
            'product_name' => '鸭血粉丝汤',
            'product_type' => '汤粉',
            'product_img' => 'http://product.com/9.jpg',
            'product_sold' => 10,
            'product_price' => 2200,
        ],
        ///////// 第3个商铺的商品插入结束
    ]
]);</span></del>
需要注意:
  • 通常来说,每个店铺的id可能来自于mysql中的自增ID,商品id也是同样的,上面可以看出它们独立自增。
  • 为了满足”父子文档”,商铺文档按商铺ID路由即可,而对应的商品在添加时不能用商品id路由而是应该使用_routing=其所属的商铺id,这样hash(product._routing)==hash(merchant._id)。另外,我们不必自己传递_routing值而是直接指定_parent即可,相关父子数据将自动进入同一个shard存储。
  • bulk API批量提交一堆请求,每个请求由”元数据”+”请求体”共2行组成,其中”元数据”应该指定操作的类型和表名等,而”请求体”则承载具体的请求参数。

父子查询筛选店铺

下面是一个简单的筛选,即搜索售卖”鸭血粉丝汤”的”店铺”有哪些:
<del><spanstyle="color: #808080;"><?php
 
require_once__DIR__ . "/vendor/autoload.php";
 
// 客户端
$client = Elasticsearch\ClientBuilder::fromConfig([
    'hosts' => ['localhost:9200', 'localhost:9201', 'localhost:9203'], // 最好在为ES集群搭建Haproxy反向代理
    'retries' => 2
]);
 
// 搜索框的输入内容
$keyword = '鸭血粉丝汤';
 
$ret = $client->search([
    'index' => 'basic',
    'type' => 'merchant',
    'body' => [
        'query' => [
            'has_child' => [
                'type' => 'product',
                'score_mode' => 'max',
                'query' => [
                    'match' => [
                        'product_name' => $keyword,
                    ]
                ]
            ]
        ]
    ]
]);
 
print_r($ret);</span></del>
这里score_mode表示:同一个”店铺”下有多个”商品”匹配关键字,那么取匹配程度最高的那个”商品”的打分作为”店铺”的打分依据。 结果很准确,”开海饭店”售卖”鸭血粉丝汤”,其匹配度打分_score高达10分+,而”东方宫兰州拉面”因为售卖”羊肉汤”而命中”汤”字,所以也出现在了结果集之中,不过匹配度打分_score才区区0.9分。
<del><spanstyle="color: #808080;">[work@df6c675da97enuomi-search]$ phpmain.php
Array
(
    [took] => 18
    [timed_out] =>
    [_shards] => Array
        (
            [total] => 3
            [successful] => 3
            [failed] => 0
        )
 
    [hits] => Array
        (
            [total] => 2
            [max_score] => 10.727891
            [hits] => Array
                (
                    [0] => Array
                        (
                            [_index] => basic
                            [_type] => merchant
                            [_id] => 3
                            [_score] => 10.727891
                            [_source] => Array
                                (
                                    [merchant_name] => 开海饭店
                                    [merchant_score] => 3
                                    [merchant_type] => 美食
                                    [merchant_img] => http://merchant.com/3.jpg
                                    [merchant_avg_price] => 3500
                                    [merchant_location] => Array
                                        (
                                            [0] => 50
                                            [1] => 50
                                        )
 
                                )
 
                        )
 
                    [1] => Array
                        (
                            [_index] => basic
                            [_type] => merchant
                            [_id] => 2
                            [_score] => 0.93239146
                            [_source] => Array
                                (
                                    [merchant_name] => 东方宫兰州拉面
                                    [merchant_score] => 3
                                    [merchant_type] => 美食
                                    [merchant_img] => http://merchant.com/2.jpg
                                    [merchant_avg_price] => 1800
                                    [merchant_location] => Array
                                        (
                                            [0] => 120
                                            [1] => 120
                                        )
 
                                )
 
                        )
 
                )
 
        )
 
)</span></del>
但是父子查询最大的问题就像你看到上面的例子一样,只能使用has_child过滤子文档,但是无法同时过滤父文档,这个鸡肋的设定让我不得不放弃父子文档!

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Pro Git (Second Edition)

Pro Git (Second Edition)

Scott Chacon、Ben Straub / Apress / 2014-11-9 / USD 59.99

Scott Chacon is a cofounder and the CIO of GitHub and is also the maintainer of the Git homepage ( git-scm.com ) . Scott has presented at dozens of conferences around the world on Git, GitHub and the ......一起来看看 《Pro Git (Second Edition)》 这本书的介绍吧!

html转js在线工具
html转js在线工具

html转js在线工具

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试