Elasticsearch-PHP 中文文档执行搜索

waite · 2021-05-20 12:46:43 · 热度: 27

搜索操作

嗯…这个项目如果没有什么特别之处就不叫 elasticsearch 了！我们来聊聊客户端中的搜索操作。

在命名方案规范的前提下，客户端拥有一切的查询权限，也拥有获取 REST API 公开的一切参数的权限。现在来看看一些示例，以便您能熟悉语法

Match 查询

以下是 Match 查询的标准 curl 格式：

curl -XGET 'localhost:9200/my_index/my_type/_search' -d '{
    "query" : {
        "match" : {
            "testField" : "abc"
        }
    }
}'

这是客户端构建的相同查询：

$params = [
    'index' => 'my_index',
    'type' => 'my_type',
    'body' => [
        'query' => [
            'match' => [
                'testField' => 'abc'
            ]
        ]
    ]
];

$results = $client->search($params);

注意 PHP 数组的结构和布局如何与 JSON 结构体的格式相对应的。这种使得将 JSON 示例转换为 PHP 变得非常简单。一个快速检测 PHP 数组是否为预期结果的方法，就是 encode 为 JSON 格式，然后进行检查：

$params = [
    'index' => 'my_index',
    'type' => 'my_type',
    'body' => [
        'query' => [
            'match' => [
                'testField' => 'abc'
            ]
        ]
    ]
];

print_r(json_encode($params['body']));

{"query":{"match":{"testField":"abc"}}}

使用原生 JSON

有时使用原生 JSON 来进行测试十分方便，或者从其他系统迁移也同样方便。您可以在 body 中用原生 JSON 字符串，这样客户端将会自动进行检查：

$json = '{
    "query" : {
        "match" : {
            "testField" : "abc"
        }
    }
}';

$params = [
    'index' => 'my_index',
    'type' => 'my_type',
    'body' => $json
];

$results = $client->search($params);

搜索结果与 Elasticsearch 的响应结果一致，唯一不同的是 JSON 格式会转换成 PHP 数组。处理这些数据与数组迭代一样简单：

$params = [
    'index' => 'my_index',
    'type' => 'my_type',
    'body' => [
        'query' => [
            'match' => [
                'testField' => 'abc'
            ]
        ]
    ]
];

$results = $client->search($params);

$milliseconds = $results['took'];
$maxScore     = $results['hits']['max_score'];

$score = $results['hits']['hits'][0]['_score'];
$doc   = $results['hits']['hits'][0]['_source'];

Bool 查询

利用客户端可以轻松构建 Bool 查询。例如如下查询：

curl -XGET 'localhost:9200/my_index/my_type/_search' -d '{
    "query" : {
        "bool" : {
            "must": [
                {
                    "match" : { "testField" : "abc" }
                },
                {
                    "match" : { "testField2" : "xyz" }
                }
            ]
        }
    }
}'

将会构建为如下的样子 (注意方括号位置)：

$params = [
    'index' => 'my_index',
    'type' => 'my_type',
    'body' => [
        'query' => [
            'bool' => [
                'must' => [
                    [ 'match' => [ 'testField' => 'abc' ] ],
                    [ 'match' => [ 'testField2' => 'xyz' ] ],
                ]
            ]
        ]
    ]
];

$results = $client->search($params);

注意， must 语句接收的是数组。这里会转换为 JSON 数组，所以最后的响应结果与 curl 格式的响应结果一致。想了解 PHP 中数组和对象的转换，查看用 PHP 处理 JSON 对象和数组.

更复杂的示例

让我们构造一个稍微复杂的例子：一个 bool 查询包含一个 filter 过滤器和一个普通查询。在 elasticsearch 的查询中非常普遍，因此它将是一个很好的演示。

curl 格式的查询：

curl -XGET 'localhost:9200/my_index/my_type/_search' -d '{
    "query" : {
        "bool" : {
            "filter" : {
                "term" : { "my_field" : "abc" }
            },
            "should" : {
                "match" : { "my_other_field" : "xyz" }
            }
        }
    }
}'

在 PHP 中：

$params = [
    'index' => 'my_index',
    'type' => 'my_type',
    'body' => [
        'query' => [
            'bool' => [
                'filter' => [
                    'term' => [ 'my_field' => 'abc' ]
                ],
                'should' => [
                    'match' => [ 'my_other_field' => 'xyz' ]
                ]
            ]
        ]
    ]
];

$results = $client->search($params);

Scrolling（游标）查询

在用批量读取数据时，经常要用 Scrolling 功能对文档进行分页处理，如输出一个用户的所有文档。Scrolling 要比常规的搜索更加高效，因为这里不需要对文档执行性能消耗较大的排序操作。

Scrolling 会保留某个时间点的索引快照数据，然后用快照数据进行分页。即使后台正在执行索引文档、更新文档和删除文档，游标查询窗口仍然可以持续正常分页。如何使用呢？首先，你要在发送搜索请求时增加 Scroll 参数。然后就会返回一个文档『页数』信息，还有一个用来获取 Hits 分页数据的 scroll_id。

更多详情请查看官方文档里的说明。

以下代码更为深入操作的示例：

$client = ClientBuilder::create()->build();
$params = [
    "scroll" => "30s",          // 设置游标查询过期时间，不应该太长
    "size" => 50,               // 返回多少数量的文档，作用于单个分片
    "index" => "my_index",
    "body" => [
        "query" => [
            "match_all" => new \stdClass()
        ]
    ]
];

// 执行搜索
// 将会返回第一批数据，和一个 scroll_id
$response = $client->search($params);

// 将所有数据循环出来
while (isset($response['hits']['hits']) && count($response['hits']['hits']) > 0) {

    // **
    // 这里可以写你的逻辑，结果在 $response['hits']['hits'] 数组上
    // **

    // 完成后，获取新的 scroll_id
    // 刷新下你的 _scroll_id 
    $scroll_id = $response['_scroll_id'];

    // 执行下一个游标请求
    $response = $client->scroll([
            "scroll_id" => $scroll_id,  // 使用上个请求获取到的  _scroll_id
            "scroll" => "30s"           // 时间窗口保持一致
        ]
    );
}

猜你喜欢:

0 个赞 0 收藏

暂无回复。

需要登录后方可回复, 如果你还没有账号请点击这里注册。