elasticsearch-scroll

栏目: 后端 · 发布时间: 6年前

内容简介:碰到一个比较头疼的问题,MySQL数据丢失。有两个办法,一个办法是让DBA找半年前的数据。另一个办法是保存了MySQL数据的ES里找。由于数据量过万,而且ES设置了一次查询数据量最大10000,想想用 scroll 取数据会比较好。

碰到一个比较头疼的问题,MySQL数据丢失。

有两个办法,一个办法是让DBA找半年前的数据。另一个办法是保存了 MySQL 数据的ES里找。

由于数据量过万,而且ES设置了一次查询数据量最大10000,想想用 scroll 取数据会比较好。

1 ElasticSearch 2.x

1.1 查询索引有多少数据

localhost:9200/_nodes/stats/indices/search?pretty

1.1 查看索引信息

curl -XGET 'http://127.0.0.1:9400/dev_index1_20190118/docs/_search?pretty'

1.2 使用游标

curl -XGET 'http://127.0.0.1:9400/dev_index1_20190118/docs/_search?scroll=10m' -d ' 
{ 
    "query": { "match_all": {}},
    "sort" : ["_doc"], 
    "size":  10000
}'  >> es_scroll_data_20190118_1w.txt

1.3 不断取下一页

curl -XGET 'http://127.0.0.1:9400/_search?scroll=10m' -d ' 
{ 
    "scroll": "10m",
    "scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAANKLTFjdfVno3Y2hGU1BpOUYtMW54OTV3TEEAAAAAADSi1BY3X1Z6N2NoRlNQaTlGLTFueDk1d0xBAAAAAAA0otYWN19WejdjaEZTUGk5Ri0xbng5NXdMQQAAAAAANKLVFjdfVno3Y2hGU1BpOUYtMW54OTV3TEEAAAAAADvJpxZzcU9YSExnLVRTNk5RY3JfMlNuWU9n"
}' >> es_scroll_data_20190118_2w.txt
curl -XGET 'http://127.0.0.1:9400/_search?scroll=10m' -d ' 
{ 
    "scroll": "10m",
    "scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAANKLTFjdfVno3Y2hGU1BpOUYtMW54OTV3TEEAAAAAADSi1BY3X1Z6N2NoRlNQaTlGLTFueDk1d0xBAAAAAAA0otYWN19WejdjaEZTUGk5Ri0xbng5NXdMQQAAAAAANKLVFjdfVno3Y2hGU1BpOUYtMW54OTV3TEEAAAAAADvJpxZzcU9YSExnLVRTNk5RY3JfMlNuWU9n"
}' >> es_scroll_data_20190118_3w.txt

2 ElasticSearch 5.6.x

2.1 查询索引信息

localhost:9200/_nodes/stats/indices/search?pretty

curl -XGET 'http://127.0.0.1:9400/dev_index1_20190118/docs/_search?pretty'

2.2 使用游标

curl -XGET 'http://127.0.0.1:9400/dev_index1_20190118/docs/_search?scroll=10m' -d ' 
{ 
    "query": { "match_all": {}},
    "sort" : ["_doc"], 
    "size":  10000
}'  >> es_scroll_data_20190118_1w.txt

2.3 不断取下一页

curl -XGET 'http://127.0.0.1:9400/_search?scroll=10m' -d ' 
{ 
    "scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAANKLTFjdfVno3Y2hGU1BpOUYtMW54OTV3TEEAAAAAADSi1BY3X1Z6N2NoRlNQaTlGLTFueDk1d0xBAAAAAAA0otYWN19WejdjaEZTUGk5Ri0xbng5NXdMQQAAAAAANKLVFjdfVno3Y2hGU1BpOUYtMW54OTV3TEEAAAAAADvJpxZzcU9YSExnLVRTNk5RY3JfMlNuWU9n"
}' >> es_scroll_data_20190118_2w.txt
curl -XGET 'http://127.0.0.1:9400/_search?scroll=10m' -d ' 
{ 
    "scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAANKLTFjdfVno3Y2hGU1BpOUYtMW54OTV3TEEAAAAAADSi1BY3X1Z6N2NoRlNQaTlGLTFueDk1d0xBAAAAAAA0otYWN19WejdjaEZTUGk5Ri0xbng5NXdMQQAAAAAANKLVFjdfVno3Y2hGU1BpOUYtMW54OTV3TEEAAAAAADvJpxZzcU9YSExnLVRTNk5RY3JfMlNuWU9n"
}' >> es_scroll_data_20190118_3w.txt
```  




# 遇到的问题

## 3.1 Unknown key for a VALUE_STRING in [scroll_id].

```json
{
    "error": {
        "root_cause": [
            {
                "type": "parsing_exception",
                "reason": "Unknown key for a VALUE_STRING in [scroll_id].",
                "line": 3,
                "col": 19
            }
        ],
        "type": "parsing_exception",
        "reason": "Unknown key for a VALUE_STRING in [scroll_id].",
        "line": 3,
        "col": 19
    },
    "status": 400
}

第二次使用的 scroll_id 和第一次返回的 scroll_id 不一致导致

3.2 Unknown key for a VALUE_STRING in [scroll]

{
    "error": {
        "root_cause": [
            {
                "type": "parsing_exception",
                "reason": "Unknown key for a VALUE_STRING in [scroll].",
                "line": 3,
                "col": 15
            }
        ],
        "type": "parsing_exception",
        "reason": "Unknown key for a VALUE_STRING in [scroll].",
        "line": 3,
        "col": 15
    },
    "status": 400
}

第二次请求时 请求参数里多了 scroll 参数

3.3 Batch size is too large, size must be less than or equal to: [10000] but was [1000000]. Scroll batch sizes cost as much memory as result windows so they are controlled by the [index.max_result_window] index level setting.

{
    "error": {
        "root_cause": [
            {
                "type": "query_phase_execution_exception",
                "reason": "Batch size is too large, size must be less than or equal to: [10000] but was [1000000]. Scroll batch sizes cost as much memory as result windows so they are controlled by the [index.max_result_window] index level setting."
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "dev_index1_20190118",
                "node": "8XqKY198S823M78QA43F8g",
                "reason": {
                    "type": "query_phase_execution_exception",
                    "reason": "Batch size is too large, size must be less than or equal to: [10000] but was [1000000]. Scroll batch sizes cost as much memory as result windows so they are controlled by the [index.max_result_window] index level setting."
                }
            }
        ]
    },
    "status": 500
}

设置的 size 过大,超过10000,配置文件里 index.max_result_window 最大为10000

3.4 search_context_missing_exception

{
    "error": {
        "root_cause": [
            {
                "type": "search_context_missing_exception",
                "reason": "No search context found for id [3540965]"
            },
            {
                "type": "search_context_missing_exception",
                "reason": "No search context found for id [3922089]"
            },
            {
                "type": "search_context_missing_exception",
                "reason": "No search context found for id [3454995]"
            },
            {
                "type": "search_context_missing_exception",
                "reason": "No search context found for id [3454996]"
            },
            {
                "type": "search_context_missing_exception",
                "reason": "No search context found for id [3454994]"
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": -1,
                "index": null,
                "reason": {
                    "type": "search_context_missing_exception",
                    "reason": "No search context found for id [3540965]"
                }
            },
            {
                "shard": -1,
                "index": null,
                "reason": {
                    "type": "search_context_missing_exception",
                    "reason": "No search context found for id [3922089]"
                }
            },
            {
                "shard": -1,
                "index": null,
                "reason": {
                    "type": "search_context_missing_exception",
                    "reason": "No search context found for id [3454995]"
                }
            },
            {
                "shard": -1,
                "index": null,
                "reason": {
                    "type": "search_context_missing_exception",
                    "reason": "No search context found for id [3454996]"
                }
            },
            {
                "shard": -1,
                "index": null,
                "reason": {
                    "type": "search_context_missing_exception",
                    "reason": "No search context found for id [3454994]"
                }
            }
        ],
        "caused_by": {
            "type": "search_context_missing_exception",
            "reason": "No search context found for id [3454994]"
        }
    },
    "status": 404
}

其实是超时了,scroll自动删除了

References

[1] 游标查询

[2] scroll


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Effective JavaScript

Effective JavaScript

David Herman / Addison-Wesley Professional / 2012-12-6 / USD 39.99

"It's uncommon to have a programming language wonk who can speak in such comfortable and friendly language as David does. His walk through the syntax and semantics of JavaScript is both charming and h......一起来看看 《Effective JavaScript》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

MD5 加密
MD5 加密

MD5 加密工具

html转js在线工具
html转js在线工具

html转js在线工具