elasticsearch-scroll

栏目: 后端 · 发布时间: 6年前

内容简介:碰到一个比较头疼的问题,MySQL数据丢失。有两个办法,一个办法是让DBA找半年前的数据。另一个办法是保存了MySQL数据的ES里找。由于数据量过万,而且ES设置了一次查询数据量最大10000,想想用 scroll 取数据会比较好。

碰到一个比较头疼的问题,MySQL数据丢失。

有两个办法,一个办法是让DBA找半年前的数据。另一个办法是保存了 MySQL 数据的ES里找。

由于数据量过万,而且ES设置了一次查询数据量最大10000,想想用 scroll 取数据会比较好。

1 ElasticSearch 2.x

1.1 查询索引有多少数据

localhost:9200/_nodes/stats/indices/search?pretty

1.1 查看索引信息

curl -XGET 'http://127.0.0.1:9400/dev_index1_20190118/docs/_search?pretty'

1.2 使用游标

curl -XGET 'http://127.0.0.1:9400/dev_index1_20190118/docs/_search?scroll=10m' -d ' 
{ 
    "query": { "match_all": {}},
    "sort" : ["_doc"], 
    "size":  10000
}'  >> es_scroll_data_20190118_1w.txt

1.3 不断取下一页

curl -XGET 'http://127.0.0.1:9400/_search?scroll=10m' -d ' 
{ 
    "scroll": "10m",
    "scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAANKLTFjdfVno3Y2hGU1BpOUYtMW54OTV3TEEAAAAAADSi1BY3X1Z6N2NoRlNQaTlGLTFueDk1d0xBAAAAAAA0otYWN19WejdjaEZTUGk5Ri0xbng5NXdMQQAAAAAANKLVFjdfVno3Y2hGU1BpOUYtMW54OTV3TEEAAAAAADvJpxZzcU9YSExnLVRTNk5RY3JfMlNuWU9n"
}' >> es_scroll_data_20190118_2w.txt
curl -XGET 'http://127.0.0.1:9400/_search?scroll=10m' -d ' 
{ 
    "scroll": "10m",
    "scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAANKLTFjdfVno3Y2hGU1BpOUYtMW54OTV3TEEAAAAAADSi1BY3X1Z6N2NoRlNQaTlGLTFueDk1d0xBAAAAAAA0otYWN19WejdjaEZTUGk5Ri0xbng5NXdMQQAAAAAANKLVFjdfVno3Y2hGU1BpOUYtMW54OTV3TEEAAAAAADvJpxZzcU9YSExnLVRTNk5RY3JfMlNuWU9n"
}' >> es_scroll_data_20190118_3w.txt

2 ElasticSearch 5.6.x

2.1 查询索引信息

localhost:9200/_nodes/stats/indices/search?pretty

curl -XGET 'http://127.0.0.1:9400/dev_index1_20190118/docs/_search?pretty'

2.2 使用游标

curl -XGET 'http://127.0.0.1:9400/dev_index1_20190118/docs/_search?scroll=10m' -d ' 
{ 
    "query": { "match_all": {}},
    "sort" : ["_doc"], 
    "size":  10000
}'  >> es_scroll_data_20190118_1w.txt

2.3 不断取下一页

curl -XGET 'http://127.0.0.1:9400/_search?scroll=10m' -d ' 
{ 
    "scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAANKLTFjdfVno3Y2hGU1BpOUYtMW54OTV3TEEAAAAAADSi1BY3X1Z6N2NoRlNQaTlGLTFueDk1d0xBAAAAAAA0otYWN19WejdjaEZTUGk5Ri0xbng5NXdMQQAAAAAANKLVFjdfVno3Y2hGU1BpOUYtMW54OTV3TEEAAAAAADvJpxZzcU9YSExnLVRTNk5RY3JfMlNuWU9n"
}' >> es_scroll_data_20190118_2w.txt
curl -XGET 'http://127.0.0.1:9400/_search?scroll=10m' -d ' 
{ 
    "scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAANKLTFjdfVno3Y2hGU1BpOUYtMW54OTV3TEEAAAAAADSi1BY3X1Z6N2NoRlNQaTlGLTFueDk1d0xBAAAAAAA0otYWN19WejdjaEZTUGk5Ri0xbng5NXdMQQAAAAAANKLVFjdfVno3Y2hGU1BpOUYtMW54OTV3TEEAAAAAADvJpxZzcU9YSExnLVRTNk5RY3JfMlNuWU9n"
}' >> es_scroll_data_20190118_3w.txt
```  




# 遇到的问题

## 3.1 Unknown key for a VALUE_STRING in [scroll_id].

```json
{
    "error": {
        "root_cause": [
            {
                "type": "parsing_exception",
                "reason": "Unknown key for a VALUE_STRING in [scroll_id].",
                "line": 3,
                "col": 19
            }
        ],
        "type": "parsing_exception",
        "reason": "Unknown key for a VALUE_STRING in [scroll_id].",
        "line": 3,
        "col": 19
    },
    "status": 400
}

第二次使用的 scroll_id 和第一次返回的 scroll_id 不一致导致

3.2 Unknown key for a VALUE_STRING in [scroll]

{
    "error": {
        "root_cause": [
            {
                "type": "parsing_exception",
                "reason": "Unknown key for a VALUE_STRING in [scroll].",
                "line": 3,
                "col": 15
            }
        ],
        "type": "parsing_exception",
        "reason": "Unknown key for a VALUE_STRING in [scroll].",
        "line": 3,
        "col": 15
    },
    "status": 400
}

第二次请求时 请求参数里多了 scroll 参数

3.3 Batch size is too large, size must be less than or equal to: [10000] but was [1000000]. Scroll batch sizes cost as much memory as result windows so they are controlled by the [index.max_result_window] index level setting.

{
    "error": {
        "root_cause": [
            {
                "type": "query_phase_execution_exception",
                "reason": "Batch size is too large, size must be less than or equal to: [10000] but was [1000000]. Scroll batch sizes cost as much memory as result windows so they are controlled by the [index.max_result_window] index level setting."
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "dev_index1_20190118",
                "node": "8XqKY198S823M78QA43F8g",
                "reason": {
                    "type": "query_phase_execution_exception",
                    "reason": "Batch size is too large, size must be less than or equal to: [10000] but was [1000000]. Scroll batch sizes cost as much memory as result windows so they are controlled by the [index.max_result_window] index level setting."
                }
            }
        ]
    },
    "status": 500
}

设置的 size 过大,超过10000,配置文件里 index.max_result_window 最大为10000

3.4 search_context_missing_exception

{
    "error": {
        "root_cause": [
            {
                "type": "search_context_missing_exception",
                "reason": "No search context found for id [3540965]"
            },
            {
                "type": "search_context_missing_exception",
                "reason": "No search context found for id [3922089]"
            },
            {
                "type": "search_context_missing_exception",
                "reason": "No search context found for id [3454995]"
            },
            {
                "type": "search_context_missing_exception",
                "reason": "No search context found for id [3454996]"
            },
            {
                "type": "search_context_missing_exception",
                "reason": "No search context found for id [3454994]"
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": -1,
                "index": null,
                "reason": {
                    "type": "search_context_missing_exception",
                    "reason": "No search context found for id [3540965]"
                }
            },
            {
                "shard": -1,
                "index": null,
                "reason": {
                    "type": "search_context_missing_exception",
                    "reason": "No search context found for id [3922089]"
                }
            },
            {
                "shard": -1,
                "index": null,
                "reason": {
                    "type": "search_context_missing_exception",
                    "reason": "No search context found for id [3454995]"
                }
            },
            {
                "shard": -1,
                "index": null,
                "reason": {
                    "type": "search_context_missing_exception",
                    "reason": "No search context found for id [3454996]"
                }
            },
            {
                "shard": -1,
                "index": null,
                "reason": {
                    "type": "search_context_missing_exception",
                    "reason": "No search context found for id [3454994]"
                }
            }
        ],
        "caused_by": {
            "type": "search_context_missing_exception",
            "reason": "No search context found for id [3454994]"
        }
    },
    "status": 404
}

其实是超时了,scroll自动删除了

References

[1] 游标查询

[2] scroll


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

数据结构与算法

数据结构与算法

许卓群、杨冬青、唐世渭、张铭 / 高等教育出版社 / 2004-1 / 29.50元

《数据结构与算法》把数据结构的原理和算法分析技术有机地结合在一起,系统地介绍了各种类型的数据结构和排序、检索的各种算法,还引入了一些比较高级的数据结构及相关的算法分析技术。.《数据结构与算法》分为基本数据结构、排序和检索、高级数据结构三部分。借助抽象数据类型,从逻辑结构的角度系统地介绍了线性表、字符串、二叉树、树和图等各种基本数据结构;从算法的角度讨论排序、检索和索引算法;从应用的角度介绍了一些复......一起来看看 《数据结构与算法》 这本书的介绍吧!

在线进制转换器
在线进制转换器

各进制数互转换器

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器