cover_image

ClickHouse之DBA运维宝典

凯朱 ClickHouse的秘密基地
2020年06月29日 05:01
最近有位网友与我聊天,他是一名 DBA,问我在 ClickHouse 中有没有一些能够 “安家立命” 的运维 SQL 语句。我想对于这个问题很多朋友都会有兴趣,所以就在这里做一个简单的分享。

在 ClickHouse 默认的 system 数据库下(databse),拥有众多的系统表。我们对 ClickHouse 运行状态的各种信息,就主要来自于这些系统表。

接下来就列举一些常用的运维 SQL 语句。

  • 当前连接数


众所周知,CH 对外暴露的原生接口分为 TCP 和 HTTP 两类,通过 system.metrics 即可查询当前的 TCP、HTTP 与内部副本的连接数。

ch7.nauu.com :) SELECT * FROM system.metrics WHERE metric LIKE '%Connection';
SELECT *FROM system.metricsWHERE metric LIKE '%Connection'
┌─metric────────────────┬─value─┬─description─────────────────────────────────────────────────────────┐│ TCPConnection │ 2Number of connections to TCP server (clients with native interface) ││ HTTPConnection        │     1 │ Number of connections to HTTP server                                ││ InterserverConnection │ 0Number of connections from other replicas to fetch parts │└───────────────────────┴───────┴─────────────────────────────────────────────────────────────────────┘

  • 当前正在执行的查询


通过 system.processes 可以查询目前正在执行的查询,例如:

ch7.nauu.com :) SELECT query_id, user, address, query  FROM system.processes ORDER BY query_id;
SELECT query_id, user, address, queryFROM system.processesORDER BY query_id ASC
┌─query_id─────────────────────────────┬─user────┬─address────────────┬─query─────────────────────────────────────────────────────────────────────────────┐203f1d0e-944e-472d-8d8f-bae548ff9899 │ default │ ::ffff:10.37.129.4SELECT query_id, user, address, query FROM system.processes ORDER BY query_id ASC│ fb7fba85-b2a0-4271-87ff-22da97ae511b │ default │ ::ffff:10.37.129.4INSERT INTO hits_v1 FORMAT TSV │└──────────────────────────────────────┴─────────┴────────────────────┴───────────────────────────────────────────────────────────────────────────────────┘
可以看到,CH 目前正在执行两条语句,其中第 2 条是 INSERT 查询正在写入数据。 

  • 终止查询


通过 KILL QUERY 语句,可以终止正在执行的查询:

KILL QUERY WHERE query_id = 'query_id'
例如,终止刚才的 INSERT 查询 :
ch7.nauu.com :) KILL QUERY WHERE query_id='ff695827-dbf5-45ad-9858-a853946ea140';
KILL QUERY WHERE query_id = 'ff695827-dbf5-45ad-9858-a853946ea140' ASYNC
Ok.
0 rows in set. Elapsed: 0.024 sec.

众所周知,除了常规的 SELECT 和 INSERT 之外,在 ClickHouse 中还存在一类被称作 Mutation 的操作,也就是 ALTER DELETE 和 ALTER UPDATE

对于 Mutation 操作, ClickHouse 专门提供了 system.mutations 用于查询,例如:
ch7.nauu.com :) SELECT database, table, mutation_id, command, create_time, is_done FROM system.mutations;
SELECT database, table, mutation_id, command, create_time, is_doneFROM system.mutations
┌─database─┬─table──────┬─mutation_id────┬─command──────────────────┬─────────create_time─┬─is_done─┐│ default  │ testcol_v9 │ mutation_2.txt │ DELETE WHERE ID = 'A003' │ 2020-06-29 01:15:04 │       1 │└──────────┴────────────┴────────────────┴──────────────────────────┴─────────────────────┴─────────┘
1 rows in set. Elapsed: 0.002 sec.

同样的,可以使用 KILL MUTATION 终止正在执行的 Mutation 操作:
KILL MUTATION WHERE mutation_id = 'mutation_id';



  • 存储空间统计

查询 CH 各个存储路径的空间:

ch5.nauu.com :) SELECT name,path,formatReadableSize(free_space) AS free,formatReadableSize(total_space) AS total,formatReadableSize(keep_free_space) AS reserved FROM system.disks
SELECT name, path, formatReadableSize(free_space) AS free, formatReadableSize(total_space) AS total, formatReadableSize(keep_free_space) AS reservedFROM system.disks
┌─name──────┬─path──────────────┬─free──────┬─total─────┬─reserved─┐default │ /chbase/data/ │ 36.35 GiB │ 49.09 GiB │ 0.00 B ││ disk_cold │ /chbase/cloddata/ │ 35.35 GiB │ 48.09 GiB │ 1.00 GiB ││ disk_hot1 │ /chbase/data/ │ 36.35 GiB │ 49.09 GiB │ 0.00 B ││ disk_hot2 │ /chbase/hotdata1/ │ 36.35 GiB │ 49.09 GiB │ 0.00 B │└───────────┴───────────────────┴───────────┴───────────┴──────────┘
4 rows in set. Elapsed: 0.001 sec.


  • 各数据库占用空间统计

ch7.nauu.com :) SELECT database, formatReadableSize(sum(bytes_on_disk)) on_disk FROM system.parts GROUP BY database;
SELECT database, formatReadableSize(sum(bytes_on_disk)) AS on_diskFROM system.partsGROUP BY database
┌─database─┬─on_disk──┐system1.59 MiB │default3.60 GiB │└──────────┴──────────┘


  • 个列字段占用空间统计

每个列字段的压缩大小、压缩比率以及该列的每行数据大小的占比

SELECT     database,     table,     column,     any(type),     sum(column_data_compressed_bytes) AS compressed,     sum(column_data_uncompressed_bytes) AS uncompressed,     round(uncompressed / compressed, 2) AS ratio,     compressed / sum(rows) AS bpr,     sum(rows)FROM system.parts_columnsWHERE active AND database != 'system'GROUP BY     database,     table,     columnORDER BY     database ASC,     table ASC,     column ASC
┌─database─┬─table────────┬─column─────────────────────┬─any(type)──────────────────────────────┬─compressed─┬─uncompressed─┬──ratio─┬───────────────────bpr─┬─sum(rows)─┐default │ hits_v1 │ AdvEngineID │ UInt8 │ 3515342662170675.730.01320478860370556326621706default │ hits_v1 │ Age │ UInt8 │ 7543552266217063.530.283360953651880926621706default │ hits_v1 │ BrowserCountry │ FixedString(2) │ 6549379532434128.130.2460165024735830326621706default │ hits_v1 │ BrowserLanguage │ FixedString(2) │ 28190855324341218.890.1058942278154525526621706default │ hits_v1 │ CLID │ UInt32 │ 231100610648682446.080.0868090872914004826621706default │ hits_v1 │ ClientEventTime │ DateTime │ 985187041064868241.083.700690857302683826621706default │ hits_v1 │ ClientIP │ UInt32 │ 251207661064868244.240.943619691390176126621706default │ hits_v1 │ ClientIP6 │ FixedString(16) │ 2508855842594729616.980.942409851569993426621706default │ hits_v1 │ ClientTimeZone │ Int16 │ 8487148532434126.270.318805564151298226621706default │ hits_v1 │ CodeVersion │ UInt32 │ 119769521064868248.890.449894232924065826621706default │ hits_v1 │ ConnectTiming │ Int32 │ 279373731064868243.811.049420837267153426621706default │ hits_v1 │ CookieEnable │ UInt8 │ 20271826621706131.320.00761476368193683826621706default │ hits_v1 │ CounterClass │ Int84254922662170662.570.01598289756486680526621706...


  • 慢查询

SELECT     user,     client_hostname AS host,     client_name AS client,     formatDateTime(query_start_time, '%T') AS started,     query_duration_ms / 1000 AS sec,     round(memory_usage / 1048576) AS MEM_MB,     result_rows AS RES_CNT,     result_bytes / 1048576 AS RES_MB,     read_rows AS R_CNT,     round(read_bytes / 1048576) AS R_MB,     written_rows AS W_CNT,     round(written_bytes / 1048576) AS W_MB,     queryFROM system.query_logWHERE type = 2ORDER BY query_duration_ms DESCLIMIT 10
┌─user────┬─host─────────┬─client────────────┬─started──┬────sec─┬─MEM_MB─┬──RES_CNT─┬────────────────RES_MB─┬────R_CNT─┬─R_MB─┬───W_CNT─┬─W_MB─┬─query───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐│ default │ ch7.nauu.com │ ClickHouse client │ 01:05:03 │ 51.434 │   1031 │  8873898 │      8706.51146697998 │        0 │    0 │ 8873898 │ 8707 │ INSERT INTO hits_v1 FORMAT TSV                                                                                                                                          ││ default │ ch7.nauu.com │ ClickHouse client │ 01:01:48 │ 43.511 │   1031 │  8873898 │      8706.51146697998 │        0 │    0 │ 8873898 │ 8707 │ INSERT INTO hits_v1 FORMAT TSV                                                                                                                                          │default │ ch7.nauu.com │ ClickHouse client17:12:0411.12180118874398446.8216323852539629146635100SELECT id, arrayJoin(arrayConcat(groupArray(a), groupArray(b), groupArray(c))) AS v FROM test_y GROUP BY id ORDER BY v ASCdefault │ ch7.nauu.com │ ClickHouse client17:13:283.992154918874398446.8216323852539629146635100SELECT id, arrayJoin(arrayConcat(groupArray(a), groupArray(b), groupArray(c))) AS v FROM test_y GROUP BY iddefault │ ch7.nauu.com │ ClickHouse client17:13:123.976154918874398446.8216323852539629146635100SELECT id, arrayJoin(arrayConcat(groupArray(a), groupArray(b), groupArray(c))) AS v FROM test_y GROUP BY iddefault │ ch7.nauu.com │ ClickHouse client01:25:393.962154918874398446.8216323852539629146635100SELECT id, arrayJoin(arrayConcat(groupArray(a), groupArray(b), groupArray(c))) AS v FROM test_y GROUP BY iddefault │ ch7.nauu.com │ ClickHouse client04:32:293.114154210000000219.821929931640621050000023100SELECT user_id, argMax(score, create_time) AS score, argMax(deleted, create_time) AS deleted, max(create_time) AS ctime FROM test_a GROUP BY user_id HAVING deleted = 0default │ ch7.nauu.com │ ClickHouse client02:59:563.03154410000000219.753809928894041050000023100SELECT user_id, argMax(score, create_time) AS score, argMax(is_update, create_time) AS is_update, max(create_time) AS ctime FROM test_a GROUP BY user_id │default │ ch7.nauu.com │ ClickHouse client02:54:013.019154310000000219.34509277343751050000023000SELECT user_id, argMax(score, create_time) AS score, argMax(delete, create_time) AS delete, max(create_time) AS ctime FROM test_a GROUP BY user_id ││ default │              │                   │ 03:03:12 │  2.857 │   1543 │       10 │ 0.0002269744873046875 │ 10500000 │  231 │       0 │    0 │ SELECT * FROM view_test_a limit 10                                                                                                 │└─────────┴──────────────┴───────────────────┴──────────┴────────┴────────┴──────────┴───────────────────────┴──────────┴──────┴─────────┴──────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
10 rows in set. Elapsed: 0.017 sec. Processed 1.44 thousand rows, 200.81 KB (83.78 thousand rows/s., 11.68 MB/s.)


  • 副本预警监控


通过下面的 SQL 语句对副本进行预警监控,其中各个预警的变量可以根据自身情况调整。

SELECT database, table, is_leader, total_replicas, active_replicas   FROM system.replicas  WHERE is_readonly     OR is_session_expired     OR future_parts > 30     OR parts_to_check > 20     OR queue_size > 30     OR inserts_in_queue > 20     OR log_max_index - log_pointer > 20     OR total_replicas < 2     OR active_replicas < total_replicas
┌─database─┬─table───────────────────────┬─is_leader─┬─total_replicas─┬─active_replicas─┐default │ replicated_sales_12 │ 000default │ test_fetch │ 000default │ test_sharding_simple2_local │ 000└──────────┴─────────────────────────────┴───────────┴────────────────┴─────────────────┘


好了,今天的分享就到这里。对于 CH 日常的运维 SQL 远不止这些,这里也只是抱砖引玉啦。


如果这篇文章对你有帮助,欢迎 订阅、转发、在看 三连击 :)








欢迎大家扫码关注我的公众号和视频号:

ClickHouse的秘密基地
图片

nauu的奇思妙想
图片

往期精彩推荐:

【专辑】ClickHouse的资讯手札

【专辑】ClickHouse的原理巩固

【专辑】ClickHouse的经验分享
继续滑动看下一个
ClickHouse的秘密基地
向上滑动看下一个