ClickHouse 18.12.13-2018-09-10版本新特性解析

栏目: 数据库 · 发布时间: 6年前

ClickHouse 18.12.13-2018-09-10版本新特性解析

ClickHouse的发版速度是众所周知的快

在最近,他们正式发出了18.12.13-2018-09-10版本

相关的CHANGELOG更是多的吓人

为了能够更好的使用新版特性,特做了详细的介绍

其中新特性部分,为人工翻译、校对,毕竟这部分内容是重点,后面为Google翻译

新特性列表(按照GitHub中CHANGELOG顺序)

支持Decimal

  • Added the DECIMAL(digits, scale) data type (Decimal32(scale), Decimal64(scale), Decimal128(scale)). To enable it, use the setting allow_experimental_decimal_type. #2846 #2970 #3008 #3047
SELECT *
FROM data_type_families
WHERE name LIKE '%De%'

┌─name───────┬─case_insensitive─┬─alias_to─┐
│ Decimal32  │                1 │          │
│ Decimal64  │                1 │          │
│ Decimal128 │                1 │          │
│ Decimal    │                1 │          │
└────────────┴──────────────────┴──────────┘

新的WITH ROLLUP修饰符GROUP BY,替代语法 GROUP BY ROLLUP(…)

  • New WITH ROLLUP modifier for GROUP BY (alternative syntax: GROUP BY ROLLUP(…)). #2948

JOIN查询会把*解析为字段

  • In requests with JOIN, the star character expands to a list of columns in all tables, in compliance with the SQL standard. You can restore the old behavior by setting - asterisk_left_columns_only to 1 on the user configuration level. Winter Zhang

JOIN支持table functions(remote/merge/numbers/url)

  • Added support for JOIN with table functions. Winter Zhang

终端支持tab自动补全

  • Autocomplete by pressing Tab in clickhouse-client. Sergey Shcherbin

终端支持ctrl c取消输入

  • Ctrl+C in clickhouse-client clears a query that was entered. #2877

可指定默认的JOIN行为

  • Added the join_default_strictness setting (values: “, ‘any’, ‘all’). This allows you to not specify ANY or ALL for JOIN. #2982

server log关联查询ID

  • Each line of the server log related to query processing shows the query ID. #2482
2018.09.15 23:01:12.934700 [ 275 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Debug> executeQuery: (from xx.xx.80.34:37066, user: user) select * from numbers(100)
2018.09.15 23:01:12.934984 [ 275 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
2018.09.15 23:01:12.935029 [ 86 ] <Trace> SystemLog (system.query_log): Flushing system log
2018.09.15 23:01:12.935036 [ 275 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Debug> executeQuery: Query pipeline:
Expression
 Expression
  Limit
   Numbers

2018.09.15 23:01:12.935391 [ 277 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Trace> ThreadStatus: Thread 277 exited
2018.09.15 23:01:12.935449 [ 87 ] <Trace> SystemLog (system.query_thread_log): Flushing system log
2018.09.15 23:01:12.935517 [ 275 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Information> executeQuery: Read 100 rows, 800.00 B in 0.001 sec., 146631 rows/sec., 1.12 MiB/sec.
2018.09.15 23:01:12.935557 [ 275 ] <Debug> MemoryTracker: Peak memory usage (total): 1.00 MiB.
2018.09.15 23:01:12.935572 [ 275 ] <Information> TCPHandler: Processed in 0.001 sec.

终端可以直接print日志

  • Now you can get query execution logs in clickhouse-client (use the send_logs_level setting). With distributed query processing, logs are cascaded from all the servers. #2482
SELECT *
FROM system.settings
WHERE name = 'send_logs_level'

┌─name────────────┬─value─┬─changed─┬─description──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ send_logs_level │ none  │       0 │ Send server text logs with specified minumum level to client. Valid values: 'trace', 'debug', 'info', 'warning', 'error', 'none' │
└─────────────────┴───────┴─────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

1 rows in set. Elapsed: 0.002 sec.

xx.xx.xx.xx. :) set send_logs_level = 'trace';

SET send_logs_level = 'trace'

Ok.

0 rows in set. Elapsed: 0.001 sec.

xx.xx.xx.xx. :) select * from system.settings where name = 'send_logs_level' ;

SELECT *
FROM system.settings
WHERE name = 'send_logs_level'

[xx.xx.xx.xx.] 2018.09.15 23:30:51.158294 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 275 ] <Debug> executeQuery: (from 127.0.0.1:59056, user: user) select * from system.settings where name = 'send_logs_level'
[xx.xx.xx.xx.] 2018.09.15 23:30:51.159056 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 275 ] <Trace> InterpreterSelectQuery: FetchColumns -> Complete
[xx.xx.xx.xx.] 2018.09.15 23:30:51.159152 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 275 ] <Debug> executeQuery: Query pipeline:
Expression
 Expression
  Filter
   One

┌─name────────────┬─value─┬─changed─┬─description──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ send_logs_level │ trace │       1 │ Send server text logs with specified minumum level to client. Valid values: 'trace', 'debug', 'info', 'warning', 'error', 'none' │
└─────────────────┴───────┴─────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
[xx.xx.xx.xx.] 2018.09.15 23:30:51.159654 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 398 ] <Trace> ThreadStatus: Thread 398 exited
[xx.xx.xx.xx.] 2018.09.15 23:30:51.159803 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 275 ] <Information> executeQuery: Read 178 rows, 25.15 KiB in 0.001 sec., 127014 rows/sec., 17.53 MiB/sec.


# 分布式查询的日志也会打到当前终端

记录setting行为到query_log

  • The system.query_log and system.processes (SHOW PROCESSLIST) tables now have information about all changed settings when you run a query (the nested structure of the Settings data). - Added the log_query_settings setting. #2482
SELECT *
FROM system.query_log
ORDER BY event_time DESC
LIMIT 1

Row 1:
──────
type:                 1
event_date:           2018-09-15
event_time:           2018-09-15 23:42:17
query_start_time:     2018-09-15 23:42:17
query_duration_ms:    0
read_rows:            0
read_bytes:           0
written_rows:         0
written_bytes:        0
result_rows:          0
result_bytes:         0
memory_usage:         0
query:                select * from system.metrics
exception:
stack_trace:
is_initial_query:     1
user:                 user
query_id:             0881d528-a79c-4bd7-8c0a-38ce270b95f1
address:              ��
M�
port:                 33384
initial_user:         user
initial_query_id:     0881d528-a79c-4bd7-8c0a-38ce270b95f1
initial_address:      ��
M�
initial_port:         33384
interface:            2
os_user:
client_hostname:
client_name:
client_revision:      0
client_version_major: 0
client_version_minor: 0
client_version_patch: 0
http_method:          1
http_user_agent:      Go-http-client/1.1
quota_key:
revision:             54407
thread_numbers:       []
ProfileEvents.Names:  []
ProfileEvents.Values: []
Settings.Names:       ['max_threads','use_uncompressed_cache','background_pool_size','load_balancing','log_queries','readonly','max_memory_usage']
Settings.Values:      ['48','0','64','random','1','1','32212254720']

记录线程数

  • The system.query_log and system.processes tables now show information about the number of threads that are participating in query execution (see the thread_numbers column). #2482
SELECT thread_numbers
FROM system.query_log
ORDER BY event_time DESC
LIMIT 10

┌─thread_numbers─┐
│ []             │
│ [88]           │
│ []             │
│ [77,173]       │
│ []             │
│ []             │
│ [77,174]       │
│ []             │
│ [77,171]       │
│ []             │
└────────────────┘


SELECT thread_numbers
FROM system.processes
LIMIT 10

┌─thread_numbers─┐
│ [77]           │
└────────────────┘

增加进程统计信息

  • Added ProfileEvents counters that measure the time spent on reading and writing over the network and reading and writing to disk, the number of network errors, and the time spent - waiting when network bandwidth is limited. #2482

  • Added ProfileEventscounters that contain the system metrics from rusage (you can use them to get information about CPU usage in userspace and the kernel, page faults, and context -

    switches), as well as taskstats metrics (use these to obtain information about I/O wait time, CPU wait time, and the amount of data read and recorded, both with and without page c ache). -#2482

  • The ProfileEvents counters are applied globally and for each query, as well as for each query execution thread, which allows you to profile resource consumption by query in detail. #2482

SELECT *
FROM system.processes
LIMIT 10

Row 1:
──────
is_initial_query:     1
user:                 user
query_id:             7e65449f-8899-4c5f-8859-20eeae32d1b1
address:              127.0.0.1
port:                 38610
initial_user:         user
initial_query_id:     7e65449f-8899-4c5f-8859-20eeae32d1b1
initial_address:      127.0.0.1
initial_port:         38610
interface:            1
os_user:              root
client_hostname:      xx.xx.xx.xx.
client_name:          ClickHouse client
client_version_major: 18
client_version_minor: 12
client_version_patch: 13
client_revision:      54407
http_method:          0
http_user_agent:
quota_key:
elapsed:              0.000800632
is_cancelled:         0
read_rows:            0
read_bytes:           0
total_rows_approx:    0
written_rows:         0
written_bytes:        0
memory_usage:         880
peak_memory_usage:    880
query:                select * from system.processes  limit 10
thread_numbers:       [181]
ProfileEvents.Names:  ['Query','SelectQuery','ReadCompressedBytes','CompressedReadBufferBlocks','CompressedReadBufferBytes','IOBufferAllocs','IOBufferAllocBytes','ContextLock','RWLockAcquiredReadLocks']
ProfileEvents.Values: [1,1,36,1,10,1,57,3,1]
Settings.Names:       ['max_threads','use_uncompressed_cache','background_pool_size','load_balancing','log_queries','max_memory_usage']
Settings.Values:      ['48','0','64','random','1','64424509440']

新增每个查询执行线程的信息

  • Added the system.query_thread_log table, which contains information about each query execution thread. Added the log_query_threads setting. #2482
SELECT *
FROM system.query_thread_log
ORDER BY event_time DESC
LIMIT 3

Row 1:
──────
event_date:           2018-09-15
event_time:           2018-09-15 22:29:17
query_start_time:     2018-09-15 22:29:17
query_duration_ms:    4
read_rows:            2178
read_bytes:           1446750
written_rows:         0
written_bytes:        0
memory_usage:         27136
peak_memory_usage:    44996168
thread_name:          ParalInputsProc
thread_number:        252
os_thread_id:         19255
master_thread_number: 74
master_os_thread_id:  9227
query:                select * from system.query_thread_log order by event_time desc  limit 10
is_initial_query:     1
user:                 user
query_id:             55740e27-e796-4b0f-a9ff-530363f91d76
address:              ��
port:                 52456
initial_user:         user
initial_query_id:     55740e27-e796-4b0f-a9ff-530363f91d76
initial_address:      ��
initial_port:         52456
interface:            1
os_user:              root
client_hostname:      xx.xx.xx.xx.
client_name:          ClickHouse client
client_revision:      54407
client_version_major: 18
client_version_minor: 12
client_version_patch: 13
http_method:          0
http_user_agent:
quota_key:
revision:             54407
ProfileEvents.Names:  ['FileOpen','ReadBufferFromFileDescriptorRead','ReadBufferFromFileDescriptorReadBytes','ReadCompressedBytes','CompressedReadBufferBlocks','CompressedReadBufferBytes','IOBufferAllocs','IOBufferAllocBytes','MarkCacheMisses','CreatedReadBufferOrdinary','DiskReadElapsedMicroseconds','ContextLock','RealTimeMicroseconds','UserTimeMicroseconds','SystemTimeMicroseconds','SoftPageFaults']
ProfileEvents.Values: [80,118,231033,229113,46,1101082,80,42862276,40,40,1135,2,4328,890,2671,678]

新增 system.metrics and system.events

  • The system.metrics and system.events tables now have built-in documentation. #3016
SELECT *
FROM system.metrics

┌─metric───────────────────────────────────┬──────value─┬─description────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Query                                    │          1 │ Number of executing queries                                                                                                                                                                                                                                │
│ Merge                                    │          0 │ Number of executing background merges                                                                                                                                                                                                                      │
│ PartMutation                             │          0 │ Number of mutations (ALTER DELETE/UPDATE)                                                                                                                                                                                                                  │
│ ReplicatedFetch                          │          0 │ Number of data parts fetching from replica                                                                                                                                                                                                                 │
│ ReplicatedSend                           │          0 │ Number of data parts sending to replicas                                                                                                                                                                                                                   │
│ ReplicatedChecks                         │          0 │ Number of data parts checking for consistency


SELECT *
FROM system.events
LIMIT 10

┌─event───────────────────────────────────┬────value─┬─description────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Query                                   │      127 │ Number of queries started to be interpreted and maybe executed. Does not include queries that are failed to parse, that are rejected due to AST size limits; rejected due to quota limits or limits on number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries. │
│ SelectQuery                             │      124 │ Same as Query, but only for SELECT queries.                                                                                                                                                                                                                │
│ FileOpen                                │    54504 │ Number of files opened.                                                                                                                                                                                                                                    │
│ Seek                                    │       47 │ Number of times the 'lseek' function was called.

新增arrayEnumerateDense函数

  • Added the arrayEnumerateDense function. Amos Bird
SELECT arrayEnumerateDense([(1, 2), (3, 4), (1, 2), (1, 2), (2, 3), (2, 3)])

┌─arrayEnumerateDense(array(tuple(1, 2), tuple(3, 4), tuple(1, 2), tuple(1, 2), tuple(2, 3), tuple(2, 3)))─┐
│ [1,2,1,1,3,3]                                                                                            │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────┘

新增arrayCumSumNonNegativ/arrayDifference函数

  • Added the arrayCumSumNonNegative and arrayDifference functions. Aleksey Studnev

  • 似乎还没上,无法使用。

新增retention函数

  • Added the retention aggregate function. Sundy Li
SELECT
    uid,
    retention(date = '2018-08-06', date = '2018-08-07', date = '2018-08-08') AS r
FROM retention_test
WHERE date IN ('2018-08-06', '2018-08-07', '2018-08-08')
GROUP BY uid
ORDER BY uid ASC
LIMIT 3

┌─uid─┬─r───────┐
│   0 │ [1,1,1] │
│   1 │ [1,1,1] │
│   2 │ [1,1,1] │
└─────┴─────────┘

states函数可以使用加号云算法

  • Now you can add (merge) states of aggregate functions by using the plus operator, and multiply the states of aggregate functions by a nonnegative constant. #3062 #3034
CREATE TABLE add_aggregate
(
    a UInt32,
    b UInt32
)
ENGINE = Memory

INSERT INTO add_aggregate VALUES(1, 2);

INSERT INTO add_aggregate VALUES(3, 1);


SELECT minMerge(x)
FROM
(
    SELECT minState(a) + minState(b) AS x
    FROM add_aggregate
)

┌─minMerge(x)─┐
│           1 │
└─────────────┘

虚拟列

  • Tables in the MergeTree family now have the virtual column _partition_id. #3089
SELECT _partition_id
FROM test.partition_id
ORDER BY _partition_id ASC

┌─_partition_id─┐
│ 197004        │
│ 197004        │
│ 197007        │
│ 197007        │
│ 197010        │
│ 197010        │
│ 201809        │
│ 201809        │
│ 201809        │
└───────────────┘



SELECT *
FROM test.partition_id
ORDER BY d ASC

┌──────────d─┬───x─┐
│ 1970-04-11 │   1 │
│ 1970-04-11 │   1 │
│ 1970-07-20 │   2 │
│ 1970-07-20 │   2 │
│ 1970-10-28 │   3 │
│ 1970-10-28 │   3 │
│ 2018-09-13 │ 100 │
│ 2018-09-14 │ 100 │
│ 2018-09-15 │ 100 │
└────────────┴─────┘

Bug修复:

  • 修复了 Dictionary 表的问题(抛出 Size of offsets doesn't match size of columnUnknown compression method 异常)。此错误出现在版本18.10.3中。 #2913
  • 修复了合并 CollapsingMergeTree 表时如果其中一个数据部分为空(这些部分是在合并期间形成或者 ALTER DELETE 所有数据都已删除)的错误,并且该 vertical 算法用于合并。 #3049
  • 在修正了比赛条件 DROPTRUNCATE 用于 Memory 与同时表 SELECT ,这可能导致服务器崩溃。此错误出现在1.1.54388版本中。 #3038
  • 修复了在返回错误时插入 Replicated 表时数据丢失的可能性 Session is expired (可以通过 ReplicatedDataLoss 度量检测到数据丢失)。版本1.1.54378中发生此错误。 #2939 #2949 #2964
  • 修复了一段时间内的段错误 JOIN ... ON#3000
  • 修复了 WHERE 表达式完全由限定列名称组成时的错误搜索列名称,例如 WHERE table.column#2994
  • 修复了在执行分布式查询时发生的“未找到列”错误,如果从远程服务器请求包含带有子查询的IN表达式的单个列。 #3087
  • 修复了 Block structure mismatch in UNION stream: different number of columns 分布式查询发生的错误,如果其中一个分片是本地分区而另一个分片不是,并且 PREWHERE 触发了移动优化。 #2226 #3037 #3055 #3065 #3073 #3090 #3093
  • 修复了 pointInPolygon 非凸多边形的某些情况的函数。 #2910
  • 修正了 nan 与整数比较时的错误结果。 #3024
  • 修复了 zlib-ng 库中可能导致段错误的错误。 #2854
  • 修复了插入带有 AggregateFunction 列的表时的内存泄漏,如果聚合函数的状态不简单(单独分配内存),以及单个插入请求是否导致多个小块。 #3084
  • 修复了同时创建和删除相同 BufferMergeTree 表格时的竞争条件。
  • 修复了比较由某些非平凡类型组成的元组(例如元组)时出现段错误的可能性。 #2989
  • 修复了运行某些 ON CLUSTER 查询时出现段错误的可能性。 张冬
  • 修复了数组元素 arrayDistinct 函数中的错误 Nullable#2845 #2937
  • enable_optimize_predicate_expression 选项现在可以正确支持案例 SELECT *张冬
  • 修复了重新初始化ZooKeeper会话时的段错误。 #2917
  • 修复了使用ZooKeeper时潜在的阻塞问题。
  • 修复了在a中添加嵌套数据结构的错误代码 SummingMergeTree
  • 在为聚合函数的状态分配内存时,正确地考虑了对齐,这使得在实现聚合函数的状态时可以使用需要对齐的操作。 晨星-XC

安全修复:

  • 安全使用ODBC数据源。与ODBC驱动程序的交互使用单独的 clickhouse-odbc-bridge 进程。第三方ODBC驱动程序中的错误不再导致服务器稳定性或漏洞问题。 #2828 #2879 #2886 #2893 #2921
  • 修复了 catBoostPool 表函数中文件路径的错误验证问题。 #2894
  • 系统表(的内容 tablesdatabasespartscolumnsparts_columnsmergesmutationsreplicas ,和 replication_queue )根据用户的配置访问数据库过滤( allow_databases )。 张冬

向后不兼容的变化:

  • 在使用JOIN的请求中,星形字符扩展为所有表中的列列表,符合 SQL 标准。您可以通过 asterisk_left_columns_only 在用户配置级别设置为1 来恢复旧行为。

构建更改:

  • 现在大多数集成测试都可以通过提交来运行。
  • 代码样式检查也可以通过提交运行。
  • memcpy 上CentOS7 / Fedora的建设时,实施正确选择。 Etienne Champetier
  • 使用clang进行构建时, -Weverything 除了常规警告外,还添加了一些警告 -Wall-Wextra -Werror#2957
  • 调试构建使用 jemalloc 调试选项。
  • 用于与ZooKeeper交互的库的接口被声明为abstract。 #2950

以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

数文明

数文明

涂子沛 / 中信出版集团 / 2018-9 / 88.00元

从量数、据数、普适记录、人脸识别、以图搜车,到雾计算、城市大脑、单粒度治理、无匿名社会、量子思维……作为中国研究大数据的权威专家,作者在《数文明》一书中,以大数据为核心元素,抽丝剥茧,深入地阐述了这个大数据时代的文明社会——一个全新的数文明时代。 将大数据与人类文明融合在一起,这本书提供给我们的不仅是一种全新的叙事结构,它还将突破你的认知边界和思维极限,给你提供一个应对这个世界的全新的认知方......一起来看看 《数文明》 这本书的介绍吧!

SHA 加密
SHA 加密

SHA 加密工具

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具