ClickHouse 18.12.13-2018-09-10版本新特性解析

栏目: 数据库 · 发布时间: 6年前

ClickHouse的发版速度是众所周知的快

在最近，他们正式发出了18.12.13-2018-09-10版本

新特性列表（按照GitHub中CHANGELOG顺序）

支持Decimal

Added the DECIMAL(digits, scale) data type (Decimal32(scale), Decimal64(scale), Decimal128(scale)). To enable it, use the setting allow_experimental_decimal_type. #2846 #2970 #3008 #3047

SELECT *
FROM data_type_families
WHERE name LIKE '%De%'

┌─name───────┬─case_insensitive─┬─alias_to─┐
│ Decimal32  │                1 │          │
│ Decimal64  │                1 │          │
│ Decimal128 │                1 │          │
│ Decimal    │                1 │          │
└────────────┴──────────────────┴──────────┘

新的WITH ROLLUP修饰符GROUP BY，替代语法 GROUP BY ROLLUP(…)

New WITH ROLLUP modifier for GROUP BY (alternative syntax: GROUP BY ROLLUP(…)). #2948

JOIN查询会把*解析为字段

In requests with JOIN, the star character expands to a list of columns in all tables, in compliance with the SQL standard. You can restore the old behavior by setting - asterisk_left_columns_only to 1 on the user configuration level. Winter Zhang

JOIN支持table functions(remote/merge/numbers/url)

Added support for JOIN with table functions. Winter Zhang

终端支持tab自动补全

Autocomplete by pressing Tab in clickhouse-client. Sergey Shcherbin

终端支持ctrl c取消输入

Ctrl+C in clickhouse-client clears a query that was entered. #2877

可指定默认的JOIN行为

Added the join_default_strictness setting (values: “, ‘any’, ‘all’). This allows you to not specify ANY or ALL for JOIN. #2982

server log关联查询ID

Each line of the server log related to query processing shows the query ID. #2482

2018.09.15 23:01:12.934700 [ 275 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Debug> executeQuery: (from xx.xx.80.34:37066, user: user) select * from numbers(100)
2018.09.15 23:01:12.934984 [ 275 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
2018.09.15 23:01:12.935029 [ 86 ] <Trace> SystemLog (system.query_log): Flushing system log
2018.09.15 23:01:12.935036 [ 275 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Debug> executeQuery: Query pipeline:
Expression
 Expression
  Limit
   Numbers

2018.09.15 23:01:12.935391 [ 277 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Trace> ThreadStatus: Thread 277 exited
2018.09.15 23:01:12.935449 [ 87 ] <Trace> SystemLog (system.query_thread_log): Flushing system log
2018.09.15 23:01:12.935517 [ 275 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Information> executeQuery: Read 100 rows, 800.00 B in 0.001 sec., 146631 rows/sec., 1.12 MiB/sec.
2018.09.15 23:01:12.935557 [ 275 ] <Debug> MemoryTracker: Peak memory usage (total): 1.00 MiB.
2018.09.15 23:01:12.935572 [ 275 ] <Information> TCPHandler: Processed in 0.001 sec.

终端可以直接print日志

Now you can get query execution logs in clickhouse-client (use the send_logs_level setting). With distributed query processing, logs are cascaded from all the servers. #2482

SELECT *
FROM system.settings
WHERE name = 'send_logs_level'

┌─name────────────┬─value─┬─changed─┬─description──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ send_logs_level │ none  │       0 │ Send server text logs with specified minumum level to client. Valid values: 'trace', 'debug', 'info', 'warning', 'error', 'none' │
└─────────────────┴───────┴─────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

1 rows in set. Elapsed: 0.002 sec.

xx.xx.xx.xx. :) set send_logs_level = 'trace';

SET send_logs_level = 'trace'

Ok.

0 rows in set. Elapsed: 0.001 sec.

xx.xx.xx.xx. :) select * from system.settings where name = 'send_logs_level' ;

SELECT *
FROM system.settings
WHERE name = 'send_logs_level'

[xx.xx.xx.xx.] 2018.09.15 23:30:51.158294 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 275 ] <Debug> executeQuery: (from 127.0.0.1:59056, user: user) select * from system.settings where name = 'send_logs_level'
[xx.xx.xx.xx.] 2018.09.15 23:30:51.159056 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 275 ] <Trace> InterpreterSelectQuery: FetchColumns -> Complete
[xx.xx.xx.xx.] 2018.09.15 23:30:51.159152 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 275 ] <Debug> executeQuery: Query pipeline:
Expression
 Expression
  Filter
   One

┌─name────────────┬─value─┬─changed─┬─description──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ send_logs_level │ trace │       1 │ Send server text logs with specified minumum level to client. Valid values: 'trace', 'debug', 'info', 'warning', 'error', 'none' │
└─────────────────┴───────┴─────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
[xx.xx.xx.xx.] 2018.09.15 23:30:51.159654 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 398 ] <Trace> ThreadStatus: Thread 398 exited
[xx.xx.xx.xx.] 2018.09.15 23:30:51.159803 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 275 ] <Information> executeQuery: Read 178 rows, 25.15 KiB in 0.001 sec., 127014 rows/sec., 17.53 MiB/sec.


# 分布式查询的日志也会打到当前终端

记录setting行为到query_log

The system.query_log and system.processes (SHOW PROCESSLIST) tables now have information about all changed settings when you run a query (the nested structure of the Settings data). - Added the log_query_settings setting. #2482

SELECT *
FROM system.query_log
ORDER BY event_time DESC
LIMIT 1

Row 1:
──────
type:                 1
event_date:           2018-09-15
event_time:           2018-09-15 23:42:17
query_start_time:     2018-09-15 23:42:17
query_duration_ms:    0
read_rows:            0
read_bytes:           0
written_rows:         0
written_bytes:        0
result_rows:          0
result_bytes:         0
memory_usage:         0
query:                select * from system.metrics
exception:
stack_trace:
is_initial_query:     1
user:                 user
query_id:             0881d528-a79c-4bd7-8c0a-38ce270b95f1
address:              ��
M�
port:                 33384
initial_user:         user
initial_query_id:     0881d528-a79c-4bd7-8c0a-38ce270b95f1
initial_address:      ��
M�
initial_port:         33384
interface:            2
os_user:
client_hostname:
client_name:
client_revision:      0
client_version_major: 0
client_version_minor: 0
client_version_patch: 0
http_method:          1
http_user_agent:      Go-http-client/1.1
quota_key:
revision:             54407
thread_numbers:       []
ProfileEvents.Names:  []
ProfileEvents.Values: []
Settings.Names:       ['max_threads','use_uncompressed_cache','background_pool_size','load_balancing','log_queries','readonly','max_memory_usage']
Settings.Values:      ['48','0','64','random','1','1','32212254720']

记录线程数

The system.query_log and system.processes tables now show information about the number of threads that are participating in query execution (see the thread_numbers column). #2482

SELECT thread_numbers
FROM system.query_log
ORDER BY event_time DESC
LIMIT 10

┌─thread_numbers─┐
│ []             │
│ [88]           │
│ []             │
│ [77,173]       │
│ []             │
│ []             │
│ [77,174]       │
│ []             │
│ [77,171]       │
│ []             │
└────────────────┘


SELECT thread_numbers
FROM system.processes
LIMIT 10

┌─thread_numbers─┐
│ [77]           │
└────────────────┘

增加进程统计信息

Added ProfileEvents counters that measure the time spent on reading and writing over the network and reading and writing to disk, the number of network errors, and the time spent - waiting when network bandwidth is limited. #2482
Added ProfileEventscounters that contain the system metrics from rusage (you can use them to get information about CPU usage in userspace and the kernel, page faults, and context -

switches), as well as taskstats metrics (use these to obtain information about I/O wait time, CPU wait time, and the amount of data read and recorded, both with and without page c ache). -#2482
The ProfileEvents counters are applied globally and for each query, as well as for each query execution thread, which allows you to profile resource consumption by query in detail. #2482

SELECT *
FROM system.processes
LIMIT 10

Row 1:
──────
is_initial_query:     1
user:                 user
query_id:             7e65449f-8899-4c5f-8859-20eeae32d1b1
address:              127.0.0.1
port:                 38610
initial_user:         user
initial_query_id:     7e65449f-8899-4c5f-8859-20eeae32d1b1
initial_address:      127.0.0.1
initial_port:         38610
interface:            1
os_user:              root
client_hostname:      xx.xx.xx.xx.
client_name:          ClickHouse client
client_version_major: 18
client_version_minor: 12
client_version_patch: 13
client_revision:      54407
http_method:          0
http_user_agent:
quota_key:
elapsed:              0.000800632
is_cancelled:         0
read_rows:            0
read_bytes:           0
total_rows_approx:    0
written_rows:         0
written_bytes:        0
memory_usage:         880
peak_memory_usage:    880
query:                select * from system.processes  limit 10
thread_numbers:       [181]
ProfileEvents.Names:  ['Query','SelectQuery','ReadCompressedBytes','CompressedReadBufferBlocks','CompressedReadBufferBytes','IOBufferAllocs','IOBufferAllocBytes','ContextLock','RWLockAcquiredReadLocks']
ProfileEvents.Values: [1,1,36,1,10,1,57,3,1]
Settings.Names:       ['max_threads','use_uncompressed_cache','background_pool_size','load_balancing','log_queries','max_memory_usage']
Settings.Values:      ['48','0','64','random','1','64424509440']

新增每个查询执行线程的信息

Added the system.query_thread_log table, which contains information about each query execution thread. Added the log_query_threads setting. #2482

SELECT *
FROM system.query_thread_log
ORDER BY event_time DESC
LIMIT 3

Row 1:
──────
event_date:           2018-09-15
event_time:           2018-09-15 22:29:17
query_start_time:     2018-09-15 22:29:17
query_duration_ms:    4
read_rows:            2178
read_bytes:           1446750
written_rows:         0
written_bytes:        0
memory_usage:         27136
peak_memory_usage:    44996168
thread_name:          ParalInputsProc
thread_number:        252
os_thread_id:         19255
master_thread_number: 74
master_os_thread_id:  9227
query:                select * from system.query_thread_log order by event_time desc  limit 10
is_initial_query:     1
user:                 user
query_id:             55740e27-e796-4b0f-a9ff-530363f91d76
address:              ��
port:                 52456
initial_user:         user
initial_query_id:     55740e27-e796-4b0f-a9ff-530363f91d76
initial_address:      ��
initial_port:         52456
interface:            1
os_user:              root
client_hostname:      xx.xx.xx.xx.
client_name:          ClickHouse client
client_revision:      54407
client_version_major: 18
client_version_minor: 12
client_version_patch: 13
http_method:          0
http_user_agent:
quota_key:
revision:             54407
ProfileEvents.Names:  ['FileOpen','ReadBufferFromFileDescriptorRead','ReadBufferFromFileDescriptorReadBytes','ReadCompressedBytes','CompressedReadBufferBlocks','CompressedReadBufferBytes','IOBufferAllocs','IOBufferAllocBytes','MarkCacheMisses','CreatedReadBufferOrdinary','DiskReadElapsedMicroseconds','ContextLock','RealTimeMicroseconds','UserTimeMicroseconds','SystemTimeMicroseconds','SoftPageFaults']
ProfileEvents.Values: [80,118,231033,229113,46,1101082,80,42862276,40,40,1135,2,4328,890,2671,678]

新增 system.metrics and system.events

The system.metrics and system.events tables now have built-in documentation. #3016

SELECT *
FROM system.metrics

┌─metric───────────────────────────────────┬──────value─┬─description────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Query                                    │          1 │ Number of executing queries                                                                                                                                                                                                                                │
│ Merge                                    │          0 │ Number of executing background merges                                                                                                                                                                                                                      │
│ PartMutation                             │          0 │ Number of mutations (ALTER DELETE/UPDATE)                                                                                                                                                                                                                  │
│ ReplicatedFetch                          │          0 │ Number of data parts fetching from replica                                                                                                                                                                                                                 │
│ ReplicatedSend                           │          0 │ Number of data parts sending to replicas                                                                                                                                                                                                                   │
│ ReplicatedChecks                         │          0 │ Number of data parts checking for consistency


SELECT *
FROM system.events
LIMIT 10

┌─event───────────────────────────────────┬────value─┬─description────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Query                                   │      127 │ Number of queries started to be interpreted and maybe executed. Does not include queries that are failed to parse, that are rejected due to AST size limits; rejected due to quota limits or limits on number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries. │
│ SelectQuery                             │      124 │ Same as Query, but only for SELECT queries.                                                                                                                                                                                                                │
│ FileOpen                                │    54504 │ Number of files opened.                                                                                                                                                                                                                                    │
│ Seek                                    │       47 │ Number of times the 'lseek' function was called.

新增arrayEnumerateDense函数

Added the arrayEnumerateDense function. Amos Bird

SELECT arrayEnumerateDense([(1, 2), (3, 4), (1, 2), (1, 2), (2, 3), (2, 3)])

┌─arrayEnumerateDense(array(tuple(1, 2), tuple(3, 4), tuple(1, 2), tuple(1, 2), tuple(2, 3), tuple(2, 3)))─┐
│ [1,2,1,1,3,3]                                                                                            │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────┘

新增arrayCumSumNonNegativ/arrayDifference函数

Added the arrayCumSumNonNegative and arrayDifference functions. Aleksey Studnev
似乎还没上，无法使用。

新增retention函数

Added the retention aggregate function. Sundy Li

SELECT
    uid,
    retention(date = '2018-08-06', date = '2018-08-07', date = '2018-08-08') AS r
FROM retention_test
WHERE date IN ('2018-08-06', '2018-08-07', '2018-08-08')
GROUP BY uid
ORDER BY uid ASC
LIMIT 3

┌─uid─┬─r───────┐
│   0 │ [1,1,1] │
│   1 │ [1,1,1] │
│   2 │ [1,1,1] │
└─────┴─────────┘

states函数可以使用加号云算法

Now you can add (merge) states of aggregate functions by using the plus operator, and multiply the states of aggregate functions by a nonnegative constant. #3062 #3034

CREATE TABLE add_aggregate
(
    a UInt32,
    b UInt32
)
ENGINE = Memory

INSERT INTO add_aggregate VALUES(1, 2);

INSERT INTO add_aggregate VALUES(3, 1);


SELECT minMerge(x)
FROM
(
    SELECT minState(a) + minState(b) AS x
    FROM add_aggregate
)

┌─minMerge(x)─┐
│           1 │
└─────────────┘

虚拟列

Tables in the MergeTree family now have the virtual column _partition_id. #3089

SELECT _partition_id
FROM test.partition_id
ORDER BY _partition_id ASC

┌─_partition_id─┐
│ 197004        │
│ 197004        │
│ 197007        │
│ 197007        │
│ 197010        │
│ 197010        │
│ 201809        │
│ 201809        │
│ 201809        │
└───────────────┘



SELECT *
FROM test.partition_id
ORDER BY d ASC

┌──────────d─┬───x─┐
│ 1970-04-11 │   1 │
│ 1970-04-11 │   1 │
│ 1970-07-20 │   2 │
│ 1970-07-20 │   2 │
│ 1970-10-28 │   3 │
│ 1970-10-28 │   3 │
│ 2018-09-13 │ 100 │
│ 2018-09-14 │ 100 │
│ 2018-09-15 │ 100 │
└────────────┴─────┘

Bug修复：

修复了 Dictionary 表的问题（抛出 Size of offsets doesn't match size of column 或 Unknown compression method 异常）。此错误出现在版本18.10.3中。＃2913
修复了合并 CollapsingMergeTree 表时如果其中一个数据部分为空（这些部分是在合并期间形成或者 ALTER DELETE 所有数据都已删除）的错误，并且该 vertical 算法用于合并。＃3049
在修正了比赛条件 DROP 或 TRUNCATE 用于 Memory 与同时表 SELECT ，这可能导致服务器崩溃。此错误出现在1.1.54388版本中。＃3038
修复了在返回错误时插入 Replicated 表时数据丢失的可能性 Session is expired （可以通过 ReplicatedDataLoss 度量检测到数据丢失）。版本1.1.54378中发生此错误。＃2939 ＃2949 ＃2964
修复了一段时间内的段错误 JOIN ... ON 。＃3000
修复了 WHERE 表达式完全由限定列名称组成时的错误搜索列名称，例如 WHERE table.column 。＃2994
修复了在执行分布式查询时发生的“未找到列”错误，如果从远程服务器请求包含带有子查询的IN表达式的单个列。＃3087
修复了 Block structure mismatch in UNION stream: different number of columns 分布式查询发生的错误，如果其中一个分片是本地分区而另一个分片不是，并且 PREWHERE 触发了移动优化。＃2226 ＃3037 ＃3055 ＃3065 ＃3073 ＃3090 ＃3093
修复了 pointInPolygon 非凸多边形的某些情况的函数。＃2910
修正了 nan 与整数比较时的错误结果。＃3024
修复了 zlib-ng 库中可能导致段错误的错误。＃2854
修复了插入带有 AggregateFunction 列的表时的内存泄漏，如果聚合函数的状态不简单（单独分配内存），以及单个插入请求是否导致多个小块。＃3084
修复了同时创建和删除相同 Buffer 或 MergeTree 表格时的竞争条件。
修复了比较由某些非平凡类型组成的元组（例如元组）时出现段错误的可能性。＃2989
修复了运行某些 ON CLUSTER 查询时出现段错误的可能性。张冬
修复了数组元素 arrayDistinct 函数中的错误 Nullable 。＃2845 ＃2937
该 enable_optimize_predicate_expression 选项现在可以正确支持案例 SELECT * 。张冬
修复了重新初始化ZooKeeper会话时的段错误。＃2917
修复了使用ZooKeeper时潜在的阻塞问题。
修复了在a中添加嵌套数据结构的错误代码 SummingMergeTree 。
在为聚合函数的状态分配内存时，正确地考虑了对齐，这使得在实现聚合函数的状态时可以使用需要对齐的操作。晨星-XC

安全修复：

安全使用ODBC数据源。与ODBC驱动程序的交互使用单独的 clickhouse-odbc-bridge 进程。第三方ODBC驱动程序中的错误不再导致服务器稳定性或漏洞问题。＃2828 ＃2879 ＃2886 ＃2893 ＃2921
修复了 catBoostPool 表函数中文件路径的错误验证问题。＃2894
系统表（的内容 tables ， databases ， parts ， columns ， parts_columns ， merges ， mutations ， replicas ，和 replication_queue ）根据用户的配置访问数据库过滤（ allow_databases ）。张冬

向后不兼容的变化：

在使用JOIN的请求中，星形字符扩展为所有表中的列列表，符合 SQL 标准。您可以通过 asterisk_left_columns_only 在用户配置级别设置为1 来恢复旧行为。

构建更改：

现在大多数集成测试都可以通过提交来运行。
代码样式检查也可以通过提交运行。
在 memcpy 上CentOS7 / Fedora的建设时，实施正确选择。 Etienne Champetier
使用clang进行构建时， -Weverything 除了常规警告外，还添加了一些警告 -Wall-Wextra -Werror 。＃2957
调试构建使用 jemalloc 调试选项。
用于与ZooKeeper交互的库的接口被声明为abstract。＃2950

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Hadoop: The Definitive Guide

Tom White / O'Reilly Media, Inc. / 2009 / 44.99

Apache Hadoop is ideal for organizations with a growing need to store and process massive application datasets. Hadoop: The Definitive Guide is a comprehensive resource for using Hadoop to build relia......一起来看看《Hadoop: The Definitive Guide》这本书的介绍吧!

码农工具

JS 压缩/解压工具

在线压缩/解压 JS 代码

ClickHouse 18.12.13-2018-09-10版本新特性解析

新特性列表（按照GitHub中CHANGELOG顺序）

支持Decimal

新的WITH ROLLUP修饰符GROUP BY，替代语法 GROUP BY ROLLUP(…)

JOIN查询会把*解析为字段

JOIN支持table functions(remote/merge/numbers/url)

终端支持tab自动补全

终端支持ctrl c取消输入

可指定默认的JOIN行为

server log关联查询ID

终端可以直接print日志

记录setting行为到query_log

记录线程数

增加进程统计信息

新增每个查询执行线程的信息

新增 system.metrics and system.events

新增arrayEnumerateDense函数

新增arrayCumSumNonNegativ/arrayDifference函数

新增retention函数

states函数可以使用加号云算法

虚拟列

Bug修复：

安全修复：

向后不兼容的变化：

构建更改：

Hadoop: The Definitive Guide

JS 压缩/解压工具

URL 编码/解码

html转js在线工具