ClickHouse的发版速度是众所周知的快
在最近,他们正式发出了18.12.13-2018-09-10版本
相关的CHANGELOG更是多的吓人
为了能够更好的使用新版特性,特做了详细的介绍
其中新特性部分,为人工翻译、校对,毕竟这部分内容是重点,后面为Google翻译
新特性列表(按照GitHub中CHANGELOG顺序)
支持Decimal
- Added the DECIMAL(digits, scale) data type (Decimal32(scale), Decimal64(scale), Decimal128(scale)). To enable it, use the setting allow_experimental_decimal_type. #2846 #2970 #3008 #3047
SELECT * FROM data_type_families WHERE name LIKE '%De%' ┌─name───────┬─case_insensitive─┬─alias_to─┐ │ Decimal32 │ 1 │ │ │ Decimal64 │ 1 │ │ │ Decimal128 │ 1 │ │ │ Decimal │ 1 │ │ └────────────┴──────────────────┴──────────┘
新的WITH ROLLUP修饰符GROUP BY,替代语法 GROUP BY ROLLUP(…)
- New WITH ROLLUP modifier for GROUP BY (alternative syntax: GROUP BY ROLLUP(…)). #2948
JOIN查询会把*解析为字段
- In requests with JOIN, the star character expands to a list of columns in all tables, in compliance with the SQL standard. You can restore the old behavior by setting - asterisk_left_columns_only to 1 on the user configuration level. Winter Zhang
JOIN支持table functions(remote/merge/numbers/url)
- Added support for JOIN with table functions. Winter Zhang
终端支持tab自动补全
- Autocomplete by pressing Tab in clickhouse-client. Sergey Shcherbin
终端支持ctrl c取消输入
- Ctrl+C in clickhouse-client clears a query that was entered. #2877
可指定默认的JOIN行为
- Added the join_default_strictness setting (values: “, ‘any’, ‘all’). This allows you to not specify ANY or ALL for JOIN. #2982
server log关联查询ID
- Each line of the server log related to query processing shows the query ID. #2482
2018.09.15 23:01:12.934700 [ 275 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Debug> executeQuery: (from xx.xx.80.34:37066, user: user) select * from numbers(100) 2018.09.15 23:01:12.934984 [ 275 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Trace> InterpreterSelectQuery: FetchColumns -> Complete 2018.09.15 23:01:12.935029 [ 86 ] <Trace> SystemLog (system.query_log): Flushing system log 2018.09.15 23:01:12.935036 [ 275 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Debug> executeQuery: Query pipeline: Expression Expression Limit Numbers 2018.09.15 23:01:12.935391 [ 277 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Trace> ThreadStatus: Thread 277 exited 2018.09.15 23:01:12.935449 [ 87 ] <Trace> SystemLog (system.query_thread_log): Flushing system log 2018.09.15 23:01:12.935517 [ 275 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Information> executeQuery: Read 100 rows, 800.00 B in 0.001 sec., 146631 rows/sec., 1.12 MiB/sec. 2018.09.15 23:01:12.935557 [ 275 ] <Debug> MemoryTracker: Peak memory usage (total): 1.00 MiB. 2018.09.15 23:01:12.935572 [ 275 ] <Information> TCPHandler: Processed in 0.001 sec.
终端可以直接print日志
- Now you can get query execution logs in clickhouse-client (use the send_logs_level setting). With distributed query processing, logs are cascaded from all the servers. #2482
SELECT * FROM system.settings WHERE name = 'send_logs_level' ┌─name────────────┬─value─┬─changed─┬─description──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ send_logs_level │ none │ 0 │ Send server text logs with specified minumum level to client. Valid values: 'trace', 'debug', 'info', 'warning', 'error', 'none' │ └─────────────────┴───────┴─────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ 1 rows in set. Elapsed: 0.002 sec. xx.xx.xx.xx. :) set send_logs_level = 'trace'; SET send_logs_level = 'trace' Ok. 0 rows in set. Elapsed: 0.001 sec. xx.xx.xx.xx. :) select * from system.settings where name = 'send_logs_level' ; SELECT * FROM system.settings WHERE name = 'send_logs_level' [xx.xx.xx.xx.] 2018.09.15 23:30:51.158294 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 275 ] <Debug> executeQuery: (from 127.0.0.1:59056, user: user) select * from system.settings where name = 'send_logs_level' [xx.xx.xx.xx.] 2018.09.15 23:30:51.159056 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 275 ] <Trace> InterpreterSelectQuery: FetchColumns -> Complete [xx.xx.xx.xx.] 2018.09.15 23:30:51.159152 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 275 ] <Debug> executeQuery: Query pipeline: Expression Expression Filter One ┌─name────────────┬─value─┬─changed─┬─description──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ send_logs_level │ trace │ 1 │ Send server text logs with specified minumum level to client. Valid values: 'trace', 'debug', 'info', 'warning', 'error', 'none' │ └─────────────────┴───────┴─────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ [xx.xx.xx.xx.] 2018.09.15 23:30:51.159654 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 398 ] <Trace> ThreadStatus: Thread 398 exited [xx.xx.xx.xx.] 2018.09.15 23:30:51.159803 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 275 ] <Information> executeQuery: Read 178 rows, 25.15 KiB in 0.001 sec., 127014 rows/sec., 17.53 MiB/sec. # 分布式查询的日志也会打到当前终端
记录setting行为到query_log
- The system.query_log and system.processes (SHOW PROCESSLIST) tables now have information about all changed settings when you run a query (the nested structure of the Settings data). - Added the log_query_settings setting. #2482
SELECT * FROM system.query_log ORDER BY event_time DESC LIMIT 1 Row 1: ────── type: 1 event_date: 2018-09-15 event_time: 2018-09-15 23:42:17 query_start_time: 2018-09-15 23:42:17 query_duration_ms: 0 read_rows: 0 read_bytes: 0 written_rows: 0 written_bytes: 0 result_rows: 0 result_bytes: 0 memory_usage: 0 query: select * from system.metrics exception: stack_trace: is_initial_query: 1 user: user query_id: 0881d528-a79c-4bd7-8c0a-38ce270b95f1 address: �� M� port: 33384 initial_user: user initial_query_id: 0881d528-a79c-4bd7-8c0a-38ce270b95f1 initial_address: �� M� initial_port: 33384 interface: 2 os_user: client_hostname: client_name: client_revision: 0 client_version_major: 0 client_version_minor: 0 client_version_patch: 0 http_method: 1 http_user_agent: Go-http-client/1.1 quota_key: revision: 54407 thread_numbers: [] ProfileEvents.Names: [] ProfileEvents.Values: [] Settings.Names: ['max_threads','use_uncompressed_cache','background_pool_size','load_balancing','log_queries','readonly','max_memory_usage'] Settings.Values: ['48','0','64','random','1','1','32212254720']
记录线程数
- The system.query_log and system.processes tables now show information about the number of threads that are participating in query execution (see the thread_numbers column). #2482
SELECT thread_numbers FROM system.query_log ORDER BY event_time DESC LIMIT 10 ┌─thread_numbers─┐ │ [] │ │ [88] │ │ [] │ │ [77,173] │ │ [] │ │ [] │ │ [77,174] │ │ [] │ │ [77,171] │ │ [] │ └────────────────┘ SELECT thread_numbers FROM system.processes LIMIT 10 ┌─thread_numbers─┐ │ [77] │ └────────────────┘
增加进程统计信息
-
Added ProfileEvents counters that measure the time spent on reading and writing over the network and reading and writing to disk, the number of network errors, and the time spent - waiting when network bandwidth is limited. #2482
-
Added ProfileEventscounters that contain the system metrics from rusage (you can use them to get information about CPU usage in userspace and the kernel, page faults, and context -
switches), as well as taskstats metrics (use these to obtain information about I/O wait time, CPU wait time, and the amount of data read and recorded, both with and without page c ache). -#2482
-
The ProfileEvents counters are applied globally and for each query, as well as for each query execution thread, which allows you to profile resource consumption by query in detail. #2482
SELECT * FROM system.processes LIMIT 10 Row 1: ────── is_initial_query: 1 user: user query_id: 7e65449f-8899-4c5f-8859-20eeae32d1b1 address: 127.0.0.1 port: 38610 initial_user: user initial_query_id: 7e65449f-8899-4c5f-8859-20eeae32d1b1 initial_address: 127.0.0.1 initial_port: 38610 interface: 1 os_user: root client_hostname: xx.xx.xx.xx. client_name: ClickHouse client client_version_major: 18 client_version_minor: 12 client_version_patch: 13 client_revision: 54407 http_method: 0 http_user_agent: quota_key: elapsed: 0.000800632 is_cancelled: 0 read_rows: 0 read_bytes: 0 total_rows_approx: 0 written_rows: 0 written_bytes: 0 memory_usage: 880 peak_memory_usage: 880 query: select * from system.processes limit 10 thread_numbers: [181] ProfileEvents.Names: ['Query','SelectQuery','ReadCompressedBytes','CompressedReadBufferBlocks','CompressedReadBufferBytes','IOBufferAllocs','IOBufferAllocBytes','ContextLock','RWLockAcquiredReadLocks'] ProfileEvents.Values: [1,1,36,1,10,1,57,3,1] Settings.Names: ['max_threads','use_uncompressed_cache','background_pool_size','load_balancing','log_queries','max_memory_usage'] Settings.Values: ['48','0','64','random','1','64424509440']
新增每个查询执行线程的信息
- Added the system.query_thread_log table, which contains information about each query execution thread. Added the log_query_threads setting. #2482
SELECT * FROM system.query_thread_log ORDER BY event_time DESC LIMIT 3 Row 1: ────── event_date: 2018-09-15 event_time: 2018-09-15 22:29:17 query_start_time: 2018-09-15 22:29:17 query_duration_ms: 4 read_rows: 2178 read_bytes: 1446750 written_rows: 0 written_bytes: 0 memory_usage: 27136 peak_memory_usage: 44996168 thread_name: ParalInputsProc thread_number: 252 os_thread_id: 19255 master_thread_number: 74 master_os_thread_id: 9227 query: select * from system.query_thread_log order by event_time desc limit 10 is_initial_query: 1 user: user query_id: 55740e27-e796-4b0f-a9ff-530363f91d76 address: �� port: 52456 initial_user: user initial_query_id: 55740e27-e796-4b0f-a9ff-530363f91d76 initial_address: �� initial_port: 52456 interface: 1 os_user: root client_hostname: xx.xx.xx.xx. client_name: ClickHouse client client_revision: 54407 client_version_major: 18 client_version_minor: 12 client_version_patch: 13 http_method: 0 http_user_agent: quota_key: revision: 54407 ProfileEvents.Names: ['FileOpen','ReadBufferFromFileDescriptorRead','ReadBufferFromFileDescriptorReadBytes','ReadCompressedBytes','CompressedReadBufferBlocks','CompressedReadBufferBytes','IOBufferAllocs','IOBufferAllocBytes','MarkCacheMisses','CreatedReadBufferOrdinary','DiskReadElapsedMicroseconds','ContextLock','RealTimeMicroseconds','UserTimeMicroseconds','SystemTimeMicroseconds','SoftPageFaults'] ProfileEvents.Values: [80,118,231033,229113,46,1101082,80,42862276,40,40,1135,2,4328,890,2671,678]
新增 system.metrics and system.events
- The system.metrics and system.events tables now have built-in documentation. #3016
SELECT * FROM system.metrics ┌─metric───────────────────────────────────┬──────value─┬─description────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ Query │ 1 │ Number of executing queries │ │ Merge │ 0 │ Number of executing background merges │ │ PartMutation │ 0 │ Number of mutations (ALTER DELETE/UPDATE) │ │ ReplicatedFetch │ 0 │ Number of data parts fetching from replica │ │ ReplicatedSend │ 0 │ Number of data parts sending to replicas │ │ ReplicatedChecks │ 0 │ Number of data parts checking for consistency SELECT * FROM system.events LIMIT 10 ┌─event───────────────────────────────────┬────value─┬─description────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ Query │ 127 │ Number of queries started to be interpreted and maybe executed. Does not include queries that are failed to parse, that are rejected due to AST size limits; rejected due to quota limits or limits on number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries. │ │ SelectQuery │ 124 │ Same as Query, but only for SELECT queries. │ │ FileOpen │ 54504 │ Number of files opened. │ │ Seek │ 47 │ Number of times the 'lseek' function was called.
新增arrayEnumerateDense函数
- Added the arrayEnumerateDense function. Amos Bird
SELECT arrayEnumerateDense([(1, 2), (3, 4), (1, 2), (1, 2), (2, 3), (2, 3)]) ┌─arrayEnumerateDense(array(tuple(1, 2), tuple(3, 4), tuple(1, 2), tuple(1, 2), tuple(2, 3), tuple(2, 3)))─┐ │ [1,2,1,1,3,3] │ └──────────────────────────────────────────────────────────────────────────────────────────────────────────┘
新增arrayCumSumNonNegativ/arrayDifference函数
-
Added the arrayCumSumNonNegative and arrayDifference functions. Aleksey Studnev
-
似乎还没上,无法使用。
新增retention函数
- Added the retention aggregate function. Sundy Li
SELECT uid, retention(date = '2018-08-06', date = '2018-08-07', date = '2018-08-08') AS r FROM retention_test WHERE date IN ('2018-08-06', '2018-08-07', '2018-08-08') GROUP BY uid ORDER BY uid ASC LIMIT 3 ┌─uid─┬─r───────┐ │ 0 │ [1,1,1] │ │ 1 │ [1,1,1] │ │ 2 │ [1,1,1] │ └─────┴─────────┘
states函数可以使用加号云算法
- Now you can add (merge) states of aggregate functions by using the plus operator, and multiply the states of aggregate functions by a nonnegative constant. #3062 #3034
CREATE TABLE add_aggregate ( a UInt32, b UInt32 ) ENGINE = Memory INSERT INTO add_aggregate VALUES(1, 2); INSERT INTO add_aggregate VALUES(3, 1); SELECT minMerge(x) FROM ( SELECT minState(a) + minState(b) AS x FROM add_aggregate ) ┌─minMerge(x)─┐ │ 1 │ └─────────────┘
虚拟列
- Tables in the MergeTree family now have the virtual column _partition_id. #3089
SELECT _partition_id FROM test.partition_id ORDER BY _partition_id ASC ┌─_partition_id─┐ │ 197004 │ │ 197004 │ │ 197007 │ │ 197007 │ │ 197010 │ │ 197010 │ │ 201809 │ │ 201809 │ │ 201809 │ └───────────────┘ SELECT * FROM test.partition_id ORDER BY d ASC ┌──────────d─┬───x─┐ │ 1970-04-11 │ 1 │ │ 1970-04-11 │ 1 │ │ 1970-07-20 │ 2 │ │ 1970-07-20 │ 2 │ │ 1970-10-28 │ 3 │ │ 1970-10-28 │ 3 │ │ 2018-09-13 │ 100 │ │ 2018-09-14 │ 100 │ │ 2018-09-15 │ 100 │ └────────────┴─────┘
Bug修复:
- 修复了
Dictionary
表的问题(抛出Size of offsets doesn't match size of column
或Unknown compression method
异常)。此错误出现在版本18.10.3中。 #2913 - 修复了合并
CollapsingMergeTree
表时如果其中一个数据部分为空(这些部分是在合并期间形成或者ALTER DELETE
所有数据都已删除)的错误,并且该vertical
算法用于合并。 #3049 - 在修正了比赛条件
DROP
或TRUNCATE
用于Memory
与同时表SELECT
,这可能导致服务器崩溃。此错误出现在1.1.54388版本中。 #3038 - 修复了在返回错误时插入
Replicated
表时数据丢失的可能性Session is expired
(可以通过ReplicatedDataLoss
度量检测到数据丢失)。版本1.1.54378中发生此错误。 #2939 #2949 #2964 - 修复了一段时间内的段错误
JOIN ... ON
。 #3000 - 修复了
WHERE
表达式完全由限定列名称组成时的错误搜索列名称,例如WHERE table.column
。 #2994 - 修复了在执行分布式查询时发生的“未找到列”错误,如果从远程服务器请求包含带有子查询的IN表达式的单个列。 #3087
- 修复了
Block structure mismatch in UNION stream: different number of columns
分布式查询发生的错误,如果其中一个分片是本地分区而另一个分片不是,并且PREWHERE
触发了移动优化。 #2226 #3037 #3055 #3065 #3073 #3090 #3093 - 修复了
pointInPolygon
非凸多边形的某些情况的函数。 #2910 - 修正了
nan
与整数比较时的错误结果。 #3024 - 修复了
zlib-ng
库中可能导致段错误的错误。 #2854 - 修复了插入带有
AggregateFunction
列的表时的内存泄漏,如果聚合函数的状态不简单(单独分配内存),以及单个插入请求是否导致多个小块。 #3084 - 修复了同时创建和删除相同
Buffer
或MergeTree
表格时的竞争条件。 - 修复了比较由某些非平凡类型组成的元组(例如元组)时出现段错误的可能性。 #2989
- 修复了运行某些
ON CLUSTER
查询时出现段错误的可能性。 张冬 - 修复了数组元素
arrayDistinct
函数中的错误Nullable
。 #2845 #2937 - 该
enable_optimize_predicate_expression
选项现在可以正确支持案例SELECT *
。 张冬 - 修复了重新初始化ZooKeeper会话时的段错误。 #2917
- 修复了使用ZooKeeper时潜在的阻塞问题。
- 修复了在a中添加嵌套数据结构的错误代码
SummingMergeTree
。 - 在为聚合函数的状态分配内存时,正确地考虑了对齐,这使得在实现聚合函数的状态时可以使用需要对齐的操作。 晨星-XC
安全修复:
- 安全使用ODBC数据源。与ODBC驱动程序的交互使用单独的
clickhouse-odbc-bridge
进程。第三方ODBC驱动程序中的错误不再导致服务器稳定性或漏洞问题。 #2828 #2879 #2886 #2893 #2921 - 修复了
catBoostPool
表函数中文件路径的错误验证问题。 #2894 - 系统表(的内容
tables
,databases
,parts
,columns
,parts_columns
,merges
,mutations
,replicas
,和replication_queue
)根据用户的配置访问数据库过滤(allow_databases
)。 张冬
向后不兼容的变化:
- 在使用JOIN的请求中,星形字符扩展为所有表中的列列表,符合 SQL 标准。您可以通过
asterisk_left_columns_only
在用户配置级别设置为1 来恢复旧行为。
构建更改:
- 现在大多数集成测试都可以通过提交来运行。
- 代码样式检查也可以通过提交运行。
- 在
memcpy
上CentOS7 / Fedora的建设时,实施正确选择。 Etienne Champetier - 使用clang进行构建时,
-Weverything
除了常规警告外,还添加了一些警告-Wall-Wextra -Werror
。 #2957 - 调试构建使用
jemalloc
调试选项。 - 用于与ZooKeeper交互的库的接口被声明为abstract。 #2950
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:- 解析 Angular 7 的十大特性
- 最全的 Swift 4 新特性解析
- java8新特性Optional深度解析
- java8新特性Optional深度解析
- java8新特性function和lambda深度解析
- npm5 新版功能特性解析及与 yarn 评测对比
本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Hadoop: The Definitive Guide
Tom White / O'Reilly Media, Inc. / 2009 / 44.99
Apache Hadoop is ideal for organizations with a growing need to store and process massive application datasets. Hadoop: The Definitive Guide is a comprehensive resource for using Hadoop to build relia......一起来看看 《Hadoop: The Definitive Guide》 这本书的介绍吧!