Using SIMD to aggregate billions of values per second

栏目: IT技术 · 发布时间: 4年前

内容简介:QuestDB 4.2 introduces SIMD instructions, which made our aggregations faster by 100x! QuestDB is available

Using SIMD to aggregate billions of values per second

SIMD instructions are specific CPU instruction sets for arithmetic calculations that use synthetic parallelisation. The parallelisation is synthetic because instead of spreading the work across CPU cores, SIMD performs vector operations on multiple items using a single CPU instruction. In practice, if you were to add 8 numbers together, SIMD does that in 1 operation instead of 8. We get compounded performance improvements by combining SIMD with actual parallelisation and spanning the work across CPUs.

QuestDB 4.2 introduces SIMD instructions, which made our aggregations faster by 100x! QuestDB is available open-source under Apache 2.0 .

As of now, SIMD operations are available for non-keyed aggregation queries, such as select sum(value) from table . In future releases, we will extend these to keyed aggregations, for example select key, sum(value) from table (note the intentional omission of GROUP BY ). This will also result in ultrafast aggregation for time-bucketed queries using SAMPLE BY .

If you like what we do, please consider Using SIMD to aggregate billions of values per second

How fast is it?

To get an idea of how fast aggregations have become, we ran a benchmark against kdb+, which is one of the fastest databases out there. Coincidentally, their new version 4.0 (released a few days ago) introduces performance improvements through implicit parallelism.

We have benchmarked QuestDB against kdb's latest version using 2 different CPUs: the Intel 8850H and the AMD Ryzen 3900X . Both databases were running on 4 threads.

Queries

Test Query (kdb+ 4.0) Query (QuestDB 4.2)
sum of 1Bn doubles
no nulls
zz:1000000000?1000.0
\t sum zz
create table zz as (select rnd_double() d from long_sequence(1000000000));
select sum(d) from zz;
sum of 1Bn ints zz:1000000000?1000i
\t sum zz
create table zz as (select rnd_int() i from long_sequence(1000000000));
select sum(i) from zz;
sum of 1Bn longs zz:1000000000?1000j
\t sum zz
create table zz as (select rnd_long() l from long_sequence(1000000000));
select sum(l) from zz;
max of 1Bn doubles zz:1000000000?1000.0
\t max zz
create table zz as (select rnd_double() d from long_sequence(1000000000));
select max(d) from zz;
max of 1Bn longs zz:1000000000?1000
\t max zz
create table zz as (select rnd_long() l from long_sequence(1000000000));
select max(l) from zz;

Results

Using SIMD to aggregate billions of values per second

Using SIMD to aggregate billions of values per second

The dataset producing the results shown above does not contain NULL values. Interestingly, as soon as the data contains NULL values, kdb+ sum() performance drops while QuestDB sum() query time is unchanged as seen on the chart below.

Test Query (kdb+ 4.0) Query (QuestDB 4.2)
sum of 1Bn doubles
(nulls)
zz:1000000000?1000.0
zz:?[zz<100;0Nf;zz]
\t sum zz
create table zz as (select rnd_double(5) d from long_sequence(1000000000));
select sum(d) from zz;

Using SIMD to aggregate billions of values per second

We can improve this performance further

QuestDB's sum(int) result is 64-bit long, whereas kdb+ sum(int) returns a 32-bit integer (even if the sum overflows). Our approach is currently slightly more complicated as we convert each 32-bit integer to a 64-bit long to avoid overflow. By removing this overhead and more, there is scope left to make our implementation faster in the future.

Perspectives on performance

The execution times outlined above become more interesting once put into context. This is how QuestDB compares to Postgres when doing a sum of 1 billion numbers from a given table select sum(d) from 1G_double_nonNull .

Using SIMD to aggregate billions of values per second

We found that our performance figures are constrained by the available memory channels. Both the 8850H and the 3900X have 2 memory channels, and throwing more than 4 cores at the query above does not improve the performance. On the other hand, if the CPU has more memory channels, then performance scales almost linearly for both kdb+ and QuestDB.

To get an idea of the impact of memory channels, we spun off a m5.metal instance on AWS. This instance has two 24-core Intel 8275CL with 6 memory channels each. Here are the results compared to the 2-channel 3900X:

cpu cores 1 2 3 4 5 6 7 8 9 10 11 12
8275CL 910 605 380 240 193 176 156 148 140 136 133 141
3900X 621 502 381 260 260 260 260 260 260 260 260 260

We plot those results below on the left. On the right-hand side, we normalise the results for each CPU and plot the performance improvement of going from 1 to more cores.

Using SIMD to aggregate billions of values per second

Interestingly, the 2-channel 3900X, is much faster on 1 core than the 8275CL. But it does not scale well and hits a performance ceiling at 4 cores. This is because it only has 2 memory channels that are already saturated. The 6-channel 8275CL allows QuestDB to scale almost linearly as we add more CPU cores and hits a performance ceiling at around 12 cores.

Unfortunately AWS CPUs are hyperthreaded. We could unpack even more performance if CPU were fully isolated to run the computations.

We did not get our hands on CPUs with more memory channels for this test, but if you have easy access to 8 or 12-channel servers and would like to benchmark QuestDB, we'd love to hear the results. You candownload QuestDB and leave a comment on github

What is next?

In further releases, we will roll out this functionality to other parts of our SQL implementation. QuestDB implements SIMD in a generic fashion, which will allow us to continue adding SIMD to about everything our SQL engine does, such as keyed aggregations, indexing etc. We will also keep improving QuestDB's performance. Through some further work on assembly, we estimate that we can gain another 15% speed on these operations. In the meantime, if you want to know exactly how we have achieved this, all of our code is open-source !

About the release: QuestDB 4.2

Summary

We have implemented SIMD-based vector execution of queries, such as select sum(value) from table . This is ~100x faster than non-vector based execution. This is just the beginning as we will introduce vectors to more operations going forward. Try our first implementation in this release - stay tuned for more features in the upcoming releases!

Important

Metadata file format has been changed to include a new flag for columns of type symbol. It is necessary to convert existing tables to new format. Running the following sql: repair table myTable will update the table metadata.

What is new?

  • Java: vectorized sum(), avg(), min(), max() for DOUBLE, LONG, INT
  • Java: select distinct symbol optimisation
  • FreeBSD support
  • Automatically restore data consistency and recover from partial data loss.

What we fixed

  • SQL: NPE when parsing SQL text with malformed table name expression , for example ')', or ', blah'
  • SQL: parsing 'fill' clause in sub-query context was causing unexpected syntax error (#115)
  • SQL: possible internal error when ordering result of group-by or sample-by
  • Data Import: Ignore byte order marks (BOM) in table names created from an imported CSV (#114)
  • SQL: 'timestamp' propagation thru group-by code had issues. sum() was tripping over null values. Added last() aggregate function. (#113)
  • LOG: make service log names consistent on windows (#106)
  • SQL: deal with the following syntax 'select * from select ( select a from ....)'
  • SQL: allow the following syntax 'case cast(x as int) when 1 then ...'
  • fix(griffin): syntax check for "case"-')' overlap, e.g. "a + (case when .. ) end"

以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

技术管理之巅

技术管理之巅

黄哲铿 / 电子工业出版社 / 2015-6 / 49.00元

《技术管理之巅——如何从零打造高质效互联网技术团队?》为您解密国内顶级互联网公司技术团队管理的精髓。作者结合自己十余年在国内知名互联网公司MySteel、1号店等担任PMO总监、技术总监的丰富经验,进行归纳和总结。书中围绕着技术管理中的热点“如何搭建扁平化、去中心化的技术团队”、“大数据下的技术管理创新”、“目标管理方法OKR”、“阿米巴生产模式”、“Scrum和Kanban的实践”逐渐展开,从技......一起来看看 《技术管理之巅》 这本书的介绍吧!

随机密码生成器
随机密码生成器

多种字符组合密码

SHA 加密
SHA 加密

SHA 加密工具

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具