Launch HN: QuestDB (YC S20) – Fast open source time series database

栏目: IT技术 · 发布时间: 4年前

内容简介:).It started in 2012 when an energy trading company hired me to rebuild their real-time vessel tracking system. Management wanted me to use a well-known XML database that they had just bought a license for. This option would have required to take down prod
Hey everyone, I’m Vlad and I co-founded QuestDB ( https://questdb.io ) with Nic and Tanc. QuestDB is an open source database for time series, events, and analytical workloads with a primary focus on performance ( https://github.com/questdb/questdb

).

It started in 2012 when an energy trading company hired me to rebuild their real-time vessel tracking system. Management wanted me to use a well-known XML database that they had just bought a license for. This option would have required to take down production for about a week just to ingest the data. And a week downtime was not an option. With no more money to spend on software, I turned to alternatives such as OpenTSDB but they were not a fit for our data model. There was no solution in sight to deliver the project.

Then, I stumbled upon Peter Lawrey’s Java Chronicle library [1]. It loaded the same data in 2 minutes instead of a week using memory-mapped files. Besides the performance aspect, I found it fascinating that such a simple method was solving multiple issues simultaneously: fast write, read can happen even before data is committed to disk, code interacts with memory rather than IO functions, no buffers to copy. Incidentally, this was my first exposure to zero-GC Java.

But there were several issues. First, at the time It didn’t look like the library was going to be maintained. Second, it used Java NIO instead of using the OS API directly. This adds overhead since it creates individual objects with sole purpose to hold a memory address for each memory page. Third, although the NIO allocation API was well documented, the release API was not. It was really easy to run out of memory and hard to manage memory page release. I decided to ditch the XML DB and then started to write a custom storage engine in Java, similar to what Java Chronicle did. This engine used memory mapped files, off-heap memory and a custom query system for geospatial time series. Implementing this was a refreshing experience. I learned more in a few weeks than in years on the job.

Throughout my career, I mostly worked at large companies where developers are “managed” via itemized tasks sent as tickets. There was no room for creativity or initiative. In fact, it was in one’s best interest to follow the ticket's exact instructions, even if it was complete nonsense. I had just been promoted to a managerial role and regretted it after a week. After so much time hoping for a promotion, I immediately wanted to go back to the technical side. I became obsessed with learning new stuff again, particularly in the high performance space.

With some money aside, I left my job and started to work on QuestDB solo. I used Java and a small C layer to interact directly with the OS API without passing through a selector API. Although existing OS API wrappers would have been easier to get started with, the overhead increases complexity and hurts performance. I also wanted the system to be completely GC-free. To do this, I had to build off-heap memory management myself and I could not use off-the-shelf libraries. I had to rewrite many of the standard ones over the years to avoid producing any garbage.

As I had my first kid, I had to take contracting gigs to make ends meet over the following 6 years. All the stuff I had been learning boosted my confidence and I started performing well at interviews. This allowed me to get better paying contracts, I could take fewer jobs and free up more time to work on QuestDB while looking after my family. I would do research during the day and implement this into QuestDB at night. I was constantly looking for the next thing, which would take performance closer to the limits of the hardware.

A year in, I realised that my initial design was actually flawed and that it had to be thrown away. It had no concept of separation between readers and writers and would thus allow dirty reads. Storage was not guaranteed to be contiguous, and pages could be of various non-64-bit-divisible sizes. It was also very much cache-unfriendly, forcing the use of slow row-based reads instead of fast columnar and vectorized ones.Commits were slow, and as individual column files could be committed independently, they left the data open to corruption.

Although this was a setback, I got back to work. I wrote the new engine to allow atomic and durable multi-column commits, provide repeatable read isolation, and for commits to be instantaneous. To do this, I separated transaction files from the data files. This made it possible to commit multiple columns simultaneously as a simple update of the last committed row id. I also made storage dense by removing overlapping memory pages and writing data byte by byte over page edges.

This new approach improved query performance. It made it easy to split data across worker threads and to optimise the CPU pipeline with prefetch. It unlocked column-based execution and additional virtual parallelism with SIMD instruction sets [2] thanks to Agner Fog’s Vector Class Library [3]. It made it possible to implement more recent innovations like our own version of Google SwissTable [4]. I published more details when we released a demo server a few weeks ago on ShowHN [5]. This demo is still available to try online with a pre-loaded dataset of 1.6 billion rows [6]. Although it was hard and discouraging at first, this rewrite turned out to be the second best thing that happened to QuestDB.

The best thing was that people started to contribute to the project. I am really humbled that Tanc and Nic left our previous employer to build QuestDB. A few months later, former colleagues of mine left their stable low-latency jobs at banks to join us. I take this as a huge responsibility and I don’t want to let these guys down. The amount of work ahead gives me headaches and goosebumps at the same time.

QuestDB is deployed in production, including into a large fintech company. We’ve been focusing on building a community to get our first users and gather as much feedback as possible.

Thank you for reading this story - I hope it was interesting. I would love to read your feedback on QuestDB and to answer questions.

[1] https://github.com/peter-lawrey/Java-Chronicle

[2] https://news.ycombinator.com/item?id=22803504

[3] https://www.agner.org/optimize/vectorclass.pdf

[4] https://github.com/questdb/questdb/blob/master/core/src/main...

[5] https://news.ycombinator.com/item?id=23616878

[6] https://try.questdb.io:9000/


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

JavaScript DOM编程艺术

JavaScript DOM编程艺术

Jeremy Keith / 杨涛、王建桥、杨晓云 / 人民邮电出版社 / 2006年12月 / 39.00元

本书讲述了JavaScript和DOM的基础知识,但重点放在DOM编程技术背后的思路和原则:预留退路、循序渐进和以用户为中心等,这些概念对于任何前端Web开发工作都非常重要。本书将这些概念贯穿在书中的所有代码示例中,使你看到用来创建图片库页面的脚本、用来创建动画效果的脚本和用来丰富页面元素呈现效果的脚本,最后结合所讲述的内容创建了一个实际的网站。 本书适合Web设计师和开发人员阅读。一起来看看 《JavaScript DOM编程艺术》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

html转js在线工具
html转js在线工具

html转js在线工具

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试