Sharing SQLite databases across containers is surprisingly brilliant

栏目: IT技术 · 发布时间: 4年前

内容简介:This is a graph of latency in milliseconds. It’s the latency ofThe story starts two and a half years ago, when it seemed like major incidents were happening all the time at Segment. We knew that very soon customers would begin to lose trust. To stop the bl

Sharing an SQLite database across containers is surprisingly brilliant

Jan 7 ·5min read

Sharing SQLite databases across containers is surprisingly brilliant

This is a graph of latency in milliseconds. It’s the latency of Segment ’s streaming pipeline fetching a critical piece of customer-specific configuration. This pipeline often handles north of 500,000 messages per second. Normally when I see a graph like this, it makes me very anxious . How can the exact same work suddenly become over 20X faster?

The story starts two and a half years ago, when it seemed like major incidents were happening all the time at Segment. We knew that very soon customers would begin to lose trust. To stop the bleeding, some extra process¹ was introduced and developers were imbued with a sense of fear of breaking production². We already knew this wasn’t an ideal state, but it was in response to a true existential threat to the business.

The engineering teams were then given a directive. Until there is a reasonable level of certainty — backed by data — that critical service deployments won’t cause a severe incident, engineers would focus on step-by-step architectural and tooling improvements to make it so.

A major reason to take such a bold stance was a pervasive lack of safety. Developers were making a vast number of choices when piecing together their system architecture. Most of their options were mediocre, many would land them in a world of pain, and often only one or two led to Nirvana. Such is the sad state of affairs in modern application development.

Configuration and Consternation

One of these major choices starts with “how do I store the configuration that customers specify in our web app?” This is the stuff that tells the high-throughput streaming pipeline how to treat data on a per-customer basis. We use a specific term for this configuration: control data. Incident after incident pointed to the bespoke architectures used for control data. Engineers really needed a default choice that would always work reliably at scale.

Our control plane is a necessarily complicated beast with layers upon layers of business logic. The data plane has a completely different nature — lean and mean — and it turns out that the data plane only needs a tiny slice of the control plane’s data under management. The mantra became loosen the coupling of the control plane and data plane.

So an admittedly bonkers idea came to me. An idea which didn’t really have a precedent, at least any we could find. What if the control data was actually local to the host? What if it was stored in a file? What if the file was shared by dozens of containers? What if a query meant reading a file? That would mean no service to break or run out of resources. As long as the file is readable, the data is available. No gods, no masters.

Initial reactions were incredulous at best. Just exactly how is a file under constant modification safely shared across dozens of containers, all needing concurrent access? And you’re really going to read this file from completely different pieces of software?

For all practical purposes, there is precisely one solution: SQLite. It turns out that multi-process concurrency is one of the things that separates SQLite from the other best-of-breed embedded databases. It is a practically unique feature — its raison d’être, at least from this perspective.

But containers make us almost instinctively nervous. They have achieved a kind of mythical status now that they are so thoroughly abstracted. Thankfully they mostly boil down to some constraints placed on processes by the kernel and an isolated filesystem with some holes poked in it. Containers are just processes , and SQLite is really good at sharing a database across processes.

Now to answer some questions — does a database on a Docker shared volume in multiple containers even work? Why yes it does. Check. Does this work at all under load? Wait, let me turn on WAL mode . Now it does. ️ Check. Do processes block each other under load? No! Check.

It’s not really a database, man

So we built a special-purpose distributed database around this idea called ctlstore , which was open sourced last year. The graph at the beginning of this post shows the team cutting over reads from a networked database to ctlstore. Now that you know about the SQLite database, the massive drop in latency is probably not all that surprising.

Some practicality trade-offs were made, but even I was a bit astonished at how well it works in practice. Aside from the quirk of sharing SQLite across containers, it delivers extremely well on one core idea: the thematic shift of complexity from the data plane read path to the control plane write path.

The team at Segment has definitely made a dent, but I think this space is ripe for innovation. Brilliant minds have built distributed systems that are amazingly resilient to routine failure. However, if you’re privy to the closely-held details, you probably already know that the major outages which take down the titans of the Internet these days very often involve control data errors or unavailability.

If you’re looking to make strides in the Internet reliability space, this seems to me to be the area for focus. Verifying complex, human-originated configuration and then distributing it to the edges of deeply layered compute infrastructures at scale is very far from a solved problem. There are no stacks or bundles of best practices. And it’s as much a human interface challenge as it is an infrastructure one. Purgamentum init, exit purgamentum.


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

暗时间

暗时间

刘未鹏 / 电子工业出版社 / 2011-7 / 35.00元

2003年,刘未鹏在杂志上发表了自己的第一篇文章,并开始写博客。最初的博客较短,也较琐碎,并夹杂着一些翻译的文章。后来渐渐开始有了一些自己的心得和看法。总体上在这8年里,作者平均每个月写1篇博客或更少,但从未停止。 刘未鹏说—— 写博客这件事情给我最大的体会就是,一件事情如果你能够坚持做8年,那么不管效率和频率多低,最终总能取得一些很可观的收益。而另一个体会就是,一件事情只要你坚持得足......一起来看看 《暗时间》 这本书的介绍吧!

SHA 加密
SHA 加密

SHA 加密工具

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具