The self-killing web site requested by a customer

栏目: IT技术 · 发布时间: 4年前

I've seen a lot of web sites implemented in less-than-ideal ways. One of them belonged to a customer which had a relentless torrent of incoming click data from all of their installations. They had six web servers sitting behind a load balancer, all writing to a database. The trouble is that while 5 or 6 machines could keep up with the load, 4 or fewer could not. They needed a fix.

When things would go bad, they would get pretty ugly. Let's say everything starts out okay. All six machines are humming along, handling the load. Then something bad happens, and one drops out. Now there are five machines handling the load, and each one has picked up some of the slack. They're now running right at the breaking point. Now another one of them dies. The load now rebalanced to four remaining machines is just far too big, and they all die as a result.

The first couple of times this happened, they called in and asked for their machines to be rebooted. The data center people did exactly that, rolling out a crash cart and rebooting them in sequence. So they'd start rebooting, and would restart Apache in sequence. Once that happened, their load balancer would go "aha!", and start sending traffic to that *one* web server. Unsurprisingly, it would then die. Then the second machine rebooted would come up and get slammed, and would die, and so on.

It finally got to where this whole ridiculous process had to be followed. First, the load balancer needed to be disabled, necessitating a call to the folks who actually managed those devices. Next, all of the web servers had to be rebooted, and any problem in the config had to be fixed. Only then, with all of them healthy, could the load balancer's VIP be re-enabled. This was time-consuming, error-prone, and annoyingly manual.

I heard about all of this and said, gee, most people running web sites want high availability and "heartbeat". What you guys need is to minimize the downtime when it *does* break, and you really need a suicide pact . I described it as a system where each host would monitor its own web server, and would send out a beacon saying it was healthy. Every other system would keep tabs on those beacons and keep track of who was still alive.

If for some reason the entire system dropped below some threshold (call it a minimum, call it quorum, whatever), then every remaining host would purposely kill Apache. Sure, this took the site down, but it meant things would come back up far more quickly. This was possible since only the minimum number of machines had to be rebooted, and the whole "toggle the load balancer" thing didn't have to happen (twice).

We wound up selling it to them as "surge protector" to give a name to its binary, "sp", but everyone really knew what it meant. The customer loved it. They expanded their config enormously over the weeks that followed, and grew to handle much more traffic since they were no longer afraid of what would happen when too much came in.

Sure, they could have rewritten their web site code so it didn't send the machines into a many-GB-deep swap fest. They could have done that without getting any hosting people involved. They didn't, and so now I have a story about the time I purposely designed something to kill a web site with an itchy trigger finger and gladly had a customer pay for it.


以上所述就是小编给大家介绍的《The self-killing web site requested by a customer》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

离散数学及其应用(原书第6版·本科教学版)

离散数学及其应用(原书第6版·本科教学版)

[美] Kenneth H. Rosen / 袁崇义、屈婉玲、张桂芸 / 机械工业出版社 / 2011-11 / 49.00元

《离散数学及其应用》一书是介绍离散数学理论和方法的经典教材,已经成为采用率最高的离散数学教材,仅在美国就被600多所高校用作教材,并获得了极大的成功。第6版在前5版的基础上做了大量的改进,使其成为更有效的教学工具。 本书基于该书第6版进行改编,保留了国内离散数学课程涉及的基本内容,更加适合作为国内高校计算机及相关专业本科生的离散数学课程教材。本书的具体改编情况如下: · 补充了关于范式......一起来看看 《离散数学及其应用(原书第6版·本科教学版)》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具