Ruby on Rails Mistakes that could Kill Your Production Servers

栏目: IT技术 · 发布时间: 4年前

内容简介：In this tutorial, I’ll describe a couple of non-obvious Ruby on Rails mistakes that could bring down your production system. They are so sneaky that they could get past the review process of even more experienced developers. Please don’t ask me how I know

In this tutorial, I’ll describe a couple of non-obvious Ruby on Rails mistakes that could bring down your production system. They are so sneaky that they could get past the review process of even more experienced developers. Please don’t ask me how I know them.

Database transactions in a multithreaded environment

ActiveRecord makes it just to easy to wrap any arbitrary chunk of code in a database transaction. Have a look at the following example:

post = Post.find(params.fetch(:id))

Post.transaction do
  post.update!(params.fetch(:post_params))
  post.user.update!(params.fetch(:user_params))
end

At first glance, it might look correct. We want to update the post and its user but rollback the error to keep the state consistent.

The problem is that by wrapping the code execution in the transaction, we are granting a RowExclusiveLock on all the updated objects until the whole block of code finished running. Any thread trying to update the same post or user will have to idly wait for the previous transaction to finish.

Imagine that the post uses Carrierwave with S3 for media attachments. Updating an image_url attribute will open an HTTP connection, download the asset, and upload it back to AWS S3. This is all happening in a database transaction locking access to those objects for way too long.

If you are working in a Ruby multithreaded environment e.g., with Puma or Sidekiq , a broad database transaction scope is a recipe for introducing deadlocks. Deadlock is two more or database locks blocking each other and not able to continue execution. They usually start popping only with high enough load.

I’ve learned the hard way that code that worked seemingly correct for the last couple of months suddenly started locking. It was not possible to track a recent change to the codebase that introduced the problem.

Overusing ActiveRecord transactions must be avoided. Validations and integrity checks before the actual save should be used instead of relying on Rails to magically rollback the invalid state.

Detecting deadlocks in Rails apps

If your production is already down a or database is sluggish for no apparent reason, you might have deadlocks. Unless you perform long-running background queries, a healthy Rails app database should never lock for more than a couple of seconds at most.

To verify that you can use rails-pg-extras gem. It offers a simple API for displaying current exclusive locks:

RailsPGExtras.locks

Ruby on Rails Mistakes that could Kill Your Production Servers

RailsPGExtras gem displaying exclusive locks in hash format

You can write a simple monitoring script checking if locks are taking too long and should be looked into:

TRESHOLD_SECONDS = 3

long_locks = RailsPGExtras.locks(in_format: :hash).select do |lock|
  Time.parse(lock.fetch("age")).seconds_since_midnight > TRESHOLD_SECONDS
end

raise "Long running locks: #{long_locks}" if long_locks.present?

RailsPGLocks under the hood executes lightweight SQL queries . You can safely run those checks even a dozen times per minute (e.g., in a recurring Sidekiq job) without negatively affecting your database.

Killing deadlocked database connection

You can cherry-pick a stuck transaction to cancel using its PID number by running the following SQL:

ActiveRecord::Base.connection.execute("SELECT pg_cancel_backend(#{pid})")

In case it does not work, you can be a bit more aggressive by killing the connection:

ActiveRecord::Base.connection.execute("SELECT pg_terminate_backend(#{pid})")

Alternatively, you can go for “turn it off and on again ” approach by killing all the active database connections and restarting the Ruby processes.

RailsPGExtras.kill_all

If your production system is in a bad enough state than even initializing the Rails console process is not possible than executing this snippet of raw SQL might do the trick:

SELECT pg_terminate_backend(pid) FROM pg_stat_activity
  WHERE pid <> pg_backend_pid()
  AND query <> '<insufficient privilege>'
  AND datname = current_database();

Kill all the active database connections using raw SQL

Console sandbox mode

Rails console offers a handy --sandbox flag that executes the whole process inside a database transaction and rolls back all the changes when it terminates. It might seem like a perfect idea to test things out on production with this flag on, e.g., before running a complex migration or when testing a new feature with a production dataset. Imagine we want to test out your activity reporting system after marking all the users as recently active:

User.update_all(last_active_at: Time.current)

User::ActivityReport.call(from: 1.day.ago, to: 1.day.from_now)

It might look harmless at first glance. Users are updated using a single SQL query that would not trigger any callbacks and should be fast to execute even on larger collections. “It will just rollback when I close the console. ”

While you are playing around with the report, it’s unfortunately quite probable that your production is already down. Updating any ActiveRecord object in the sandbox mode effectively grants the console process connection a long-lasting RowExclusiveLock on the corresponding database row.

Ruby on Rails Mistakes that could Kill Your Production Servers

RailsPGExtras displaying sandbox mode locks

Any background worker or web server trying to update objects that you “harmlessly” modified in sandbox mode is suspended waiting for the Rails console to release its global lock. Things can get nasty quite fast with idle processes queueing up. In that case, even exiting the console process might be difficult because the transaction could have a hard time doing the correct rollback in a database overloaded with idle connections.

In that scenario, you sometimes must resort to manually killing the hanging connections and restarting the processes.

Unsafe database migrations

Database schema and data migrations are an often cause of Ruby on Rails production systems going down. They are especially tricky to test appropriately because issues usually arise only with big enough dataset and real traffic on the production database. It’s challenging to simulate a realistic usage load in the test or staging environments.

I was planning to describe a couple of risky database migration scenarios and how to prevent them. While doing the research, I’ve realized that the readme of strong_migrations gem does an excellent job of explaining those problems and how to mitigate them. Instead of duplicating the info, I’ll leave the link here for reference.

Exhausting the database connections pool

Balancing the number of web servers and background worker threads with database pool and max connections is not straightforward. Getting the config wrong can result in downtime because threads will not be able to obtain a database connection and will raise the ActiveRecord::ConnectionTimeoutError .

Let’s start be checking out where those different values can be configured:

config/puma.rb

threads_count = ENV.fetch('RAILS_MAX_THREADS') { 3 }.to_i
threads threads_count, threads_count
port        ENV.fetch("PORT") { 3000 }
environment ENV.fetch("RAILS_ENV") { "production" }
workers ENV.fetch('WEB_CONCURRENCY') { 2 }.to_i

preload_app!

on_worker_boot do
  ActiveRecord::Base.establish_connection if defined?(ActiveRecord)
end

plugin :tmp_restart

Puma configuration docs

config/sidekiq.yml

---
:concurrency: 4
:queues:
  - urgent
  - default

Sidekiq advance configuration docs

config/database.yml

default: &default
  adapter: postgresql
  encoding: unicode
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>

Setting the max_connections config depends on your PostgreSQL deployment. To check the current value, you have to run this SQL:

show max_connections;

Now for the magic formula:

To avoid pool exhaustion, you must always use a less or equal number of threads from the Ruby single process than the database pool configured in config/database.yml file. Additionally, the number of threads accessing your database across all the processes must be smaller than the value configured in the max_connections PostgreSQL settings.

Remember that you cannot set max_connection to any arbitrary value. Multiple concurrent connections will increase a load on a database, so its specs must be adjusted accordingly. PGTune can help you with that.

Summary

I hope that reading this blog post will reduce the number of “Production is down” alerts you’ll see. Let me know in the comments below if you know some more interesting ways to kill your production servers.

以上所述就是小编给大家介绍的《Ruby on Rails Mistakes that could Kill Your Production Servers》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

Ruby on Rails Mistakes that could Kill Your Production Servers

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

深入浅出数据分析

Michael Milton / 李芳 / 电子工业出版社 / 2009 / 88.00元

《深入浅出数据分析》以类似“章回小说”的活泼形式，生动地向读者展现优秀的数据分析人员应知应会的技术：数据分析基本步骤、实验方法、最优化方法、假设检验方法、贝叶斯统计方法、主观概率法、启发法、直方图法、回归法、误差处理、相关数据库、数据整理技巧；正文以后，意犹未尽地以三篇附录介绍数据分析十大要务、R工具及ToolPak工具，在充分展现目标知识以外，为读者搭建了走向深入研究的桥梁。本书构思跌宕......一起来看看《深入浅出数据分析》这本书的介绍吧!

码农工具