Dead rows in a materialized view

栏目: IT技术 · 发布时间: 4年前

内容简介:In the application that I work on, we have a materialized view. It was created from several joined tables. It’s used to speed up searching for data without joining seven or eight tables in every query. At least it should make searching faster but in practi

In the application that I work on, we have a materialized view. It was created from several joined tables. It’s used to speed up searching for data without joining seven or eight tables in every query. At least it should make searching faster but in practice, it didn’t and I will describe to you why.

Queries to that view were really slow, but only on the production server, on staging everything was fine. That view was exactly the same as on staging server, but queries were ~50 times slower on production.

The first step was to check execution plans for any query on both servers.

EXPLAIN ANALYZE SELECT COUNT(*) FROM 'table_name';

On staging execution plan involved parallel scan, on production index scan.

When I discovered that, the first thing that came to my mind was the possibility of different configurations that disabled parallel queries on production. I checked that and it turned out to be false. My next attempts were to check various Postgres views that kept some data about tables. After looking through many numbers that didn’t help I finally discovered that our table (materialized view actually, but in this context, it makes no difference) on the production server it took 7GB of disc space and on staging only 300MB.

SELECT pg_size_pretty( pg_total_relation_size('table_name') );

I expected that row count had to be totally different but unexpectedly it was ~320k on production and ~310k on staging.

After a really long investigation, I found the reason why those numbers were so different, the table on production contained an enormous number of dead rows.

SELECT n_dead_tup 
FROM pg_stat_all_tables
WHERE relname = 'table_name'

At this time I was almost sure that’s the reason for our whole problem (or at least I hoped so). I started researching those dead rows, what they are, and why they appear. The major defect of that approach is that after every single update on one of the records from joined tables, the whole materialized view is refreshed. Refreshed i.e. whole view is dropped and then it’s building query is run. We have two types of a refresh in PostgreSQL:

  • non concurrently

This refresh type does not produce any dead rows, but for the time of refreshing view is locked, and no data can be read from it. This is an unacceptable solution for my project.

  • concurrently

It allows reading data during the process. It’s possible thanks to duplicating all data before deleting it. During refresh, all SELECT queries see that duplicated data, and after the process, all queries have access to newly created view, and duplicates remain as dead rows. It’s the way how the view is bloated with tons of unnecessary data.

And here comes VACUUM mechanism that is used to remove all dead rows from the table or materialized view.

VACUUM table_name

This command removes all dead queries from the given table, but it has to be run ‘manually’, or by some application code. There is also a mechanism called autovacuum . PostgreSQL has some workers (quantity set in configuration) that all the time search our database and run VACUUM on tables that are in need. What do we know now? We have many dead rows in the materialized view, we also have a mechanism that should clean them. Why isn’t it? In order to check that I added query that logs last auto and vacuum time with dead rows count.

SELECT n_dead_tup, last_autovacuum, last_vacuum 
FROM pg_stat_all_tables
WHERE relname = 'table_name'

After a few days, I checked that logs. The first thing to notice was that the Autovacuum process is working but it’s triggered only in evenings. All dead rows are cleaned and the next day from something like 7 a.m. they appear and it’s count is growing. I checked our search in the morning and indeed it was as fast as it should be, but within a day it was getting much slower. Following really long research why is that happening I found that Autovacuum cannot be run on the table when it’s locked by SHARE UPDATE EXCLUSIVE lock. What locks a table that way? Obviously it’s REFRESH MATERIALIZED VIEW CONCURRENTLY. When that view is refreshed in our application? Other logs added and the answer is: the view is refreshed almost whole time (during a workday from morning to evening). Refresh is as I mentioned triggered by every data update on each of the tables that problematic view is made of. And here comes our final answer. Dead rows aren’t cleaned because Autovacuum cannot be run during refresh and refresh is running continuously from morning to evening.

Possible solutions:

  1. Architectural changes that would prevent an application from refreshing whole materialized view during every data update. This is definitely possible to just update those rows that really changed without dropping and building from the scratch whole table.

    This is definitely the best solution and I would choose that if I had much more time to spend on fixing that problem, but it would be too long to refactor every place in the application that can update one of the included tables.

  2. The second possible fix is to append some breaks between refreshes that AUTOVACUUM process could be fired on our view. Theoretically, it is a fine idea, it would reduce database overload, it should be quite fast to implement, but I think there is too much room for unforeseen consequences and finally, I decided to give up that idea. Even though adding breaks would not take much time to implement, testing it carefully would and as I noticed above there were no more time after that really long investigation

  3. The third and final solution I anticipated was the simplest one. In the job used to refresh the materialized view, I added code that ‘manually’ vacuums view from dead rows. During that whole research on I run vacuum manually many times and I knew it worked. Going this way assures us that there is no possibility that dead rows stay in view because they are vacuumed as soon as refresh (that creates them) ends.

Weighing up the pros and cons I finally chose the third solution. Time was a problem and it was the only one that could be tested immediately. It worked, but I think refactoring the application to not refresh the whole view would be the best option.

We create dedicated software for companies, bring ideas to life and enjoy what we do.

Looking for professional team?


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

大转换

大转换

尼古拉斯·卡尔 / 闫鲜宁、张付国 / 中信 / 2016-2 / 49

1、我们这个时代最清醒的思考者之一尼古拉斯·卡尔继《浅薄》《玻璃笼子》之后又一重磅力作。 2、在这部跨越历史、经济和技术领域的著作中,作者从廉价的电力运营方式对社会变革的深刻影响延伸到互联网对我们生活的这个世界的重构性影响。 3、《快公司》《金融时报》《华尔街日报》联袂推荐 简介 早在2003年,尼古拉斯·卡尔先生发表在《哈佛商业评论》上的一篇文章——IT Doesn't ......一起来看看 《大转换》 这本书的介绍吧!

SHA 加密
SHA 加密

SHA 加密工具

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试