内容简介:© Laurenz Albe 2020Most people know that autovacuum is necessary to get rid ofThis article explains the reasons behind that and gives some advice on how to best use the new feature. It will also explain how to achieve similar benefits in older PostgreSQL r
© Laurenz Albe 2020
Most people know that autovacuum is necessary to get rid of dead tuples . These dead tuples are a side effect of PostgreSQL’s MVCC implementation. So many people will be confused when they read that from PostgreSQL v13 on, commit b07642dbc adds support for autovacuuming insert-only tables (also known as “append-only tables”).
This article explains the reasons behind that and gives some advice on how to best use the new feature. It will also explain how to achieve similar benefits in older PostgreSQL releases.
Note that all that I say here about insert-only tables also applies to insert-mostly tables, which are tables that receive only few updates and deletes.
How insert-triggered autovacuum works
From v13 on, PostgreSQL will gather statistics on how many rows were inserted since a table last received a VACUUM
. You can see this new value in the new “ n_ins_since_vacuum
” column of the pg_stat_all_tables
catalog view (and in pg_stat_user_tables
and pg_stat_sys_tables
).
Autovacuum runs on a table whenever that count exceeds a certain value. This value is calculated from the two new parameters “ autovacuum_vacuum_insert_threshold
” (default 1000) and “ autovacuum_vacuum_insert_scale_factor
” (default 0.2) as follows:
insert_threshold + insert_scale_factor * reltuples
where reltuples
is the estimate for the number of rows in the table, taken from the pg_class
catalog.
Like other autovacuum parameters, you can override autovacuum_vacuum_insert_threshold
and autovacuum_vacuum_insert_scale_factor
with storage parameters of the same name for individual tables. You can disable the new feature by setting autovacuum_vacuum_insert_threshold
to -1.
You can use “ toast.autovacuum_vacuum_insert_threshold
” and “ toast.autovacuum_vacuum_insert_scale_factor
” to change the parameters for the associated TOAST
table.
Use case 1: “anti-wraparound” vacuum on insert-only tables
Why do insert-only tables need VACUUM
?
PostgreSQL stores transaction IDs in the xmin
and
xmax
system columns to determine which row version is visible to which query. These transaction IDs are unsigned 4-byte integer values, so after slightly more than 4 billion transactions the counter hits the upper limit. Then it “wraps around” and starts again at 3.
As described inthis blog post, that method would cause data loss after about 2 billion transactions. So old table rows must be “frozen” (marked as unconditionally visible) before that happens. This is one of the many jobs of the autovacuum daemon.
Why anti-wraparound vacuum on insert-only tables can be a problem
The problem is that PostgreSQL only triggers such “anti-wraparound” runs once the oldest unfrozen table row is more than 200 million transactions old. For an insert-only table, this is normally the first time ever that autovacuum runs on a table. There are two potential problems with that:
-
The anti-wraparound vacuum doesn’t get done in time. Then, one million transactions before it would suffer data corruption, PostgreSQL will not accept any new transactions. You have to start it in single-user mode and run
VACUUM
manually. You can find a description of such cases in this and this blog . -
Different from other autovacuum runs, anti-wraparound autovacuum will not give up when it blocks a concurrent transaction. This will block even short operations that require an
ACCESS EXCLUSIVE
lock (like DDL statements on the table). Such a blocked operation will block all other access to the table, and processing comes to a standstill. You can find such a case described in this blog .
How to protect yourself from disruptive anti-wraparound vacuums
From PostgreSQL v13 on, the default settings should already protect you from this problem. This was indeed the motivation behind the new feature.
If it is truly an insert-only table, you should also set the vacuum_freeze_min_age
storage option to 0 for that table, so that autovacuum freezes as many tuples as it can:
ALTER TABLE mytable SET ( vacuum_freeze_min_age = 0 );
With a table that also receives some updates and deletes, you should use a higher value of vacuum_freeze_min_age
to avoid unnecessary I/O.
For PostgreSQL versions older than v13, you can achieve a similar effect by triggering anti-wraparound vacuum earlier, so that it becomes less disruptive. For example, if you want to vacuum a table every 100000 transactions, you can set this additional storage parameter:
ALTER TABLE mytable SET ( autovacuum_freeze_max_age = 100000, vacuum_freeze_min_age = 0 );
Use case 2: index-only scans on insert-only tables
How index-only scans work in PostgreSQL
As mentioned above, each row contains the information for which transactions it is visible. However, the index does not contain this information. Now if you consider an SQL query like this:
SELECT count(*) FROM mytables WHERE id < 100;
where you have an index on id
, all the information you need is available in the index. So you should not need to fetch the actual table row (“heap fetch”), which is the expensive part of an index scan. But unfortunately you have to visit the table row anyway, just to check if the index entry is visible or not.
To work around that, PostgreSQL has a shortcut that makes index-only scans possible: the visibility map . This data structure stores two bits per 8kB table block, one of which indicates if all rows in the block are visible to all transactions. If a query scans an index entry and finds that the block containing the referenced table row is all-visible , it can skip checking visibility for that entry.
So you can have index-only scans in PostgreSQL if most blocks of a table are marked all-visible in the visibility map.
The problem with index-only scans on insert-only tables
Since VACUUM
removes dead tuples, which is required to make a table block all-visible, it is also VACUUM
that updates the visibility map. So to have most blocks all-visible in order to get an index-only scan, VACUUM
needs to run on the table often enough.
Now if a table receives enough UPDATE
s or DELETE
s, you can set autovacuum_vacuum_scale_factor
to a low value like 0.005. Then autovacuum will keep the visibility map in good shape.
But with an insert-only table, it is not as simple to get index-only scans before PostgreSQL v13. One report of a problem related to that is here .
How to get index-only scans on insert-only tables
From PostgreSQL v13 on, all you have to do is to lower autovacuum_vacuum_insert_scale_factor
on the table:
ALTER TABLE mytable SET ( autovacuum_vacuum_insert_scale_factor = 0.005 );
In older PostgreSQL versions, this is more difficult. You have two options:
-
schedule regular
VACUUM
runs withcron
or a different scheduler -
set
autovacuum_freeze_max_age
low for that table, so that autovacuum processes it often enough
Use case 3: hint bits on insert-only tables
In PostgreSQL, the first query that reads a newly created row has to consult the commit log to figure out if the transaction that created the row was committed or not. It then sets a hint bit on the row that persists that information. That way, the first reader saves future readers the effort of checking the commit log.
As a consequence, the first reader of a new row “dirties” (modifies in memory) the block that contains it. If a lot of rows were recently inserted in a table, that can cause a performance hit for the first reader. Therefore, it is considered good practice in PostgreSQL to VACUUM
a table after you insert (or COPY
) a lot of rows into it.
But people don’t always follow that recommendation. Also, if you want to write software that supports several database systems, it is annoying to have to add special cases for individual systems. With the new feature, PostgreSQL automatically vacuums insert-only tables after large inserts, so you have one less thing to worry about.
Future work
During the discussion for the new feature we saw that there is still a lot of room for improvement. Autovacuum is already quite complicated (just look at the many configuration parameters) and still does not do everything right. For example, truly insert-only tables would benefit from freezing rows right away. On the other hand, for tables that receive some updates or deletes as well as for table partitions that don’t live long enough to reach wraparound age, such aggressive freezing can lead to unnecessary I/O activity.
One promising idea Andres Freund propagated was to freeze all tuples in a block whenever the block becomes dirty, that is, has to be written anyway.
The fundamental problem is that autovacuum serves so many different purposes. Basically, it is the silver bullet that should solve all of the problems of PostgreSQL’s MVCC architecture. That is why it is so complicated. However, it would take a major redesign to improve that situation.
Conclusion
While it seems to be an oxymoron at first glance, autovacuum for insert-only tables mitigates several problems that large databases used to suffer from.
In a world where people collect “big data”, it becomes even more important to keep such databases running smoothly. With careful tuning, that was possible even before PostgreSQL v13. But autovacuum is not simple to tune, and many people lack the required knowledge. So it is good to have new autovacuum functionality that takes care of more potential problems automatically.
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
MATLAB在数学建模中的应用
卓金武 编 / 北京航空航天大学 / 2011-4 / 34.80元
《MATLAB在数学建模中的应用》从数学建模的角度介绍了MATLAB的应用。《MATLAB在数学建模中的应用》的4位作者均具有实际的数学建模参赛经历和竞赛指导经验。书中内容完全是根据数学建模竞赛的需要而编排的,涵盖了绝大部分数学建模问题的MATLAB求解方法。 《MATLAB在数学建模中的应用》内容分上下两篇。上篇介绍数学建模中常规方法MATLAB的实现,包括MATLAB交互、数据建模、程序......一起来看看 《MATLAB在数学建模中的应用》 这本书的介绍吧!