PostgreSQL: Vacuum / Autovacuum
Eliminate the bloat in your database with vacuum and autovacuum.
Hello and welcome. For this video, we're focusing in on understanding a narrow but very critical aspect of performance tuning with PostgreSQL. Specifically, we're going to be discussing Vacuum and Autovacuum. The commands may be short but they have a major impact on your system and understanding them along with what all is going on under the hood will be mission critical. That being said, let’s get started.
Before we dive into
autovacuum, we need to first grasp some key
data principles that Postgres operates on. Let's start with MVCC. Without getting
too deep in the computer science, MVCC is a means for solving a concurrency issue
within databases. More specifically, there is an old problem about what happens
when data must be modified while others are reading or trying to read that same
bit of data. One old means would have been to just lock that bit of data down
while it is being read or updated to maintain consistency. However, in a modern
application, that would just be unacceptable when you have thousands of
transactions happening at any given second. This is where MVCC comes in. Again,
MVCC remedies the issue by keeping extra
tuples of information with every row anytime there is a transaction so that the
data stays consistent where it needs to be and Postgres knows which row should be
accessible while not locking things up for someone who is needing to carry out an
update on a particular piece of data.
This brings numerous performance advantages, however, this also requires a fair
bit of garbage collection over time as this extra data tends to pile up. If it
isn't cleaned up periodically, or even often, your system could become overloaded
really fast and then you have some real problems on your hands. Now we can see
vacuum comes in. The
vacuum command allows you to clean this data up
on an as needed basis. Running vacuum is very simple, let's go over to our
terminal take a look at an example.
That was rather easy wasn't it? We just issue the
vacuum command then select
the table. Not bad at all. However, there are more options for us to work with.
Let's try something else.
VACUUM your_table a_column
You can also specifically vacuum a column within a table. You can see here where we have chosen our table to vacuum, then we choose the specific column. Not bad at all and really quite easy.
Now let's take a look at another key option in vacuum.
What this does is vacuums absolutely everything. It also rebuilds many of the
underlying components to your tables. As a result, this operation will take both
more disk space and it will lock your table from having any operations performed
on it for the duration of the vacuuming. This is why unless there is a seriously
dire circumstance, it is advised that you avoid using this command. Let's take a
look at another option that
VACUUM ANALYZE your_table
This time we're introducing the
analyze option. While we won't cover the full
analyze, it is worth noting that
analyze is actually
a command unto itself that will update the Postgres query planner on the most
efficient way to execute different tasks. Knowing this, when we include
as a flag for
vacuum we are essentially saying "after this vacuuming is done,
analyze the table too and update the query planner". Let's move on to probably the
most simple take on
vacuum of all.
No, this isn't a joke. You can also just run
vacuum by itself. What this will do
is vacuum all of the tables and columns. So what makes this different from
VACUUM FULL? When we just use
VACUUM we only clear out unused data but we do
not go rewriting the contents into a new disk file. This means no massive disk
overhead and no database locking to contend with.
Now that we have come this far with
vacuum it's now time to take a look at its
To do this, we need to first open
postgresql.conf because unlike
is a configuration option instead of a SQL command. Let's get this done.
video showing opening of postgresql.conf then uncommenting the autovacuum section
Good, now you can see that by uncommenting this section, we have also just flipped
autovacuum on. Now we just need to tune a few things. Let's start here...
autovacuum_vacuum_scale_factor = 0.1
What this does is it determines the percentage of the table size to see changed
before carrying out an autovacuum. The default setting may have been
0.2 but we
0.1. The idea being that it far better to have frequent small
vacuums than less frequent but larger vacuums. The name of the game is to prevent
bloat from happening at all. We don’t want to keep purging bloated data because then
we would have to use
vacuum full and we really don't want to do that. Think of it
like this: if you have a very large database with large tables, you may want to
make the number smaller, possibly even as low as 0.01.
It's worth knowing that if you have an instance of Postgres already running and
it's bloated, running autovacuum will not help you. You will unfortunately need
Let's look at our next parameter.
autovacuum_vacuum_threshold = 50
Here, we are determining what is the minimum number of row updates in a table before autovacuum is allowed to kick in. This is particularly useful if you have multiple tables and one is significantly larger than the other. This can help keep from triggering unnecessary vacuums which would hinder performance.
In general, you will be spending some time tuning between
autovacuum_vacuum_scale_factor trying to find
the right balance specific to your system.
So while we have covered a lot regarding
autovacuum, this still
is only scratching the surface. Not to fear though, if you wish to learn more,
links will be provided below.
Alex is an independent developer who is obsessed with both performance and information security. When not writing code, Alex is either playing guitar or working in his garden.