Hello and welcome. For this video, we're focusing in on understanding a narrow but very critical aspect of performance tuning with PostgreSQL. Specifically, we're going to be discussing Vacuum and Autovacuum. The commands may be short but they have a major impact on your system and understanding them along with what all is going on under the hood will be mission critical. That being said, let’s get started.


Before we dive into vacuum and autovacuum, we need to first grasp some key data principles that Postgres operates on. Let's start with MVCC. Without getting too deep in the computer science, MVCC is a means for solving a concurrency issue within databases. More specifically, there is an old problem about what happens when data must be modified while others are reading or trying to read that same bit of data. One old means would have been to just lock that bit of data down while it is being read or updated to maintain consistency. However, in a modern application, that would just be unacceptable when you have thousands of transactions happening at any given second. This is where MVCC comes in. Again, MVCC remedies the issue by keeping extra tuples of information with every row anytime there is a transaction so that the data stays consistent where it needs to be and Postgres knows which row should be accessible while not locking things up for someone who is needing to carry out an update on a particular piece of data.


This brings numerous performance advantages, however, this also requires a fair bit of garbage collection over time as this extra data tends to pile up. If it isn't cleaned up periodically, or even often, your system could become overloaded really fast and then you have some real problems on your hands. Now we can see where vacuum comes in. The vacuum command allows you to clean this data up on an as needed basis. Running vacuum is very simple, let's go over to our terminal take a look at an example.

VACUUM your_table

That was rather easy wasn't it? We just issue the vacuum command then select the table. Not bad at all. However, there are more options for us to work with. Let's try something else.

VACUUM your_table a_column

You can also specifically vacuum a column within a table. You can see here where we have chosen our table to vacuum, then we choose the specific column. Not bad at all and really quite easy.

Now let's take a look at another key option in vacuum.


What this does is vacuums absolutely everything. It also rebuilds many of the underlying components to your tables. As a result, this operation will take both more disk space and it will lock your table from having any operations performed on it for the duration of the vacuuming. This is why unless there is a seriously dire circumstance, it is advised that you avoid using this command. Let's take a look at another option that vacuum provides.


This time we're introducing the analyze option. While we won't cover the full details of analyze, it is worth noting that analyze is actually a command unto itself that will update the Postgres query planner on the most efficient way to execute different tasks. Knowing this, when we include analyze as a flag for vacuum we are essentially saying "after this vacuuming is done, analyze the table too and update the query planner". Let's move on to probably the most simple take on vacuum of all.


No, this isn't a joke. You can also just run vacuum by itself. What this will do is vacuum all of the tables and columns. So what makes this different from VACUUM FULL? When we just use VACUUM we only clear out unused data but we do not go rewriting the contents into a new disk file. This means no massive disk overhead and no database locking to contend with.

Now that we have come this far with vacuum it's now time to take a look at its counterpart autovacuum.


To do this, we need to first open postgresql.conf because unlike vacuum, this is a configuration option instead of a SQL command. Let's get this done.

video showing opening of postgresql.conf then uncommenting the autovacuum section

Good, now you can see that by uncommenting this section, we have also just flipped autovacuum on. Now we just need to tune a few things. Let's start here...

autovacuum_vacuum_scale_factor = 0.1

What this does is it determines the percentage of the table size to see changed before carrying out an autovacuum. The default setting may have been 0.2 but we really want 0.1. The idea being that it far better to have frequent small vacuums than less frequent but larger vacuums. The name of the game is to prevent bloat from happening at all. We don’t want to keep purging bloated data because then we would have to use vacuum full and we really don't want to do that. Think of it like this: if you have a very large database with large tables, you may want to make the number smaller, possibly even as low as 0.01.

It's worth knowing that if you have an instance of Postgres already running and it's bloated, running autovacuum will not help you. You will unfortunately need vacuum or vacuum full.

Let's look at our next parameter.

autovacuum_vacuum_threshold = 50

Here, we are determining what is the minimum number of row updates in a table before autovacuum is allowed to kick in. This is particularly useful if you have multiple tables and one is significantly larger than the other. This can help keep from triggering unnecessary vacuums which would hinder performance.

In general, you will be spending some time tuning between autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor trying to find the right balance specific to your system.


So while we have covered a lot regarding vacuum and autovacuum, this still is only scratching the surface. Not to fear though, if you wish to learn more, links will be provided below.