Hello and welcome. For this video, we are focusing on some in depth tuning of our PostgreSQL systems. Specifically, we will be dealing with configuration in our postgresql.conf and pg_hba.conf files.

First, we need to take an in depth look at pg_hba.conf. We already know the basic idea behind this file. It is the location where client authentication is handled in PostgreSQL. We use it to define very specifically who can access our Postgres instance and which databases within that instance they are allowed to work with specifically. To better understand this, let's take a more in depth look at the pg_hba.conf file and try writing some example connections.

video showing the opening of pg_hba.conf

Ok, now let's write some example connections.

# my fake app
host       my_awesome_app  samy   555.0.543.12    md5
local       all             jane       trust
hostssl    another_app_db  irwin    password
hostnossl  all             all    all             reject

Good, so what all is going on here? Let's take a look at the first line. Here, we have our connection type which is host. This essentially means we will accept a connection from any TCP/IP type whether it is secured with ssl or not. We then have the specific database that this connection is allowed to connect to followed by the username of the user that is allowed to access it. This user would be created of course within Postgres along with the password which will become relevant in just a moment. Next we have the IP address of the specific connection we're dealing with. This is especially relevant because it means our user "samy" can only access the my_awesome_app database from the one specific machine with that ip address and "samy" must use a TCP/IP connection to reach that database. Lastly we have the authentication method which in this case is md5. What this means is that "samy" will have to use a password to access the database and that password will be encrypted and sent to Postgres in a secured manner.

local all jane trust On the next line, we are using a connection type local which means any connection from a Unix type socket. In plain English, it's someone accessing the Postgres instance from a local network. In this case, the local connection user will be able to access all databases. The user that is allowed to do all this is "jane" and she will be coming from this address and will not have to use a password or any other authentication method to access the database. It is worth noting however that in production, it is recommended to not use trust unless absolutely necessary.

hostssl another_app_db irwin password Moving to the next line, we are accepting a connection from a TCP/IP connection that is explicitly secured via SSL. They can only access the another_app_db database, the user is "irwin", we have the ip address defined, then we have the password authentication method. In production, this authentication method is not recommended because unlike md5, password sends everything in cleartext. This is only recommended for use in debugging situations and that is it. Otherwise, you will want to stick with md5 when doing a password authentication in production.

hostnossl all all all reject Lastly we have have a line that looks for TCP/IP connections that are explicitly not SSL secured. We then apply this to all databases, all users, and all ip addresses, then we set reject as the authentication method which does exactly what you think. This line essentially rejects all non SSL connections from accessing your Postgres instance.

What we have just seen is not exhaustive of what all we can do in this file. However, it is a good start. Now it's time to shift our attention to a more important and detailed file when it comes to performance tuning and that is postgresql.conf.

So let's open this file and see where to begin.

open postgresql.conf

Excellent, so with all that is going on in this file, where should we begin? Let's start with listen_addresses. When you first install Postgres, it will only accept connections from localhost. However, since we have been going through pg_hba.conf and defined all of the allowed and disallowed connections, we can now just set this to '*'.

Next, let's turn our attention to max_connections. There is no rocket science to this one. It defines the maximum number of client connections Postgres can have at anytime. It is important to know however that many other parameters can be allocated on a per-client basis so it is good to be mindful of the number of connections you will be able to have. If you have good hardware, Postgres can support hundreds of connections but otherwise, you want to use connection pooling software to reduce the load.

Now let's take a look at shared_buffers. This setting determines how much memory will be used for caching data. If the starting number looks low, it's because there are many older systems such as older Solaris versions that will require heavy amounts of kernel work before it will work. Even many modern versions of Linux will not allow you to set this parameter to over 32MB without adjusting the kernel settings. However, if you're using a more modern variant of Postgres, you should be okay. The general rule of thumb is if you have a system with more than 1GB of RAM, then set shared_buffers to 1/4 of the memory. If you have less than 1GB, then you'll want to stay around 15% of your memory.

The next setting we want take a look at is effective_cache_size. To better understand this setting, we need to understand that Postgres has a built in query planner that determines the most efficient ways to carry out its operations based on what resources it has available. Knowing this, effective_cache_size is not about setting how much memory will be allocated for cache duties. Instead, it is about estimating how much memory Postgres may have to work with so the query planner can better anticipate how to best handle its operations. To do this properly, we're going to want to take a look at our system resources using a tool like htop with Postgres and any other software running that will be going during our production run. From there we will want to take a look at how much of our memory is free. Take this image for example on my own laptop.

display image of htop being run on my laptop

We can see I have about 4.5GB of RAM free. So if in some bizarre universe I were to run this Postgres instance in production on my laptop, I would base my effective_cache_size number on this amount of free RAM. In general, you would want to set effective_cache_size to 1/2 of this number for a conservative estimate, or if you need to be more aggressive in this department, set it to 3/4 of the free RAM on your system.

Now that we have handled that, it's time to take a look at checkpoint_segments. To grasp what is going on here, we need to first understand that when Postgres writes transactions, it does so in small segment files. However, when a certain number of these segment files are written, a checkpoint is established which is essentially a recovery point for your system. It may seem like a good thing, and for the most part it really is. However, these checkpoints are very system intensive. As you can see, the default number is 3. This means a checkpoint is happening with every 3 full segment files being created. This is overboard for most systems and so we should expand this number. The recommended minimum is at least 10 but it isn't unusual for many systems to use between 32 and 256. Just remember that a large setting will use more disk space and take longer to recover so balance accordingly.


So despite all that we have just covered, we are still only scratching the surface of what all we can tune for better performance in our Postgres systems. If you would like more information, a link will be provided below with more resources for you to dig up whatever you need.