PostgreSQL: General Configuration 2
Understanding pg_hba.conf, postgresql.conf, and how to use them to tune our systems.
Hello and welcome. For this video, we are focusing on some in depth tuning of our
PostgreSQL systems. Specifically, we will be dealing with configuration in our
First, we need to take an in depth look at
pg_hba.conf. We already know the
basic idea behind this file. It is the location where client authentication is
handled in PostgreSQL. We use it to define very specifically who can access our
Postgres instance and which databases within that instance they are allowed to
work with specifically. To better understand this, let's take a more in depth look
pg_hba.conf file and try writing some example connections.
video showing the opening of pg_hba.conf
Ok, now let's write some example connections.
# my fake app host my_awesome_app samy 555.0.543.12 md5 local all jane 127.0.0.1 trust hostssl another_app_db irwin 126.96.36.199/80 password hostnossl all all all reject
Good, so what all is going on here? Let's take a look at the first line. Here, we
have our connection type which is
host. This essentially means we will accept
a connection from any TCP/IP type whether it is secured with ssl or not. We then
have the specific database that this connection is allowed to connect to followed
by the username of the user that is allowed to access it. This user would be
created of course within Postgres along with the password which will become
relevant in just a moment. Next we have the IP address of the specific connection
we're dealing with. This is especially relevant because it means our user "samy"
can only access the
my_awesome_app database from the one specific machine with
that ip address and "samy" must use a TCP/IP connection to reach that database.
Lastly we have the authentication method which in this case is
md5. What this
means is that "samy" will have to use a password to access the database and that
password will be encrypted and sent to Postgres in a secured manner.
local all jane 127.0.0.1 trust
On the next line, we are using a connection type local which means any connection
from a Unix type socket. In plain English, it's someone accessing the Postgres
instance from a local network. In this case, the local connection user will be
able to access all databases. The user that is allowed to do all this is "jane"
and she will be coming from this address and will not have to use a password or
any other authentication method to access the database. It is worth noting however
that in production, it is recommended to not use
trust unless absolutely
hostssl another_app_db irwin 188.8.131.52/80 password
Moving to the next line, we are accepting a connection from a TCP/IP connection
that is explicitly secured via SSL. They can only access the
database, the user is "irwin", we have the ip address defined, then we have the
password authentication method. In production, this authentication method is
not recommended because unlike
md5, password sends everything in cleartext.
This is only recommended for use in debugging situations and that is it.
Otherwise, you will want to stick with
md5 when doing a password authentication
hostnossl all all all reject
Lastly we have have a line that looks for TCP/IP connections that are explicitly
not SSL secured. We then apply this to all databases, all users, and all
ip addresses, then we set
reject as the authentication method which does
exactly what you think. This line essentially rejects all non SSL connections from
accessing your Postgres instance.
What we have just seen is not exhaustive of what all we can do in this file.
However, it is a good start. Now it's time to shift our attention to a more
important and detailed file when it comes to performance tuning and that is
So let's open this file and see where to begin.
Excellent, so with all that is going on in this file, where should we begin? Let's start with
listen_addresses. When you first install Postgres, it will only accept
connections from localhost. However, since we have been going through
pg_hba.conf and defined all of the allowed and disallowed connections, we can
now just set this to
Next, let's turn our attention to
max_connections. There is no rocket science to
this one. It defines the maximum number of client connections Postgres can have
at anytime. It is important to know however that many other parameters can be
allocated on a per-client basis so it is good to be mindful of the number of
connections you will be able to have. If you have good hardware, Postgres can
support hundreds of connections but otherwise, you want to use connection pooling
software to reduce the load.
Now let's take a look at
shared_buffers. This setting determines how much
memory will be used for caching data. If the starting number looks low, it's
because there are many older systems such as older Solaris versions that will
require heavy amounts of kernel work before it will work. Even many modern
versions of Linux will not allow you to set this parameter to over 32MB without
adjusting the kernel settings. However, if you're using a more modern variant
of Postgres, you should be okay. The general rule of thumb is if you have a
system with more than 1GB of RAM, then set
shared_buffers to 1/4 of the memory.
If you have less than 1GB, then you'll want to stay around 15% of your memory.
The next setting we want take a look at is
effective_cache_size. To better
understand this setting, we need to understand that Postgres has a built in query
planner that determines the most efficient ways to carry out its operations based
on what resources it has available. Knowing this,
effective_cache_size is not
about setting how much memory will be allocated for cache duties. Instead, it is
about estimating how much memory Postgres may have to work with so the query
planner can better anticipate how to best handle its operations. To do this
properly, we're going to want to take a look at our system resources using a tool
htop with Postgres and any other software running that will be going during
our production run. From there we will want to take a look at how much of our
memory is free. Take this image for example on my own laptop.
display image of htop being run on my laptop
We can see I have about 4.5GB of RAM free. So if in some bizarre universe I were
to run this Postgres instance in production on my laptop, I would base my
effective_cache_size number on this amount of free RAM. In general, you would
want to set
effective_cache_size to 1/2 of this number for a conservative
estimate, or if you need to be more aggressive in this department, set it to 3/4
of the free RAM on your system.
Now that we have handled that, it's time to take a look at
To grasp what is going on here, we need to first understand that when Postgres
writes transactions, it does so in small segment files. However, when a certain
number of these segment files are written, a checkpoint is established which is
essentially a recovery point for your system. It may seem like a good thing, and
for the most part it really is. However, these checkpoints are very system
intensive. As you can see, the default number is 3. This means a checkpoint is
happening with every 3 full segment files being created. This is overboard for
most systems and so we should expand this number. The recommended minimum is at
least 10 but it isn't unusual for many systems to use between 32 and 256. Just
remember that a large setting will use more disk space and take longer to recover
so balance accordingly.
So despite all that we have just covered, we are still only scratching the surface of what all we can tune for better performance in our Postgres systems. If you would like more information, a link will be provided below with more resources for you to dig up whatever you need.
Alex is an independent developer who is obsessed with both performance and information security. When not writing code, Alex is either playing guitar or working in his garden.