Hello and welcome, today we're learning about performance tuning of our PostgreSQL databases and why it's so important. Whether you're a brand new start up or a large corporation with millions of data entries that are being modified all the time, this information is critical to keeping your systems up and running no matter the circumstances.

So what exactly are we covering? Today, we're going to focus on a quick refresher of PostgreSQL's fundamentals before diving into some basic configuration. As we move along, we will learn how to use benchmarking software to give us a solid metric on how our systems are doing. Next, we will get into some key methods, commands, and other options that we can utilize to drastically improve the performance of our systems. Once we get through with that, we transition to working with VACUUM, AUTOVACUUM, ANALYZE, and EXPLAIN which are methods provided by PostgreSQL's version of the SQL language. We won't just dig into the methods though. We will use these methods to better understand how PostgreSQL works under the hood so we can then have a firm grasp over not just the how, but the why behind our tuning choices.

So knowing this, let's dive into the basics of our general configuration.

Setup and Installation of PostgreSQL

For those who have yet to setup or install PostgreSQL, let's just do a quick install on our systems. From here, you can see we will be working with Mac OS X. However, you can adapt the commands very quickly to whatever operating system you are on. Links will be provided below for different installation procedures needed for different common operating systems. If you can't find yours in the list, don't worry--a quick search will turn up the installation process needed for your system.

video of installation process in Mac

cd /usr/local/pgsql/data
sublime postgresql.conf pg_hba.conf

Excellent, let's first start by taking a look a pg_hba.conf. The basic role of this file is for client authentication. This may look like just a security measure but this also has performance ramifications. The reason is because you're simplifying the job for PostgreSQL of knowing exactly who can access which databases, the connection method, ip address, username, and connection type.

We can see here the basic connection format with which you would establish your database clients. They are also kind enough to give you a thorough breakdown on each aspect of the database client so you can know what your options are and how to best format everything.

Finally, we can see here where we would setup our true configuration setup. For a quick example, let's add an entry for a made up app that would connect to our database.

# my fake app
host    my_awesome_app  samy            555.0.543.12            md5

We won't go into the hardcore details because in the end, this is about performance tuning and is intended as a refresher for those who haven't touched these things in awhile.

Now that we have had our refresher on the pg_hba.conf file, let's switch over to the postgresql.conf file and take a quick look at what is going on in here.

video showing switch to postgresql.conf

Good, if there is one key take away that should be had when it comes to this file, it is that postgresql.conf is the central configuration point for your whole PostgreSQL system. Most of our performance tuning will happen here. In part two of general configuration videos, we will go into more detail of what this has to offer. For now, just know that we can handle a whole plethora of options in here. We can configure file locations, connections, authentication, resource usage, logs, data replication, error reporting, and plenty more.

Now that we have learned a bit about the main 2 files of importance in performance tuning, we need to take a quick look at some tools that are available for benchmarking our database performance. Specifically, we're going to take a quick look at pgbench. This tool comes with PostgreSQL and acts as a simple means for performance testing any of your databases. You don't need to have it full of data because it will create and populate its own tables for you. Lets see it in action

pgbench -i my_database_name

Great, now your output will likely look something like this.

○ → pgbench -i your_database
NOTICE:  table "pgbench_history" does not exist, skipping
NOTICE:  table "pgbench_tellers" does not exist, skipping
NOTICE:  table "pgbench_accounts" does not exist, skipping
NOTICE:  table "pgbench_branches" does not exist, skipping
creating tables...
100000 of 100000 tuples (100%) done (elapsed 0.19 s, remaining 0.00 s)
set primary keys...

Good, now upon quick inspection, you'll see that pgbench created four tables, populated them with dummy data, vacuumed the data and gave us an elapsed time. However, this information isn't really useful. To achieve this, we could use a host of different flags to better simulate our specific situations. Some of these would be...

-s = this multiplies the number of rows generated in each table by the number you
provide to this flag

-c = this establishes the number of concurrent clients in the test.

-d = print the debugging output

-C = establish new connections on each transaction to measure the connection

-j = establishes the number of worker threads to test the helpfulness if you are
on a multi CPU machine. It is key to understand though that the number of clients
*must* be a multiple of the number of threads considering each thread is given
the same number of clients to manage.

There is much more that could be covered regarding PgBench, however for the sake of time, we must stop here. However, links will be provided below for more information regarding Pgbench and how it can benefit your performance tuning.


So we've done a basic introduction to what performance tuning in PostgreSQL involves and in future videos, we will dive more into the details of how to tune our systems for optimal performance.