Hello and welcome, today we're learning about performance tuning of our PostgreSQL databases and why it's so important. Whether you're a brand new start up or a large corporation with millions of data entries that are being modified all the time, this information is critical to keeping your systems up and running no matter the circumstances.
So what exactly are we covering? Today, we're going to focus on a quick refresher of PostgreSQL's fundamentals before diving into some basic configuration. As we move along, we will learn how to use benchmarking software to give us a solid metric on how our systems are doing. Next, we will get into some key methods, commands, and other options that we can utilize to drastically improve the performance of our systems. Once we get through with that, we transition to working with VACUUM, AUTOVACUUM, ANALYZE, and EXPLAIN which are methods provided by PostgreSQL's version of the SQL language. We won't just dig into the methods though. We will use these methods to better understand how PostgreSQL works under the hood so we can then have a firm grasp over not just the how, but the why behind our tuning choices.
So knowing this, let's dive into the basics of our general configuration.
Setup and Installation of PostgreSQL
For those who have yet to setup or install PostgreSQL, let's just do a quick install on our systems. From here, you can see we will be working with Mac OS X. However, you can adapt the commands very quickly to whatever operating system you are on. Links will be provided below for different installation procedures needed for different common operating systems. If you can't find yours in the list, don't worry--a quick search will turn up the installation process needed for your system.
video of installation process in Mac
cd /usr/local/pgsql/data sublime postgresql.conf pg_hba.conf
Excellent, let's first start by taking a look a
pg_hba.conf. The basic role of
this file is for client authentication. This may look like just a security
measure but this also has performance ramifications. The reason is because you're
simplifying the job for PostgreSQL of knowing exactly who can access which
databases, the connection method, ip address, username, and connection type.
We can see here the basic connection format with which you would establish your database clients. They are also kind enough to give you a thorough breakdown on each aspect of the database client so you can know what your options are and how to best format everything.
Finally, we can see here where we would setup our true configuration setup. For a quick example, let's add an entry for a made up app that would connect to our database.
# my fake app host my_awesome_app samy 555.0.543.12 md5
We won't go into the hardcore details because in the end, this is about performance tuning and is intended as a refresher for those who haven't touched these things in awhile.
Now that we have had our refresher on the pg_hba.conf file, let's switch over to
postgresql.conf file and take a quick look at what is going on in here.
video showing switch to postgresql.conf
Good, if there is one key take away that should be had when it comes to this
file, it is that
postgresql.conf is the central configuration point for your
whole PostgreSQL system. Most of our performance tuning will happen here. In
part two of general configuration videos, we will go into more detail of what
this has to offer. For now, just know that we can handle a whole plethora of
options in here. We can configure file locations, connections, authentication,
resource usage, logs, data replication, error reporting, and plenty more.
Now that we have learned a bit about the main 2 files of importance in performance
tuning, we need to take a quick look at some tools that are available for
benchmarking our database performance. Specifically, we're going to take a quick
pgbench. This tool comes with PostgreSQL and acts as a simple means for
performance testing any of your databases. You don't need to have it full of data
because it will create and populate its own tables for you. Lets see it in action
pgbench -i my_database_name
Great, now your output will likely look something like this.
○ → pgbench -i your_database NOTICE: table "pgbench_history" does not exist, skipping NOTICE: table "pgbench_tellers" does not exist, skipping NOTICE: table "pgbench_accounts" does not exist, skipping NOTICE: table "pgbench_branches" does not exist, skipping creating tables... 100000 of 100000 tuples (100%) done (elapsed 0.19 s, remaining 0.00 s) vacuum... set primary keys... done.
Good, now upon quick inspection, you'll see that
pgbench created four tables,
populated them with dummy data, vacuumed the data and gave us an elapsed time.
However, this information isn't really useful. To achieve this, we could use a
host of different flags to better simulate our specific situations. Some of these
-s = this multiplies the number of rows generated in each table by the number you provide to this flag -c = this establishes the number of concurrent clients in the test. -d = print the debugging output -C = establish new connections on each transaction to measure the connection overhead -j = establishes the number of worker threads to test the helpfulness if you are on a multi CPU machine. It is key to understand though that the number of clients *must* be a multiple of the number of threads considering each thread is given the same number of clients to manage.
There is much more that could be covered regarding PgBench, however for the sake of time, we must stop here. However, links will be provided below for more information regarding Pgbench and how it can benefit your performance tuning.
So we've done a basic introduction to what performance tuning in PostgreSQL involves and in future videos, we will dive more into the details of how to tune our systems for optimal performance.