Intro

Hello and welcome to our continuing series on performance tuning our PostgreSQL databases. Today we're focusing on a very important method in the PostgreSQL command line up and that is explain. To understand the idea behind explain we first need to gain an understanding about the PostgreSQL query planner. For those who have watched our video about analyze, some of this will be a review. For others, this will be critical information.

The PostgreSQL Query Planner

The query planner's job is to assess the landscape of your database to determine the most efficient ways to complete the tasks that will be demanded of it. It will evaluate the size of your tables, the content and size of the rows, etc. then use that to determine the pathway to carry out the operations. If this seems abstract, that's okay. We will use a concrete example to get this going.

First, let's open PSequel and handle some mundane tasks for getting our database and table up and running.

video showing PSequel open then connecting to Postgres

CREATE TABLE item (
    id INTEGER,
    description VARCHAR (255)
);

INSERT INTO item VALUES (generate_series(1,10000), md5(random()::text));

Excellent. We now have a table called items and it has 10,000 rows filled with ID integers and random string text. Nothing fancy or special. So let's say we want to retrieve all the description data with ids ranging from 35 to 4537. How would PostgreSQL do that under the hood? That is where explain is going to come in. Let's use it now to do just that.

EXPLAIN SELECT *
FROM item
WHERE
  id > 35 AND id < 4537;

Good, now let's take a look at the response we get.

Seq Scan on item  (cost=0.00..2334.00 rows=4534 width=37)
  Filter: ((id > 35) AND (id < 4537))

As you can see, this looks a bit like jibberish at first so we will take this line by line. First, we can see here that PostgreSQL is expecting to perform a Seq Scan on the item table. No need to get over detailed here, the Seq Scan is just a simple way of scanning the items of a table in a sequence. Next we can see this cost variable. The numbers here are just arbitrary values based on the estimated amount of work expected. We can see this range from 0 to 2334, but what are they? The zero is the work cost number expected for the first row returned. In actuality, the number isn't actually zero. It's just so small that it rounds to zero. Then we see 2334 which is the total number expected by the time we have returned the final row. Next you will see we have the rows variable which is just giving an estimate of how many rows the query planner thinks will be returned. Lastly we have this width variable which is just a general estimate of the average size of our rows based on the current data. Lastly we can see this filter being applied to our scan which is using the exact criteria from our WHERE filtering mentioned before.

Now it is important to know that when we run explain, we aren't actually running the following command. If you run explain on an update operation, for example, you won't actually be running the update. This is just an estimate. However, if you would like to run the command along with having explain breakdown what just happened, we would use explain analyze to see just how close our estimate comes to reality. Let's run our same example from earlier but with explain analyze.

EXPLAIN ANALYZE SELECT *
FROM item
WHERE
  id > 35 AND < 4537;
Seq Scan on item  (cost=0.00..234.00 rows=4503 width=37) (actual time=0.026..3.850 rows=4501 loops=1)
  Filter: ((id > 35) AND (id < 4537))
  Rows Removed by Filter: 5499
Planning time: 0.090 ms
Execution time: 4.176 ms

Good. Now you will see that while we do have some of our more familiar information we have some more data added on this time. Let's start with looking at what is familiar. We can see that the query planner did indeed use a Seq Scan however the cost ended up being significantly less than expected. That said, the rows and width were very close to their expected output. Now we have this actual time which is no mystery. It gives us the actual time required to return each of the rows along with the actual rows returned and the number of loops performed during the operation. Next we have our filter line, nothing new here. Next we have these numbers which are fairly obvious and need no explaining.

This being said, it goes without saying that if we start to get into more complex queries, this information can become really tedious to read through really fast. No need to worry though. There are great web tools out there which help with this exact thing. Let's check out a personal favorite of mine called explain.depesz.com. We will just copy and paste our query information into their service and see what gets rendered.

video showing explain.depesz.com in action

Excellent. As you can see, we already have our information laid out in a more efficient manner. The time required for our operation is located here and we can see the number of rows returned along with the number of loops performed. This may not look like much now, but when we need to use more complex queries, this tool will be indispensable. It's also worth noting that we have this text tab which allows us to see just the query text itself, then we have this stats tab which shows us some more individualized data about what is going on in the query.

So knowing this, it's time to see a somewhat more complex query in action. We won't do anything too crazy. We just want to see a bit more of what explain and explain analyze can offer us.

CREATE TABLE garbage (
    id INTEGER,
    garbage VARCHAR (255)
);

INSERT INTO garbage VALUES (generate_series(1,10000), md5(random()::text));

EXPLAIN ANALYZE SELECT description
FROM
  item
WHERE
  id > 30 AND id < 200
UNION
SELECT
  garbage
FROM
  garbage
WHERE
  id < 500
ORDER BY
  description DESC;
Sort  (cost=489.60..491.28 rows=671 width=33) (actual time=15.654..15.798 rows=668 loops=1)
  Sort Key: item.description DESC
  Sort Method: quicksort  Memory: 77kB
  ->  HashAggregate  (cost=451.39..458.10 rows=671 width=33) (actual time=6.689..6.890 rows=668 loops=1)
        Group Key: item.description
        ->  Append  (cost=0.00..449.71 rows=671 width=33) (actual time=0.027..6.214 rows=668 loops=1)
              ->  Seq Scan on item  (cost=0.00..234.00 rows=171 width=33) (actual time=0.026..3.703 rows=169 loops=1)
                    Filter: ((id > 30) AND (id < 200))
                    Rows Removed by Filter: 9831
              ->  Seq Scan on garbage  (cost=0.00..209.00 rows=500 width=33) (actual time=0.011..2.418 rows=499 loops=1)
                    Filter: (id < 500)
                    Rows Removed by Filter: 9501
Planning time: 0.161 ms
Execution time: 15.905 ms

As you can see, our query itself was still mere childsplay compared to what I'm sure many of us have to do on a daily basis with our systems. Yet when we look at this query result... it's not exactly too pleasant to look at. We get this wretched looking candlestick operation and digging into this for performance evaluation is just not going to be too fun. This is where explain.depesz.com should shine. Let's go back over there and see what it gives us this time.

video showing the results

Oh yes, this is drastically better. We can see the times for each chunk of our query, which operations being carried out, and more importantly... now we know which operations are creating the bottleneck so we can start thinking about how to make our query more efficient.

Conclusion

So this wraps up our initial breakdown into explain and what all it offers you in terms of optimizing our PostgreSQL systems. If there is one takeaway you should glean from this video, it is that explain and explain analyze are the heart and soul of your query optimizations. You can run general configurations, and they are extremely important, but this digs into the actual mechanics of the operations your system is actually carrying out.

Resources

Alex Allen

Alex is an independent developer who is obsessed with both performance and information security. When not writing code, Alex is either playing guitar or working in his garden.

  1. Comments for PostgreSQL: Explain

You must login to comment

You May Also Like