Intro

Hello and welcome. Today we're focusing on what it takes to better tune our queries in PostgreSQL. For those who have been following this series, you will know that we have seen some in depth ways we can analyze and understand what is going on within our queries. Now we're going to focus on a few things we can do to make them a little more efficient. Let's get started.

Setup

First we need to open PSequel, create our tables, then populate them with some dummy data. Let's get this handled.

CREATE TABLE item (
    id INTEGER,
    description VARCHAR (255)
);

CREATE TABLE garbage (
    id INTEGER,
    garbage VARCHAR (255)
);

INSERT INTO item VALUES (generate_series(1,10000), md5(random()::text));
INSERT INTO garbage VALUES (generate_series(1,10000), md5(random()::text));

Excellent, now let's run a basic query then analyze what is going on.

EXPLAIN ANALYZE SELECT *
FROM item
WHERE
  id > 35 AND id < 4537;
Seq Scan on item  (cost=0.00..234.00 rows=4503 width=37) (actual time=0.026..3.850 rows=4501 loops=1)
  Filter: ((id > 35) AND (id < 4537))
  Rows Removed by Filter: 5499
Planning time: 0.090 ms
Execution time: 4.176 ms

Excellent, now let's take this information then send it over to our favorite tool at explain.depesz.com to visualize this output.

video showing this being done

Good, now that we have this handled let's take a look at what this tool is showing us. We can see that we only have the one node and it's running a little slow. Granted, we are returning 4,501 rows so that can be expected regardless. However, can we make this seemingly simple request a little more efficient? If we look over here, we can see that the PostgreSQL query planner uses a Sequential Scan to carry out this task then applies a filter to get what we are wanting. What this means is that PostgreSQL is quite literally looping through every single row in our item table. Considering that this table has 10,000 rows, this does seem to be a ridiculously blunt force maneuver.

Index

So how can we persuade PostgreSQL to do something more efficient? One mechanism we could use is an index. What this will do is it tells the PostgreSQL query planner to watch a particular column that you know could be useful to help make the queries be handled better. Let's create one now.

CREATE INDEX id_index ON item (id);

Good, that wasn't bad at all. Now let's run our previous explain analyze and see what happens.

Index Scan using id_index on item  (cost=0.29..187.34 rows=4503 width=37) (actual time=0.076..6.323 rows=4501 loops=1)
  Index Cond: ((id > 35) AND (id < 4537))
Planning time: 0.474 ms
Execution time: 7.233 ms

So hang on one minute... we can see that we have indeed performed an Index Scan instead of a Sequential Scan, but our operation just became slower. What gives? Let's run the command a few more times and see if this changes anything.

Index Scan using id_index on item  (cost=0.29..187.34 rows=4503 width=37) (actual time=0.043..1.707 rows=4501 loops=1)
  Index Cond: ((id > 35) AND (id < 4537))
Planning time: 0.213 ms
Execution time: 2.008 ms

Now, we can see that the command is moving drastically faster than our previous Sequential Scan. That is because by telling PostgreSQL to place an index on the id column of our item table, it knows to look for rows and sections that are being called upon more than others so it can better focus on grabbing only what is needed. So in essence, the database grows smarter and more efficient the more the table and specifically the id row is called upon for queries.

It is worth mentioning however that indexes don't only benefit the select command. They also help with update and delete operations. Also, it is important to note that when creating your index, timing is important if you are doing this on a live production system. So why is this?

When is index appropriate?

Let's say you are looking to create an index on a massive table filled with business data that is being acted upon a lot. Creating the index may not seem like a big deal but on a large table such as this, it can take a little while. Not only that... but while users could continue to do read statements, the database will be locked from using insert, update, or delete statements. So if you are looking to enhance your database performance on a massive table, you may need to schedule maintenance downtime during a slow period before creating the index. Another option would be to utilize create index concurrently which will allow you to create your index while other operations are being carried out.

Another thing to note is that when using an index, this does mean that the query planner is having to keep up to date with what is going on in the database at all times. Because it will be keeping more records, this does mean there is an added overhead to any data manipulation operations. In plain English, it means that create, update, and delete will be more taxing on your system now. At least for the operations carried out on this specific table.

So what does this all mean? When should we use an index on our tables? When evaluating whether or not to use an index on your table, here are some basic questions to ask yourself.

  1. Are the vast majority of my table's queries focusing on reading or manipulating existing data?
  2. Is the table a fairly large one?
  3. Are the select statements being called upon frequently registering as being heavy on your system on a frequent basis?
  4. Are the select statements consistently calling upon a Sequential Scan on a frequent basis?

If you answered yes to those questions, then it is very plausible that creating an index is the right call for you. However, there are some more questions to ask yourself.

  1. Does your table see the creation and deletion of a lot of data on a frequent basis?
  2. Is the primary purpose of the table to be used as a repository for data manipulation, creation, and destruction? Is it actually being used in that manner?
  3. Is your table relatively small?
  4. Even if your table is receiving a lot of select queries, are these using a lot of Sequential Scans under the hood?

If you say yes to those things, then creating an index may not be appropriate for your use case scenario on that specific table. This is for you to decide though based on your unique situation and no rule can be applied in a black and white manner when evaluating these decisions.

Conclusion

As we can see, using an index on your database can be a powerful way to speed up your queries. When used with the right scenario for the right table, it can be a huge boost to your system. Thank you and I hope this video has been helpful to you.

Resources

Alex Allen

Alex is an independent developer who is obsessed with both performance and information security. When not writing code, Alex is either playing guitar or working in his garden.

  1. Comments for PostgreSQL: Efficient Queries - Indexing

You must login to comment

You May Also Like