PostgreSQL: Efficient Queries - Indexing
How to index your tables to make your queries perform far more efficiently.
Hello and welcome. Today we're focusing on what it takes to better tune our queries in PostgreSQL. For those who have been following this series, you will know that we have seen some in depth ways we can analyze and understand what is going on within our queries. Now we're going to focus on a few things we can do to make them a little more efficient. Let's get started.
First we need to open PSequel, create our tables, then populate them with some dummy data. Let's get this handled.
CREATE TABLE item ( id INTEGER, description VARCHAR (255) ); CREATE TABLE garbage ( id INTEGER, garbage VARCHAR (255) ); INSERT INTO item VALUES (generate_series(1,10000), md5(random()::text)); INSERT INTO garbage VALUES (generate_series(1,10000), md5(random()::text));
Excellent, now let's run a basic query then analyze what is going on.
EXPLAIN ANALYZE SELECT * FROM item WHERE id > 35 AND id < 4537;
Seq Scan on item (cost=0.00..234.00 rows=4503 width=37) (actual time=0.026..3.850 rows=4501 loops=1) Filter: ((id > 35) AND (id < 4537)) Rows Removed by Filter: 5499 Planning time: 0.090 ms Execution time: 4.176 ms
Excellent, now let's take this information then send it over to our favorite tool at explain.depesz.com to visualize this output.
video showing this being done
Good, now that we have this handled let's take a look at what this tool is showing us. We can see that we only have the one node and it's running a little slow. Granted, we are returning 4,501 rows so that can be expected regardless. However, can we make this seemingly simple request a little more efficient? If we look over here, we can see that the PostgreSQL query planner uses a Sequential Scan to carry out this task then applies a filter to get what we are wanting. What this means is that PostgreSQL is quite literally looping through every single row in our item table. Considering that this table has 10,000 rows, this does seem to be a ridiculously blunt force maneuver.
So how can we persuade PostgreSQL to do something more efficient? One mechanism
we could use is an
index. What this will do is it tells the PostgreSQL query
planner to watch a particular column that you know could be useful to help make
the queries be handled better. Let's create one now.
CREATE INDEX id_index ON item (id);
Good, that wasn't bad at all. Now let's run our previous
explain analyze and
see what happens.
Index Scan using id_index on item (cost=0.29..187.34 rows=4503 width=37) (actual time=0.076..6.323 rows=4501 loops=1) Index Cond: ((id > 35) AND (id < 4537)) Planning time: 0.474 ms Execution time: 7.233 ms
So hang on one minute... we can see that we have indeed performed an Index Scan instead of a Sequential Scan, but our operation just became slower. What gives? Let's run the command a few more times and see if this changes anything.
Index Scan using id_index on item (cost=0.29..187.34 rows=4503 width=37) (actual time=0.043..1.707 rows=4501 loops=1) Index Cond: ((id > 35) AND (id < 4537)) Planning time: 0.213 ms Execution time: 2.008 ms
Now, we can see that the command is moving drastically faster than our previous
Sequential Scan. That is because by telling PostgreSQL to place an index on the
id column of our
item table, it knows to look for rows and sections that are
being called upon more than others so it can better focus on grabbing only what is
needed. So in essence, the database grows smarter and more efficient the more the
table and specifically the
id row is called upon for queries.
It is worth mentioning however that indexes don't only benefit the
command. They also help with
delete operations. Also, it is
important to note that when creating your index, timing is important if you are
doing this on a live production system. So why is this?
Let's say you are looking to create an index on a massive table filled with
business data that is being acted upon a lot. Creating the index may not seem like
a big deal but on a large table such as this, it can take a little while. Not only
that... but while users could continue to do
read statements, the database will
be locked from using
delete statements. So if you are
looking to enhance your database performance on a massive table, you may need to
schedule maintenance downtime during a slow period before creating the index.
Another option would be to utilize
create index concurrently which will allow
you to create your index while other operations are being carried out.
Another thing to note is that when using an index, this does mean that the query
planner is having to keep up to date with what is going on in the database at all
times. Because it will be keeping more records, this does mean there is an added
overhead to any data manipulation operations. In plain English, it means that
delete will be more taxing on your system now. At least
for the operations carried out on this specific table.
So what does this all mean? When should we use an index on our tables? When evaluating whether or not to use an index on your table, here are some basic questions to ask yourself.
selectstatements being called upon frequently registering as being heavy on your system on a frequent basis?
selectstatements consistently calling upon a Sequential Scan on a frequent basis?
If you answered yes to those questions, then it is very plausible that creating an index is the right call for you. However, there are some more questions to ask yourself.
selectqueries, are these using a lot of Sequential Scans under the hood?
If you say yes to those things, then creating an index may not be appropriate for your use case scenario on that specific table. This is for you to decide though based on your unique situation and no rule can be applied in a black and white manner when evaluating these decisions.
As we can see, using an index on your database can be a powerful way to speed up your queries. When used with the right scenario for the right table, it can be a huge boost to your system. Thank you and I hope this video has been helpful to you.
Alex is an independent developer who is obsessed with both performance and information security. When not writing code, Alex is either playing guitar or working in his garden.