An introduction to dbplanview.com

I've taken over running dbplanview.com.

If you spend any time with a database, you'll eventually find some queries which are inexplicably slow.

The first step to making those queries faster is to know what the database was thinking: this is as easy as running the query with explain in front of it. Or adding explain analyze before the query, to show both the plan and the how long each stage actualy took.

But the output is complicated — a lot happens in parallel — which is why some sort of visualization is needed. dbplanview.com, creates this visualization.

The basics

In this query, we get source_id from route, and use this in a query on source:

travelinedata=> select * from source where source_id = (select source_id from route where description = 'Fen Street - Agora');

If we run the query with explain or explain analyze at the start:

travelinedata=> explain select * from source where source_id = (select source_id from route where description = 'Fen Street - Agora');

The database will output:

                                   QUERY PLAN                                    
---------------------------------------------------------------------------------
 Index Scan using source_pkey on source  (cost=3089.34..3097.35 rows=1 width=36)
   Index Cond: (source_id = $0)
   InitPlan 1 (returns $0)
     ->  Seq Scan on route  (cost=0.00..3089.05 rows=6 width=4)
           Filter: (description = 'Fen Street - Agora'::text)
(5 rows)

Copy-paste into dbplanview.com and click "Timeline", and this timeline will be displayed, with time running from left-to-right:

Visualization of a query plan

Querying route uses a sequential scan, which takes lots of time (it checks each row). Querying source uses an index and takes less time.

The joining line means the output of "seq scan on route" is used in the "index scan on source". It's red because the index scan doesn't start until the first stage has finished.

Mutiple things happen at once

The example above is equaivalent to this join:

travelinedata=> explain select * from route join source using (source_id) where route.description = 'Fen Street - Agora';

Which the database plans like this:

                                   QUERY PLAN                                    
---------------------------------------------------------------------------------
 Nested Loop  (cost=0.29..3138.93 rows=6 width=36)
   ->  Seq Scan on route  (cost=0.00..3089.05 rows=6 width=4)
         Filter: (description = 'Fen Street - Agora'::text)
   ->  Index Scan using source_pkey on source  (cost=0.29..8.30 rows=1 width=36)
         Index Cond: (source_id = route.source_id)
(5 rows)

A query plan where several steps happen in parallel

The output of "seq scan on route" is still used in the "index scan on source", but the index scan doesn't wait. It runs in parallel: as soon as route gets a result, an index scan will be performed.

The index scan probably won't happen immediately (as the diagram sort-of suggests), it happens whenever the sequential scan gives a result. The query planner expects there to be 6 results from "seq scan on route" (so the index scan will probably happen 6 times).

The "nested loop" takes the output of both index and sequential scan and sticks them together. It appears to take a long time, but it's not really the slow bit: it spends most of its time waiting for the other parts of the query to finish.

Making it fast

If you do have slow queries:

Try and rephrase the query to make better use of indexes. \d+ tablename shows the indexes availble. You might wish to add an index, but this will make inserts and updates to that table more costly: think carefully about the ratio of update vs select before adding any index.

The planner treats with as a barrier: the query inside the with will fully complete before the next query starts. For example, select * from route limit 10 will stop reading route once it has 10 rows. But with foo as (select * from route) select * from foo limit 10 will read all of route into memory (as "foo"), then take the first 10.

Cache expensive queries using a materialized view.

Adjust the query planner costs to favour a different approach. One common way is to set local random_page_cost to 0; to make indexes infinitely more favourable than an sequential scan (for the transaction).

That's it

That's all there is to it! Give dbplanview.com a go, to work out where your queries are slow.

An intro­duction to dbplanview.com

The basics

Mutiple things happen at once

Making it fast

That's it

An introduction to dbplanview.com