Tuesday 31 July 2018

Parallel index-only scan in PostgreSQL




In the previous blog, we saw that parallel index scans leads to significantly improves the performance of quite a few TPC-H queries. It is customary  to analyse if its sister operator, namely index-only scan will benefit similarly when parallelised.

Before getting into that, we will briefly discuss the utility of an index-only scan. In PostgreSQL, indexes are stored at a different location than the tables. For a regular index scan on a table, first the index created on that table is scanned to find the relevant leaf nodes and then the table is scanned for those locations only. Now, if there is some query in which we need only the values of columns which have an index, then we can scan the index tree only and return the required data, since there is nothing extra that we need to retrieve from that table, that type of scan is called index-only scans. To be precise, index-only scans are a special type of access method which uses index alone and does not require to fetch data from the heap.

For example, an index-only scan is likely to show a performance improvement over a regular index scan for the query such as, SELECT count(*) FROM countries WHERE country_area <= <some value>. Assuming we have an index on the column country_area. Here, we can get the tuple information lesser than the required country area by index alone, hence, saving the I/O time to read the tables.

The design and implementation of parallel index-only scan is heavily dependent on the machinery developed for scanning B-tree in parallel. There are no new GUC parameters or configuration settings required for this scan.

Performance of parallel index-only scan

For the industry strength benchmark TPC-H on 300 scale factor, the performance of Q13 is improved by almost 5x with the usage of parallel index-only scan. For this experiment we used TPC-H inspired benchmark for PostgreSQL.  We used 300 scale factor, which gives 300+ GB of database, depending on the available indexes, etc. Additional indexes we created were on columns (l_shipmode, l_shipdate, o_orderdate, o_comment). We tuned following parameters,

  • random_page_cost = 0.1
  • seq_page_cost = 0.1
  • effective_cache_size = 10GB
  • shared_buffers = 10GB
  • work_mem = 1GB

Q13 Query plan on v9.6
————————————
Limit  
 -> Sort  
      Sort Key: (count(*)) DESC, (count(orders.o_orderkey)) DESC
               ->  HashAggregate  
                    Group Key: count(orders.o_orderkey)
                 ->  GroupAggregate  
                      Group Key: customer.c_custkey
                           ->  Merge Left Join  
                                Merge Cond: (customer.c_custkey = orders.o_custkey)
                                 ->  Index Only Scan using customer_pkey on customer
                                ->  Index Scan using idx_orders_custkey on orders
Execution time: 4146177.735 ms

Q13 Query plan on v10
————————————
Limit  
 -> Sort  
      -> HashAggregate  
         Group Key: count(orders.o_orderkey)
          -> Finalize GroupAggregate
              Group Key: customer.c_custkey
                  -> Gather Merge  
                       -> Partial GroupAggregate  
                             Group Key: customer.c_custkey
                               -> Sort  
                                       ->  Parallel Hash Left Join  
                                            Hash Cond: (customer.c_custkey = orders.o_custkey)
                                     -> Parallel Index Only Scan using customer_pkey on customer
                                     ->  Parallel Hash    
                                        -> Parallel Seq Scan on orders
Execution Time: 739088.785 ms

We can see that the query involved aggregations and join over the primary key of customer table. When parallel index-only scan is used, both the aggregations as well as the join could be performed in parallel, hence improving the query performance.

I would like to close this discussion with a thank you note to my colleague Amit Kapila who helped and supported me in the due course of this project. Additionally, I would to thank my employer EnterpriseDB for bestowing me with such an opportunity.

Friday 13 July 2018

Performance of parallel index scans in PostgreSQL




In this blog will continue the discussion of parallel query in PostgreSQL. In the previous blog of this series by my colleague we learned about parallel index scans, its design in PostgreSQL and the performance improvement achieved for a few queries on the industrial benchmark of TPC-H. Therein we analysed the performance improvement only for a small factor of 20 (database size was approximately 20GB). But the performance benefits realised by parallel operators aren’t that significant until we leverage them for higher scale factors. Hence, in this blog we will analyse the performance of parallel index scans on 300 scale factor.

For this experiment we used TPC-H inspired benchmark for PostgreSQL.  We used 300 scale factor, which gives 300+ GB of database, depending on the available indexes, etc. Additional indexes we created were on columns (l_shipmode, l_shipdate, o_orderdate, o_comment). We tuned following parameters,
  • random_page_cost = 0.1
  • seq_page_cost = 0.1
  • effective_cache_size = 10GB
  • shared_buffers = 10GB
  • work_mem = 1GB
In the table below, we compared the performance of the queries using parallel-index scans on v10 to their performance in v9.6. Note that in v9.6 parallel seq scans and parallel nested loop joins were available. Additionally, we tabulated the total contribution of parallel index-scan for each of the queries. All the values of timing are in seconds.

TPC-H query
On v9.6
On v10
Contribution of PIS
6
883
155
137
12
740
717
122
14
422
111
53
15
2295
1138
157
17
17724
3376
16

From the above table, it is clear that the benefits of parallel index scans are significant on large amounts of data as well. To get a better understanding of how this scan method benefits the queries in question, we studied their query plans on both versions. On analysing those query plans we found two primary explanations for the benefits attained:

  • Contribution of parallel index scan in total execution time of the query

If the most time-consuming operation in a query is index scan then parallelising it is likely to improve query performance significantly, e.g. Q6 and Q12. To be more precise have a look at the query plan of Q6,

Q6 Query plan on v9.6
———————————
Limit
 ->  Aggregate  
       ->  Bitmap Heap Scan on lineitem
            ->  Bitmap Index Scan on idx_lineitem_shipdate
Execution time: 882812.376 ms

Q6 Query plan on v10
———————————
Limit
->  Finalize Aggregate
            ->  Gather  
                 ->  Partial Aggregate
                    -> Parallel Index Scan using idx_lineitem_shipdate on lineitem                          
      Execution Time: 155619.579 ms

Here, scan is the only time consuming operator, hence, parallelising it improves the performance of the query. Similarly for Q12, the index scan is on lineitem which is the largest table of database. Once it is scanned in parallel, the join above it could also be done in parallel and hence the improvement in performance.

On the other hand, the benefits are not so pronounced when the plan is more complex and involves a number of operators with index scan as one of the less significant operator in terms of total execution time of the query.

  • Operators above it in plan tree could also leverage parallelism

In some of the queries, it has been observed that even when the contribution of parallel index scan is not much in total execution time of the query, the benefits are significant, e.g. Q15 and Q17. In such cases the benefits are not coming from parallel index scans alone but from the fact that now more operators could be pushed to workers and hence extend parallelism upto higher levels. To explain this observation, query plan for Q15 is used:

Q15 Query plan on v9.6
—————————————
Limit
  InitPlan 1 (returns $0)
   
  -> Merge Join  
       Merge Cond: (lineitem.l_suppkey = supplier.s_suppkey)
      -> GroupAggregate  
        Group Key: lineitem.l_suppkey
        -> Sort
            -> Bitmap Heap Scan on lineitem
                  -> Bitmap Index Scan on idx_l_shipdate
            -> Index Scan using supplier_pkey on supplier  
Execution time: 2297222.717 ms

Q15 Query plan on v10
—————————————
Limit
 InitPlan 1 (returns $0)
   
  -> Nested Loop  
       ->  Finalize GroupAggregate
             Group Key: lineitem.l_suppkey
              ->  Gather Merge  
                   ->  Sort  
                       ->  Partial HashAggregate
                         Group Key: lineitem.l_suppkey
                            ->  Parallel Index Scan using idx_lineitem_shipdate on Lineitem
->  Index Scan using supplier_pkey on supplier
Execution Time: 1133723.985 ms

In this case, with the introduction of parallel index scan the plan structure changed altogether. With parallel index scan, the aggregates are pushed to workers and the join method is also changed, eventually improving query performance. This shows that it may not be necessary for parallel index scan to be the most time consuming operator for it to improve query performance, rather having it may help other operators to use parallelism which could not be done otherwise.

Overall, we saw there are significant benefits of parallel index scans on large volume of data as well. Thus, aiming towards a more OLAP compatible and scalable version of PostgreSQL. Stay tuned to know about more groundbreaking features in PostgreSQL.

Learnings from benchmarking PostgreSQL on TPC-H

After making its mark in the OLTP world,  PostgreSQL is moving towards catering the needs of OLAP environment. Hence, the recent advancem...