Friday 13 July 2018

Performance of parallel index scans in PostgreSQL




In this blog will continue the discussion of parallel query in PostgreSQL. In the previous blog of this series by my colleague we learned about parallel index scans, its design in PostgreSQL and the performance improvement achieved for a few queries on the industrial benchmark of TPC-H. Therein we analysed the performance improvement only for a small factor of 20 (database size was approximately 20GB). But the performance benefits realised by parallel operators aren’t that significant until we leverage them for higher scale factors. Hence, in this blog we will analyse the performance of parallel index scans on 300 scale factor.

For this experiment we used TPC-H inspired benchmark for PostgreSQL.  We used 300 scale factor, which gives 300+ GB of database, depending on the available indexes, etc. Additional indexes we created were on columns (l_shipmode, l_shipdate, o_orderdate, o_comment). We tuned following parameters,
  • random_page_cost = 0.1
  • seq_page_cost = 0.1
  • effective_cache_size = 10GB
  • shared_buffers = 10GB
  • work_mem = 1GB
In the table below, we compared the performance of the queries using parallel-index scans on v10 to their performance in v9.6. Note that in v9.6 parallel seq scans and parallel nested loop joins were available. Additionally, we tabulated the total contribution of parallel index-scan for each of the queries. All the values of timing are in seconds.

TPC-H query
On v9.6
On v10
Contribution of PIS
6
883
155
137
12
740
717
122
14
422
111
53
15
2295
1138
157
17
17724
3376
16

From the above table, it is clear that the benefits of parallel index scans are significant on large amounts of data as well. To get a better understanding of how this scan method benefits the queries in question, we studied their query plans on both versions. On analysing those query plans we found two primary explanations for the benefits attained:

  • Contribution of parallel index scan in total execution time of the query

If the most time-consuming operation in a query is index scan then parallelising it is likely to improve query performance significantly, e.g. Q6 and Q12. To be more precise have a look at the query plan of Q6,

Q6 Query plan on v9.6
———————————
Limit
 ->  Aggregate  
       ->  Bitmap Heap Scan on lineitem
            ->  Bitmap Index Scan on idx_lineitem_shipdate
Execution time: 882812.376 ms

Q6 Query plan on v10
———————————
Limit
->  Finalize Aggregate
            ->  Gather  
                 ->  Partial Aggregate
                    -> Parallel Index Scan using idx_lineitem_shipdate on lineitem                          
      Execution Time: 155619.579 ms

Here, scan is the only time consuming operator, hence, parallelising it improves the performance of the query. Similarly for Q12, the index scan is on lineitem which is the largest table of database. Once it is scanned in parallel, the join above it could also be done in parallel and hence the improvement in performance.

On the other hand, the benefits are not so pronounced when the plan is more complex and involves a number of operators with index scan as one of the less significant operator in terms of total execution time of the query.

  • Operators above it in plan tree could also leverage parallelism

In some of the queries, it has been observed that even when the contribution of parallel index scan is not much in total execution time of the query, the benefits are significant, e.g. Q15 and Q17. In such cases the benefits are not coming from parallel index scans alone but from the fact that now more operators could be pushed to workers and hence extend parallelism upto higher levels. To explain this observation, query plan for Q15 is used:

Q15 Query plan on v9.6
—————————————
Limit
  InitPlan 1 (returns $0)
   
  -> Merge Join  
       Merge Cond: (lineitem.l_suppkey = supplier.s_suppkey)
      -> GroupAggregate  
        Group Key: lineitem.l_suppkey
        -> Sort
            -> Bitmap Heap Scan on lineitem
                  -> Bitmap Index Scan on idx_l_shipdate
            -> Index Scan using supplier_pkey on supplier  
Execution time: 2297222.717 ms

Q15 Query plan on v10
—————————————
Limit
 InitPlan 1 (returns $0)
   
  -> Nested Loop  
       ->  Finalize GroupAggregate
             Group Key: lineitem.l_suppkey
              ->  Gather Merge  
                   ->  Sort  
                       ->  Partial HashAggregate
                         Group Key: lineitem.l_suppkey
                            ->  Parallel Index Scan using idx_lineitem_shipdate on Lineitem
->  Index Scan using supplier_pkey on supplier
Execution Time: 1133723.985 ms

In this case, with the introduction of parallel index scan the plan structure changed altogether. With parallel index scan, the aggregates are pushed to workers and the join method is also changed, eventually improving query performance. This shows that it may not be necessary for parallel index scan to be the most time consuming operator for it to improve query performance, rather having it may help other operators to use parallelism which could not be done otherwise.

Overall, we saw there are significant benefits of parallel index scans on large volume of data as well. Thus, aiming towards a more OLAP compatible and scalable version of PostgreSQL. Stay tuned to know about more groundbreaking features in PostgreSQL.

No comments:

Post a Comment

Learnings from benchmarking PostgreSQL on TPC-H

After making its mark in the OLTP world,  PostgreSQL is moving towards catering the needs of OLAP environment. Hence, the recent advancem...