Thursday, 1 November 2018

My experience at PGConf Europe 2018

It was my first time at PGConf Europe this year, like many other firsts it was special, hence the blog.

Let's start with some of the basics, PostgreSQL conferences are held in a somewhat regional basis. There are many of them like,  PGConf India, PGConf USA, PGConf Europe, PGConf Asia, and then there are other one day events called PgDays. Coming back to PGConf Europe 2018,  it was organised from 23-26 October in Lisbon Marriott, Lisbon.

My talk 'Parallel Query in PG: how not to (mis)use it?' was scheduled on the first slot of last day. So, I had enough time to analyse and study the audience and prepare accordingly. But, first things first...

The conference started with a one day training session on 22 Oct, one has to buy different tickets for training and conference. You get a free registration for the conference only if you're the speaker. I wasn't part of the training session, hence will not be discussing anything about it. This was my day to rest and try the Portugal cuisine.

The next day was the start of the conference. It was opened by Magnus Hagander covering the logistics and introducing us to the conference halls, etc., must say it was one entertaining start. The next was the keynote by Paul Ramsey. The keynote was my first comprehensive introduction to PostGIS. Further, there was a nice snack buffet arranged in the lobby, and this was my time to know more people, the most exciting part of any conference. I happened to catch Tom Lane!

Henceforth, I was forced to take some difficult decisions like which talk to attend, since there were three parallel sessions going on. There was such a variety of areas covered in the conference and most of them have amazing presentations, that it made me greedy and hate the idea of parallel sessions.

To keep the discussion short, I enjoyed being exposed to some of the new areas and uses of postgres like, challenges of using postgres on cloud, multi-column indexes, pluggable storage, benchmarking,  efficient query planning in latest PG, new and old features of postgres. Cannot go ahead without mentioning a couple of sentences about zheap -- the current project I am part of it. People were excited about it and there were all sorts of curiosities around it.

The conference ended with a set of sponsor talks, I liked the one by Marc Linster. No, not because it was the sponsored talk of EDB but it was truly an awesome talk covering the journey of Postgres in the last two decades. The final closing talk was yet another entertainment by Magnus and Dave Page, and Tomas Vondra in special appearance as Slony, but I could not help seeing him just as one GIANT BLUE elephant. It was really scary! Finally speakers were given a gift and there was one big group selfie.

To end the blog, I must say it was a great experience to find the faces behind those names you see every now and then on hackers-list. The amazing food and nice weather of Lisbon added to the overall fun.

P.S. Here are the slides of my talk.

Wednesday, 31 October 2018

Using parallel sequential scan in PostgreSQL

Parallel sequential scan is the first parallel access method in PostgreSQL and is introduced in version 9.6.  The committer of this feature and my colleague at EnterpriseDB Robert Haas wrote an awesome blog on it, there is another great blog by another PostgreSQL committer and my colleague Amit Kapila. Both of these blogs explain this access method, its design, usage, and related parameters. 

Still, I could not help but notice that there are curiosities around the usage of this access method. Every now and then I could see a complaint saying parallel sequential scan is not getting selected or it is degrading the performance of a query.  So, I decided to write this blog to cater more practical scenarios and specifically focus on its less talked about aspect  -- where parallel sequential scan would (should) not improve the performance.

Before diving into the details of parallel SeqScan, let's first understand the basic infrastructure and terminology related to it in PostgreSQL. The processes that run in parallel and scan the tuples of a relation are called parallel workers or workers in short. There is one special worker namely leader which co-ordinates and collects the output of the scan from each  of the worker. This worker may or may not participate in scanning the relation depending on it's load in dividing and combining processes. End users can also control the involvement of leader in relation scan by GUC parameter parallel_leader_participation, it is a boolean parameter. 

Now, let's understand the concept of parallel scan in PostgreSQL by a simple example.
  • Let there be a table T (a int, b int) containing 100 tuples
  • Let's say we have two workers and one leader,
  • Cost of scanning one tuple is 10
  • Cost of communicating a tuple from worker to leader is 20
  • Cost of dividing the tuples among workers is 30
  • For simplicity, let's assume that leader gives 50 tuples to each of the worker
Now, let's analyse if parallel scan will be faster than non parallel scan,

Cost of SeqScan = 10*100 = 1000
Cost of Parallel SeqScan = 30 + (50 * 10)  + (50 * 20) * 2 = 2530

Here, we can see that though the cost of scanning the tuples is halved yet the cost of combining the total result is enough to make the overall cost of parallel SeqScan higher than non parallel SeqScan.

Now, let's say we want to list only the tuples which have a > 80, and there are only 20 (say) such tuples, then cost of SeqScan will remain same, but cost of parallel SeqScan can be given as,

Cost of Parallel SeqScan = 30 + (50 * 10) + (10 * 20) * 2 =  730

Hence, parallel SeqScan is likely to improve the performance of queries that require scanning a large amount of data but only few of them satisfy the selection criteria. To generalise this,

Cost of SeqScan = Cost of scanning one tuple * number of tuples
Cost of parallel SeqScan = Cost of dividing the work among workers + cost of combining the work
                                            from workers + cost of work done by a worker * number of workers

Let's dive into it a bit more,

Cost of dividing the work among workers is fairly constant depending on the relation size
Cost of combining the the work from workers = cost of communicating the selected tuples from each
                                                                              worker to the leader
Cost of work done by a worker = cost of scanning a tuple * number of tuples the respective worker
                                                      has received

Now, we can see that the cost of combining the work is dependent on the number of tuples received by each worker. Now, for the queries where all or almost all of the tuples are in the final result, we will paying more cost than its non parallel flavour, first in scanning the tuple and second in combining it to the final result.

In PostgreSQL, the cost determining the cost of dividing the work among workers is given as parallel_setup_cost, the cost of communicating the tuple from worker to leader is given by parallel_tuple_cost, and the number of workers is upper bounded by the GUC max_parallel_workers_per_gather.

So, if you are using a system a high frequency multiple processor then lowering the parallel_setup_cost and parallel_tuple_cost will help in selection of parallel scans. If there are not many processes running in parallel, then increasing max_parallel_workers_per_gather can leverage more parallel processes to improve the query performance. Another point to note is that the number of workers is further capped by max_worker_processes.

Tuesday, 31 July 2018

Parallel index-only scan in PostgreSQL

In the previous blog, we saw that parallel index scans leads to significantly improves the performance of quite a few TPC-H queries. It is customary  to analyse if its sister operator, namely index-only scan will benefit similarly when parallelised.

Before getting into that, we will briefly discuss the utility of an index-only scan. In PostgreSQL, indexes are stored at a different location than the tables. For a regular index scan on a table, first the index created on that table is scanned to find the relevant leaf nodes and then the table is scanned for those locations only. Now, if there is some query in which we need only the values of columns which have an index, then we can scan the index tree only and return the required data, since there is nothing extra that we need to retrieve from that table, that type of scan is called index-only scans. To be precise, index-only scans are a special type of access method which uses index alone and does not require to fetch data from the heap.

For example, an index-only scan is likely to show a performance improvement over a regular index scan for the query such as, SELECT count(*) FROM countries WHERE country_area <= <some value>. Assuming we have an index on the column country_area. Here, we can get the tuple information lesser than the required country area by index alone, hence, saving the I/O time to read the tables.

The design and implementation of parallel index-only scan is heavily dependent on the machinery developed for scanning B-tree in parallel. There are no new GUC parameters or configuration settings required for this scan.

Performance of parallel index-only scan

For the industry strength benchmark TPC-H on 300 scale factor, the performance of Q13 is improved by almost 5x with the usage of parallel index-only scan. For this experiment we used TPC-H inspired benchmark for PostgreSQL.  We used 300 scale factor, which gives 300+ GB of database, depending on the available indexes, etc. Additional indexes we created were on columns (l_shipmode, l_shipdate, o_orderdate, o_comment). We tuned following parameters,

  • random_page_cost = 0.1
  • seq_page_cost = 0.1
  • effective_cache_size = 10GB
  • shared_buffers = 10GB
  • work_mem = 1GB

Q13 Query plan on v9.6
 -> Sort  
      Sort Key: (count(*)) DESC, (count(orders.o_orderkey)) DESC
               ->  HashAggregate  
                    Group Key: count(orders.o_orderkey)
                 ->  GroupAggregate  
                      Group Key: customer.c_custkey
                           ->  Merge Left Join  
                                Merge Cond: (customer.c_custkey = orders.o_custkey)
                                 ->  Index Only Scan using customer_pkey on customer
                                ->  Index Scan using idx_orders_custkey on orders
Execution time: 4146177.735 ms

Q13 Query plan on v10
 -> Sort  
      -> HashAggregate  
         Group Key: count(orders.o_orderkey)
          -> Finalize GroupAggregate
              Group Key: customer.c_custkey
                  -> Gather Merge  
                       -> Partial GroupAggregate  
                             Group Key: customer.c_custkey
                               -> Sort  
                                       ->  Parallel Hash Left Join  
                                            Hash Cond: (customer.c_custkey = orders.o_custkey)
                                     -> Parallel Index Only Scan using customer_pkey on customer
                                     ->  Parallel Hash    
                                        -> Parallel Seq Scan on orders
Execution Time: 739088.785 ms

We can see that the query involved aggregations and join over the primary key of customer table. When parallel index-only scan is used, both the aggregations as well as the join could be performed in parallel, hence improving the query performance.

I would like to close this discussion with a thank you note to my colleague Amit Kapila who helped and supported me in the due course of this project. Additionally, I would to thank my employer EnterpriseDB for bestowing me with such an opportunity.

Friday, 13 July 2018

Performance of parallel index scans in PostgreSQL

In this blog will continue the discussion of parallel query in PostgreSQL. In the previous blog of this series by my colleague we learned about parallel index scans, its design in PostgreSQL and the performance improvement achieved for a few queries on the industrial benchmark of TPC-H. Therein we analysed the performance improvement only for a small factor of 20 (database size was approximately 20GB). But the performance benefits realised by parallel operators aren’t that significant until we leverage them for higher scale factors. Hence, in this blog we will analyse the performance of parallel index scans on 300 scale factor.

For this experiment we used TPC-H inspired benchmark for PostgreSQL.  We used 300 scale factor, which gives 300+ GB of database, depending on the available indexes, etc. Additional indexes we created were on columns (l_shipmode, l_shipdate, o_orderdate, o_comment). We tuned following parameters,
  • random_page_cost = 0.1
  • seq_page_cost = 0.1
  • effective_cache_size = 10GB
  • shared_buffers = 10GB
  • work_mem = 1GB
In the table below, we compared the performance of the queries using parallel-index scans on v10 to their performance in v9.6. Note that in v9.6 parallel seq scans and parallel nested loop joins were available. Additionally, we tabulated the total contribution of parallel index-scan for each of the queries. All the values of timing are in seconds.

TPC-H query
On v9.6
On v10
Contribution of PIS

From the above table, it is clear that the benefits of parallel index scans are significant on large amounts of data as well. To get a better understanding of how this scan method benefits the queries in question, we studied their query plans on both versions. On analysing those query plans we found two primary explanations for the benefits attained:

  • Contribution of parallel index scan in total execution time of the query

If the most time-consuming operation in a query is index scan then parallelising it is likely to improve query performance significantly, e.g. Q6 and Q12. To be more precise have a look at the query plan of Q6,

Q6 Query plan on v9.6
 ->  Aggregate  
       ->  Bitmap Heap Scan on lineitem
            ->  Bitmap Index Scan on idx_lineitem_shipdate
Execution time: 882812.376 ms

Q6 Query plan on v10
->  Finalize Aggregate
            ->  Gather  
                 ->  Partial Aggregate
                    -> Parallel Index Scan using idx_lineitem_shipdate on lineitem                          
      Execution Time: 155619.579 ms

Here, scan is the only time consuming operator, hence, parallelising it improves the performance of the query. Similarly for Q12, the index scan is on lineitem which is the largest table of database. Once it is scanned in parallel, the join above it could also be done in parallel and hence the improvement in performance.

On the other hand, the benefits are not so pronounced when the plan is more complex and involves a number of operators with index scan as one of the less significant operator in terms of total execution time of the query.

  • Operators above it in plan tree could also leverage parallelism

In some of the queries, it has been observed that even when the contribution of parallel index scan is not much in total execution time of the query, the benefits are significant, e.g. Q15 and Q17. In such cases the benefits are not coming from parallel index scans alone but from the fact that now more operators could be pushed to workers and hence extend parallelism upto higher levels. To explain this observation, query plan for Q15 is used:

Q15 Query plan on v9.6
  InitPlan 1 (returns $0)
  -> Merge Join  
       Merge Cond: (lineitem.l_suppkey = supplier.s_suppkey)
      -> GroupAggregate  
        Group Key: lineitem.l_suppkey
        -> Sort
            -> Bitmap Heap Scan on lineitem
                  -> Bitmap Index Scan on idx_l_shipdate
            -> Index Scan using supplier_pkey on supplier  
Execution time: 2297222.717 ms

Q15 Query plan on v10
 InitPlan 1 (returns $0)
  -> Nested Loop  
       ->  Finalize GroupAggregate
             Group Key: lineitem.l_suppkey
              ->  Gather Merge  
                   ->  Sort  
                       ->  Partial HashAggregate
                         Group Key: lineitem.l_suppkey
                            ->  Parallel Index Scan using idx_lineitem_shipdate on Lineitem
->  Index Scan using supplier_pkey on supplier
Execution Time: 1133723.985 ms

In this case, with the introduction of parallel index scan the plan structure changed altogether. With parallel index scan, the aggregates are pushed to workers and the join method is also changed, eventually improving query performance. This shows that it may not be necessary for parallel index scan to be the most time consuming operator for it to improve query performance, rather having it may help other operators to use parallelism which could not be done otherwise.

Overall, we saw there are significant benefits of parallel index scans on large volume of data as well. Thus, aiming towards a more OLAP compatible and scalable version of PostgreSQL. Stay tuned to know about more groundbreaking features in PostgreSQL.

My experience at PGConf Europe 2018

It was my first time at PGConf Europe this year, like many other firsts it was special, hence the blog. Let's start with some ...