Saturday, 7 December 2019

Interesting aspects of logical replication at a glance

In PostgreSQL, there is this amazing feature called logical replication, which enables you to replicate a set of table(s) from one server to other. Being a new feature it is less mature and needs many improvements. Nevertheless, it should not limit us enjoying its advantage and so here is a post to highlight some interesting aspects of the feature.

No matter how similar it might sound, it is quite a different thing. It solves a different problem altogether and in some sense it even overcomes the issues of streaming replication. Some of the important aspects of logical replication are as follows,

  • Writes at the secondary
Unlike in streaming replication, here the secondary can serve as a normal server. To be particular, you can perform  inserts and updates at the secondary also. Also, unlike streaming replication we can set up logical replication for one or more tables only. This comes in handy when we want to setup additional servers for load sharing.
  • Schema
The schema is not automatically copied at the secondary once you start the replication. You have to create the tables at the secondary before you start the subscription.
Now in case if there are schema changes in the primary then they will not be replicated via logical replication. To go about such changes, one could pause the replication, make the necessary changes at the secondary and resume the replication then.
  • Attribute comparison
Attributes of a table are matched by name. A different order of columns in the target table is allowed, also the data type can also be different as long as the text representation of the type is same as that the secondary. The target table can have additional columns not provided by the published table. Those will be filled with their default values. This makes it easy to also optimize the schema at the time of migration of your database from one to another environment.
  • Sequences
As of the PostgreSQL version 12, the sequences used in the tables are not replicated to the secondary server. What it means is that the values of the column containing sequence will be copied to the secondary but the next value, max value, etc. of the sequence will remain unaffected at the secondary.

To get around this situation one can just manually increment the next value of the sequence at the secondary to some high value. An interesting thing could be to decide what high value to choose for the purpose. A good guess would be based upon the normal rate of insertions on the required table and the max value at the primary. Important thing is to choose the value such that sequenced column does not conflict with the ones coming from the primary.
  •  Privileges
Currently, the permissions for the tables are checked only at the time of creation of the subscription and never afterwards. So, be careful if some permission changes and there is subscription on that table.
  • Partition tables
As you might be aware that current version of logical replication allows only normal tables to be replicated, so if you have a partition table then the root table can not be replicated. However, you may create a publication for the child tables individually and it will be replicated accordingly.

Saturday, 23 November 2019

Which size fits me?

If you have ever used postgreSQL and wondered about the size of the tables, this post might help. A postgreSQL table is no simple table rather it has many things associated with it like index tables, toast tables, then there are other important stuff like free space map and visibility map. Now postgreSQL provides multiple options for measuring the size of table and there are interesting distinctions among them.

\dt+ table-name

This is one of the most common way to find the size of a table. On your psql client you may simple type it and it will return the information about the mentioned table including its size in kB. Internally, this size is same as
select * from pg_size_pretty(pg_relation_size('table-name')) -- for versions older than 9
select * from pg_size_pretty(pg_table_size('table-name')) -- for later versions
This brings us to our next function.



This is the size of the table which includes the size of its heap, any associated toast table, free space map, and visibility map.  So, this is the actual size of the relation on the disk. Note that this does not include the size of any indexes associated with this table.


This is the measure to provide the disk space used by one particular fork -- main, init, fsm, or vm. Now, when you write pg_relation_size('table-name') it will be same as pg_relation_size('table-name', 'main'). So, if one wants to measure the space by one fork, this is the function to call.

Now, what about if we want to know the table size with all of its indexes, for that we move to this next function.


This is the function which gives you the total size of the relation (as in pg_table_size) plus the size of all the indexes associated with it.

Here is the time to introduce an interesting extension in this regards. 


If you are little more into the table size and how postgreSQL stores it, you might have come across this extension. Basically, after installing this extension when you run
select * from pgstattuple('table-name')
it gives you several metrics relating to the size-metrics, and first one among them is table_len. Now, this table_len is same as the output of \dt+. 

Which one is for you?

Now when you know about different size functions, let's find out which suits your purpose best.

Are you trying to find the bloat size in your table?

pgstattuple would be a good place to get that. It gives you the number of dead tuples, and the size of total dead tuples, free space, etc. so it gives a clearer picture of your table actual and bloat size. There is a lighter version of it also called pgstattuple_approx, with lesser locks and accuracy, it might serve good when you aren't bothered about the last MB.

Are you interested in only finding the size of an index?

An interesting thing to remember here is that postgreSQL indexes are also saved as relations, so you may very well use the pg_relation/table_size with the index name as its argument to get the size of required index only.

Additionally, pgstattuple has pgstatindex function for this. So, you may also use select * from pgstatindex('index-name'). The difference between them would be same as explained above for the corresponding relation functions.

Are you trying to find out the size of only the toast table(s) associated with your table?

If you know the name of the toast table you are interested in, then you may use  any of the above mentioned functions as per the requirement. In case you are struggling about finding the associated toast table, try this,
select relname from pg_class where oid = (select  reltoastrelid from pg_class where relname='table-name');
As a shortcut, the name of your toast table is in the format pg_toast.pg_toast_oid_of_parent_table

Have fun with all your size endeavors!

Friday, 1 March 2019

Using parallelism for queries from PL functions in PostgreSQL 10

Intra-query parallelism was introduced in PostgreSQL in version 9.6. The benefit from the parallel  scans and joins were talked about and significant improvement in the benchmark queries on higher  scale factors were highlighted. However, one area remained devoid of the benefits - queries from
 procedural language functions. Precisely, if you fire a query from a PL/pgSQL function then it can  not use parallel scans or joins for that query, even though the query is capable of using them  otherwise. Have a look at an example yourself,

-- creating and populating the table
create table foo (i int, j int)
insert into foo values (generate_series(1,500), generate_series(1,500));
-- for experimental purposes we are forcing parallelism by setting relevant parameters
set parallel_tuple_cost = 0;
set parallel_setup_cost = 0;
alter table foo set (parallel_workers = 4);
set max_parallel_workers_per_gather = 4;
-- executing the query as an SQL statement
 explain analyse select * from foo where i <= 150; 
 Gather  (cost=0.00..4.56 ...) (actual time=0.217..5.614 ...)
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Seq Scan on foo  (cost=0.00..4.56 ...) (actual time=0.004..0.018 ...)
         Filter: (i <= 150)
         Rows Removed by Filter: 70
-- create a PLpgSQL function to fire the same query from
create or replace function total ()
returns integer as $total$
declare total integer;
     select count(*) from foo into total from foo where i <= 150;
     return total;
language plpgsql;

-- executing the query from a PLpgSQL function in v 9.6
explain analyse select total(); 
Query Text: SELECT count(*) FROM foo where i <=150
Aggregate  (cost=9.25..9.26 ...)
  ->  Seq Scan on foo  (cost=0.00..8.00 ...)
Query Text: explain analyse select total();
Result  (cost=0.00..0.26 ...)
To your relief the feature was then added in  version 10. Have a look,

-- executing the query from a PLpgSQL function in v10
explain analyse select total(); 
Query Text: SELECT count(*) FROM foo where i <=150
Finalize Aggregate  (cost=4.68..4.69 ...)
  ->  Gather  (cost=4.66..4.67 ...)
        Workers Planned: 4
        ->  Partial Aggregate  (cost=4.66..4.67 ...)
              ->  Parallel Seq Scan on foo  (cost=0.00..4.56 ...)
                    Filter: (i <= 150)
This extends the utility of parallel query to more realistic environments wherein queries are fired
through functions and are not simple SQL statements.

Thursday, 1 November 2018

My experience at PGConf Europe 2018

It was my first time at PGConf Europe this year, like many other firsts it was special, hence the blog.

Let's start with some of the basics, PostgreSQL conferences are held in a somewhat regional basis. There are many of them like,  PGConf India, PGConf USA, PGConf Europe, PGConf Asia, and then there are other one day events called PgDays. Coming back to PGConf Europe 2018,  it was organised from 23-26 October in Lisbon Marriott, Lisbon.

My talk 'Parallel Query in PG: how not to (mis)use it?' was scheduled on the first slot of last day. So, I had enough time to analyse and study the audience and prepare accordingly. But, first things first...

The conference started with a one day training session on 22 Oct, one has to buy different tickets for training and conference. You get a free registration for the conference only if you're the speaker. I wasn't part of the training session, hence will not be discussing anything about it. This was my day to rest and try the Portugal cuisine.

The next day was the start of the conference. It was opened by Magnus Hagander covering the logistics and introducing us to the conference halls, etc., must say it was one entertaining start. The next was the keynote by Paul Ramsey. The keynote was my first comprehensive introduction to PostGIS. Further, there was a nice snack buffet arranged in the lobby, and this was my time to know more people, the most exciting part of any conference. I happened to catch Tom Lane!

Henceforth, I was forced to take some difficult decisions like which talk to attend, since there were three parallel sessions going on. There was such a variety of areas covered in the conference and most of them have amazing presentations, that it made me greedy and hate the idea of parallel sessions.

To keep the discussion short, I enjoyed being exposed to some of the new areas and uses of postgres like, challenges of using postgres on cloud, multi-column indexes, pluggable storage, benchmarking,  efficient query planning in latest PG, new and old features of postgres. Cannot go ahead without mentioning a couple of sentences about zheap -- the current project I am part of it. People were excited about it and there were all sorts of curiosities around it.

The conference ended with a set of sponsor talks, I liked the one by Marc Linster. No, not because it was the sponsored talk of EDB but it was truly an awesome talk covering the journey of Postgres in the last two decades. The final closing talk was yet another entertainment by Magnus and Dave Page, and Tomas Vondra in special appearance as Slony, but I could not help seeing him just as one GIANT BLUE elephant. It was really scary! Finally speakers were given a gift and there was one big group selfie.

To end the blog, I must say it was a great experience to find the faces behind those names you see every now and then on hackers-list. The amazing food and nice weather of Lisbon added to the overall fun.

P.S. Here are the slides of my talk.

Wednesday, 31 October 2018

Using parallel sequential scan in PostgreSQL

Parallel sequential scan is the first parallel access method in PostgreSQL and is introduced in version 9.6.  The committer of this feature and my colleague at EnterpriseDB Robert Haas wrote an awesome blog on it, there is another great blog by another PostgreSQL committer and my colleague Amit Kapila. Both of these blogs explain this access method, its design, usage, and related parameters. 

Still, I could not help but notice that there are curiosities around the usage of this access method. Every now and then I could see a complaint saying parallel sequential scan is not getting selected or it is degrading the performance of a query.  So, I decided to write this blog to cater more practical scenarios and specifically focus on its less talked about aspect  -- where parallel sequential scan would (should) not improve the performance.

Before diving into the details of parallel SeqScan, let's first understand the basic infrastructure and terminology related to it in PostgreSQL. The processes that run in parallel and scan the tuples of a relation are called parallel workers or workers in short. There is one special worker namely leader which co-ordinates and collects the output of the scan from each  of the worker. This worker may or may not participate in scanning the relation depending on it's load in dividing and combining processes. End users can also control the involvement of leader in relation scan by GUC parameter parallel_leader_participation, it is a boolean parameter. 

Now, let's understand the concept of parallel scan in PostgreSQL by a simple example.
  • Let there be a table T (a int, b int) containing 100 tuples
  • Let's say we have two workers and one leader,
  • Cost of scanning one tuple is 10
  • Cost of communicating a tuple from worker to leader is 20
  • Cost of dividing the tuples among workers is 30
  • For simplicity, let's assume that leader gives 50 tuples to each of the worker
Now, let's analyse if parallel scan will be faster than non parallel scan,

Cost of SeqScan = 10*100 = 1000
Cost of Parallel SeqScan = 30 + (50 * 10)  + (50 * 20) * 2 = 2530

Here, we can see that though the cost of scanning the tuples is halved yet the cost of combining the total result is enough to make the overall cost of parallel SeqScan higher than non parallel SeqScan.

Now, let's say we want to list only the tuples which have a > 80, and there are only 20 (say) such tuples, then cost of SeqScan will remain same, but cost of parallel SeqScan can be given as,

Cost of Parallel SeqScan = 30 + (50 * 10) + (10 * 20) * 2 =  730

Hence, parallel SeqScan is likely to improve the performance of queries that require scanning a large amount of data but only few of them satisfy the selection criteria. To generalise this,

Cost of SeqScan = Cost of scanning one tuple * number of tuples
Cost of parallel SeqScan = Cost of dividing the work among workers + cost of combining the work
                                            from workers + cost of work done by a worker * number of workers

Let's dive into it a bit more,

Cost of dividing the work among workers is fairly constant depending on the relation size
Cost of combining the the work from workers = cost of communicating the selected tuples from each
                                                                              worker to the leader
Cost of work done by a worker = cost of scanning a tuple * number of tuples the respective worker
                                                      has received

Now, we can see that the cost of combining the work is dependent on the number of tuples received by each worker. Now, for the queries where all or almost all of the tuples are in the final result, we will paying more cost than its non parallel flavour, first in scanning the tuple and second in combining it to the final result.

In PostgreSQL, the cost determining the cost of dividing the work among workers is given as parallel_setup_cost, the cost of communicating the tuple from worker to leader is given by parallel_tuple_cost, and the number of workers is upper bounded by the GUC max_parallel_workers_per_gather.

So, if you are using a system a high frequency multiple processor then lowering the parallel_setup_cost and parallel_tuple_cost will help in selection of parallel scans. If there are not many processes running in parallel, then increasing max_parallel_workers_per_gather can leverage more parallel processes to improve the query performance. Another point to note is that the number of workers is further capped by max_worker_processes.

Tuesday, 31 July 2018

Parallel index-only scan in PostgreSQL

In the previous blog, we saw that parallel index scans leads to significantly improves the performance of quite a few TPC-H queries. It is customary  to analyse if its sister operator, namely index-only scan will benefit similarly when parallelised.

Before getting into that, we will briefly discuss the utility of an index-only scan. In PostgreSQL, indexes are stored at a different location than the tables. For a regular index scan on a table, first the index created on that table is scanned to find the relevant leaf nodes and then the table is scanned for those locations only. Now, if there is some query in which we need only the values of columns which have an index, then we can scan the index tree only and return the required data, since there is nothing extra that we need to retrieve from that table, that type of scan is called index-only scans. To be precise, index-only scans are a special type of access method which uses index alone and does not require to fetch data from the heap.

For example, an index-only scan is likely to show a performance improvement over a regular index scan for the query such as, SELECT count(*) FROM countries WHERE country_area <= <some value>. Assuming we have an index on the column country_area. Here, we can get the tuple information lesser than the required country area by index alone, hence, saving the I/O time to read the tables.

The design and implementation of parallel index-only scan is heavily dependent on the machinery developed for scanning B-tree in parallel. There are no new GUC parameters or configuration settings required for this scan.

Performance of parallel index-only scan

For the industry strength benchmark TPC-H on 300 scale factor, the performance of Q13 is improved by almost 5x with the usage of parallel index-only scan. For this experiment we used TPC-H inspired benchmark for PostgreSQL.  We used 300 scale factor, which gives 300+ GB of database, depending on the available indexes, etc. Additional indexes we created were on columns (l_shipmode, l_shipdate, o_orderdate, o_comment). We tuned following parameters,

  • random_page_cost = 0.1
  • seq_page_cost = 0.1
  • effective_cache_size = 10GB
  • shared_buffers = 10GB
  • work_mem = 1GB

Q13 Query plan on v9.6
 -> Sort  
      Sort Key: (count(*)) DESC, (count(orders.o_orderkey)) DESC
               ->  HashAggregate  
                    Group Key: count(orders.o_orderkey)
                 ->  GroupAggregate  
                      Group Key: customer.c_custkey
                           ->  Merge Left Join  
                                Merge Cond: (customer.c_custkey = orders.o_custkey)
                                 ->  Index Only Scan using customer_pkey on customer
                                ->  Index Scan using idx_orders_custkey on orders
Execution time: 4146177.735 ms

Q13 Query plan on v10
 -> Sort  
      -> HashAggregate  
         Group Key: count(orders.o_orderkey)
          -> Finalize GroupAggregate
              Group Key: customer.c_custkey
                  -> Gather Merge  
                       -> Partial GroupAggregate  
                             Group Key: customer.c_custkey
                               -> Sort  
                                       ->  Parallel Hash Left Join  
                                            Hash Cond: (customer.c_custkey = orders.o_custkey)
                                     -> Parallel Index Only Scan using customer_pkey on customer
                                     ->  Parallel Hash    
                                        -> Parallel Seq Scan on orders
Execution Time: 739088.785 ms

We can see that the query involved aggregations and join over the primary key of customer table. When parallel index-only scan is used, both the aggregations as well as the join could be performed in parallel, hence improving the query performance.

I would like to close this discussion with a thank you note to my colleague Amit Kapila who helped and supported me in the due course of this project. Additionally, I would to thank my employer EnterpriseDB for bestowing me with such an opportunity.

Friday, 13 July 2018

Performance of parallel index scans in PostgreSQL

In this blog will continue the discussion of parallel query in PostgreSQL. In the previous blog of this series by my colleague we learned about parallel index scans, its design in PostgreSQL and the performance improvement achieved for a few queries on the industrial benchmark of TPC-H. Therein we analysed the performance improvement only for a small factor of 20 (database size was approximately 20GB). But the performance benefits realised by parallel operators aren’t that significant until we leverage them for higher scale factors. Hence, in this blog we will analyse the performance of parallel index scans on 300 scale factor.

For this experiment we used TPC-H inspired benchmark for PostgreSQL.  We used 300 scale factor, which gives 300+ GB of database, depending on the available indexes, etc. Additional indexes we created were on columns (l_shipmode, l_shipdate, o_orderdate, o_comment). We tuned following parameters,
  • random_page_cost = 0.1
  • seq_page_cost = 0.1
  • effective_cache_size = 10GB
  • shared_buffers = 10GB
  • work_mem = 1GB
In the table below, we compared the performance of the queries using parallel-index scans on v10 to their performance in v9.6. Note that in v9.6 parallel seq scans and parallel nested loop joins were available. Additionally, we tabulated the total contribution of parallel index-scan for each of the queries. All the values of timing are in seconds.

TPC-H query
On v9.6
On v10
Contribution of PIS

From the above table, it is clear that the benefits of parallel index scans are significant on large amounts of data as well. To get a better understanding of how this scan method benefits the queries in question, we studied their query plans on both versions. On analysing those query plans we found two primary explanations for the benefits attained:

  • Contribution of parallel index scan in total execution time of the query

If the most time-consuming operation in a query is index scan then parallelising it is likely to improve query performance significantly, e.g. Q6 and Q12. To be more precise have a look at the query plan of Q6,

Q6 Query plan on v9.6
 ->  Aggregate  
       ->  Bitmap Heap Scan on lineitem
            ->  Bitmap Index Scan on idx_lineitem_shipdate
Execution time: 882812.376 ms

Q6 Query plan on v10
->  Finalize Aggregate
            ->  Gather  
                 ->  Partial Aggregate
                    -> Parallel Index Scan using idx_lineitem_shipdate on lineitem                          
      Execution Time: 155619.579 ms

Here, scan is the only time consuming operator, hence, parallelising it improves the performance of the query. Similarly for Q12, the index scan is on lineitem which is the largest table of database. Once it is scanned in parallel, the join above it could also be done in parallel and hence the improvement in performance.

On the other hand, the benefits are not so pronounced when the plan is more complex and involves a number of operators with index scan as one of the less significant operator in terms of total execution time of the query.

  • Operators above it in plan tree could also leverage parallelism

In some of the queries, it has been observed that even when the contribution of parallel index scan is not much in total execution time of the query, the benefits are significant, e.g. Q15 and Q17. In such cases the benefits are not coming from parallel index scans alone but from the fact that now more operators could be pushed to workers and hence extend parallelism upto higher levels. To explain this observation, query plan for Q15 is used:

Q15 Query plan on v9.6
  InitPlan 1 (returns $0)
  -> Merge Join  
       Merge Cond: (lineitem.l_suppkey = supplier.s_suppkey)
      -> GroupAggregate  
        Group Key: lineitem.l_suppkey
        -> Sort
            -> Bitmap Heap Scan on lineitem
                  -> Bitmap Index Scan on idx_l_shipdate
            -> Index Scan using supplier_pkey on supplier  
Execution time: 2297222.717 ms

Q15 Query plan on v10
 InitPlan 1 (returns $0)
  -> Nested Loop  
       ->  Finalize GroupAggregate
             Group Key: lineitem.l_suppkey
              ->  Gather Merge  
                   ->  Sort  
                       ->  Partial HashAggregate
                         Group Key: lineitem.l_suppkey
                            ->  Parallel Index Scan using idx_lineitem_shipdate on Lineitem
->  Index Scan using supplier_pkey on supplier
Execution Time: 1133723.985 ms

In this case, with the introduction of parallel index scan the plan structure changed altogether. With parallel index scan, the aggregates are pushed to workers and the join method is also changed, eventually improving query performance. This shows that it may not be necessary for parallel index scan to be the most time consuming operator for it to improve query performance, rather having it may help other operators to use parallelism which could not be done otherwise.

Overall, we saw there are significant benefits of parallel index scans on large volume of data as well. Thus, aiming towards a more OLAP compatible and scalable version of PostgreSQL. Stay tuned to know about more groundbreaking features in PostgreSQL.

Interesting aspects of logical replication at a glance

In PostgreSQL, there is this amazing feature called logical replication , which enables you to replicate a set of table(s) from one server ...