Friday, 1 March 2019

Using parallelism for queries from PL functions in PostgreSQL 10

Intra-query parallelism was introduced in PostgreSQL in version 9.6. The benefit from the parallel  scans and joins were talked about and significant improvement in the benchmark queries on higher  scale factors were highlighted. However, one area remained devoid of the benefits - queries from
 procedural language functions. Precisely, if you fire a query from a PL/pgSQL function then it can  not use parallel scans or joins for that query, even though the query is capable of using them  otherwise. Have a look at an example yourself,

-- creating and populating the table
create table foo (i int, j int)
insert into foo values (generate_series(1,500), generate_series(1,500));
-- for experimental purposes we are forcing parallelism by setting relevant parameters
set parallel_tuple_cost = 0;
set parallel_setup_cost = 0;
alter table foo set (parallel_workers = 4);
set max_parallel_workers_per_gather = 4;
-- executing the query as an SQL statement
 explain analyse select * from foo where i <= 150; 
 Gather  (cost=0.00..4.56 ...) (actual time=0.217..5.614 ...)
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Seq Scan on foo  (cost=0.00..4.56 ...) (actual time=0.004..0.018 ...)
         Filter: (i <= 150)
         Rows Removed by Filter: 70
-- create a PLpgSQL function to fire the same query from
create or replace function total ()
returns integer as $total$
declare total integer;
     select count(*) from foo into total from foo where i <= 150;
     return total;
language plpgsql;

-- executing the query from a PLpgSQL function in v 9.6
explain analyse select total(); 
Query Text: SELECT count(*) FROM foo where i <=150
Aggregate  (cost=9.25..9.26 ...)
  ->  Seq Scan on foo  (cost=0.00..8.00 ...)
Query Text: explain analyse select total();
Result  (cost=0.00..0.26 ...)
To your relief the feature was then added in  version 10. Have a look,

-- executing the query from a PLpgSQL function in v10
explain analyse select total(); 
Query Text: SELECT count(*) FROM foo where i <=150
Finalize Aggregate  (cost=4.68..4.69 ...)
  ->  Gather  (cost=4.66..4.67 ...)
        Workers Planned: 4
        ->  Partial Aggregate  (cost=4.66..4.67 ...)
              ->  Parallel Seq Scan on foo  (cost=0.00..4.56 ...)
                    Filter: (i <= 150)
This extends the utility of parallel query to more realistic environments wherein queries are fired
through functions and are not simple SQL statements.

Learnings from benchmarking PostgreSQL on TPC-H

After making its mark in the OLTP world,  PostgreSQL is moving towards catering the needs of OLAP environment. Hence, the recent advancem...