It's all in database: March 2019

Intra-query parallelism was introduced in PostgreSQL in version 9.6. The benefit from the parallel scans and joins were talked about and significant improvement in the benchmark queries on higher scale factors were highlighted. However, one area remained devoid of the benefits - queries from
procedural language functions. Precisely, if you fire a query from a PL/pgSQL function then it can not use parallel scans or joins for that query, even though the query is capable of using them otherwise. Have a look at an example yourself,

-- creating and populating the table

create table foo (i int, j int)
insert into foo values (generate_series(1,500), generate_series(1,500));

-- for experimental purposes we are forcing parallelism by setting relevant parameters

set parallel_tuple_cost = 0;
set parallel_setup_cost = 0;
alter table foo set (parallel_workers = 4);
set max_parallel_workers_per_gather = 4;

-- executing the query as an SQL statement

explain analyse select * from foo where i <= 150;

Gather (cost=0.00..4.56 ...) (actual time=0.217..5.614 ...)
Workers Planned: 4
Workers Launched: 4
-> Parallel Seq Scan on foo (cost=0.00..4.56 ...) (actual time=0.004..0.018 ...)
Filter: (i <= 150)
Rows Removed by Filter: 70

-- create a PLpgSQL function to fire the same query from

create or replace function total ()
returns integer as $total$
declare total integer;
begin
select count(*) from foo into total from foo where i <= 150;
return total;
end;
$total$
language plpgsql;

-- executing the query from a PLpgSQL function in v 9.6

explain analyse select total();

Query Text: SELECT count(*) FROM foo where i <=150
Aggregate (cost=9.25..9.26 ...)
-> Seq Scan on foo (cost=0.00..8.00 ...)
Query Text: explain analyse select total();
Result (cost=0.00..0.26 ...)

To your relief the feature was then added in version 10. Have a look,

-- executing the query from a PLpgSQL function in v10

explain analyse select total();

Query Text: SELECT count(*) FROM foo where i <=150
Finalize Aggregate (cost=4.68..4.69 ...)
-> Gather (cost=4.66..4.67 ...)
Workers Planned: 4
-> Partial Aggregate (cost=4.66..4.67 ...)
-> Parallel Seq Scan on foo (cost=0.00..4.56 ...)
Filter: (i <= 150)

This extends the utility of parallel query to more realistic environments wherein queries are fired
through functions and are not simple SQL statements.

It's all in database

Friday 1 March 2019

Using parallelism for queries from PL functions in PostgreSQL 10

Learnings from benchmarking PostgreSQL on TPC-H