Greenplum explain plans
View explain plans
An explain plan explains the method the optimizer has chosen to produce a result set. Depending on the query, there can be a variety of methods to produce a result set. The optimizer calculates the cost for each method and chooses the one with the lowest cost. In large queries, cost is generally measured by the amount of I/O to be performed.
An explain plan does not do any actual query processing work. Explain plans use statistics generated by the ANALYZE command, so plans generated before and after running ANALYZE can be quite different. This is especially true for queries with multiple joins, because the order of the joins can have a tremendous impact on performance.
In the following exercise, you will generate some small tables that you can query and view some explain plans.
- Enable timing so that you can see the effects of different performance tuning measures.
tutorial=# \timing on
- View the create_sample_table.sql script, and then run it.
tutorial=# \i create_sample_table.sql DROP TABLE Time: 15.901 ms SET Time: 3.174 ms psql:create_sample_table.sql:3: NOTICE: CREATE TABLE will create implicit sequence "sample_id_seq" for serial column "sample.id" CREATE TABLE Time: 24.421 ms INSERT 0 1000000 Time: 14624.516 ms UPDATE 1000000 Time: 1241.156 ms UPDATE 50000 Time: 190.210 ms UPDATE 1000000 Time: 1111.454 ms
- Request the explain plan for the COUNT() aggregate.
tutorial=# EXPLAIN SELECT COUNT(*) FROM sample WHERE id > 100; QUERY PLAN --------------------------------------------------------------------------- Aggregate (cost=0.00..462.77 rows=1 width=8) -> Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..462.77 rows=1 width=8) -> Aggregate (cost=0.00..462.76 rows=1 width=8) -> Table Scan on sample (cost=0.00..462.76 rows=500687 width=1) Filter: id > 100 Settings: optimizer=on Optimizer status: PQO version 1.597 (7 rows)
Query plans are read from bottom to top. In this example, there are four steps. First there is a sequential scan on each segment server to access the rows. Then there is an aggregation on each segment server to produce a count of the number of rows from that segment. Then there is a gathering of the count value to a single location. Finally, the counts from each segment are aggregated to produce the final result.
The cost number on each step has a start and stop value. For the sequential scan, this begins at time zero and goes until 13863.80. This is a fictional number created by the optimizer—it is not a number of seconds or I/O operations.
The cost numbers are cumulative, so the cost for the second operation includes the cost for the first operation. Notice that nearly all the time to process this query is in the sequential scan.
- The EXPLAIN ANALYZE command actually runs the query (without returning te result set). The cost numbers reflect the actual timings. It also produces some memory and I/O statistics.
tutorial=# EXPLAIN ANALYZE SELECT COUNT(*) FROM sample WHERE id > 100; QUERY PLAN ----------------------------------------------------------------------------- Aggregate (cost=0.00..462.77 rows=1 width=8) Rows out: 1 rows with 446 ms to end, start offset by 7.846 ms. -> Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..462.77 rows=1 width=8) Rows out: 2 rows at destination with 443 ms to first row, 446 ms to end, start offset by 7.860 ms. -> Aggregate (cost=0.00..462.76 rows=1 width=8) Rows out: Avg 1.0 rows x 2 workers. Max 1 rows (seg0) with 442 ms to end, start offset by 9.000 ms. -> Table Scan on sample (cost=0.00..462.76 rows=500687 width=1) Filter: id > 100 Rows out: Avg 499950.0 rows x 2 workers. Max 499951 rows (seg0) with 88 ms to first row, 169 ms to end, start offset by 9.007 ms. Slice statistics: (slice0) Executor memory: 159K bytes. (slice1) Executor memory: 177K bytes avg x 2 workers, 177K bytes max (seg0). Statement statistics: Memory used: 128000K bytes Settings: optimizer=on Optimizer status: PQO version 1.597 Total runtime: 453.855 ms (17 rows)
Changing optimizers
By default, the sandbox instance disables the Pivotal Query Optimizer and you may see “legacy query optimizer” listed in the EXPLAIN output under “Optimizer status.”
- Check whether the Pivotal Query Optimizer is enabled.
$ gpconfig -s optimizer Values on all segments are consistent GUC : optimizer Master value: off Segment value: off
- Enable the Pivotal Query Optimizer
$ gpconfig -c optimizer -v on --masteronly 20151201:09:08:31:172854 gpconfig:gpdb-sandbox:gpadmin-[INFO]:-completed successfully
- Reload the configuration on master and segment instances.
$ gpstop -u 20151201:09:08:49:172949 gpstop:gpdb-sandbox:gpadmin-[INFO]:-Starting gpstop with args: -u 20151201:09:08:49:172949 gpstop:gpdb-sandbox:gpadmin-[INFO]:-Gathering information and validating the environment... 20151201:09:08:49:172949 gpstop:gpdb-sandbox:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information 20151201:09:08:49:172949 gpstop:gpdb-sandbox:gpadmin-[INFO]:-Obtaining Segment details from master... 20151201:09:08:49:172949 gpstop:gpdb-sandbox:gpadmin-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 4.3.12.0' 20151201:09:08:49:172949 gpstop:gpdb-sandbox:gpadmin-[INFO]:-Signalling all postmaster processes to reload .