Greenplum Indexes and performance
Greenplum Database does not depend upon indexes to the same degree as conventional data warehouse systems. Because the segments execute table scans in parallel, each segment scanning a small segment of the table, the traditional performance advantage from indexes is diminished. Indexes consume large amounts of space and require considerable CPU time to compute during data loads. There are, however, times when indexes are useful, especially for highly selective queries. When a query looks up a single row, an index can dramatically improve performance.
In this exercise, you first run a single row lookup on the sample table without an index, then rerun the query after creating an index.
- Select a single row and note the time to execute the query.
tutorial=# SELECT * from sample WHERE big = 12345; id | big | wee | stuff -------+-------+-----+------- 12345 | 12345 | 0 | (1 row) Time: 197.640 ms
- View the explain plan for the previous query:
tutorial=# EXPLAIN SELECT * from sample WHERE big = 12345; QUERY PLAN ----------------------------------------------------------------------- Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..459.04 rows=2 width=12) -> Table Scan on sample (cost=0.00..459.04 rows=1 width=12) Filter: big = 12345 Settings: optimizer=on Optimizer status: PQO version 1.597 (5 rows) Time: 19.719 ms
- Create an index on the sample table.
tutorial=# CREATE INDEX sample_big_index ON sample(big); CREATE INDEX Time: 1106.467 ms
- View the explain plan for the single-row select query with the new index in place:
tutorial=# EXPLAIN SELECT * FROM sample WHERE big = 12345; QUERY PLAN -------------------------------------------------------------------------- Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..3.00 rows=2 width=12) -> Index Scan using sample_big_index on sample (cost=0.00..3.00 rows=1 width=12) Index Cond: big = 12345 Settings: optimizer=on Optimizer status: PQO version 1.597 (5 rows) Time: 23.674 ms
- Run the single-row SELECT query with the index in place and note the time.
tutorial=# SELECT * FROM sample WHERE big = 12345; id | big | wee | stuff -------+-------+-----+------- 12345 | 12345 | 0 | (1 row) Time: 29.421 ms
Notice the difference in timing between the single-row SELECT with and without the index. The difference would have been much greater for a larger table. Not that even when there is a index, the optimizer can choose not to use it if it calculates a more efficient plan.
- View the following explain plans to compare plans for some other common types of queries.
tutorial=# EXPLAIN SELECT * FROM sample WHERE big = 12345; tutorial=# EXPLAIN SELECT * FROM sample WHERE big > 12345; tutorial=# EXPLAIN SELECT * FROM sample WHERE big = 12345 OR big = 12355; tutorial=# DROP INDEX sample_big_index; tutorial