Greenplum Row vs. column orientation
Greenplum Database offers the ability to store a table in either row or column orientation. Both storage options have advantages, depending upon data compression characteristics, the kinds of queries executed, the row length, and the complexity and number of join columns.
As a general rule, very wide tables are better stored in row orientation, especially if there are joins on many columns. Column orientation works well to save space with compression and to reduce I/O when there is much duplicated data in columns.
In this exercise, you will create a column-oriented version of the fact table and compare it with the row-oriented version.
- Create a column-oriented version of the FAA On Time Performance fact table and insert the data from the row-oriented version.
tutorial=# CREATE TABLE FAA.OTP_C (LIKE faa.otp_r) WITH (appendonly=true, orientation=column) DISTRIBUTED BY (UniqueCarrier, FlightNum) PARTITION BY RANGE(FlightDate) ( PARTITION mth START('2009-06-01'::date) END ('2010-10-31'::date) EVERY ('1 mon'::interval)); tutorial=# INSERT INTO faa.otp_c SELECT * FROM faa.otp_r;
- Compare the definitions of the row and the column versions of the table.
tutorial=# \d faa.otp_r Table "faa.otp_r" Column | Type | Modifiers ----------------------+------------------+----------- flt_year | smallint | flt_quarter | smallint | flt_month | smallint | flt_dayofmonth | smallint | flt_dayofweek | smallint | flightdate | date | uniquecarrier | text | airlineid | integer | carrier | text | flightnum | text | origin | text | origincityname | text | originstate | text | originstatename | text | dest | text | destcityname | text | deststate | text | deststatename | text | crsdeptime | text | deptime | integer | depdelay | double precision | depdelayminutes | double precision | departuredelaygroups | smallint | taxiout | smallint | wheelsoff | text | wheelson | text | taxiin | smallint | crsarrtime | text | arrtime | text | arrdelay | double precision | arrdelayminutes | double precision | arrivaldelaygroups | smallint | cancelled | smallint | cancellationcode | text | diverted | smallint | crselapsedtime | integer | actualelapsedtime | double precision | airtime | double precision | flights | smallint | distance | double precision | distancegroup | smallint | carrierdelay | smallint | weatherdelay | smallint | nasdelay | smallint | securitydelay | smallint | lateaircraftdelay | smallint | Distributed by: (uniquecarrier, flightnum)
Notice that the column-oriented version is append-only and partitioned. It has seventeen child files for the partitions, one for each month from June 2009 through October 2010.
-
tutorial=# \d faa.otp_c Append-Only Columnar Table "faa.otp_c" Column | Type | Modifiers ----------------------+------------------+----------- flt_year | smallint | flt_quarter | smallint | flt_month | smallint | flt_dayofmonth | smallint | flt_dayofweek | smallint | flightdate | date | uniquecarrier | text | airlineid | integer | carrier | text | flightnum | text | origin | text | origincityname | text | originstate | text | originstatename | text | dest | text | destcityname | text | deststate | text | deststatename | text | crsdeptime | text | deptime | integer | depdelay | double precision | depdelayminutes | double precision | departuredelaygroups | smallint | taxiout | smallint | wheelsoff | text | wheelson | text | taxiin | smallint | crsarrtime | text | arrtime | text | arrdelay | double precision | arrdelayminutes | double precision | arrivaldelaygroups | smallint | cancelled | smallint | cancellationcode | text | diverted | smallint | crselapsedtime | integer | actualelapsedtime | double precision | airtime | double precision | flights | smallint | distance | double precision | distancegroup | smallint | carrierdelay | smallint | weatherdelay | smallint | nasdelay | smallint | securitydelay | smallint | lateaircraftdelay | smallint | Checksum: t Number of child tables: 17 (Use \d+ to list them.) Distributed by: (uniquecarrier, flightnum)
Compare the sizes of the tables using the pg_relation_size() and pg_total_relation_size() functions. The pg_size_pretty() function converts the size in bytes to human-readable units.
tutorial=# SELECT pg_size_pretty(pg_relation_size('faa.otp_r')); pg_size_pretty ---------------- 256 MB (1 row) tutorial=# SELECT pg_size_pretty(pg_total_relation_size('faa.otp_r')); pg_size_pretty ---------------- 256 MB (1 row) tutorial=# SELECT pg_size_pretty(pg_relation_size('faa.otp_c')); pg_size_pretty ---------------- 0 bytes (1 row) tutorial=# SELECT pg_size_pretty(pg_total_relation_size('faa.otp_c')); pg_size_pretty ---------------- 288 kB (1 row)