Greenplum psql Command Line Client
The previous tutorial showed how you can download and run a virtual machine with Greenplum Database on it. In continuation, this tutorial will show how to use the command line client “psql”.
What is psql?
First start the virtual machine, and login with the known credentials. By default, Greenplum Database comes with “psql” as command line client, and that is a very powerful tool to query the database or use it in scripts.
How to start psql
Start a terminal, and type in:
psql
psql tries to connect to the database using a number of default settings:
- TCP Port: 5432
- Host: will use the Unix Domain Socket
- Username: your Unix username
- Database: same as your username
Each setting can be changed on the commandline, or overridden using the configuration file or environment variables. The most common settings are:
- Hostname: -h or –host
- Username: -U or –username (uppercase U)
- TCP-Port: -p or –port
- Database name: -d or –dbname
Therefore in order to connect from your local workstation to a Greenplum Database server, you need the following command line:
psql -U gpadmin -h gpdbserver -p 5432 postgres
Which databases are available?
In a newly created Greenplum Database system, a total of 3 databases are created by default:
- “template0”: empty, used as fallback if “template1” needs to be recreated
- “template1”: used as template for every newly created database – everything which is installed or created in “template1” will also be available in the new database
- “postgres”: can be used as default database to establish a connection, until a database for your project is created
It comes handy to create a “gpadmin” database as well, in order to have a default database for the gpadmin user. It saves a few keystrokes every time you use psql:
CREATE DATABASE gpadmin;
What does psql offer?
Once connected to the database, psql offers a broad range of commands to explore the database. Here are a few common commands:
- \l: lists all databases
- \l+: lists all databases and shows extended information
- \c “database”: connects to “database”
- \timing: switches client-side timing on or off
- \dt: lists all tables
- \dt+: lists all tables and shows extended information
- \dt public.*: lists all tables in the public schema
- \dn: lists all schemas
- \df: lists all functions
- \x: switches between row and column based output (handy for very wide tables)
How to load data using psql
The COPY command can be used to load data from files into tables. psql is often used in scripts to execute the COPY command, however that can also happen interactively. Connect to your database and execute the following command:
COPY tablename from ‘/path/to/filename.csv’;
Note: the file needs to be on the database server, and needs to be accessible for the user which runs the Greenplum Database (usually gpadmin). The server will directly open the file and read the data.
That is not very handy if psql runs on another host. However psql itself offers a nice way to work around this problem:
\copy tablename from ‘/path/to/filename.csv’
Using the backslash form, psql will transfer the file over the network to the database, and execute the COPY command.
How to export data using psql
The COPY command is not only used to transfer data to the database, but can also be used to export data:
COPY tablename to ‘/path/to/export.csv’;
This will export the content of the table “tablename” to the file ‘export.csv’ on the master server. Again, psql can do the heavy lifting and transfer the data over the network:
\copy tablename to ‘/path/to/export.csv’
More options how to load and unload data are described in the Data Loading tutorial.
Execute scripts
One of the advanced use cases for psql is the usage in scripts and batch jobs. psql offers the option to execute single commands, using the -c option. And it allows to read an arbitrary number of commands from a text file and send them to the database, using the -f option.
Single command:
psql -c “SELECT COUNT(*) FROM tablename”
count ------- 390
(1 row)
It’s even more handy to (un)format the output to be used in scripts:
psql -q -A -t -c “SELECT COUNT(*) FROM tablename”
390
Read commands from text file:
psql -f textfile.txt
This option can also be used when already connected to a database:
\i textfile.txt
Row and column based output
The normal output for any result is column-based. That is, psql will calculate the width for every cell based on the data returned from the query. If the output is larger than the terminal, or if a table has many columns, this will wrap around and produce non-readable output.
The command line option -x/–expanded and the inline option \x will switch to row-based output. In this format, every tuple will be printed on a separate line. This makes it easy to read very long or very wide datasets.
Example showing the difference between row and column based output:
SELECT * FROM pg_class LIMIT 1;
relname | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | relhasindex | relisshared | relkind | relstorage | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relfrozenxid | relacl | reloptions
------------------+--------------+---------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+-------------+-------------+---------+------------+----------+-----------+-------------+----------+----------+---------+------------+------------+-------------+----------------+--------------+--------------------------+------------
gp_size_of_index | 11913 | 12017 | 10 | 0 | 12016 | 0 | 0 | 0 | 0 | 0 | f | f | v | v | 7 | 0 | 0 | 0 | 0 | 0 | f | f | t | f | 0 | {gpadmin=arwdDxt/gpadmin,=r/gpadmin} |
(1 row)
Measure runtime
The psql client has a built-in stopwatch:
\timing
SELECT * FROM pg_class;
count ------- 390
(1 row)
Time: 3,025 ms
T Add Form his will measure the time required to execute the query. Keep in mind that this also measures the time which is required to transfer the data to the client. If you have a small result set, this will be pretty accurate. However if you just do a SELECT * from a large table, you might spend more time in transferring the data than in selecting the data from the table.
Stop scripts on error
If psql is used in scripts, or if a large script is executed in the database using the -f option, it might happen that something goes wrong. A syntax error, a missing table, anything which might cause an error, and later on produces more follow-up errors.
By default, psql will not stop when it encounters an error, but will read all commands from the file and tries to execute them. Most of the time that is not what you want, either because the other commands depend on the failing command and therefore the result is just one big mess. Or because everything was running in one transaction and the transaction fails on the first error.
For this kind of use cases psql offers a stop command when it encounters an error:
\set ON_ERROR_STOP
This will stop the execution of a script when an error is encountered.