10 Tips to increase PostgreSQL performance

1. Avoid join if possible use subquery instead of join

Note : subqueries can be slower than LEFT [OUTER] JOINS, but in my opinion their strength is slightly higher readability.

select GPA from Student where sID in (select sID from Apply where major = 'CS');
Don't use Like following :
select GPA from Student, Apply where Student.sID = Apply.sID and Apply.major = 'CS';

Some Other Example :

SELECT id, alias
FROM
  mos_categories
WHERE
  id IN (
    SELECT
      DISTINCT catid
    FROM mos_content
  );

~ 5 seconds

SELECT
  DISTINCT mos_content.catid,
  mos_categories.alias
FROM
  mos_content, mos_categories
WHERE
  mos_content.catid = mos_categories.id;

~ 15 seconds

2.If you need to join the table , join should be small table to big table, before that developer need to check the table size using following query.

SELECT pg_size_pretty(pg_relation_size('schema_name.tablename'));

3.PostgreSQL Multicolum INDEX ORDER Should be Following Order :

Let’s review the explain analyze plan of the following simple query without indexes:

EXPLAIN ANALYZE SELECT line_items.product_id, SUM(line_items.price)
FROM line_items
WHERE product_id > 80
GROUP BY 1

EXPLAIN PLAN :
HashAggregate (cost=13.81..14.52 rows=71 width=12) (actual time=0.137..0.141 rows=20 loops=1)
Group Key: product_id
-> Seq Scan on line_items (cost=0.00..13.25 rows=112 width=8) (actual time=0.017..0.082 rows=112 loops=1)
Filter: (product_id > 80)
Rows Removed by Filter: 388
Planning time: 0.082 ms
Execution time: 0.187 ms

AFTER CREATING INDEX :

CREATE INDEX items_product_id_price ON line_items(product_id, price) ;

EXPLAIN PLAN :

GroupAggregate (cost=0.27..7.50 rows=71 width=12) (actual time=0.034..0.090 rows=20 loops=1)
Group Key: product_id
-> Index Only Scan using items_product_id_price on line_items (cost=0.27..6.23 rows=112 width=8) (actual time=0.024..0.049 rows=112 loops=1)
Index Cond: (product_id > 80)
Heap Fetches: 0
Planning time: 0.271 ms
Execution time: 0.136 ms

CREATING INDEX WITH REVEARSE ORDER :

CREATE INDEX items_product_id_price_reversed ON line_items(price, product_id);

It is equal to full table sequnce scan so don’t create index like this.

OTHER EXAMPLE :

SELECT orders.product_id, SUM(line_items.price)
FROM line_items
LEFT JOIN orders ON line_items.order_id = orders.id
WHERE line_items.created_at BETWEEN '2018-01-01' and '2018-01-02'
GROUP BY 1

Here we have join on order_id and filter on created_at. This way, we can create a multicolumn index that will contain created_at in the first place, order_id in the second and price in the third:

CREATE INDEX line_items_created_at_order_id_price ON line_items(created_at, order_id, price);

We’ll get the following explain plan:

GroupAggregate (cost=12.62..12.64 rows=1 width=12) (actual time=0.029..0.029 rows=1 loops=1)
Group Key: orders.product_id
-> Sort (cost=12.62..12.62 rows=1 width=8) (actual time=0.025..0.026 rows=1 loops=1)
Sort Key: orders.product_id
Sort Method: quicksort Memory: 25kB
-> Nested Loop Left Join (cost=0.56..12.61 rows=1 width=8) (actual time=0.015..0.017 rows=1 loops=1)
-> Index Only Scan using line_items_created_at_order_id_price on line_items (cost=0.27..4.29 rows=1 width=8) (actual time=0.009..0.010 rows=1 loops=1)
Index Cond: ((created_at >= '2018-01-01 00:00:00'::timestamp without time zone) AND (created_at <= '2018-01-02 00:00:00'::timestamp without time zone))
Heap Fetches: 0
-> Index Scan using orders_pkey on orders (cost=0.29..8.30 rows=1 width=8) (actual time=0.004..0.005 rows=1 loops=1)
Index Cond: (line_items.order_id = id)
Planning time: 0.303 ms
Execution time: 0.072 ms

As you can see, line_items_created_at_order_id_price is used to reduce scan by date condition. After that, it’s joined with orders using the orders_pkey index scan.

Note : Gin index faster SELECT AND gist FASTER UPDATE and Btree for Default index
First try to create default index then note the Execution time after that try GIN and GIST index
note the execution time which one index is getting good performance keep that index only.
for taking execution plan please Refer https://www.tutorialdba.com/p/postgresql-explain-plan.html

4. while using WHere CLAUSE, Use IN class instead of using OR class

with OR :
SELECT id FROM a
WHERE id = 42
   OR id = 4711;

WITHOUT OR:
SELECT id FROM a
WHERE id IN (42, 4711);

5. INSTEAD OF USING DOUBLE LIKE USE “ANY LIKE” OPTION

SELECT id FROM tutorialdba WHERE a_val LIKE 'something%' OR a_val LIKE 'other%';

--REWRITING LIKE FOLLOWING:
SELECT id FROM tutorialdba WHERE a_val LIKE ANY (ARRAY['something%', 'other%']);

6. DO ReINDEX Query Unused Indexes :

Index all the things right? Did you know having unused indexes can negatively affect write performance? The reason is, when you create an index, Postgres is burdened with the task of keeping this index updated after write (INSERT / UPDATE / DELETE) operations. So, adding an index is a balancing act because they can speed up reading of data (if created properly) but will slow down write operations. To find unused indexes you can run the following command.

SELECT
  indexrelid::regclass as index,
  relid::regclass as table,
  'DROP INDEX ' || indexrelid::regclass || ';' as drop_statement
FROM
  pg_stat_user_indexes
  JOIN
    pg_index USING (indexrelid)
WHERE
  idx_scan = 0
  AND indisunique is false;

7. TO CHECKING INDEX STATUS :

--to checking table index:
postgres=# select relname,indexrelname,indexrelid from pg_stat_all_indexes where indexrelid=24700;
 relname | indexrelname | indexrelid 
---------+--------------+------------
 k       | production   |      24700 
--check the index is valid or not and check the index is alive or not
postgres=# select indexrelid,indislive,indisvalid from pg_index where indexrelid=24700;
 indexrelid | indislive | indisvalid 
------------+-----------+------------
      24700 | t         | t
(1 row)

MANUALLY REINDEXING IF VALUE IS FALSE :

DROP INDEX registration.rg_patient_ix10;
CREATE INDEX rg_patient_ix10 ON registration.patient USING btree( lower(firstname), lower(middlename), lower(lastname) varchar_pattern_ops);

8. Before and after changing the query take the explain plan and note down the query run time using explain plan
E

Example:

Before changing Query functionality:
EXPLAIN ANALYZE Query; note the Execution time

After changing Query functionality:
EXPLAIN ANALYZE Query; note the Execution time

9. CHECKING THE DEAD TUBLES :

IF U GET ticket regarding for query tuning you have to check the table whether is having any dead tubles or not , if dead tuble is exists you have to do vacuum .

--to checking table index:
postgres=# select relname,indexrelname,indexrelid from pg_stat_all_indexes where indexrelid=24700;
 relname | indexrelname | indexrelid 
---------+--------------+------------
 k       | production   |      24700 
--check the index is valid or not and check the index is alive or not
postgres=# select indexrelid,indislive,indisvalid from pg_index where indexrelid=24700;
 indexrelid | indislive | indisvalid 
------------+-----------+------------
      24700 | t         | t
(1 row)

10. CHECKING LONG RUNNING QUERY :

THIS SCRIPT WILL HELPFUL FOR CHECKING THE LONG RUNNING QUERY AS WELL DISK USUAGE AND MONITORING RAM SIZE DEAD TUBLES AND ETC.. .

[root@tutorialdba.com]# cat TUTORIALDBAmonitor.sh
echo " "
echo " "
HOSTNAME="101.0.1.123"
PSQL="/opt/edb/as9.6/bin/psql"
PORT=5444
DB="edb"
USER="enterprisedb"
export PGPASSWORD="2NDQUADRANT.IN" 

echo "ENTER THE TIME IN MINUTES ?"
echo "For example if you give 10 means it will display "
echo "queries which is running more than 10 minutes"
echo "0 means will display all queries:"
echo ""
read time
echo " "
echo " "


echo "WHAT ARE THE QUERY IS RUNING MORE THAN $time MINUTES"
$PSQL -d $DB -U $USER -p $PORT -h $HOSTNAME <<EOF
\pset format wrapped
SELECT pid, now() - query_start as "runtime", usename, datname, state, query
  FROM  pg_stat_activity
  WHERE now() - query_start > '$time minutes'::interval
 ORDER BY runtime DESC;

EOF
echo "*********************COMPLETED 1****************************** "
echo " "
echo " "
echo " "

echo "CHECKING dISK SPACE:"
        df -h

echo "*********************COMPLETED 2****************************** "
echo " "
echo " "
echo " "



echo "CHECKING RAM USAGE:"
       free -h

echo "*********************COMPLETED 3****************************** "
echo " "
echo " "
echo " "



echo "ENTER THE BLOATED COUNT ? "
echo "For example if you give 1000  means it will display the table count"
echo "how many table having more than 1000 dead tubles that database:"
echo ""
read count
echo " "
echo " "



echo "HOW MANY TABLES HAVING MORE THAN $count BLOATED TABLE :"

$PSQL -d $DB -U $USER -p $PORT -h $HOSTNAME  <<EOF
\c LC
select count(*) from pg_stat_all_tables where n_dead_tup > $count;
EOF

echo " "

$PSQL -d $DB -U $USER -p $PORT -h $HOSTNAME  <<EOF
\c DC
select count(*) from pg_stat_all_tables where n_dead_tup >$count;
EOF

echo "*********************COMPLETED 4****************************** "
echo " "
echo " "
echo " "


echo "ENTER THE BLOATED COUNT AGAING ? "
echo "it will give schema,table name and dead tuble count including "
echo "with vacuum & analyze script you can copy and run it to the   "
echo "SQL prompt what are the table having more Bloated table "
echo ""
read count1
echo ""
echo ""

echo "WHAT ARE THE TABLE HAVING MORE THAN $count1 BLOATED TABLE : "

$PSQL -d $DB -U $USER -p $PORT -h $HOSTNAME  <<EOF
\c LC
select 'vacuum ANALYZE '||schemaname||'.'||relname||';' from pg_stat_all_tables where n_dead_tup >$count1;
select schemaname,relname,n_dead_tup from pg_stat_all_tables where n_dead_tup >$count1;
EOF

echo " "

$PSQL -d $DB -U $USER -p $PORT -h $HOSTNAME  <<EOF
\c DC
select 'vacuum ANALYZE '||schemaname||'.'||relname||';' from pg_stat_all_tables where n_dead_tup >$count1;
select schemaname,relname,n_dead_tup from pg_stat_all_tables where n_dead_tup >$count1;
EOF

echo "*********************COMPLETED 5****************************** "
echo " "

SAMPLE OUTPUT:

[root@tutorialdba.com]#  sh TUTORIALDBAmonitor.sh

ENTER THE TIME IN MINUTES ?
For example if you give 10 means it will display
queries which is running more than 10 minutes
0 means will display all queries:

4


WHAT ARE THE QUERY IS RUNING MORE THAN 4 MINUTES
Output format is wrapped.
  pid  |     runtime     | usename  |    datname     | state |   query
-------+-----------------+----------+----------------+-------+------------
 15832 | 00:05:57.796746 | hr       | LC             | idle  | unlisten *
 15465 | 00:05:35.23556  | ehis     | LC 	     | idle  | unlisten *
 15113 | 00:05:34.880537 | billing  | LC	     | idle  | unlisten *
 15918 | 00:04:50.816381 | security | LC	     | idle  | unlisten *
 15930 | 00:04:48.148488 | billing  | DC 	     | idle  | unlisten *
...
..
(10 rows)

*********************COMPLETED 1******************************



CHECKING dISK SPACE:
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda2      250G  2.0G  249G   1% /
devtmpfs         16G     0   16G   0% /dev
tmpfs            16G  8.0K   16G   1% /dev/shm
tmpfs            16G  177M   16G   2% /run
tmpfs            16G     0   16G   0% /sys/fs/cgroup
/dev/xvdb       200G   33G  168G  17% /opt
tmpfs           3.2G     0  3.2G   0% /run/user/1000
*********************COMPLETED 2******************************



CHECKING RAM USAGE:
              total        used        free      shared  buff/cache   available
Mem:            31G        717M         27G        1.3G        2.7G         28G
Swap:            0B          0B          0B
*********************COMPLETED 3******************************



ENTER THE BLOATED COUNT ?
For example if you give 1000  means it will display the table count
how many table having more than 1000 dead tubles that database:

1500


HOW MANY TABLES HAVING MORE THAN 1500 BLOATED TABLE :
You are now connected to database "LC" as user "enterprisedb".
 count
-------
     3
(1 row)


You are now connected to database "DC" as user "enterprisedb".
 count
-------
     1
(1 row)

*********************COMPLETED 4******************************



ENTER THE BLOATED COUNT AGAING ?
it will give schema,table name and dead tuble count including
with vacuum & analyze script you can copy and run it to the
SQL prompt what are the table having more Bloated table

2000


WHAT ARE THE TABLE HAVING MORE THAN 2000 BLOATED TABLE :
You are now connected to database "LC" as user "enterprisedb".
                ?column?
-----------------------------------------
 vacuum ANALYZE pg_catalog.pg_attribute;
 vacuum ANALYZE pg_catalog.pg_depend;
 vacuum ANALYZE pg_catalog.pg_shdepend;
(3 rows)

 schemaname |   relname    | n_dead_tup
------------+--------------+------------
 pg_catalog | pg_attribute |      21218
 pg_catalog | pg_depend    |       9869
 pg_catalog | pg_shdepend  |       2730
(3 rows)


You are now connected to database "DC" as user "enterprisedb".
                ?column?
----------------------------------------
 vacuum ANALYZE pg_catalog.pg_shdepend;
(1 row)

 schemaname |   relname   | n_dead_tup
------------+-------------+------------
 pg_catalog | pg_shdepend |       2730
(1 row)

*********************COMPLETED 5******************************

vacuum and analyze the catalog table which is having more dead tuble, Here pg_attribute havig more dead tuble that is what am chose it and vacuum and analyze that.

LC=# vacuum ANALYZE pg_catalog.pg_attribute;
VACUUM

About Courses

Greenplum

About Courses

Greenplum

Blog

10 Tips to increase PostgreSQL performance

1. Avoid join if possible use subquery instead of join

2.If you need to join the table , join should be small table to big table, before that developer need to check the table size using following query.

3.PostgreSQL Multicolum INDEX ORDER Should be Following Order :

4. while using WHere CLAUSE, Use IN class instead of using OR class

5. INSTEAD OF USING DOUBLE LIKE USE “ANY LIKE” OPTION

6. DO ReINDEX Query Unused Indexes :

7. TO CHECKING INDEX STATUS :

8. Before and after changing the query take the explain plan and note down the query run time using explain plan
E

9. CHECKING THE DEAD TUBLES :

10. CHECKING LONG RUNNING QUERY :

PostgreSQL support formulas

How to recover The greenplum segments ?

Leave A Reply

Become An Instructor?

About Courses

About Courses

Blog

1. Avoid join if possible use subquery instead of join

2.If you need to join the table , join should be small table to big table, before that developer need to check the table size using following query.

3.PostgreSQL Multicolum INDEX ORDER Should be Following Order :

4. while using WHere CLAUSE, Use IN class instead of using OR class

5. INSTEAD OF USING DOUBLE LIKE USE “ANY LIKE” OPTION

6. DO ReINDEX Query Unused Indexes :

7. TO CHECKING INDEX STATUS :

8. Before and after changing the query take the explain plan and note down the query run time using explain planE

9. CHECKING THE DEAD TUBLES :

10. CHECKING LONG RUNNING QUERY :

You may also like

Leave A Reply

Login with your site account

8. Before and after changing the query take the explain plan and note down the query run time using explain plan
E