10 Tips to increase PostgreSQL performance
1. Avoid join if possible use subquery instead of join
Note : subqueries can be slower than LEFT [OUTER] JOINS, but in my opinion their strength is slightly higher readability.
select GPA from Student where sID in (select sID from Apply where major = 'CS');
Don't use Like following :
select GPA from Student, Apply where Student.sID = Apply.sID and Apply.major = 'CS';
Some Other Example :
SELECT id, alias
FROM
mos_categories
WHERE
id IN (
SELECT
DISTINCT catid
FROM mos_content
);
~ 5 seconds
SELECT
DISTINCT mos_content.catid,
mos_categories.alias
FROM
mos_content, mos_categories
WHERE
mos_content.catid = mos_categories.id;
~ 15 seconds
2.If you need to join the table , join should be small table to big table, before that developer need to check the table size using following query.
SELECT pg_size_pretty(pg_relation_size('schema_name.tablename'));
3.PostgreSQL Multicolum INDEX ORDER Should be Following Order :
Let’s review the explain analyze plan of the following simple query without indexes:
EXPLAIN ANALYZE SELECT line_items.product_id, SUM(line_items.price)
FROM line_items
WHERE product_id > 80
GROUP BY 1
EXPLAIN PLAN :
HashAggregate (cost=13.81..14.52 rows=71 width=12) (actual time=0.137..0.141 rows=20 loops=1)
Group Key: product_id
-> Seq Scan on line_items (cost=0.00..13.25 rows=112 width=8) (actual time=0.017..0.082 rows=112 loops=1)
Filter: (product_id > 80)
Rows Removed by Filter: 388
Planning time: 0.082 ms
Execution time: 0.187 ms
AFTER CREATING INDEX :
CREATE INDEX items_product_id_price ON line_items(product_id, price) ;
EXPLAIN PLAN :
GroupAggregate (cost=0.27..7.50 rows=71 width=12) (actual time=0.034..0.090 rows=20 loops=1)
Group Key: product_id
-> Index Only Scan using items_product_id_price on line_items (cost=0.27..6.23 rows=112 width=8) (actual time=0.024..0.049 rows=112 loops=1)
Index Cond: (product_id > 80)
Heap Fetches: 0
Planning time: 0.271 ms
Execution time: 0.136 ms
CREATING INDEX WITH REVEARSE ORDER :
CREATE INDEX items_product_id_price_reversed ON line_items(price, product_id);
It is equal to full table sequnce scan so don’t create index like this.
OTHER EXAMPLE :
SELECT orders.product_id, SUM(line_items.price)
FROM line_items
LEFT JOIN orders ON line_items.order_id = orders.id
WHERE line_items.created_at BETWEEN '2018-01-01' and '2018-01-02'
GROUP BY 1
Here we have join on order_id and filter on created_at. This way, we can create a multicolumn index that will contain created_at in the first place, order_id in the second and price in the third:
CREATE INDEX line_items_created_at_order_id_price ON line_items(created_at, order_id, price);
We’ll get the following explain plan:
GroupAggregate (cost=12.62..12.64 rows=1 width=12) (actual time=0.029..0.029 rows=1 loops=1)
Group Key: orders.product_id
-> Sort (cost=12.62..12.62 rows=1 width=8) (actual time=0.025..0.026 rows=1 loops=1)
Sort Key: orders.product_id
Sort Method: quicksort Memory: 25kB
-> Nested Loop Left Join (cost=0.56..12.61 rows=1 width=8) (actual time=0.015..0.017 rows=1 loops=1)
-> Index Only Scan using line_items_created_at_order_id_price on line_items (cost=0.27..4.29 rows=1 width=8) (actual time=0.009..0.010 rows=1 loops=1)
Index Cond: ((created_at >= '2018-01-01 00:00:00'::timestamp without time zone) AND (created_at <= '2018-01-02 00:00:00'::timestamp without time zone))
Heap Fetches: 0
-> Index Scan using orders_pkey on orders (cost=0.29..8.30 rows=1 width=8) (actual time=0.004..0.005 rows=1 loops=1)
Index Cond: (line_items.order_id = id)
Planning time: 0.303 ms
Execution time: 0.072 ms
As you can see, line_items_created_at_order_id_price is used to reduce scan by date condition. After that, it’s joined with orders using the orders_pkey index scan.
Note : Gin index faster SELECT AND gist FASTER UPDATE and Btree for Default index
First try to create default index then note the Execution time after that try GIN and GIST index
note the execution time which one index is getting good performance keep that index only.
for taking execution plan please Refer https://www.tutorialdba.com/p/postgresql-explain-plan.html
4. while using WHere CLAUSE, Use IN class instead of using OR class
with OR :
SELECT id FROM a
WHERE id = 42
OR id = 4711;
WITHOUT OR:
SELECT id FROM a
WHERE id IN (42, 4711);
5. INSTEAD OF USING DOUBLE LIKE USE “ANY LIKE” OPTION
SELECT id FROM tutorialdba WHERE a_val LIKE 'something%' OR a_val LIKE 'other%';
--REWRITING LIKE FOLLOWING:
SELECT id FROM tutorialdba WHERE a_val LIKE ANY (ARRAY['something%', 'other%']);
6. DO ReINDEX Query Unused Indexes :
Index all the things right? Did you know having unused indexes can negatively affect write performance? The reason is, when you create an index, Postgres is burdened with the task of keeping this index updated after write (INSERT / UPDATE / DELETE) operations. So, adding an index is a balancing act because they can speed up reading of data (if created properly) but will slow down write operations. To find unused indexes you can run the following command.
SELECT
indexrelid::regclass as index,
relid::regclass as table,
'DROP INDEX ' || indexrelid::regclass || ';' as drop_statement
FROM
pg_stat_user_indexes
JOIN
pg_index USING (indexrelid)
WHERE
idx_scan = 0
AND indisunique is false;
7. TO CHECKING INDEX STATUS :
--to checking table index:
postgres=# select relname,indexrelname,indexrelid from pg_stat_all_indexes where indexrelid=24700;
relname | indexrelname | indexrelid
---------+--------------+------------
k | production | 24700
--check the index is valid or not and check the index is alive or not
postgres=# select indexrelid,indislive,indisvalid from pg_index where indexrelid=24700;
indexrelid | indislive | indisvalid
------------+-----------+------------
24700 | t | t
(1 row)
MANUALLY REINDEXING IF VALUE IS FALSE :
DROP INDEX registration.rg_patient_ix10;
CREATE INDEX rg_patient_ix10 ON registration.patient USING btree( lower(firstname), lower(middlename), lower(lastname) varchar_pattern_ops);
8. Before and after changing the query take the explain plan and note down the query run time using explain plan
E
Example:
Before changing Query functionality:
EXPLAIN ANALYZE Query; note the Execution time
After changing Query functionality:
EXPLAIN ANALYZE Query; note the Execution time
9. CHECKING THE DEAD TUBLES :
IF U GET ticket regarding for query tuning you have to check the table whether is having any dead tubles or not , if dead tuble is exists you have to do vacuum .
--to checking table index:
postgres=# select relname,indexrelname,indexrelid from pg_stat_all_indexes where indexrelid=24700;
relname | indexrelname | indexrelid
---------+--------------+------------
k | production | 24700
--check the index is valid or not and check the index is alive or not
postgres=# select indexrelid,indislive,indisvalid from pg_index where indexrelid=24700;
indexrelid | indislive | indisvalid
------------+-----------+------------
24700 | t | t
(1 row)
10. CHECKING LONG RUNNING QUERY :
THIS SCRIPT WILL HELPFUL FOR CHECKING THE LONG RUNNING QUERY AS WELL DISK USUAGE AND MONITORING RAM SIZE DEAD TUBLES AND ETC.. .
[root@tutorialdba.com]# cat TUTORIALDBAmonitor.sh
echo " "
echo " "
HOSTNAME="101.0.1.123"
PSQL="/opt/edb/as9.6/bin/psql"
PORT=5444
DB="edb"
USER="enterprisedb"
export PGPASSWORD="2NDQUADRANT.IN"
echo "ENTER THE TIME IN MINUTES ?"
echo "For example if you give 10 means it will display "
echo "queries which is running more than 10 minutes"
echo "0 means will display all queries:"
echo ""
read time
echo " "
echo " "
echo "WHAT ARE THE QUERY IS RUNING MORE THAN $time MINUTES"
$PSQL -d $DB -U $USER -p $PORT -h $HOSTNAME <<EOF
\pset format wrapped
SELECT pid, now() - query_start as "runtime", usename, datname, state, query
FROM pg_stat_activity
WHERE now() - query_start > '$time minutes'::interval
ORDER BY runtime DESC;
EOF
echo "*********************COMPLETED 1****************************** "
echo " "
echo " "
echo " "
echo "CHECKING dISK SPACE:"
df -h
echo "*********************COMPLETED 2****************************** "
echo " "
echo " "
echo " "
echo "CHECKING RAM USAGE:"
free -h
echo "*********************COMPLETED 3****************************** "
echo " "
echo " "
echo " "
echo "ENTER THE BLOATED COUNT ? "
echo "For example if you give 1000 means it will display the table count"
echo "how many table having more than 1000 dead tubles that database:"
echo ""
read count
echo " "
echo " "
echo "HOW MANY TABLES HAVING MORE THAN $count BLOATED TABLE :"
$PSQL -d $DB -U $USER -p $PORT -h $HOSTNAME <<EOF
\c LC
select count(*) from pg_stat_all_tables where n_dead_tup > $count;
EOF
echo " "
$PSQL -d $DB -U $USER -p $PORT -h $HOSTNAME <<EOF
\c DC
select count(*) from pg_stat_all_tables where n_dead_tup >$count;
EOF
echo "*********************COMPLETED 4****************************** "
echo " "
echo " "
echo " "
echo "ENTER THE BLOATED COUNT AGAING ? "
echo "it will give schema,table name and dead tuble count including "
echo "with vacuum & analyze script you can copy and run it to the "
echo "SQL prompt what are the table having more Bloated table "
echo ""
read count1
echo ""
echo ""
echo "WHAT ARE THE TABLE HAVING MORE THAN $count1 BLOATED TABLE : "
$PSQL -d $DB -U $USER -p $PORT -h $HOSTNAME <<EOF
\c LC
select 'vacuum ANALYZE '||schemaname||'.'||relname||';' from pg_stat_all_tables where n_dead_tup >$count1;
select schemaname,relname,n_dead_tup from pg_stat_all_tables where n_dead_tup >$count1;
EOF
echo " "
$PSQL -d $DB -U $USER -p $PORT -h $HOSTNAME <<EOF
\c DC
select 'vacuum ANALYZE '||schemaname||'.'||relname||';' from pg_stat_all_tables where n_dead_tup >$count1;
select schemaname,relname,n_dead_tup from pg_stat_all_tables where n_dead_tup >$count1;
EOF
echo "*********************COMPLETED 5****************************** "
echo " "
SAMPLE OUTPUT:
[root@tutorialdba.com]# sh TUTORIALDBAmonitor.sh
ENTER THE TIME IN MINUTES ?
For example if you give 10 means it will display
queries which is running more than 10 minutes
0 means will display all queries:
4
WHAT ARE THE QUERY IS RUNING MORE THAN 4 MINUTES
Output format is wrapped.
pid | runtime | usename | datname | state | query
-------+-----------------+----------+----------------+-------+------------
15832 | 00:05:57.796746 | hr | LC | idle | unlisten *
15465 | 00:05:35.23556 | ehis | LC | idle | unlisten *
15113 | 00:05:34.880537 | billing | LC | idle | unlisten *
15918 | 00:04:50.816381 | security | LC | idle | unlisten *
15930 | 00:04:48.148488 | billing | DC | idle | unlisten *
...
..
(10 rows)
*********************COMPLETED 1******************************
CHECKING dISK SPACE:
Filesystem Size Used Avail Use% Mounted on
/dev/xvda2 250G 2.0G 249G 1% /
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 8.0K 16G 1% /dev/shm
tmpfs 16G 177M 16G 2% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/xvdb 200G 33G 168G 17% /opt
tmpfs 3.2G 0 3.2G 0% /run/user/1000
*********************COMPLETED 2******************************
CHECKING RAM USAGE:
total used free shared buff/cache available
Mem: 31G 717M 27G 1.3G 2.7G 28G
Swap: 0B 0B 0B
*********************COMPLETED 3******************************
ENTER THE BLOATED COUNT ?
For example if you give 1000 means it will display the table count
how many table having more than 1000 dead tubles that database:
1500
HOW MANY TABLES HAVING MORE THAN 1500 BLOATED TABLE :
You are now connected to database "LC" as user "enterprisedb".
count
-------
3
(1 row)
You are now connected to database "DC" as user "enterprisedb".
count
-------
1
(1 row)
*********************COMPLETED 4******************************
ENTER THE BLOATED COUNT AGAING ?
it will give schema,table name and dead tuble count including
with vacuum & analyze script you can copy and run it to the
SQL prompt what are the table having more Bloated table
2000
WHAT ARE THE TABLE HAVING MORE THAN 2000 BLOATED TABLE :
You are now connected to database "LC" as user "enterprisedb".
?column?
-----------------------------------------
vacuum ANALYZE pg_catalog.pg_attribute;
vacuum ANALYZE pg_catalog.pg_depend;
vacuum ANALYZE pg_catalog.pg_shdepend;
(3 rows)
schemaname | relname | n_dead_tup
------------+--------------+------------
pg_catalog | pg_attribute | 21218
pg_catalog | pg_depend | 9869
pg_catalog | pg_shdepend | 2730
(3 rows)
You are now connected to database "DC" as user "enterprisedb".
?column?
----------------------------------------
vacuum ANALYZE pg_catalog.pg_shdepend;
(1 row)
schemaname | relname | n_dead_tup
------------+-------------+------------
pg_catalog | pg_shdepend | 2730
(1 row)
*********************COMPLETED 5******************************
vacuum and analyze the catalog table which is having more dead tuble, Here pg_attribute havig more dead tuble that is what am chose it and vacuum and analyze that.
LC=# vacuum ANALYZE pg_catalog.pg_attribute;
VACUUM