PostgreSQL Replication Setup / configuration on Linux
what is the postgresql Streaming Replication (SR) ?
Streaming Replication (SR) provides the capability to continuously ship and apply the WAL XLOG records to some number of standby servers in order to keep them current. This feature was added to PostgreSQL 9.0.
Advantages of postgresql Streaming Replication (SR):
1.Switch over/Fail over after the primary fails.
2.Data loss so less and data integrity.
3.No need more down time after primary fail.
4.synchronous mode Zero data loss.
5. Load balancing using load balancer ex.pgpool-II,pgbouncer.
6.Auto failover using repmgr.
Now We can go postgresql streaming replication setup using postgresql-10.3 version master IP: 192.168.2.2 slave IP: 192.168.2.3
NetWork Configuration :
On Master:
[postgres@master ~]$ cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.3 Beta (Maipo) --Netcard Entry cd /etc/sysconfig/network-scripts/ vi ifcfg-ens33 BOOTPROTO=static IPADDR=192.168.2.2 NETMASK=255.255.255.0 DEVICE="ens33" ONBOOT=yes vi /etc/sysconfig/network NETWORKING=yes HOSTNAME=master GATEWAY=192.168.2.1 ## Configure DNS Server # vi /etc/resolv.conf nameserver 8.8.8.8 # Replace with your nameserver ip nameserver 192.168.2.1 # Replace with your nameserver ip --Host Entry vi /etc/hosts 192.168.2.2 master --Restart the network services #systemctl restart network (OR) #service network restart
FOR SLAVE SERVER:
--Netcard Entry cd /etc/sysconfig/network-scripts/ vi ifcfg-ens33 BOOTPROTO=static IPADDR=192.168.2.3 NETMASK=255.255.255.0 DEVICE="ens33" ONBOOT=yes vi /etc/sysconfig/network NETWORKING=yes HOSTNAME=slave GATEWAY=192.168.2.1 ## Configure DNS Server # vi /etc/resolv.conf nameserver 8.8.8.8 # Replace with your nameserver ip nameserver 192.168.2.1 # Replace with your nameserver ip --Host Entry vi /etc/hosts 192.168.2.3 slave --Restart the network services #systemctl restart network (Or) #service network restart
On Master Server 192.168.2.2:-
ping 192.168.2.2 ping 192.168.2.3 ## Some time ping will not work for security firewall at this time you have to use Telnet : telnet 192.168.2.2 5432 telnet 192.168.2.3 5432
On Slave Server 192.168.2.3:-
ping 192.168.2.2 ping 192.168.2.3 ## Some time ping will not work for security firewall at this time you have to use Telnet : telnet 192.168.2.2 5432 telnet 192.168.2.3 5432
Note : 5432 is database port for EDB 5444 you can give what you given at the time of postgres installation, if not reaching the destination host issue with Firewall you have add firewall rule else if you are using csf firewall open the Database port of 5444 or 5432 on CSF
ON SLAVE:
stop the slave server
/usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data/ stop
--move the data directory to backup location: mv /var/lib/pgsql/10/data/* /home/postgres/backup/
ON MASTER: postgresql.conf:
listen_addresses = 'localhost,192.168.2.2' wal_level = replica # minimal, replica, or logical archive_mode = on archive_command = 'rsync -av %p /home/postgres/archive/%f && rsync -av %p postgres@192.168.2.3:/home/postgres/archive/%f' max_wal_senders = 2 wal_keep_segments = 10
on pg_hba.conf:
host replication postgres 192.168.2.3/24 md5
Restart the master server:
/usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data/ restart
taking consistancy backup:
$ psql -c "select pg_start_backup('initial_backup');" $ rsync -cva --inplace /var/lib/pgsql/10/data/* postgres@192.168.2.3:/var/lib/pgsql/10/data/ $ psql -c "select pg_stop_backup();"
ON SLAVE: postgresql.conf:
listen_addresses = 'localhost,192.168.2.3' wal_level = replica # minimal, replica, or logical archive_mode = on archive_command = '/bin/cp -av %p /home/postgres/archive/%f' max_wal_senders = 2 wal_keep_segments = 10 hot_standby = on
on pg_hba.conf:
host replication postgres 192.168.2.2/24 md5
On recovery.conf
cat /var/lib/pgsql/10/data/recovery.conf standby_mode = 'on' # to enable the standby (read-only) mode. primary_conninfo = 'host=192.168.2.2 port=5432 user=postgres password=postgres' # to specify a connection info to the master node. trigger_file = '/tmp/pg_failover_trigger' # to specify a trigger file to recognize a fail over. restore_command = 'cp /home/postgres/archive/%f "%p"' archive_cleanup_command = '/usr/pgsql-10/bin/pg_archivecleanup /home/postgres/archive/ %r'
Restart the slave server:
/usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data/ restart
if you get any error error like arch missing wal log …etc while restarting slave server again take the incremental physical backup
$ psql -c "select pg_start_backup('initial_backup');" $ rsync -cva --inplace --exclude=pg_hba.conf --exclude=postgresql.conf --exclude=recovery.conf /var/lib/pgsql/10/data/* postgres@192.168.2.3:/var/lib/pgsql/10/data/ $ psql -c "select pg_stop_backup();"
I got Following error while Restarting the slave Server:
[postgres@slave root]$ /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data/ restart could not change directory to "/root": Permission denied pg_ctl: PID file "/var/lib/pgsql/10/data/postmaster.pid" does not exist Is server running? starting server anyway waiting for server to start....2018-05-26 20:36:34.834 IST [45453] LOG: listening on IPv6 address "::1", port 5432 2018-05-26 20:36:34.834 IST [45453] LOG: listening on IPv4 address "127.0.0.1", port 5432 2018-05-26 20:36:34.836 IST [45453] LOG: listening on IPv4 address "192.168.2.3", port 5432 2018-05-26 20:36:34.840 IST [45453] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2018-05-26 20:36:34.849 IST [45453] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" .2018-05-26 20:36:35.685 IST [45453] LOG: redirecting log output to logging collector process 2018-05-26 20:36:35.685 IST [45453] HINT: Future log output will appear in directory "log".
ON LOG FILE:
tail -f postgresql-2018-05-26_203635.log 2018-05-26 20:36:35.957 IST [45461] FATAL: could not connect to the primary server: FATAL: password authentication failed for user "postgres" cp: cannot stat ‘/home/postgres/archive/000000010000000000000006’: No such file or directory 2018-05-26 20:36:40.964 IST [45465] FATAL: could not connect to the primary server: FATAL: password authentication failed for user "postgres" cp: cannot stat ‘/home/postgres/archive/000000010000000000000006’: No such file or directory 2018-05-26 20:36:45.963 IST [45467] FATAL: could not connect to the primary server: FATAL: password authentication failed for user "postgres" cp: cannot stat ‘/home/postgres/archive/000000010000000000000006’: No such file or directory 2018-05-26 20:36:50.977 IST [45471] FATAL: could not connect to the primary server: FATAL: password authentication failed for user "postgres" 2018-05-26 20:36:55.459 IST [45453] LOG: received fast shutdown request 2018-05-26 20:36:55.465 IST [45457] LOG: shutting down 2018-05-26 20:36:55.472 IST [45453] LOG: database system is shut down
on master:
–change the postgres password because as default postgres is trust user you given md5 authentication on master so you have to give right postgres password on master then give that password in recovery.conf file.
alter user postgres with password 'postgres';
Then Restart the slave server:
/usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data/ restart
MONITORING STREAMING REPLICATION:
At MASTER:
1.create some dummy table check the slave server whther is replicated or not.
postgres=# create table t(id int); CREATE TABLE
postgres=# select pg_is_in_recovery(); pg_is_in_recovery ------------------- f (1 row) Note : check master server whether is in recovery mode or not, master should not be in recovery mode , slave only should be in recovery mode. 2.Using pg_stat_replication views postgres=# select client_addr,client_hostname,client_port,state,sent_lsn,write_lsn,flush_lsn,replay_lsn,write_lag,replay_lag,flush_lag,sync_state from pg_stat_replication; client_addr | client_hostname | client_port | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | replay_lag | flush_lag | sync_state -------------+-----------------+-------------+-----------+-----------+-----------+-----------+------------+-----------+------------+-----------+------------ 192.168.2.3 | | 60000 | streaming | 0/8017A28 | 0/8017A28 | 0/8017A28 | 0/8017A28 | | | | async (1 row)
3.using linux command check the wal sender process whether started or not:
[postgres@mster ~]$ ps -ef|grep postgres root 82472 82437 0 08:34 pts/1 00:00:00 su postgres postgres 82473 82472 0 08:34 pts/1 00:00:00 bash root 94549 94494 0 20:36 pts/3 00:00:00 su postgres postgres 94550 94549 0 20:36 pts/3 00:00:00 bash postgres 94582 1 0 20:36 pts/3 00:00:00 /usr/pgsql-10/bin/postgres -D /var/lib/pgsql/10/data postgres 94584 94582 0 20:36 ? 00:00:00 postgres: logger process postgres 94586 94582 0 20:36 ? 00:00:00 postgres: checkpointer process postgres 94587 94582 0 20:36 ? 00:00:00 postgres: writer process postgres 94588 94582 0 20:36 ? 00:00:00 postgres: wal writer process postgres 94589 94582 0 20:36 ? 00:00:00 postgres: autovacuum launcher process postgres 94591 94582 0 20:36 ? 00:00:00 postgres: stats collector process postgres 94592 94582 0 20:36 ? 00:00:00 postgres: bgworker: logical replication launcher postgres 94741 94582 0 20:43 ? 00:00:00 postgres: wal sender process postgres 192.168.2.3(60000) streaming 0/8017B08 postgres 95178 94550 0 21:08 pts/3 00:00:00 ps -ef postgres 95179 94550 0 21:08 pts/3 00:00:00 grep --color=auto postgres
–you can calculate using this linux command how many wal sender is replicated to slave.
$ ps -ef|grep sender postgres 7585 3383 0 15:59 ? 00:00:00 postgres: wal sender process postgres 192.168.2.2(42586) streaming 0/18017CD8 postgres 7598 6564 0 15:59 pts/2 00:00:00 grep --color=auto sender
AT SLAVE SERVER:
1.check the previously created table whether is replicated or not
postgres=# \dt List of relations Schema | Name | Type | Owner --------+---------------+-------+---------- public | qrtransaction | table | postgres public | t | table | postgres (2 rows)
yes! it is successfully replicated the table.
–check the slave mode using following function whether is recovery mode or not
postgres=# select pg_is_in_recovery(); pg_is_in_recovery ------------------- t (1 row)
–and try to insert the valuse on slave server, slave is a read only mode it will not permitted write transaction.
postgres=# insert into t values(1); ERROR: cannot execute INSERT in a read-only transaction
2. using pg_stat_replication view:
postgres=# select status,receive_start_lsn,received_lsn,last_msg_send_time,latest_end_lsn,latest_end_time,conninfo from pg_stat_wal_receiver ; status | receive_start_lsn | received_lsn | last_msg_send_time | latest_end_lsn | latest_end_time | conninfo -----------+-------------------+--------------+----------------------------------+----------------+----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- streaming | 0/6000000 | 0/8017B08 | 2018-05-26 21:26:34.577733+05:30 | 0/8017B08 | 2018-05-26 21:06:32.309548+05:30 | user=postgres password=******** dbname=replication host=192.168.2.2 port=5432 fallback_application_name=walreceiver sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any (1 row)
3.using linux command you can monitor wether wal receiver is started or not.this linux command will be helpful for finding out the postgres data directory as well as postgres utility path means bin path
[postgres@slave data]$ ps -ef|grep postgres root 3971 3573 0 10:27 pts/0 00:00:00 su postgres postgres 3972 3971 0 10:27 pts/0 00:00:00 bash root 45410 45321 0 20:36 pts/1 00:00:00 su postgres postgres 45411 45410 0 20:36 pts/1 00:00:00 bash postgres 45606 1 0 20:43 pts/1 00:00:00 /usr/pgsql-10/bin/postgres -D /var/lib/pgsql/10/data postgres 45607 45606 0 20:43 ? 00:00:00 postgres: logger process postgres 45608 45606 0 20:43 ? 00:00:00 postgres: startup process recovering 000000010000000000000008 postgres 45610 45606 0 20:43 ? 00:00:00 postgres: checkpointer process postgres 45611 45606 0 20:43 ? 00:00:00 postgres: writer process postgres 45612 45606 0 20:43 ? 00:00:06 postgres: wal receiver process streaming 0/8017B08 postgres 45613 45606 0 20:43 ? 00:00:00 postgres: stats collector process postgres 45995 45411 0 21:16 pts/1 00:00:00 ps -ef postgres 45996 45411 0 21:16 pts/1 00:00:00 grep --color=auto postgres
4.this command will be helpful for how many wal segment get postgres receiver
[postgres@slave data]$ ps -ef|grep receiver; postgres 45612 45606 0 20:43 ? 00:00:06 postgres: wal receiver process streaming 0/8017B08 postgres 46018 45411 0 21:18 pts/1 00:00:00 grep --color=auto receiver
5.If the slave is up in hot standby mode, you can tell the time in seconds the delay of transactions applied on the slave with this query:
postgres=# select now() - pg_last_xact_replay_timestamp() AS replication_delay; replication_delay ------------------- 00:18:35.207663 (1 row)
Note: above timing is noted for last 18 minutes slave not get any transaction
Know More About Replication: https://www.tutorialdba.com/search/label/Replication