HOW TO SETUP/CONFIGURE THE POSTGRESQL STREAMING REPLICATION ON POSTGRESQL 10?
what is the postgresql Streaming Replication (SR) ?
Streaming Replication (SR) provides the capability to continuously ship and apply the WAL XLOG records to some number of standby servers in order to keep them current. This feature was added to PostgreSQL 9.0.
Advantages of postgresql Streaming Replication (SR):
1.Switch over/Fail over after the primary fails.
2.Data loss so less and data integrity.
3.No need more down time after primary fail.
4.synchronous mode Zero data loss.
5. Load balancing using load balancer ex.pgpool-II,pgbouncer.
6.Auto failover using repmgr.
Now We can go postgresql streaming replication setup using postgresql-10.3 version
master IP:Â 192.168.2.2 Â slave IP: 192.168.2.3
ON SLAVE:
stop the slave server
/usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data/ stop
move the data directory to backup location:
mv /var/lib/pgsql/10/data/* /home/postgres/backup/
ON MASTER:
on postgresql.conf:
listen_addresses = 'localhost,192.168.2.2'  wal_level = replica           # minimal, replica, or logical archive_mode = on archive_command =  'rsync -av %p /home/postgres/archive/%f && rsync -av %p postgres@192.168.2.3:/home/postgres/archive/%f' max_wal_senders = 2 wal_keep_segments = 10
on pg_hba.conf:
host   replication   postgres     192.168.2.3/24      md5
Restart the master server:
/usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data/ restart
taking consistancy backup:
$ psql -c "select pg_start_backup('initial_backup');" $ rsync -cva --inplace /var/lib/pgsql/10/data/* postgres@192.168.2.3:/var/lib/pgsql/10/data/ $ psql -c "select pg_stop_backup();"
ON SLAVE:
on postgresql.conf:
listen_addresses = 'localhost,192.168.2.2'  wal_level = replica           # minimal, replica, or logical archive_mode = on archive_command = '/bin/cp -av %p /home/postgres/archive/%f' max_wal_senders = 2 wal_keep_segments = 10 hot_standby = on
on pg_hba.conf:
host   replication   postgres     192.168.2.2/24      md5
On recovery.conf
cat /var/lib/pgsql/10/data/recovery.conf standby_mode = 'on' Â Â Â # to enable the standby (read-only) mode. primary_conninfo = 'host=192.168.2.2 port=5432 user=postgres password=postgres' Â Â Â Â Â Â Â Â Â Â Â Â Â # to specify a connection info to the master node. trigger_file = '/tmp/pg_failover_trigger' Â Â Â Â Â Â Â Â Â Â Â Â Â # to specify a trigger file to recognize a fail over. restore_command = 'cp /home/postgres/archive/%f "%p"' archive_cleanup_command = '/usr/pgsql-10/bin/pg_archivecleanup /home/postgres/archive/ %r'
Restart the slave server:
/usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data/ restart
if you get any error error like arch missing wal log …etc while restarting slave server again take the incremental physical backup
$ psql -c "select pg_start_backup('initial_backup');" $ rsync -cva --inplace --exclude=pg_hba.conf --exclude=postgresql.conf --exclude=recovery.conf /var/lib/pgsql/10/data/* postgres@192.168.2.3:/var/lib/pgsql/10/data/ $ psql -c "select pg_stop_backup();"
WHILE RESTARTING SLAVE SERVER GOT FOLLOWING ERROR:
[postgres@slave root]$ /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data/ restart could not change directory to "/root": Permission denied pg_ctl: PID file "/var/lib/pgsql/10/data/postmaster.pid" does not exist Is server running? starting server anyway waiting for server to start....2018-05-26 20:36:34.834 IST [45453] LOG: Â listening on IPv6 address "::1", port 5432 2018-05-26 20:36:34.834 IST [45453] LOG: Â listening on IPv4 address "127.0.0.1", port 5432 2018-05-26 20:36:34.836 IST [45453] LOG: Â listening on IPv4 address "192.168.2.3", port 5432 2018-05-26 20:36:34.840 IST [45453] LOG: Â listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2018-05-26 20:36:34.849 IST [45453] LOG: Â listening on Unix socket "/tmp/.s.PGSQL.5432" .2018-05-26 20:36:35.685 IST [45453] LOG: Â redirecting log output to logging collector process 2018-05-26 20:36:35.685 IST [45453] HINT: Â Future log output will appear in directory "log".
ON LOG FILE:
tail -f postgresql-2018-05-26_203635.log 2018-05-26 20:36:35.957 IST [45461] FATAL:  could not connect to the primary server: FATAL:  password authentication failed for user "postgres" cp: cannot stat ‘/home/postgres/archive/000000010000000000000006’: No such file or directory 2018-05-26 20:36:40.964 IST [45465] FATAL:  could not connect to the primary server: FATAL:  password authentication failed for user "postgres" cp: cannot stat ‘/home/postgres/archive/000000010000000000000006’: No such file or directory 2018-05-26 20:36:45.963 IST [45467] FATAL:  could not connect to the primary server: FATAL:  password authentication failed for user "postgres" cp: cannot stat ‘/home/postgres/archive/000000010000000000000006’: No such file or directory 2018-05-26 20:36:50.977 IST [45471] FATAL:  could not connect to the primary server: FATAL:  password authentication failed for user "postgres" 2018-05-26 20:36:55.459 IST [45453] LOG:  received fast shutdown request 2018-05-26 20:36:55.465 IST [45457] LOG:  shutting down 2018-05-26 20:36:55.472 IST [45453] LOG:  database system is shut down
on master:
–change the postgres password becouse you given wrong password on recovery.conf file for postgres user so change the password as per recovery.conf
alter user postgres with password 'postgres';
Then Restart the slave server:
 /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data/ restart
MONITORING STREAMING REPLICATION:
At MASTER:
1.create some dummy table check the slave server whther is replicated or not.
postgres=# create table t(id int); CREATE TABLE --check master server whether is recovery mode or not,master will not be recovery mode slave only will be recovery mode. postgres=# select pg_is_in_recovery();  pg_is_in_recovery -------------------  f (1 row)
2.Using pg_stat_replication views
postgres=# select client_addr,client_hostname,client_port,state,sent_lsn,write_lsn,flush_lsn,replay_lsn,write_lag,replay_lag,flush_lag,sync_state from pg_stat_replication;  client_addr | client_hostname | client_port |  state  | sent_lsn  | write_lsn | flush_lsn | replay_lsn | write_lag | replay_lag | flush_lag | sync_state -------------+-----------------+-------------+-----------+-----------+-----------+-----------+------------+-----------+------------+-----------+------------  192.168.2.3 |         |    60000 | streaming | 0/8017A28 | 0/8017A28 | 0/8017A28 | 0/8017A28  |      |       |      | async (1 row)
3.using linux command check the wal sender process whether started or not:
[postgres@mster ~]$ ps -ef|grep postgres root    82472  82437  0 08:34 pts/1   00:00:00 su postgres postgres  82473  82472  0 08:34 pts/1   00:00:00 bash root    94549  94494  0 20:36 pts/3   00:00:00 su postgres postgres  94550  94549  0 20:36 pts/3   00:00:00 bash postgres  94582    1  0 20:36 pts/3   00:00:00 /usr/pgsql-10/bin/postgres -D /var/lib/pgsql/10/data postgres  94584  94582  0 20:36 ?     00:00:00 postgres: logger process postgres  94586  94582  0 20:36 ?     00:00:00 postgres: checkpointer process postgres  94587  94582  0 20:36 ?     00:00:00 postgres: writer process postgres  94588  94582  0 20:36 ?     00:00:00 postgres: wal writer process postgres  94589  94582  0 20:36 ?     00:00:00 postgres: autovacuum launcher process postgres  94591  94582  0 20:36 ?     00:00:00 postgres: stats collector process postgres  94592  94582  0 20:36 ?     00:00:00 postgres: bgworker: logical replication launcher postgres  94741  94582  0 20:43 ?     00:00:00 postgres: wal sender process postgres 192.168.2.3(60000) streaming 0/8017B08 postgres  95178  94550  0 21:08 pts/3   00:00:00 ps -ef postgres  95179  94550  0 21:08 pts/3   00:00:00 grep --color=auto postgres
–you can calculate using this linux command how many wal sender is replicated to slave.
$ ps -ef|grep sender postgres  7585  3383  0 15:59 ?     00:00:00 postgres: wal sender process postgres 192.168.2.2(42586) streaming 0/18017CD8 postgres  7598  6564  0 15:59 pts/2   00:00:00 grep --color=auto sender
AT SLAVE SERVER:
1.check the previously created table whether is replicated or not
postgres=# \dt        List of relations  Schema |   Name    | Type  |  Owner --------+---------------+-------+----------  public | qrtransaction | table | postgres  public | t       | table | postgres (2 rows)
yes! it is successfully replicated the table.
–check the slave mode using following function whether is recovery mode or not
postgres=# select pg_is_in_recovery();  pg_is_in_recovery -------------------  t (1 row)
–and try to insert the valuse on slave server, slave is a read only mode it will not permitted write transaction.
postgres=# insert into t values(1); ERROR: Â cannot execute INSERT in a read-only transaction
2. using pg_stat_replication view:
postgres=# select status,receive_start_lsn,received_lsn,last_msg_send_time,latest_end_lsn,latest_end_time,conninfo from pg_stat_wal_receiver ;  status  | receive_start_lsn | received_lsn |     last_msg_send_time     | latest_end_lsn |     latest_end_time      |                              conninfo -----------+-------------------+--------------+----------------------------------+----------------+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  streaming | 0/6000000     | 0/8017B08   | 2018-05-26 21:26:34.577733+05:30 | 0/8017B08    | 2018-05-26 21:06:32.309548+05:30 | user=postgres password=******** dbname=replication host=192.168.2.2 port=5432 fallback_application_name=walreceiver sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any (1 row)
3.using linux command you can monitor wether wal receiver is started or not.this linux command will be helpful for finding out the postgres data directory as well as postgres utility path means bin path
[postgres@slave data]$ ps -ef|grep postgres root    3971  3573  0 10:27 pts/0   00:00:00 su postgres postgres  3972  3971  0 10:27 pts/0   00:00:00 bash root    45410  45321  0 20:36 pts/1   00:00:00 su postgres postgres  45411  45410  0 20:36 pts/1   00:00:00 bash postgres  45606    1  0 20:43 pts/1   00:00:00 /usr/pgsql-10/bin/postgres -D /var/lib/pgsql/10/data postgres  45607  45606  0 20:43 ?     00:00:00 postgres: logger process postgres  45608  45606  0 20:43 ?     00:00:00 postgres: startup process  recovering 000000010000000000000008 postgres  45610  45606  0 20:43 ?     00:00:00 postgres: checkpointer process postgres  45611  45606  0 20:43 ?     00:00:00 postgres: writer process postgres  45612  45606  0 20:43 ?     00:00:06 postgres: wal receiver process  streaming 0/8017B08 postgres  45613  45606  0 20:43 ?     00:00:00 postgres: stats collector process postgres  45995  45411  0 21:16 pts/1   00:00:00 ps -ef postgres  45996  45411  0 21:16 pts/1   00:00:00 grep --color=auto postgres
4.this command will be helpful for how many wal segment get postgres receiver
[postgres@slave data]$ ps -ef|grep receiver; postgres  45612  45606  0 20:43 ?     00:00:06 postgres: wal receiver process  streaming 0/8017B08 postgres  46018  45411  0 21:18 pts/1   00:00:00 grep --color=auto receiver
5.If the slave is up in hot standby mode, you can tell the time in seconds the delay of transactions applied on the slave with this query:
postgres=# select now() - pg_last_xact_replay_timestamp() AS replication_delay; replication_delay ------------------- 00:18:35.207663 (1 row)
Note:Â above timing is noted for last 18 minutes slave not get any transaction