How to rebuild your postgres Server ?
Currently two postgres server is running which are :
- 10.11.128.81 :
- 10.11.128.82 :
- postgres version : 9.6
By using following query you can find which server is master and slave , mster should be false (f) value and slave should be true (t) value means it is in recovery mode
/opt/edb/as9.6/bin/psql -U enterprisedb -h 10.11.128.81 -d edb -p 5444 -c "select pg_is_in_recovery();"
[nijam@2ndq-1]$ /opt/edb/as9.6/bin/psql -U enterprisedb -h 10.11.128.81 -d edb -p 5444 -c "select pg_is_in_recovery();"
Password for user enterprisedb:
pg_is_in_recovery
-------------------
f
(1 row)
[nijam@2ndq-2]$ /opt/edb/as9.6/bin/psql -U enterprisedb -h 10.11.128.82 -d edb -p 5444 -c "select pg_is_in_recovery();"
Password for user enterprisedb:
pg_is_in_recovery
-------------------
t
(1 row)
Note : so 10.11.128.81 is a master 10.11.128.82 is slave server.
BY following query you can find the Streaming replication is sync or not , if not sync you have to rebuild your slave by taking incremental backup from master :
Checking the streaming replication status :
[nijam@2ndq-1]$ /opt/edb/as9.6/bin/psql -U enterprisedb -h 10.11.128.81 -d edb -p 5444 -c "select * from pg_stat_replication;"
Password for user enterprisedb:
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_location | write_location | flu
sh_location | replay_location | sync_priority | sync_state
-----+----------+---------+------------------+-------------+-----------------+-------------+---------------+--------------+-------+---------------+----------------+----
------------+-----------------+---------------+------------
(0 rows)
[nijam@2ndq-1]$ ps -ef|grep postgres
enterpr+ 11054 1 0 2018 ? 00:01:05 /opt/edb/as9.6/bin/edb-postgres -D /opt/edb/as9.6/data
enterpr+ 11055 11054 0 2018 ? 00:00:00 postgres: logger process
enterpr+ 11059 11054 0 2018 ? 00:02:23 postgres: checkpointer process
enterpr+ 11060 11054 0 2018 ? 00:01:03 postgres: writer process
enterpr+ 11061 11054 0 2018 ? 00:01:41 postgres: stats collector process
465632 16900 12814 0 12:58 pts/0 00:00:00 grep --color=auto postgres
enterpr+ 23147 11054 0 Jan01 ? 00:01:33 postgres: wal writer process
enterpr+ 23148 11054 0 Jan01 ? 00:00:59 postgres: autovacuum launcher process
enterpr+ 23149 11054 0 Jan01 ? 00:00:07 postgres: archiver process last was 00000005.history
enterpr+ 23150 11054 0 Jan01 ? 00:00:26 postgres: bgworker: dbms_aq launcher
enterpr+ 23246 11054 0 Jan01 ? 00:00:00 postgres: enterprisedb edb [local] DROP DATABASE waiting for 0/27001220
enterpr+ 23843 11054 0 Jan01 ? 00:00:00 postgres: enterprisedb edb [local] DROP DATABASE waiting for 0/27002360
Rebuild the slave Server :
su enterprisedb
Stop the Slave server :
/opt/edb/as9.6/bin/pg_ctl -D /opt/edb/as9.6/data stop
Send the data to salve again before sending data don’t move exists data it will append everything
/opt/edb/as9.6/bin/psql -U enterprisedb -h 10.11.128.81 -d edb -p 5444 -c "select pg_start_backup('initial_backup');"
rsync -cva --inplace /opt/edb/as9.6/data/* enterprisedb@10.11.128.82:/opt/edb/as9.6/data/
/opt/edb/as9.6/bin/psql -U enterprisedb -h 10.11.128.81 -d edb -p 5444 -c "select pg_stop_backup();"
Start the postgreSQL Slave server again :
/opt/edb/as9.6/bin/pg_ctl -D /opt/edb/as9.6/data start
Switchover steps:
Step 1. Do clean shutdown of Primary[5432] (-m fast or smart)
/opt/edb/as9.6/bin/pg_ctl -D /opt/edb/as9.6/data stop -mf
waiting for server to shut down.... done
server stopped
Step 2. Check for sync status and recovery status of Standby[5433] before promoting it:
/opt/edb/as9.6/bin/psql -p 5444 -c 'select pg_last_xlog_receive_location() "receive_location",
pg_last_xlog_replay_location() "replay_location",
pg_is_in_recovery() "recovery_status";'
receive_location | replay_location | recovery_status
------------------+-----------------+-----------------
2/9F000A20 | 2/9F000A20 | t
(1 row)
Standby in complete sync. At this stage we are safe to promote it as Primary.
Standby in complete sync. At this stage we are safe to promote it as Primary.
Step 3. Open the Standby as new Primary by pg_ctl promote or creating a trigger file.
[postgres@localhost:/opt/edb/as9.6/data~]$ cat recovery.conf
trigger_file = '/tmp/primary_down.txt'
[postgres@localhost:/opt/edb/as9.6/data~]$ touch /tmp/primary_down.txt
[postgres@localhost:/opt/edb/as9.6/data~]$ psql -p 5444 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
f
(1 row)
In Logs:
2014-12-29 00:16:04 PST-26344-- [host=] LOG: trigger file found: /tmp/primary_down.txt
2014-12-29 00:16:04 PST-26344-- [host=] LOG: redo done at 2/A0000028
2014-12-29 00:16:04 PST-26344-- [host=] LOG: selected new timeline ID: 14
2014-12-29 00:16:04 PST-26344-- [host=] LOG: restored log file "0000000D.history" from archive
2014-12-29 00:16:04 PST-26344-- [host=] LOG: archive recovery complete
2014-12-29 00:16:04 PST-26342-- [host=] LOG: database system is ready to accept connections
2014-12-29 00:16:04 PST-31874-- [host=] LOG: autovacuum launcher started
Standby has been promoted as master and a new timeline followed which you can notice in logs.
Standby has been promoted as master and a new timeline followed which you can notice in logs.
Step 4. Restart old Primary as standby and allow to follow the new timeline by passing “recovery_target_timline=’latest'” in $PGDATA/recovery.conf file.
[postgres@localhost:/opt/edb/as9.6/data~]$ cat data/recovery.conf
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=localhost port=5444 user=enterprisedb'
restore_command = 'cp /opt/edb/as9.6/data/archives96/%f %p'
trigger_file = '/tmp/primary_131_down.txt'
[postgres@localhost:/opt/edb/as9.6/data~]$ /opt/edb/as9.6/bin/pg_ctl -D /opt/edb/as9.6/data start
server starting
If you go through recovery.conf its very clear that old Primary trying to connect to 5433 port as new Standby pointing to common WAL Archives location and started.
In Logs:
2014-12-29 00:21:17 PST-32315-- [host=] LOG: database system was shut down at 2014-12-29 00:12:23 PST
2014-12-29 00:21:17 PST-32315-- [host=] LOG: restored log file "0000000E.history" from archive
2014-12-29 00:21:17 PST-32315-- [host=] LOG: entering standby mode
2014-12-29 00:21:17 PST-32315-- [host=] LOG: restored log file "0000000D00000002000000A0" from archive
2014-12-29 00:21:17 PST-32315-- [host=] LOG: restored log file "0000000D.history" from archive
2014-12-29 00:21:17 PST-32315-- [host=] LOG: consistent recovery state reached at 2/A0000090
2014-12-29 00:21:17 PST-32315-- [host=] LOG: record with zero length at 2/A0000090
2014-12-29 00:21:17 PST-32310-- [host=] LOG: database system is ready to accept read only connections
2014-12-29 00:21:17 PST-32325-- [host=] LOG: started streaming WAL from primary at 2/A0000000 on timeline 14
Step 5. Verify the new Standby status.
[postgres@localhost:/opt/edb/as9.6/bin/~]$ psql -p 5444 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
t
(1 row)
Cool, without any re-setup we have brought back old Primary as new Standby.