• Home
  • Services
    • DBA Support
    • DBA Consultancy Services
    • PostgreSQL Support
    • Website Maintenance
  • Courses

    About Courses

    • List Of Courses
    • Become an Instructor
    Greenplum Database

    Greenplum Database

    $2,000.00 $1,500.00
    Read More
  • Company
    • FAQs
    • About Us
    • Contact
  • Events
  • Portfolio
  • Blogs
    • Blog – RayaFeeL
    • Blog – PostgreSQL Support
    • Blog – PostgreSQL Migration
    • Blog – All DB’s
    • Blog – Linux
    • Blog – Medical Coding
      • Cart

        0

    Have any question?
    (+91)8838953252
    ITsupport@rayafeel.com
    RegisterLogin
    RayaFeeL
    • Home
    • Services
      • DBA Support
      • DBA Consultancy Services
      • PostgreSQL Support
      • Website Maintenance
    • Courses

      About Courses

      • List Of Courses
      • Become an Instructor
      Greenplum Database

      Greenplum Database

      $2,000.00 $1,500.00
      Read More
    • Company
      • FAQs
      • About Us
      • Contact
    • Events
    • Portfolio
    • Blogs
      • Blog – RayaFeeL
      • Blog – PostgreSQL Support
      • Blog – PostgreSQL Migration
      • Blog – All DB’s
      • Blog – Linux
      • Blog – Medical Coding
        • Cart

          0

      Databases

      • Home
      • All courses
      • Databases
      • Greenplum Database
      CoursesIT & SoftwareDatabasesGreenplum Database
      • Greenplum Tutorial
        9
        • Lecture1.1
          Download the Greenplum Sandbox VM 30 min
        • Lecture1.2
          Greenplum Database Architecture 30 min
        • Lecture1.3
          Greenplum Configuration Files ( pg_hba.conf , postgresql.conf ) 30 min
        • Lecture1.4
          Greenplum Database Installation 30 min
        • Lecture1.5
          Greenplum psql Command Line Client 30 min
        • Lecture1.6
          Greenplum Create Users and Roles 30 min
        • Lecture1.7
          Greenplum Create and Prepare Database 30 min
        • Lecture1.8
          Greenplum Create Tables 30 min
        • Lecture1.9
          Greenplum Data Loading 30 min
      • Greenplum Performance Tuning
        7
        • Lecture2.1
          Greenplum Performance Tuning Overview 30 min
        • Lecture2.2
          Greenplum Analyze the tables 30 min
        • Lecture2.3
          Greenplum explain plans 30 min
        • Lecture2.4
          Greenplum Indexes and performance 30 min
        • Lecture2.5
          Greenplum Row vs. column orientation 30 min
        • Lecture2.6
          Greenplum – Check for even data distribution on segments 30 min
        • Lecture2.7
          Greenplum Partitioning 30 min
      • Greenplum Database Analytics
        1
        • Lecture3.1
          Greenplum Database analytics 30 min
      • Greenplum Backup and Recovery Operations
        2
        • Lecture4.1
          Greenplum gpcrondump 30 min
        • Lecture4.2
          Greenplum gpdbrestore 30 min
      • Importing GPDB Sandbox
        1
        • Lecture5.1
          Importing GPDB Sandbox into VMware Fusion 30 min
      • Greenplum Database Books
        1
        • Lecture6.1
          Data Warehousing with Greenplum PDF 30 min
      • Basic Questions and Answer of Greenplum Database
        1
        • Lecture7.1
          Some basic Questions and answer about Greenplum database 30 min

        Greenplum Database Installation

        Overview

        • The World’s First Open-Source & Massively Parallel Data Platform.
        • Greenplum is based on PostgreSQL Database

        Minimum recommended specifications for servers intended to support Greenplum Database in a production environment.

        • Operating System
        • SUSE Linux SLES 10.2 or higher
        • CentOS 5.0 or higher
        • RedHat Enterprise Linux 5.0 or higher
        • Oracle Unbreakable Linux 5.5
        • Solaris x86 v10 update 7

        User
        gpadmin on all nodes

        File Systems

        • xfs required for data storage on SUSE Linux and Red Hat (ext3 supported for root file system)
        • zfs required for data storage on Solaris (ufs supported for root file system)

        Minimum CPU

        • Pentium Pro compatible (P3/Athlon and above)

        Minimum Memory

        • 16 GB RAM per server

        Disk Requirements

        • 150MB per host for Greenplum installation
        • Approximately 300MB per segment instance for meta data
        • Appropriate free space for data with disks at no more than 70% capacity
        • High-speed, local storage

        Network Requirements

        • Gigabit Ethernet within the array
        • Dedicated, non-blocking switch

        Software and Utilities

        • bash shell
        • GNU tar
        • GNU zip
        • GNU readline (Solaris only)

        OS : Redhat Enterprise Linux 5 or 6

        Host Name IP-Address Host Type
        2ndquadrant.in-1 192.168.2.2 Master
        2ndquadrant.in -2 192.168.2.3 Standby-master , segment host
        2ndquadrant.in -3 192.168.2.4 Segment host
        2ndquadrant.in -4 192.168.2.5 Segment host

        Here’s a list of steps to keep on your desk

        1. Configure your Operating System for Greenplum.
        2. Install Greenplum on master host.
        3. Run gpseginstall to installing and configuring Greenplum Database on other hosts .
        4. Create master directory on the master.
        5. Create the master data directory location on your standby master ( gpssh can help here).
        6. Create data directories on all segments (gpssh can help here).
        7. Network Time Protocol to synchronize the system clocks on all hosts.
        8. Validating Your Systems.
        9. Initialize Greenplum Database System.
        10. Configuring Environment variable and Client Authentication

        Steps 1. Configure your Operating System for Greenplum database

        Greenplum requires the certain operating system (OS) parameters be set on all hosts in your Greenplum Database system (masters and segments).

        • Change hostname for each node and Edit the /etc/hosts file and make sure that it includes the host names and all interface address names for every machine participating in your Greenplum Database system.

        At Master:

        #cat /etc/redhat-release
        Red Hat Enterprise Linux Server release 7.3 Beta (Maipo)

        — Network configuration

        # cd /etc/sysconfig/network-scripts/
        # vi ifcfg-ens33
        BOOTPROTO=static
        IPADDR=192.168.2.2
        NETMASK=255.255.255.0
        DEVICE="ens33"
        ONBOOT=yes
        #vi /etc/sysconfig/network
        NETWORKING=yes
        HOSTNAME=2ndquadrant.in-1
        GATEWAY=192.168.2.1
        ## Configure DNS Server
        # vi /etc/resolv.conf
        nameserver 8.8.8.8     # Replace with your nameserverip
        nameserver 192.168.2.1  # Replace with your nameserverip

        – Hosts Entry

        # vi /etc/hosts
        192.168.2.2  2ndquadrant.in-1  2ndquadrant.in
        192.168.2.3  2ndquadrant.in-2  2ndquadrant.in
        192.168.2.4  2ndquadrant.in-3  2ndquadrant.in
        192.168.2.5  2ndquadrant.in-4  2ndquadrant.in

        –Restart the network services

        # systemctl restart network
                (OR)
        # service network restart

         Note: Change hostname for each segment and standby  node by using above methods for more reference

        https://www.tutorialdba.com/2018/03/how-to-make-partition-and-configure.html

        •  Add the following kernel parameter to the end of your /etc/sysctl.conf file all node and run sysctl -p . Note: XFS is the preferred file system on Linux platforms for data storage. Greenplum recommends the following xfs mount options
        xfs_mount_options = rw,noatime,inode64,allocsize=16m
        kernel.shmmax = 500000000
        kernel.shmmni = 4096
        kernel.shmall = 4000000000
        kernel.sem = 250 512000 100 2048
        kernel.sysrq = 1
        kernel.core_uses_pid = 1
        kernel.msgmnb = 65536
        kernel.msgmax = 65536
        kernel.msgmni = 2048
        net.ipv4.tcp_syncookies = 1
        net.ipv4.ip_forward = 0
        net.ipv4.conf.default.accept_source_route = 0
        net.ipv4.tcp_tw_recycle = 1
        net.ipv4.tcp_max_syn_backlog = 4096
        net.ipv4.conf.all.arp_filter = 1
        net.ipv4.ip_local_port_range = 1025 65535
        net.core.netdev_max_backlog = 10000
        vm.overcommit_memory = 2
        
        --Then Run
        # sysctl -p.
        • Set the following parameters in the /etc/security/limits.conf file:
        * softnofile 65536  
        * hardnofile 65536  
        * softnproc 131072 
        * hardnproc 131072

        Note:

        # these nproc values are overridden in limits.d/90-nproc.conf

        In /etc/security/limits.d/90-nproc.conf:
        * softnproc 131072
        * hardnproc 131072
        • The Linux disk I/O scheduler for disk access supports different policies, such as CFQ, AS, and deadline. Greenplum recommends the following scheduler option: deadline

        To specify a scheduler, run the following:

        echo 'deadline' > /sys/block/devname/queue/scheduler
        • Each disk device file should have a read-ahead (blockdev) value of 16384.
        To verify the read-ahead value of a disk device:
        # /sbin/blockdev --getra /dev/sdb
        you will need to set for sr0, sda, sdb, sdc
        To set blockdev (read-ahead) on a device:
        # /sbin/blockdev --setra 16385 /dev/sd?
        Add to /etc/rc.local:
        blockdev --setra 16384 /dev/sd?
        echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
        • Turn off firewalls and security Linux
        These commands disable firewalld when run as root:
        # systemctl stop firewalld.service
        # systemctl disable firewalld.service
        # chkconfig iptables off
        # service iptables stop
        
        These commands disable SELINUX when run as root
        # vi /etc/selinux/config
          SELINUX=disabled
        • Create a user for GreenPlum by the name gpadmin group gpadmin  with bash shell and home directory /home/gpadmin
        Add the Greenplum database Admin account :
        # useradd gpadmin
        # groupadd gpadmin
        # useradd gpadmin -g gpadmin
        # passwd gpadmin
        # New password: gpadmin
        # Retype new password: gpadmin
        • Create a directory for all host file and configuration files that will be user in the installation
          # cd /home/gpadmin
          # mkdir gpconfigs

        Step 2. Install Greenplum on master host.

        Greenplum database installation types are:

        1. Installing the RPM Distribution
        2. Installing the Binary Distribution

        If you do not have root access on the master host machine, run the binary installer as the gpadmin user and install the software into a directory in which you have write permission.

        1.Installing the RPM Distribution

        1. Log in as root on the machine that will become the Greenplum Database master host.
        2. Download or copy the RPM distribution file to the master host machine. The RPM distribution filename has the format greenplum-db-<version>-<platform>.rpm where <platform> is similar to RHEL7-x86_64 (Red Hat 64-bit) or SuSE12-x86_64 (SuSe Linux 64 bit).
        3. Install the local RPM file:
        # rpm -Uvh ./greenplum-db-<version>-<platform>.rpm
        Preparing...                ########################################### [100%]
        1:greenplum-db           ########################################### [100%]

        The RPM installation copies the Greenplum Database software into a version-specific directory, /usr/local/greenplum-db-<version>.

        1. Change the ownership and group of the installed files to gpadmin:
        # chown -R gpadmin /usr/local/greenplum*
        # chgrp -R gpadmin /usr/local/greenplum*

        Installing the Binary Distribution

        1. Log in as root on the machine that will become the Greenplum Database master host.If you do not have root access on the master host machine, run the binary installer as the gpadmin user and install the software into a directory in which you have write permission.
        2. Download or copy the Binary Installation distribution file to the master host machine. The Binary Installer distribution filename has the format greenplum-db-<version>-<platform>.zip where <platform> is similar to RHEL7-x86_64 (Red Hat 64-bit) or SuSE12-x86_64 (SuSe Linux 64 bit).
        3. Unzip the installer file:
        # unzip greenplum-db-<version>-<platform>.zip
        1. Launch the installer using bash:
        # /bin/bash greenplum-db-<version>-<platform>.bin
        1. The installer prompts you to accept the Greenplum Database license agreement. Type yes to accept the license agreement.
        2. The installer prompts you to provide an installation path. Press ENTER to accept the default install path (/usr/local/greenplum-db-<version>), or enter an absolute path to a custom install location. You must have write permission to the location you specify.
        3. The installer installs the Greenplum Database software and creates a greenplum-db symbolic link one directory level above the version-specific installation directory. The symbolic link is used to facilitate patch maintenance and upgrades between versions. The installed location is referred to as $GPHOME.
        4. If you installed as root, change the ownership and group of the installed files to gpadmin:
        # chown -R gpadmin /usr/local/greenplum*
        # chgrp -R gpadmin /usr/local/greenplum*

        Ref: important Greenplum database files and directories

        Step 3. Run gpseginstall to installing and configuring Greenplum Database on other hosts

        1. Log in to the master host as root:
        $ su -
        1. Source the path file from your master host’s Greenplum Database installation directory:
        # source /usr/local/greenplum-db/greenplum_path.sh
        1. Create a file called hostfile_exkeys that has the machine configured host names and host addresses (interface names) for each host in your Greenplum system (master, standby master and segments).
        # vi hostfile_exkeys
              2ndquadrant.in-1
              2ndquadrant.in-2 
              2ndquadrant.in-3
              2ndquadrant.in-4
        1. Run the following command to redo the ssh key exchange:
        # gpssh-exkeys -f hostfile

        5. Run the gpseginstall utility referencing the hostfile  file you just created. This example runs the utility as root. The utility creates the Greenplum system user gpadmin on all hosts and sets the password as  “gpadmin”  for that user on all hosts.

        # gpseginstall -f hostfile –u gpadmin –p gpadmin

        Note:

        • Use the -u and -p options to specify a different system user and password.
        • Do not use the default password option for production environments.
        • Change the password immediately after installation.

        Validating the Installation:

        To make sure the Greenplum software was installed and configured correctly, run the following confirmation steps from your Greenplum master host  as  gpadmin user.

        • Log in to the master host as gpadmin:
        $ su - gpadmin
        • Source the path file from Greenplum Database installation directory:
        $ source /usr/local/greenplum-db/greenplum_path.sh
        • Use the gpssh utility to see if you can login to all hosts without a password prompt, and to confirm that the Greenplum software was installed on all hosts. Use the hostfile file you used for installation. For example:
        $ gpssh -f hostfile -e ls -l $GPHOME

        If the installation was successful, you should be able to log in to all hosts without a password prompt. All hosts should show that they have the same contents in their installation directories, and that the directories are owned by the gpadmin user.

        If you are prompted for a password, run the following command to redo the ssh key exchange:

        $ gpssh-exkeys -f hostfile

        Optional installation :

        1.Installing Oracle Compatibility Functions

        Many Oracle Compatibility SQL functions are available in Greenplum Database

        you need to run the installation script $GPHOME/share/postgresql/contrib/orafunc.sql once for each database

        To install:

        $ psql -d testdb -f $GPHOME/share/postgresql/contrib/orafunc.sql

        To uninstall Oracle Compatibility Functions, use the script:

        $GPHOME/share/postgresql/contrib/uninstall_orafunc.sql

        Note: The following functions are available by default and can be accessed without running the Oracle Compatibility installer: sinh, tanh, cosh and decode.

        2. Greenplum Database Extensions

        gppkg (Greenplum package manager) to install Greenplum Database extensions such as pgcrypto, PL/R, PL/Java, PL/Perl, and PostGIS, along with their dependencies, across an entire cluster.

        For More info: https://gpdb.docs.pivotal.io/43230/install_guide/refs/gppkg.html

        Step 4. Create master directory on the master

        The master does not store any user data, only the system catalog tables and system metadata are stored on the master instance, therefore you do not need to designate as much storage space as on the segments.

        Note: How to make data partition on linux RHEL7: https://www.tutorialdba.com/2018/03/how-to-make-partition-and-configure.html here we are created /data partition with postgres user permission you change the permission as gpadmin

        Master : as root user

        # su -
        # mkdir /data/master

        Change ownership of this directory to the gpadmin user:

        # chown gpadmin /data/master

        Step 5. Create the master data directory location on your standby master

        As root User:

        Using gpssh, create the master data directory location on your standby master as well. For example:

        # source /usr/local/greenplum-db-4.3.x.x/greenplum_path.sh 
        # gpssh -h 2ndquadrant.in-2  -e 'mkdir /data/master'
        # gpssh -h 2ndquadrant.in-2  -e 'chown gpadmin /data/master'

        Step 6. Create data directories on all segments:

        1. On the master host, log in as root:
        su -
        1. Create a file called hostfile_gpssh_segonly. This file should have only one machine configured host name for each segment host. For example, if you have three segment hosts:
        2ndquadrant.in-2 
        2ndquadrant.in-3
        2ndquadrant.in-4
        1. Using gpssh, create the primary and mirror data directory locations on all segment hosts at once using the hostfile_gpssh_segonlyfile you just created. For example:
        # source /usr/local/greenplum-db-4.3.x.x/greenplum_path.sh 
        # gpssh -f hostfile_gpssh_segonly -e 'mkdir /data/primary'
        # gpssh -f hostfile_gpssh_segonly -e 'mkdir /data/mirror'
        # gpssh -f hostfile_gpssh_segonly -e 'chown gpadmin /data/primary'
        # gpssh -f hostfile_gpssh_segonly -e 'chown gpadmin /data/mirror'

        Step 7. Network Time Protocol to synchronize the system clocks on all hosts.

        • All Hosts requires the same time zone environment variable setting on all host for time consistancy..
        • All Hosts to synchronize time over networks for an accurate time. While time is passing by, hosts internal clocks tend to drift which can lead to inconsistent time issues, especially on servers and clients logs files or if you want to replicate servers resources or databases.
        • Greenplum recommends using NTP (Network Time Protocol) to synchronize the system clocks on all hosts that comprise your Greenplum Database system.
        • NTP on the segment hosts should be configured to use the master host as the primary time source, and the standby master as the secondary time source.
        • On the master and standby master hosts, configure NTP to point to your preferred time server.

        1.NTP Installation and configuring on master:

        The NTP service is installed from a Yum repository using the following command.

        # yum install ntp

        Turn on the NTP service and make sure it starts automatically on reboot.

        # service ntpdate start
        # service ntpd start
        # chkconfig ntpd on
        # chkconfig ntpdate on

        The NTP service is configured using the “/etc/ntp.conf” file. Configuration changes have to be followed by a restart of the NTP service.

        # service ntpd restart

        Checking the synchronization status and your system time

        # ps -ef | grep ntp
        # ntpq -p
        # date -R

        2. On the master host, log in as root and edit the /etc/ntp.conf file.

        server 2ndquadrant.in-1 prefer
        server 2ndquadrant.in-2

        3.On the master host, use the NTP daemon synchronize the system clocks on all Greenplum hosts. For example using gpssh:

        # gpssh -f /home/gpadmin/gpconfigs/hostfile_gpssh_segonly -v -e 'yum install ntp; service ntpdate start; service ntpd start; chkconfig ntpd on ; chkconfig ntpdate on'

        To Enabling Firewall

        You should re-enable the firewall after installation is complete

        as root:

        # systemctl start firewalld.service
        # systemctl enable firewalld.service
        # chkconfig iptables on
        # service iptables start

        — checking iptables rules

        # cat /etc/sysconfig/iptables

        step 8. Validating Our Systems

        gpcheck and gpcheckperf utilities to validate the configuration and performance of your systems.

        The following tests should be run prior to initializing your Greenplum Database system.

        1. Validating OS Settings (gpcheck can help here)
        2. Validating Network Performance (gpcheckperf can help here)
        3. Validating Disk I/O and Memory Bandwidth (gpcheckperf can help here)
        • gpcheck that can be used to verify that all hosts in your array have the recommended OS settings for running a production Greenplum Database system.
        • gpcheckperfcalls to gpssh and gpscp, so these Greenplum utilities must be in your $PATH.
        1. Validating OS Settings
        2. Log in on the master host as the gpadmin
        3. Source the shpath file from your Greenplum installation. For example:
        $ source /usr/local/greenplum-db/greenplum_path.sh
        1. Create a file called hostfile_gpcheckthat has the machine-configured host names of each Greenplum host (master, standby master and segments), one host name per line. Make sure there are no blank lines or extra spaces. This file should just have a single host name per host. For example:
        2ndquadrant.in-1
        2ndquadrant.in-2
        2ndquadrant.in-3
        2ndquadrant.in-4
        1. Run the gpcheckutility using the host file you just created. For example:
        $ gpcheck -f hostfile_gpcheck -m 2ndquadrant.in-1 -s 2ndquadrant.in-2
        1. After gpcheckfinishes verifying OS parameters on all hosts (masters and segments), you might be prompted to modify certain OS parameters before initializing your Greenplum Database system.

        2.Validating Network Performance (optional)

        To test network performance, run gpcheckperf with one of the network test run options: parallel pair test (-r N), serial pair test (-r n), or full matrix test (-r M). The utility runs a network benchmark program that transfers a 5 second stream of data from the current host to each remote host included in the test. By default, the data is transferred in parallel to each remote host and the minimum, maximum, average and median network transfer rates are reported in megabytes (MB) per second. If the summary transfer rate is slower than expected (less than 100 MB/s), you can run the network test serially using the -r n option to obtain per-host results. To run a full-matrix bandwidth test, you can specify -r M which will cause every host to send and receive data from every other host specified. This test is best used to validate if the switch fabric can tolerate a full-matrix workload.

        Most systems in a Greenplum Database array are configured with multiple network interface cards (NICs), each NIC on its own subnet. When testing network performance, it is important to test each subnet individually. For example, considering the following network configuration of two NICs per host:

        Network Interface Configuration
        Greenplum Host Subnet1 NICs Subnet2 NICs
        2ndquadrant.in -2 2nd2-1 2nd2-2
        2ndquadrant.in -3 2nd3-1 2nd3-2
        2ndquadrant.in -4 2nd4-1 2nd4-2

        You would create four distinct host files for use with the gpcheckperf network test:

        Network Test Host File Contents
        hostfile_gpchecknet_ic1 hostfile_gpchecknet_ic2
        2nd2-1 2nd2-2
        2nd3-1 2nd3-2
        2nd4-1 2nd4-2

        You would then run gpcheckperf once per subnet. For example (run in parallel pairs test mode):

        $ gpcheckperf -f hostfile_gpchecknet_ic1 -r N -d /tmp > subnet1.out
        $ gpcheckperf -f hostfile_gpchecknet_ic2 -r N -d /tmp > subnet2.out

        Note:

        • If you have an oddnumber of hosts to test, you can run in serial test mode (-r n),
        • if testing an evennumber of hosts, run in parallel pairs test mode
        • If hosts have multiple unbonded NICs, the convention is to append a dash (-) and number to the host name. For example, sdw1-1 and sdw1-2 are the two interface names for host sdw1. However, NIC bonding is recommended to create a load-balanced, fault-tolerant network. This is fully network HA more About https://www.interserver.net/tips/kb/network-bonding-types-network-bonding/

        3.Validating Disk I/O  and Memory Bandwidth (optional)

        Disk I/O Performance (dd test) – The disk test uses the dd command (a standard UNIX utility) to test the sequential throughput performance of a logical disk or file system.

        Memory Bandwidth (stream test) – The memory test uses the STREAM benchmark program to measure sustainable memory bandwidth. Results are reported in MB per second (MB/s), run gpcheckperf with the disk and stream test run options (-r ds).

        To run the disk and stream tests

        1. Log in on the master host as the gpadmin
        2. Source the greenplum_path.sh path file from your Greenplum installation. For example:
        $ source /usr/local/greenplum-db/greenplum_path.sh
        1. Create a host file named hostfile_gpcheckperfthat has one host name per segment host. Do not include the master host. For example:
        2ndquadrant.in-2
        2ndquadrant.in-3
        2ndquadrant.in-4
        1. Run the gpcheckperf utility using the hostfile_gpcheckperf file you just created. Use the-d option to specify the file systems you want to test on each host (you must have write access to these directories). You will want to test all primary and mirror segment data directory locations. For example:
        $ gpcheckperf -f hostfile_gpcheckperf -r ds -D \
        -d /data/primary \
        -d /data/mirror
        1. The utility may take a while to perform the tests as it is copying very large files between the hosts. When it is finished you will see the summary results for the Disk Write, Disk Read, and Stream tests.

        Step 9. Initialize Greenplum Database System

        Step to initializing greenplum database:

        1. Create a host file that contains the host addresses of your segments.
        2. Creating the Greenplum Database Configuration File (gpinitsystem_config like oracle pfile ).By default, Greenplum Database will be initialized using the locale of the master host system. Make sure this is the correct locale you want to use, as some locale options cannot be changed after initialization.
        3. Run the Greenplum Database initialization utility on the master host.

         

        1. Create a host file that contains the host addresses of your segments.
        • Log in as gpadmin.
        $ su - gpadmin
        • Create a file named hostfile_gpinitsystem. In this file add the host address name(s) of your segment host interfaces, one name per line, no extra lines or spaces. For example, if you have three segment hosts with two network interfaces each:
        $ vi hostfile_gpinitsystem
        2nd2-1
        2nd2-2
        2nd3-1
        2nd3-2
        2nd4-1
        2nd4-2

        Save and close the file.

        2.Creating the Greenplum Database Configuration File (gpinitsystem_config like oracle pfile )

        Your Greenplum Database configuration file tells the gpinitsystem utility how you want to configure your Greenplum Database system. An example configuration file can be found in $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config.

        • Log in as gpadmin.
        $ su - gpadmin
        • Make a copy of the gpinitsystem_config file to gpadmin user configure location
        $ cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config /home/gpadmin/gpconfigs/
        • Open the file you just copied in a text editor.Set all of the required parameters according to your environment. See gpinitsystemfor more information. A Greenplum Database system must contain a master instance and at least twosegment instances (even if setting up a single node system).The DATA_DIRECTORY parameter is what determines how many segments per host will be created. If your segment hosts have multiple network interfaces, and you used their interface address names in your host file, the number of segments will be evenly spread over the number of available interfaces.

        Here is an example of the required parameters in the gpinitsystem_config file:

        $ cat /home/gpadmin/gpconfigs/gpinitsystem_config 
        	ARRAY_NAME="2ndquadrant.in"
        	SEG_PREFIX=2ndseg
        	PORT_BASE=40000 
        	declare -a DATA_DIRECTORY=(/data/primary /data/primary	/data/primary)
        	MASTER_HOSTNAME=2ndquadrant.in -1
        	MASTER_DIRECTORY=/data/master 
        	MASTER_PORT=5432 
        	TRUSTED SHELL=ssh
        	CHECK_POINT_SEGMENTS=8
        	ENCODING=UNICODE
        • Optional) If you want to deploy mirror segments, uncomment and set the mirroring parameters according to your environment. Here is an example of the optional mirror parameters in the gpinitsystem_config file:
        MIRROR_PORT_BASE=50000
        REPLICATION_PORT_BASE=41000
        MIRROR_REPLICATION_PORT_BASE=51000
        declare -a MIRROR_DATA_DIRECTORY=(/data/mirror /data/mirror /data/mirror)
        • Save and close the file.

        Note: You can initialize your Greenplum system with primary segments only and deploy mirrors later using the gpaddmirrors utility.

        1. Run the Greenplum Database initialization utility on the master host

        The gpinitsystem utility will create a Greenplum Database system using the values defined in the configuration file.

        Syntax:

        gpinitsystem -c cluster_configuration_file 
                    [-h hostfile_gpinitsystem]
                    [-B parallel_processes] 
                    [-p postgresql_conf_param_file]
                    [-s standby_master_host [-P standby_master_port]
                      [-F standby_master_filespaces]]         
                    [-m number | --max_connections=number]
                    [-b size | --shared_buffers=size]
                    [-n locale | --locale=locale] [--lc-collate=locale] 
                    [--lc-ctype=locale] [--lc-messages=locale] 
                    [--lc-monetary=locale] [--lc-numeric=locale] 
                    [--lc-time=locale]
                    [-e password | --su_password=password] 
                    [-S] [-i] [-a] [-q] [-l logfile_directory] [-D]
                    [-I input_configuration_file]
                    [-O output_configuration_file]
        
        gpinitsystem -v
        
        gpinitsystem -h

        To run the initialization utility

        1. Run the following command referencing the path and file name of your initialization configuration file (gpinitsystem_config) and host file (hostfile_gpinitsystem). For a fully redundant system (with a standby master and a spreadmirror configuration) include the -s and -S
        $ gpinitsystem -c gpconfigs/gpinitsystem_config -h gpconfigs/hostfile_gpinitsystem \
         -s 2ndquadrant.in-2 -S -b 500MB
        1. The utility will verify your setup information and make sure it can connect to each host and access the data directories specified in your configuration. If all of the pre-checks are successful, the utility will prompt you to confirm your configuration. For example:
        => Continue with Greenplum creation? Yy/Nn
        1. Press y to start the initialization.
        2. The utility will then begin setup and initialization of the master instance and each segment instance in the system. Each segment instance is set up in parallel. Depending on the number of segments, this process can take a while.
        3. At the end of a successful setup, the utility will start your Greenplum Database system. You should see:
        => Greenplum Database instance successfully created.

        Step 10 . Configuring Environment variable and Client Authentication

        1. set up your user environment for Greenplum (.bash_profile)

        Set these in your user’s startup shell profile (such as ~/.bashrc or ~/.bash_profile), or in /etc/profile if you want to set them for all users.

        # su - gpadmin
        
        $ vi .bash_profile
        source /usr/local/greenplum-db/greenplum_path.sh
        export MASTER_DATA_DIRECTORY=/data/master/2ndseg-1
        export PGDATABASE=postgres
        export PGPORT=5432
        export PGUSER=gpadmin 
        export PGPASSWORD=gpadmin

        Note: Here You can access Greenplum database with no password prompt by using PGPASSWORD variable/.pgpass file.

        2. pg_hba.conf  – host based authentication

        • Client access and authentication is controlled by the standard PostgreSQL host-based authentication file, conf.
        • The authentication method for local database connections is initially trust, which means that Greenplum Database trusts any user logged in to the master server to access the database system using whatever database role they specify. Trust authentication is insufficiently secure for a production database system. You should replace all instances of trust authentication in the pg_hba.conf file with a stronger authentication method, such as ident for local connections or md5 passwords for remote connections.
        • The pg_hba.conf file just controls who can initiate a database session and how those connections are authenticated. By setting object privileges you can control database access.
        • Every greenplum user autheicated/authorized by using 5 field there are TYPE,DATABASE,USER,ADDRESS and METHOD

        Each remote client access record should be the following format:

        #TYPE     DATABASE-NAME  USER-NAME            ADDRESS              AUTHENTICATION METHOD
        host       database  	   user     	   address/CIDR      	auth-method  [auth-options]
        hostssl    database  	   user     	   address/CIDR      	auth-method  [auth-options]
        hostnossl  database        user     	   address/CIDR      	auth-method  [auth-options]
        host       database        user  	IP-address  IP-mask  	auth-method  [auth-options]
        hostssl    database        user  	IP-address  IP-mask  	auth-method  [auth-options]
        hostnossl  database        user  	IP-address  IP-mask  	auth-method  [auth-options]

        Each UNIX-domain socket (any user on the local system) access record should be the following format:

        #TYPE     DATABASE-NAME   USER-NAME         AUTHENTICATION METHOD
        local       database         user         auth-method  [auth-options]

        Note: 255.0.0.0 represents an IPv4 CIDR mask length of IP/8, and 255.255.255.255 represents a CIDR mask length of IP/32.

        Editing pg_hba.conf

        1. Open the file $MASTER_DATA_DIRECTORY/pg_hba.conf in a text editor.
        2. Add a line to the file for each type of connection you want to allow. Records are read sequentially, so the order of the records is significant. Typically, earlier records will have tight connection match parameters and weaker authentication methods, while later records will have looser match parameters and stronger authentication methods. For example:
        # allow the gpadmin user local access to all databases
        # using ident authentication
        local   all   gpadmin   ident         sameuser
        host    all   gpadmin   127.0.0.1/32  ident
        host    all   gpadmin   ::1/128       ident
        # allow the 'dba' role access to any database from any 
        # host with IP address 192.168.x.x and use md5 encrypted 
        # passwords to authenticate the user
        # Note that to use SHA-256 encryption, replace md5 with 
        # password in the line below
        host    all   dba   192.168.0.0/32  md5
        # allow all roles access to any database from any 
        # host and use ldap to authenticate the user. Greenplum role 
        # names must match the LDAP common name.
        host    all   all   192.168.0.0/32  ldap ldapserver=usldap1 
        ldapport=1389 ldapprefix="cn=" 
        ldapsuffix=",ou=People,dc=company,dc=com"

        3 . Save and close the file.

        4. Reload the pg_hba.conf configuration file for your changes to take effect:

        $ gpstop -u

        3.  postgresql.conf configuration file

        1. To limit the number of active concurrent sessions to your Greenplum Database system, you can configure the max_connections server configuration parameter. This is a local parameter, meaning that you must set it in the conf file of the master, the standby master, and each segment instance (primary and mirror). The recommended value of max_connections on segments is 5-10 times the value on the master.
        2. When you set max_connections, you must also set the dependent parameter max_prepared_transactions. This value must be at least as large as the value of max_connections on the master, and segment instances should be set to the same value as the master.
        • In $MASTER_DATA_DIRECTORY/postgresql.conf(including standby master):
        max_connections=100
        max_prepared_transactions=100
        • In SEGMENT_DATA_DIRECTORY/postgresql.conffor all segment instances:
        max_connections=500
        max_prepared_transactions=100

        Using gpconfig utility:

        $ gpconfig -c max_connections -v 1000 -m 200
        $ gpconfig -c max_prepared_transactions -v 200

        4. Stop and restart your Greenplum Database:

        $ gpstop -r

        check the max connection:

        $ gpconfig -s max_connections

        Key points of max connection:

        • The value on the segments must be greater than the value on the master. The recommended value of max_connections on segments is 5-10 times the value on the master.
        • The value of max_prepared_transactions must be greater than or equal to max_connections on the master.
        • while raising connection you have to note down your system RAM memory (  (work_mem  x master connection) + effective_cache_size + maintennce_work_mem + Shared_buffer + Wall_buffer + commit log buffer)
        1. Try to connect the postgres database using psql

        Prev Greenplum Configuration Files ( pg_hba.conf , postgresql.conf )
        Next Greenplum psql Command Line Client

        Leave A Reply Cancel reply

        You must be logged in to post a comment.

        Login with:

        Login with Google Login with Twitter Login with LinkedIn Login with Microsoft


        ADVERTISEMENT

        Latest Courses

        PostgreSQL Database

        PostgreSQL Database

        $600.00 $500.00
        Greenplum Database

        Greenplum Database

        $2,000.00 $1,500.00

        Preview Course

        Free

        Latest Posts

        PostgreSQL Patching version 9, 10,11
        10Oct2019
        Tools for PostgreSQL
        16Sep2019
        Postgres user creation and restrict DDL & database access
        13Sep2019

        Recent Forum Topics

        • thought behind whiteboard activity
        • Are you going to take your first ste
        • How to start working on an application?
        • please let me know pre requirements to increase work_mem
        • how to copy some data in one table to another table in postgres

        2ndquadrant.in

        (+91) 8838953252

        ITsupport@rayafeel.com

        Company

        • About Us
        • Contact
        • Our Team
        • Blog

        COURSES

        • List Of Course
        • Become An Instructor
        • Events
        • Postgres Support Blog

        Support

        • DBA Support
        • Consultancy Services
        • Postgres Migration Blogs
        • Forum

        Recommend

        • Groups
        • Login
        • FAQs
        • SignUp

        IT Services by rayafeel.com. Powered by Rayafeel Technologies Pvt Ltd.

        • Privacy
        • Terms

        Become An Instructor?

        Join thousand of instructors and earn money hassle free!

        Get Started Now

        Login with:

        Login with Google Login with Twitter Login with LinkedIn Login with Microsoft

        Login with your site account

        Lost your password?

        Not a member yet? Register now

        Register a new account

        Are you a member? Login now

        Modal title

        Message modal