• Home
  • Services
    • DBA Support
    • DBA Consultancy Services
    • PostgreSQL Support
    • Website Maintenance
  • Courses

    About Courses

    • List Of Courses
    • Become an Instructor
    Greenplum Database

    Greenplum Database

    $2,000.00 $1,500.00
    Read More
  • Company
    • FAQs
    • About Us
    • Contact
  • Events
  • Portfolio
  • Blogs
    • Blog – RayaFeeL
    • Blog – PostgreSQL Support
    • Blog – PostgreSQL Migration
    • Blog – All DB’s
    • Blog – Linux
    • Blog – Medical Coding
      • Cart

        0

    Have any question?
    (+91)8838953252
    ITsupport@rayafeel.com
    RegisterLogin
    RayaFeeL
    • Home
    • Services
      • DBA Support
      • DBA Consultancy Services
      • PostgreSQL Support
      • Website Maintenance
    • Courses

      About Courses

      • List Of Courses
      • Become an Instructor
      Greenplum Database

      Greenplum Database

      $2,000.00 $1,500.00
      Read More
    • Company
      • FAQs
      • About Us
      • Contact
    • Events
    • Portfolio
    • Blogs
      • Blog – RayaFeeL
      • Blog – PostgreSQL Support
      • Blog – PostgreSQL Migration
      • Blog – All DB’s
      • Blog – Linux
      • Blog – Medical Coding
        • Cart

          0

      Databases

      • Home
      • All courses
      • Databases
      • Greenplum Database
      CoursesIT & SoftwareDatabasesGreenplum Database
      • Greenplum Tutorial
        9
        • Lecture1.1
          Download the Greenplum Sandbox VM 30 min
        • Lecture1.2
          Greenplum Database Architecture 30 min
        • Lecture1.3
          Greenplum Configuration Files ( pg_hba.conf , postgresql.conf ) 30 min
        • Lecture1.4
          Greenplum Database Installation 30 min
        • Lecture1.5
          Greenplum psql Command Line Client 30 min
        • Lecture1.6
          Greenplum Create Users and Roles 30 min
        • Lecture1.7
          Greenplum Create and Prepare Database 30 min
        • Lecture1.8
          Greenplum Create Tables 30 min
        • Lecture1.9
          Greenplum Data Loading 30 min
      • Greenplum Performance Tuning
        7
        • Lecture2.1
          Greenplum Performance Tuning Overview 30 min
        • Lecture2.2
          Greenplum Analyze the tables 30 min
        • Lecture2.3
          Greenplum explain plans 30 min
        • Lecture2.4
          Greenplum Indexes and performance 30 min
        • Lecture2.5
          Greenplum Row vs. column orientation 30 min
        • Lecture2.6
          Greenplum – Check for even data distribution on segments 30 min
        • Lecture2.7
          Greenplum Partitioning 30 min
      • Greenplum Database Analytics
        1
        • Lecture3.1
          Greenplum Database analytics 30 min
      • Greenplum Backup and Recovery Operations
        2
        • Lecture4.1
          Greenplum gpcrondump 30 min
        • Lecture4.2
          Greenplum gpdbrestore 30 min
      • Importing GPDB Sandbox
        1
        • Lecture5.1
          Importing GPDB Sandbox into VMware Fusion 30 min
      • Greenplum Database Books
        1
        • Lecture6.1
          Data Warehousing with Greenplum PDF 30 min
      • Basic Questions and Answer of Greenplum Database
        1
        • Lecture7.1
          Some basic Questions and answer about Greenplum database 30 min

        Greenplum Database Architecture

        Greenplum Database is a massively parallel processing (MPP) database server with an architecture specially designed to manage large-scale analytic data warehouses and business intelligence workloads.

        MPP, Greenplum and Postgresql :

        MPP (also known as a shared nothing architecture) refers to systems with two or more processors that cooperate to carry out an operation, each processor with its own memory, operating system and disks. Greenplum uses this high-performance system architecture to distribute the load of multi-terabyte data warehouses and can use all of a system’s resources in parallel to process a query.

        Greenplum Database is based on PostgreSQL open-source technology. It is essentially several PostgreSQL database instances acting together as one cohesive database management system (DBMS). It is based on PostgreSQL 8.2.15, and in most cases is very similar to PostgreSQL with regard to SQL support, features, configuration options, and end-user functionality. Database users interact with Greenplum Database as they would a regular PostgreSQL DBMS.

        The internals of PostgreSQL have been modified or supplemented to support the parallel structure of Greenplum Database. For example, the system catalog, optimizer, query executor, and transaction manager components have been modified and enhanced to be able to execute queries simultaneously across all of the parallel PostgreSQL database instances. The Greenplum interconnect (the networking layer) enables communication between the distinct PostgreSQL instances and allows the system to behave as one logical database.

        Greenplum Database also includes features designed to optimize PostgreSQL for business intelligence (BI) workloads. For example, Greenplum has added parallel data loading (external tables), resource management, query optimizations, and storage enhancements, which are not found in standard PostgreSQL.

        Greenplum Architecture

        Greenplum Database stores and processes large amounts of data by distributing the data and processing workload across several servers or hosts. Greenplum Database is an array of individual databases based upon PostgreSQL 8.2 working together to present a single database image. The master is the entry point to the Greenplum Database system. It is the database instance to which clients connect and submit SQL statements. The master coordinates its work with the other database instances in the system, called segments, which store and process the data.

        Figure 1. High-Level view of the Greenplum Database Architecture

        High-Level Greenplum Database Architecture

        Greenplum Master :

        The Greenplum Database master is the entry to the Greenplum Database system, accepting client connections and SQL queries, and distributing work to the segment instances.

        Greenplum Database end-users interact with Greenplum Database (through the master) as they would with a typical PostgreSQL database. They connect to the database using client programs such as psql or application programming interfaces (APIs) such as JDBC or ODBC.

        The master is where the global system catalog resides. The global system catalog is the set of system tables that contain metadata about the Greenplum Database system itself. The master does not contain any user data; data resides only on the segments. The master authenticates client connections, processes incoming SQL commands, distributes workloads among segments, coordinates the results returned by each segment, and presents the final results to the client program.

        Greenplum Segments :

        Greenplum Database segment instances are independent PostgreSQL databases that each store a portion of the data and perform the majority of query processing.

        When a user connects to the database via the Greenplum master and issues a query, processes are created in each segment database to handle the work of that query. For more information about query processes, see About Greenplum Query Processing.

        User-defined tables and their indexes are distributed across the available segments in a Greenplum Database system; each segment contains a distinct portion of data. The database server processes that serve segment data run under the corresponding segment instances. Users interact with segments in a Greenplum Database system through the master.

        Segments run on a servers called segment hosts. A segment host typically executes from two to eight Greenplum segments, depending on the CPU cores, RAM, storage, network interfaces, and workloads. Segment hosts are expected to be identically configured. The key to obtaining the best performance from Greenplum Database is to distribute data and workloads evenly across a large number of equally capable segments so that all segments begin working on a task simultaneously and complete their work at the same time.

        Greenplum Interconnect :

        The interconnect is the networking layer of the Greenplum Database architecture.

        The interconnect refers to the inter-process communication between segments and the network infrastructure on which this communication relies. The Greenplum interconnect uses a standard 10-Gigabit Ethernet switching fabric.

        By default, the interconnect uses User Datagram Protocol (UDP) to send messages over the network. The Greenplum software performs packet verification beyond what is provided by UDP. This means the reliability is equivalent to Transmission Control Protocol (TCP), and the performance and scalability exceeds TCP. If the interconnect used TCP, Greenplum Database would have a scalability limit of 1000 segment instances. With UDP as the current default protocol for the interconnect, this limit is not applicable.

        Pivotal Query Optimizer :

        The Pivotal Query Optimizer brings a state of the art query optimization framework to Greenplum Database that is distinguished from other optimizers in several ways:

        • Modularity. Pivotal Query Optimizer is not confined inside a single RDBMS. It is currently leveraged in both Greenplum Database and Pivotal HAWQ, but it can also be run as a standalone component to allow greater flexibility in adopting new backend systems and using the optimizer as a service. This also enables elaborate testing of the optimizer without going through the other components of the database stack.
        • Extensibility. The Pivotal Query Optimizer has been designed as a collection of independent components that can be replaced, configured, or extended separately. This significantly reduces the development costs of adding new features, and also allows rapid adoption of emerging technologies. Within the Query Optimizer, the representation of the elements of a query has been separated from how the query is optimized. This lets the optimizer treat all elements equally and avoids the issues with the imposed order of optimizations steps of multi-phase optimizers.
        • Performance. The Pivotal Query Optimizer leverages a multi-core scheduler that can distribute individual optimization tasks across multiple cores to speed up the optimization process. This allows the Query Optimizer to apply all possible optimizations as the same time, which results in many more plan alternatives and a wider range of queries that can be optimized. For instance, when the Pivotal Query Optimizer was used with TPC-H Query 21 it generated 1.2 Billion possible plans in 250 ms. This is especially important in Big Data Analytics where performance challenges are magnified by the volume of data that needs to be processed. A suboptimal optimization choice could very well lead to a query that just runs forever.

         

        Prev Download the Greenplum Sandbox VM
        Next Greenplum Configuration Files ( pg_hba.conf , postgresql.conf )

        ADVERTISEMENT

        Latest Courses

        PostgreSQL Database

        PostgreSQL Database

        $600.00 $500.00
        Greenplum Database

        Greenplum Database

        $2,000.00 $1,500.00

        Preview Course

        Free

        Latest Posts

        PostgreSQL Patching version 9, 10,11
        10Oct2019
        Tools for PostgreSQL
        16Sep2019
        Postgres user creation and restrict DDL & database access
        13Sep2019

        Recent Forum Topics

        • thought behind whiteboard activity
        • Are you going to take your first ste
        • How to start working on an application?
        • please let me know pre requirements to increase work_mem
        • how to copy some data in one table to another table in postgres

        2ndquadrant.in

        (+91) 8838953252

        ITsupport@rayafeel.com

        Company

        • About Us
        • Contact
        • Our Team
        • Blog

        COURSES

        • List Of Course
        • Become An Instructor
        • Events
        • Postgres Support Blog

        Support

        • DBA Support
        • Consultancy Services
        • Postgres Migration Blogs
        • Forum

        Recommend

        • Groups
        • Login
        • FAQs
        • SignUp

        IT Services by rayafeel.com. Powered by Rayafeel Technologies Pvt Ltd.

        • Privacy
        • Terms

        Become An Instructor?

        Join thousand of instructors and earn money hassle free!

        Get Started Now

        Login with:

        Login with Google Login with Twitter Login with LinkedIn Login with Microsoft

        Login with your site account

        Lost your password?

        Not a member yet? Register now

        Register a new account

        Are you a member? Login now

        Modal title

        Message modal