Pivotal GemFire XD DISTRIBUTED IN-MEMORY AND HADOOP-INTEGRATED SQL DATABASE
Transcription
Pivotal GemFire XD DISTRIBUTED IN-MEMORY AND HADOOP-INTEGRATED SQL DATABASE
DATA S HEET Pivotal GemFire XD DISTRIBUTED IN-MEMORY AND HADOOP-INTEGRATED SQL DATABASE FOR MISSION CRITICAL APPLICATIONS OVERVIEW AT-A-GLANCE For developers that need to meet highest service level requirements for structured big data applications, Pivotal™ GemFire® XD is a distributed in-memory database that is designed to provide: • Scale-out performance • Consistent database operations across globally distributed applications • High availability, resilience, and global scale • Standards-based developer features and interfaces • Easy administration of distributed nodes KEY FEATURES & BENEFITS Scale-out performance • In-memory storage: all operational data available inmemory to avoid disk I/O penalty • High-memory nodes: supports systems with memory capacity larger than JVM heap size limits • Elastic, linear scalability: easily scale up or down capacity to meet changes in demand • Optimized data distribution & processing: configure data distribution across grid to optimize speed of data access & processing Consistent database operations for Hadoop clusters and across globally distributed applications • Flexible persistence: Store data in performanceoptimized disk persistence, or within Pivotal HD. • Configurable consistency: choose consistency model supporting distributed OLTP applications to balance performance and data availability. • SQL query support: Supports SQL queries of data over distributed nodes that can be optimized with indexes on key values • Advanced analytics access: analyze archived on-disk, and in-memory data with Pivotal HAWQ via PXF pivotal.io SCALING OUT STRATEGIC DATA-DRIVEN SQL APPLICATIONS Many applications are built with a relational data model to meet requirements for reporting and analytics on current and historical data. Other times its just a default choice of starting with an RDBMS as the data management system. When companies choose to scale-out such applications in high concurrency deployments with thousands to hundreds of thousands of concurrent operations, traditional relational databases develop unacceptable performance problems. Such high usage applications typically generate significant historical information. Only with inexpensive, and flexible storage solutions such as Hadoop does it make sense to keep large detailed data sets. This includes not only transactional data, but history, application logging, and data from external sources to analyze user behavior, and application performance. Pivotal GemFire XD is a distributed in-memory SQL database for high scale custom applications. GemFire XD provides low latency data access to applications at massive scale with many concurrent transactions involving terabytes of operational data. Designed for maintaining consistency of concurrent operations across its distributed data nodes, Pivotal GemFire XD can support ACID transactions for massively scaled applications such as data stream analysis and processing, financial payments, and ticket sales in proven customer deployments of more than 10 million user transactions a day. With optional persistence and archival in HDFS, GemFire XD will store an extremely large, consistent database in Hadoop nodes which can be accessed for analysis by Pivotal HAWQ. Through support of standards such as JDBC, GemFire XD works with common development frameworks and reporting tools for relational data. DATA SHEET PIVOTAL GEMFIRE XD SCALE-OUT PERFORMANCE IN-MEMORY STORAGE KEY FEATURES & BENEFITS (CONTINUED) GemFire XD stores all required data in RAM memory across distributed nodes to provide fastest access to data while eliminating the performance penalty of reading from disk. High availability, resilience, and global scale HIGH-MEMORY NODES GemFire XD allocates in-memory storage off heap to take advantage of hardware systems with memory capacity larger than JVM head size limits, and to provide faster performance by avoiding the Java garbage collection cycle governing memory deal location. • Node failover: application and data access ensured in event of network split or node failure • Resilient self-healing: fast node startup on reconnect, self-healing of clusters automates restoration after node failure • Cluster to cluster WAN connectivity: enabling global scale of data access and multi-site capability Standards-based Developer Features and Interfaces • API’s and Standards Support: develop in any programming language that supports JDBC, Spring Data JDBC, ADO.NET, ODBC, MapReduce. • Data type support: ANSI SQL-92 data types, table definitions, and foreign key relationships, JSON documents • Powerful application functions: data-aware stored procedures , SQL-compliant queries and DML statements, publish & subscribe event framework with reliable asynchronous queues for delivering events. • Use familiar tools: Hibernate, NHibernate, Roo, SQuirreL, IntelliJ, other JDBC-compliant tools ELASTIC, LINEAR SCALABILTY GemFire XD provides linear scalability that allows you to predictably increase capacity for number of operations per second, and data storage simply by adding additional nodes to a cluster. Data distribution and system resource usage is automatically adjusted as nodes are added or removed, making it easy to scale up or down to quickly meet expected, or unexpected, spikes of demand. OPTIMIZED DATA DISTRIBUTION ACROSS NODES GemFire XD will automatically optimize how data is distributed across nodes to optimize latency and usage of system resources. You can also configure partitioning and replication of data to further optimize application response time. GemFireXD will appropriately direct processing operations on data to the specific nodes where data resides in order to reduce latency and network traffic, according to the cluster configuration you set up for data distribution and replication between nodes. CONSISTENT DATABASE OPERATIONS FOR HADOOP CLUSTERS AND ACROSS GLOBALLY DISTRIBUTED APPLICATIONS FLEXIBLE PERSISTENCE To ensure durability of data in the event of node failure, GemFire XD writes to disk a log of all creates, updates, and deletes of data managed by a node. This log can then be read to reconstruct the last consistent state of the in-memory database on that node when a node comes back online. When persisted or archived in Hadoop, this data can be used in analytics processing with tools such as Pivotal HAWQ, and support even larger data volumes. Using the event framework, you can modify persistence behavior for purposes such as archiving historical data. Easy administration of distributed clusters • Auto tuning and simplified cluster configuration: automatic distribution of data to optimize usage of system resources on nodes for best cluster performance • Simplified Cluster Configuration: configure all nodes in cluster from single fault-tolerant service • Cluster monitoring & data query: dashboard showing cluster & node status; view and query data in nodes • Performance statistics analysis: offline tool for viewing historical logs and statistics to diagnose bottlenecks • Command line tools: easy automation and scripting of administrative tasks via command line interface CONFIGURABLE CONSISTENCY GemFire XD is capable of providing ACID consistency across distributed nodes to support high capacity transactional applications. You can also configure consistency models for higher performance such as allowing the entire grid to cache and operate on data, or turn consistency off when your requirements case calls for speed rather than consistency. 2 DATA SHEET PIVOTAL GEMFIRE XD Figure 1. Example topologies of Pivotal GemFire XD deployments supporting different service level requirements of data-driven applications. SQL QUERY SUPPORT CLUSTER-TO-CLUSTER WAN CONNECTIVITY Pivotal GemFire XD supports the ANSI SQL-92 for authoring queries. Queries are sent to the appropriate nodes that serve relevant partitions of data. Query results are then merged and sent back to the client application. Developers can define indexes on key values to improve performance. You can define key values that control distribution of data across nodes. When functions that operate on partitions of data are invoked, processing will be routed to appropriate nodes responsible for serving partitions of targeted data. GemFire XD allows multiple clusters to be connected via WAN gateways. This allows application data access to span across the globe, and allows companies to meet local data requirements, such as country-specific privacy regulations. WAN connected clusters also enable multi-site failover capability, ensuring ongoing availability and built-in disaster recovery in the case of catastrophic failure. ADVANCED ANALYTICS ACCESS Data persisted in Pivotal HD by GemFire XD can be accessed for advanced analytic processing by Pivotal HAWQ by way of Pivotal Extension Framework (PXF). This includes archived data as well as latest state active data in-memory. HIGH AVAILABILITY, RESILIENCE, AND GLOBAL SCALE STANDARDS-BASED DEVELOPER FEATURES AND INTERFACES API’S AND STANDARDS SUPPORT Pivotal GemFire XD will manage data for applications in any programming language that supports JDBC, ADO.NET, or ODBC. For Java developers, GemFire XD provides support for Spring Data JDBC. GemFire XD also extends the Hadoop MapReduce API allowing MapReduce jobs to access GemFire XD data without needing to start or access a GemFire XD distributed system. NODE FAIL OVER DATA TYPE SUPPORT GemFire XD provides continuous uptime with built in high availability and disaster recovery. Multiple failure detection models detect and react to failures quickly, ensuring that the cluster is always available, and that the data set is always complete. GemFire XD supports structured data in relational data models with declared tables and foreign key relationships. Data types supported include those defined in the ASI SQL-92 standard. GemFire XD also supports JSON documents and custom Java types. RESILENT SELF-HEALING GemFire XD has self-healing automation that allows a node to quickly rejoin a cluster once it becomes operational again, with fast startup, reconnect, and incremental updates of changed data, all handled without administrator intervention. POWERFUL APPLICATION FEATURES GemFire XD provides powerful advanced application features to developers that want to leverage its distributed database capabilities. Like many database platforms, developers can embed and generate queries using SQL. GemFire XD 3 DATA SHEET PIVOTAL GEMFIRE XD provides a sophisticated event handling mechanism providing durable asynchronous queues suitable for mission critical application requirements. USE FAMILAR TOOLS GemFire XD, through support of JDBC and ANSI SQL, allows usage of familiar integrated development environments, app-development frameworks, business intelligence and visualization tools. EASY ADMINISTRATION OF DISTRIBUTED NODES AUTOMATED TUNING GemFire XD is built to automate administrative tasks as much as possible. This includes automating tuning of system resources between nodes in a cluster by intelligently managing the placement of data while reducing network round trips. Data gets distributed and replicated according to the cluster configuration, and requests for access are routed intelligently using the most direct path available. This data placement and resource allocation is adjusted automatically if nodes are added to, or removed from the cluster. COMPREHENSIVE MONITORING & ADMINISTRATION TOOLS GemFire XD provides a comprehensive set of online and offline tools for monitoring and administering clusters. The online dashboard allows drill down into cluster and node status, and querying of stored data. The offline analytics tool allows diagnosis of system bottlenecks through analysis of historical statistics logging. A command line tool allows administrators to take action on clusters and nodes such as starting, stopping and configuring settings. FLEXIBLE DEPLYOYMENT OPTIONS GemFire XD runs in Java Virtual Machines in 32 and 64-bit mode on Linux and Windows operating systems. GemFire XD grids can be set up with active/active multi-site bi-directional WAN replication to enable disaster recovery, business continuity, and geographical proximity for lowest possible latency world-wide. LEARN MORE To learn more about Pivotal’s products and services, please visit us at pivotal.io. For more information about Pivotal Big Data Suite for application developers, please visit pivotal.io/big-data. SIMPLIFIED CLUSTER CONFIGURATION Node configuration is handled centrally with automatic redundancy for high-availability. New nodes can get their configuration from the centralized configuration manager upon startup to quickly join a cluster with no additional system administration tasks. Pivotal offers a modern approach to technology that organizations need to thrive in a new era of business innovation. Our solutions intersect cloud, big data and agile development, creating a framework that increases data leverage, accelerates application delivery, and decreases costs, while providing enterprises the speed and scale they need to compete. Pivotal 3495 Deer Creek Road Palo Alto, CA 94304 pivotal.io Pivotal, Pivotal CF, and Cloud Foundry are trademarks and/or registered trademarks of Pivotal Software, Inc. in the United States and/or other Countries. All other trademarks used herein are the property of their respective owners. © Copyright 2014 Pivotal Software, Inc. All rights reserved. Published in the USA. PVTL-DS-10/14