PHEMI Central Datasheet
Transcription
PHEMI Central Datasheet
D ATA S H E E T PHEMI Central Big Data Warehouse Take Advantage of Enterprise-Grade Big Data to Unlock the Value in Your Data Collect Unlock data silos, consolidate your data Curate Curate data for sub-second lookups with automatic indexing and cataloging Consume Automatically enforce governance policies PHEMI Central™ is a big data warehouse that takes advantage of the power, scalability, and flexibility of Hadoop while providing fully integrated privacy, security, governance, and data management—all built right in. For the first time, organizations can take advantage of big data while retaining the governance and data management of a traditional enterprise data warehouse to unlock the value in their data, driving discovery and fuelling innovation with big data economics, while meeting compliance and governance objectives: t The ability to collect, curate, and consume any volume and variety of data t High-speed data ingestion and processing that supports real-time operations and business intelligence applications t Full data control and lifecycle management t Built-in Privacy by Design Privacy, Security, and Governance Automatically enforce data sharing, consent, and privacy rules The PHEMI Central Big Data Warehouse includes the following innovations: Data Management Gain full control with field-level enterprisegrade data management DPF framework: Custom and standard Data Processing Functions (code libraries) can parse, recognize, extract, cleanse, standardize, encrypt, mask, or redact selected fields. Metadata framework: Extensible, descriptive end-to-end metadata enables access control and data management at the field level. Privacy By Design: PHEMI Central is designed from the ground up to implement PbD principles. PHEMI Central Big Data Warehouse PRIVACY, SECURITY, GOVERNANCE Protect information at the field-level and ensure rightful access at scale DATABASE DATA SOURCES REPORTS and ANALYTICS TEXT BUSINESS INTELLIGENCE SPREADSHEET IMAGES SENSORS GENOMICS COLLECT Ingest any raw data type and tag with metadata CURATE CONSUME Use powerful data processing functions to transform and catalog data into analytics-ready assets Generate datasets on demand Use any third-party apps SYSTEM MANAGEMENT Enterprise-grade reliability, availability, and scalability with cluster economics CUSTOM APPLICATIONS THIRD-PARTY APPLICATIONS APPLICATIONS AND USERS DATA MANAGEMENT Manage data down to the field level SYSTEMS D ATA S H E E T P H E M I C E N T R A L™ B I G D ATA W A R E H O U S E Collect Ingest and describe all types and any size of data PHEMI Central ingests data from multiple and disparate sources. Data can range from small kilobyte files to large terabyte files. Schemaless ingestion is fast. You can: t Stream data from machine-to-machine data sources through the PHEMI REST API t Push data directly from data sources and ETL tools using either JDBC or the PHEMI REST API t Deploy a custom connector based on the PHEMI REST API to allow PHEMI Central to fetch data from data sources t Upload data manually using a standard web browser window Data is tagged on ingest with descriptive metadata that immediately enforces privacy policies and controls the data lifecycle. Data is indexed and cataloged as it is stored, making it immediately findable and retrievable. Curate Extract the greatest possible value from your data with processing, indexing, cataloging, linking, and metadata PHEMI Central uses a flexible, distributed key-value store, automatic indexing and cataloging, and sophisticated metadata tagging to manage, describe, and govern the data that it stores. Data Linking After cataloging and indexing, data can be linked based on keywords, graph relationships, and geospatial attributes. Data linking expands the kinds of connections you can make between data items, promotes discovery, and gives you a more complete picture of your data. Data Processing Function Framework PHEMI Central lets you develop customized pieces of executable code, called Data Processing Functions (DPFs), that provide unprecedented power and flexibility. t Parse ingested data, extract or cleanse data, encrypt, redact, or anonymize selected information t Provide enhanced or deeper indexing and cataloging t Restructure data t Transform data into standardized ontologies t Analyze streams of machine data to find patterns and exceptions, calculate aggregates, or convert streaming data into an analytics-ready state for trending and predictive analysis. As the organization’s needs evolve and knowledge advances, you can simply develop new DPFs and re-execute on your data. DPFs can be developed in modern programming languages such as Java, Python, and C++. No specialized expertise in MapReduce or YARN is required. Your DPF can be developed by PHEMI, by your in-house programmers, or by a third party. Data Dictionary Conventional big data systems store big data, but struggle to catalog or track diverse data types. With PHEMI Central, you can use DPFs that act as data dictionaries, identifying and saving a common a common interpretation for fields that occur frequently but are named differently or use different format conventions (such as “M/F” vs. “Male/Female”, or converting between Imperial and metric measurement schemes). Data dictionaries greatly simplify queries and analysis. D ATA S H E E T P H E M I C E N T R A L™ B I G D ATA W A R E H O U S E Consume Access your datasets on demand at sub-second speeds, even with petabytes of data Describing information with metadata means that users and applications can query data based on the data’s properties, instead of navigating complex directories or schemas. Multiple users can interact with the system, accessing datasets via SQL, data exports, and custom applications. Above all, information in PHEMI Central is findable and searchable, for users and applications. t Break down costly data silos by constructing datasets across multiple and disparate data sources t Reduce data sprawl by creating virtual datasets that are not instantiated until export t Improve consumption speeds with digital assets that are cataloged and indexed in advance t Ensure rightful access at all times, with every data request automatically mediated by a policy enforcement engine Privacy, Security, and Governance Automatically de-identify, encrypt, or mask personal information PHEMI Central provides an industry-pioneering set of capabilities to manage the governance of sensitive data, enforced from end to end and throughout the lifecycle of data. PHEMI Central uses one coordinated framework based on Privacy by Design principles to define, manage, and enforce data sharing agreements and privacy policies across an entire organization or set of organizations. Data is tagged with attributes that describes its level of sensitivity. Users are tagged with attributes that describe their level of authorization. Simple, powerful access rules describe the relationships between data visibility and user authorization. Datasets can be associated with access policies that are independent of the policies attached to the source data collections, but rightful access to data is always enforced. PHEMI Central keeps your data secure: t User roles determine what operations a user can perform t The system maintains a complete, tamperproof audit log of operations and data access t Communication links from data sources or to consuming systems can be encrypted using Secure Sockets Layer (SSL) or Transport Layer Security (TLS) t Data fields can be individually selected for encryption at rest t Because privacy and security are performed at the data level, it’s easier and faster to prototype, test, and deploy new applications Privacy by Design A Privacy by Design (PbD) approach requires you to take into account seven foundational principles throughout your system. But how do you know whether your system implements PbD principles? Here’s a checklist: 1. Metadata. All data should be tagged on ingest with enough descriptive information to allow adequate privacy, sharing, consent, and lifecycle management, plus compliance with any other governance requirements. 2. Role-based access control. User and application access to functionality and operations is adequately restricted by system roles. 3. Policy-based data access. Access to and visibility of data is restricted by permissions and authorizations, and controlled by access policies. 4. Automatic policy enforcement. The system automatically enforces policies and governance; manual intervention is not required. 5. Transparency. Data stewards and privacy officers can directly view and verify the system implementation of governance policies. 6. Auditability. The system automatically tracks system activity, and maintains a detailed, tamperproof audit log of data access and system operations. 7. Data immutability. Data in the repository remains available in its original form, regardless of what digital assets are derived from the original through transformation. 8. Ability to anonymize. The system should be able to automatically de-identify, encrypt, mask, obfuscate, or redact personal information, and allow the data steward or privacy officer to choose which version of data appears to which users. D ATA S H E E T P H E M I C E N T R A L™ B I G D ATA W A R E H O U S E Specifications Data Management Use a powerful metadata framework to manage digital assets at the field level Element-level metadata embeds the rules and policies governing the data at the field level. Data retention policies and data sharing agreements are automatically enforced. Data in the system is immutable: the original data cannot be modified and data is only purged from the system based on the configured retention policy. Robust version control and rollback capabilities mean that data is never lost, corrupted, or overwritten. On-Premise Deployment Cloud Deployment 4 Cluster Nodes. Each: t Subscribe to PHEMI Central as a managed service using Amazon Web Services. t8xCore (2.2GHz) t64 GB RAM t12 TB Direct Attached Storage 2 Management Nodes. Each: tCloud service grows from 1 TB storage capacity. t4xCore (2.2 GHz) t64 GB RAM t2 TB RAID1 Storage 10 Gigabit Ethernet Network System Management Get cluster reliability and economics at scale PHEMI Central can be deployed at the customer premise, as a managed service, or as a cloudbased service. The system uses low-cost commodity hardware components and Direct Attached disk drives to lower the cost of ownership compared to traditional enterprise data warehouse systems. Storage and compute resources scale linearly from terabytes to petabytes. All data in the system is replicated three times to ensure availability and resiliency. Direct-attached drives can be hot-swapped without impacting performance or data availability. Larger or faster Direct Attached drives and nodes are absorbed into the system and load-balanced automatically. The system provides clear visibility into system health, diagnostics, troubleshooting, capacity, and digital assets under management. Using Apache Ambari, system management capabilities can be integrated with existing tools. Data Ingest Protocols Data Export Protocols tSFTP File Transfer t Excel/CSV/TSV Download tHTTP/HTTPS Manual Upload t REST Web Services API tREST Web Services API t ODBC/JDBC SQL Interface tODBC/JDBC SQL Interface tCCDA HL7 Interface Analytics Tools Data Processing Functions tR t Excel Reader t SAP t Variant Call Format (VCF) Reader t SAS t JSON Reader t SPSS t XML Reader t Stata t Tableau Ease Your Entry into Big Data PHEMI Central makes it easy to break into big data. The software is fully integrated and enterprise-ready, so you don’t need to hire a team of Hadoop engineers to build and maintain your system. And, you can start small and expand incrementally. Use PHEMI Central to offload your existing data warehouse, or to capture new data types or sources. Keep your existing systems and tools and let PHEMI Central feed data into them. You can move into big data as you become ready, at your own speed. Visit www.phemi.com for more information. www.phemi.com [email protected] twitter.com/PHEMIsystems linkedin.com/company/phemi Copyright © 2015, PHEMI and/or its affiliates. All rights reserved. Affiliate names may be trademarks of their respective owners. April 2015