KNIME Big Data Extension
Transcription
KNIME Big Data Extension
KNIME Big Data Extension Leveraging your Hadoop Infrastructure With the KNIME Big Data Extension, access to your data is now even easier. As part of the commercial KNIME.com AG offerings, the extension provides a set of nodes for accessing Hadoop/HDFS via Hive from inside KNIME Analytics Platform. It brings with it all required libraries — after installing the extension from KNIME’s update website no additional steps are needed. The Big Data Extension is certified by Cloudera for CDH 5.0.x and has also been tested against Hadoop 2.4.0 and Hive 0.13. The Big Data Extension can be easily installed via the KNIME Update Manager. Analyze & Data Mining Data Access Big Data Deployment Database Reader PMML Writer Decision Tree Learner Access Oracle DB Loading & Preprocessing on Hadoop Database Table Selector XLS Reader Merge Data Auto-Binner Decision Tree Predictor Partitioning Enrich database with external data Database GroupBy Twitter API Connector Twitter Search Scorer Determine model accuracy Concatenate Text Preprocessing Text Analytics Filtering and stemming Extract important terms R View (Table) Image to Report Call out to R to view Contigency Table Include this image in a report Tag Cloud Image to Report Save model for later re-use XLS Writer Export to Excel JavaScript Scatter Plot Include this image in a report Interactive Table Hive Connector Database Joiner Create aggregate Statistics Database Writer Highlight Database Connection Table Reader Joiner Scatter Plot Color Manager Bar Chart (JFreeChart) Database Table Selector Write results to database Interactive data plot Crosstab Data to Report Database Row Filter Include this table in a report KNIME Big Data Extension as part of a complete KNIME workflow The Power of Hadoop Mix and Match within KNIME Cloudera Certified KNIME allows you to take advantage of the power of Hadoop from within your KNIME environment. All Hive infrastructure features may be utilized from the well documented and maintained KNIME Analytics Platform. The real power of Hadoop comes when you can mix and match as needed across all steps of an analytic application. KNIME Big Data Extensions are Cloudera certified. Building a Hadoop cluster from the ground up can be challenging. There are numerous choices to be made at all levels of the stack and making those choices can be complex. Database nodes may be combined with all features of KNIME, allowing advanced features like flow variables and flow control to automate which elements are executed at a given stage within your Hadoop environment. When combined with KNIME Server, you can go even further to provide all types of access control to your users, not only for the data but also for the types of queries and manipulation that can be performed. With KNIME, you can mix data sources as well as pick and choose exactly where a particular operation or task should complete, giving you total control over the workflows that you build. Starting with a subset of Hadoop data, KNIME may be used to optimize exactly which portions of the workflow application should be in Hadoop, in another database, or in KNIME itself. You have total control of your Big Data. If you have KNIME Server, you can also surface all or portions of the workflows to your casual users via KNIME WebPortal, giving those users access to sensible Big Data as well. The Cloudera Certified Technology program is designed to make selecting the right technology easier. When you see the Cloudera Certified Technology logo, you can trust that the product has been tested and validated to work with CDH, Cloudera’s open source and enterprise-ready distribution of Apache Hadoop and related projects. KNIME Big Data Extensions are easy to use Hive Connector Hive Loader Impala and more The Hive Connector node establishes a connection to a Hadoop/HDFS database. Connection details may be specified and stored using the node dialog box . The Big Data Extension provides a special Hive Loader node to get your data into Hadoop. The node makes use of the File Handling extensions and first copies the data to the HiveServer (using SSH, FTP, or any other supported protocol ‒ note that remote access to the HiveServer is currently required). Next, a Hive command is executed to import the data into Hadoop/HDFS. The node’s output is a database connection operating on the imported table. Impala support will be added late in 2014. As with all KNIME commercial products, enhancements to KNIME Big Data Extension are automatically included as they become available. Once executed, the node returns a database connection that can be used with almost any of KNIME’s standard database nodes. You can easily upgrade to all new features via the standard upgrade process built into your KNIME Analytics Platform. KNIME Analytics Platform ...is an open source platform for integrated data access, data mining, statistics, visualization, and reporting. KNIME Collaborative Extensions ...activate the full potential of KNIME for teams. Features include user rights & authentication, remote & scheduled execution, shared workflow repository, data space, and metanodes, plus access to workflows, reports, and web services. KNIME Productivity Extensions ...increase the speed at which KNIME workflows can be created, reused, and maintained for both individuals and partners who use KNIME for their clients. KNIME Performance Extensions KNIME Community & Partner Extensions ...enable distributed storage and scalable execution of KNIME workflows, enabling scale for large data sets and complex computation requirements. ...are nodes created by and given back to the KNIME community. These range from broadly useful application nodes simplifying the work of anyone using KNIME, to domainspecific solutions to pressing problems for specialists. About KNIME KNIME is the leading open platform for data-driven innovation helping organizations to stay ahead of change. Innovative organizations use our open-source, enterprise-grade analytics platform to discover the potential hidden in their data, mine for fresh insights or predict new futures. Quick to deploy, easy to scale, and intuitive, KNIME is used in over 60 countries on data of every kind: from numbers to images, molecules to humans, signals to complex networks, from kilo- to petabytes or simple reports to complex analyses. KNIME is developed and supported by KNIME.com AG. Learn more at www.knime.com KNIME.com AG Technoparkstrasse 1 8005 Zurich Switzerland [email protected] http://www.knime.com Tel.: +41 44 445 2660 Fax: +41 44 445 2662 Copyright © 2014 KNIME.com AG. All rights reserved. KNIME TM is a registered trademark of KNIME GmbH, Germany. All other brands or product names mentioned are trademarks owned by their respective organizations. V09/14