KNIME Big Data Extension

Transcription

KNIME Big Data Extension
KNIME Big Data Extension
Leveraging your Hadoop Infrastructure
With the KNIME Big Data Extension, access to your data is now even easier. As part of the commercial KNIME.com AG offerings, the extension provides a set of nodes for accessing Hadoop/HDFS via Hive from inside KNIME Analytics Platform. It brings
with it all required libraries — after installing the extension from KNIME’s update website no additional steps are needed. The
Big Data Extension is certified by Cloudera for CDH 5.0.x and has also been tested against Hadoop 2.4.0 and Hive 0.13. The Big
Data Extension can be easily installed via the KNIME Update Manager.
Analyze & Data Mining
Data Access
Big Data
Deployment
Database Reader
PMML Writer
Decision
Tree Learner
Access Oracle DB
Loading & Preprocessing on Hadoop
Database Table
Selector
XLS Reader
Merge Data
Auto-Binner
Decision Tree
Predictor
Partitioning
Enrich database
with external data
Database GroupBy
Twitter API
Connector
Twitter Search
Scorer
Determine
model accuracy
Concatenate
Text Preprocessing
Text Analytics
Filtering and stemming
Extract
important terms
R View (Table)
Image to Report
Call out to R to view
Contigency Table
Include this image
in a report
Tag Cloud
Image to Report
Save model for
later re-use
XLS Writer
Export to Excel
JavaScript
Scatter Plot
Include this image
in a report
Interactive Table
Hive Connector
Database Joiner
Create aggregate
Statistics
Database Writer
Highlight
Database Connection
Table Reader
Joiner
Scatter Plot
Color Manager
Bar Chart
(JFreeChart)
Database Table
Selector
Write results to database
Interactive data plot
Crosstab
Data to Report
Database Row Filter
Include this table
in a report
KNIME Big Data Extension as part of a complete KNIME workflow
The Power of Hadoop
Mix and Match within KNIME
Cloudera Certified
KNIME allows you to take advantage of the power
of Hadoop from within your KNIME environment.
All Hive infrastructure features may be utilized
from the well documented and maintained
KNIME Analytics Platform.
The real power of Hadoop comes when you can
mix and match as needed across all steps of an
analytic application.
KNIME Big Data Extensions are Cloudera certified.
Building a Hadoop cluster from the ground up can
be challenging. There are numerous choices to be
made at all levels of the stack and making those
choices can be complex.
Database nodes may be combined with all features of KNIME, allowing advanced features like
flow variables and flow control to automate
which elements are executed at a given stage
within your Hadoop environment.
When combined with KNIME Server, you can go
even further to provide all types of access control
to your users, not only for the data but also for
the types of queries and manipulation that can be
performed.
With KNIME, you can mix data sources as well as
pick and choose exactly where a particular operation or task should complete, giving you total
control over the workflows that you build.
Starting with a subset of Hadoop data, KNIME
may be used to optimize exactly which portions
of the workflow application should be in Hadoop,
in another database, or in KNIME itself. You have
total control of your Big Data.
If you have KNIME Server, you can also surface all
or portions of the workflows to your casual users
via KNIME WebPortal, giving those users access
to sensible Big Data as well.
The Cloudera Certified Technology program is
designed to make selecting the right technology
easier. When you see the Cloudera Certified Technology logo, you can trust that the product has
been tested and validated to work with CDH,
Cloudera’s open source and enterprise-ready
distribution of Apache Hadoop and related projects.
KNIME Big Data Extensions are easy to use
Hive Connector
Hive Loader
Impala and more
The Hive Connector node establishes a connection
to a Hadoop/HDFS database. Connection details
may be specified and stored using the node dialog
box .
The Big Data Extension provides a special Hive
Loader node to get your data into Hadoop. The
node makes use of the File Handling extensions
and first copies the data to the HiveServer (using
SSH, FTP, or any other supported protocol ‒ note
that remote access to the HiveServer is currently
required). Next, a Hive command is executed to
import the data into Hadoop/HDFS. The node’s
output is a database connection operating on the
imported table.
Impala support will be added late in 2014. As with
all KNIME commercial products, enhancements to
KNIME Big Data Extension are automatically included as they become available.
Once executed, the node returns a database
connection that can be used with almost any of
KNIME’s standard database nodes.
You can easily upgrade to all new features via the
standard upgrade process built into your KNIME
Analytics Platform.
KNIME Analytics Platform
...is an open source platform for
integrated data access, data mining,
statistics, visualization, and reporting.
KNIME Collaborative
Extensions
...activate the full potential of KNIME for
teams. Features include user rights &
authentication, remote & scheduled
execution, shared workflow repository,
data space, and metanodes, plus access
to workflows, reports, and web services.
KNIME Productivity
Extensions
...increase the speed at which KNIME
workflows can be created, reused,
and maintained for both individuals
and partners who use KNIME for
their clients.
KNIME Performance Extensions
KNIME Community & Partner Extensions
...enable distributed storage and scalable execution of KNIME workflows, enabling scale for large
data sets and complex computation requirements.
...are nodes created by and given back to the KNIME community. These range from broadly useful application nodes
simplifying the work of anyone using KNIME, to domainspecific solutions to pressing problems for specialists.
About KNIME
KNIME is the leading open platform for data-driven innovation helping organizations to stay ahead of change. Innovative organizations use our open-source,
enterprise-grade analytics platform to discover the potential hidden in their data, mine for fresh insights or predict new futures.
Quick to deploy, easy to scale, and intuitive, KNIME is used in over 60 countries on data of every kind: from numbers to images, molecules to humans, signals
to complex networks, from kilo- to petabytes or simple reports to complex analyses.
KNIME is developed and supported by KNIME.com AG. Learn more at www.knime.com
KNIME.com AG
Technoparkstrasse 1
8005 Zurich
Switzerland
[email protected]
http://www.knime.com
Tel.: +41 44 445 2660
Fax: +41 44 445 2662
Copyright © 2014 KNIME.com AG. All rights reserved. KNIME TM is a registered trademark of KNIME GmbH, Germany.
All other brands or product names mentioned are trademarks owned by their respective organizations.
V09/14