Big Data

Transcription

Big Data
Big Data & Analytics
Willkommen zur IBM Roadshow
SPSS Predictive Analytics meets Big Data
Think Big, Start Small, Act Fast
Oktober /November 2014
Hamburg, Berlin, Stuttgart, München, Düsseldorf, Wien, Zürich
© 2014 IBM Corporation
Big Data & Analytics
Agenda
08.30 - 09.00 Uhr:
Registrierung
09.00 - 09.15 Uhr:
Big Data Analytics - Hype oder Realität?
09.15 - 09.45 Uhr:
Big Data – Überblick
• Vorstellung der IBM Big Data Plattform
• Architektur und Komponenten
• Big Data Anwendungsbereiche und Referenzen
09.45 - 10.30 Uhr:
Predictive Analytics & Big Data
die skalierbare IBM SPSS Lösung jetzt auch auf Hadoop
• Typische Anwendungsbereiche und aktuelle Herausforderungen im Mittelstand
• Referenzen und Beispiele aus dem Mittelstand und diversen Industriebereichen
• Überblick SPSS Predictive Analytics Produktportfolio & Architektur
10.30 - 11.00 Uhr:
2
Pause
© 2014 IBM Corporation
Big Data & Analytics
Agenda
11.00 - 12.00 Uhr:
InfoSphere BigInsights - Hadoop fit für den Einsatz im
Unternehmen - Live-Demonstration
• Explorative Aufbereitung, Analyse und Visualisierung von Daten mit Big Sheets
• Automatisieren von Abläufen mit Applikationen und Workflows
• Integration von Hadoop in analytische Umgebungen mit Big SQL
12.00 - 13.15 Uhr :
IBM SPSS Predictive Analytics live Demonstration anhand von
Beispielen aus den Bereichen Kundendatenanalyse und
Analyse von Produktions-/Fertigungsdaten in der Industrie
• IBM SPSS Modeler & Analytic Server – Data Mining basierend auf Big Data für
Analysten und Data Mining Experten
• Watson Analytics - Vorstellung und Ausblick neuer IBM Technologien und
Trends
13:15 Uhr:
3
Zusammenfassung und anschließendes Lunchbuffet
© 2014 IBM Corporation
Big Data Analytics
Hype oder Realität?
4
© 2013 IBM Corporation
Big Data & Analytics
Big Data bietet viele Chancen in allen Bereichen
49% der Kunden nutzen zwei oder mehr Technologien um einzukaufen,
und 53% der aktiven, erwachsenen Social-Netzwerker FOLLOW A BRAND
Milliarden von
Kundenpräferenzen und
-bewertungen
existieren in Callcentern,
Webseiten, Transaktionsdaten,
Mensch-Maschine
Interaktionen und Social Media
6.8%
nur
des Marketing glauben, dass
Social Media in die Strategie
integriert ist
7.6% des Budgets
im Marketing ist für Social
Media eingeplant
5
Variety
Daten in
vielen, verschiedenen
Formaten
Velocity
Batch und streaming
1 von 5 Onlineminuten
werden in sozialen Netzen
verbracht
400 Millionen
tweets werden täglich
verschickt
Big Data 
Big Value
Glaubwürdigkeit
<->Zweifel
Veracity
Terabytes und
zettabytes Daten
Volume
5 Milliarden+
Mobiletelefone weltweit
55%
nuzten ihr Mobiltelefon für
Preisvergleiche
34%
scannen QR Codes
© 2014 IBM Corporation
Big Data & Analytics
Gartner: Hype Cycle für Big Data 2014
6
© 2014 IBM Corporation
Big Data & Analytics
Anwendungsbeispiele:
Firmen die Empfehlungssyteme nutzen
Zielsetzung: Kunden sollen individuell angesprochen werden um
den Umsatz & die Nutzung zu steigern.
7
© 2014 IBM Corporation
Big Data & Analytics
Anwendungsbeispiel-Details:
“We know what people watch on Netflix and we’re able with a high degree of confidence
to understand how big a likely audience is for a given show based on people’s viewing
habits.”
(Jonathan Friedland - Netflix Communications Director)
• Mehr als 50 Millionen Nutzer
• Mehr als 60 Millionen Filmabrufe pro Tag (Netflix halt fest; Pausen, Vor- und
Zurückspulen
• Mehr als 6 Million Bewertung / Tag
• Mehr als 4 Million Suchanfragen / Tag
• Geopositionsdaten
• Gerätinformation (Mobiltelefon, Tablet, TV, …)
• Uhrzeit und Datum / Wochentag (es wurde bewiesen, dass Nutzer in der Woche mehr
TV-Shows sehen und am Wochenende mehr Filme)
• Metadaten von Drittanbietern wie zum Beispiel Nielsen
• Social Media Daten von Facebook und Twitter
•  Mehr als 1Mrd US$ Umsatz im 1. Quartal 2014
8
© 2014 IBM Corporation
Big Data & Analytics
9
© 2014 IBM Corporation
Big Data Überblick
10
© 2013 IBM Corporation
Big Data & Analytics
The Big Data Paradox: More Data, Less Confidence
11
1 in 3
Business leaders
frequently make decisions
based on information they
don’t trust, or don’t have
1 in 2
Business leaders say they
don’t have access to the
information they need to
do their jobs
60%
Have more data than they
can use
40%
Time spent on each big
data project to understand
information
2.5 Billion
1 Trillion
3 Times
gigabytes new every
day
connected things by 2015
Increase in transistors per
human by 2017
© 2014 IBM Corporation
Big Data & Analytics
IBM Referenzarchitektur für Informationsmanagement und Analyse
Transaction and
application data
Information
ingestion and
integration
zone
Enterprise
warehouse
and data mart
zone
Why did it
happen?
Reporting, analysis,
content analytics
Information governance zone
12
© 2014 IBM Corporation
Big Data & Analytics
IBM Referenzarchitektur für Informationsmanagement und Analyse
New/Enhanced
Applications
All Data
Real-time
analytics
zone
Transaction and
application data
Machine and
sensor data
Enterprise
content
Information
ingestion and
integration
zone
Exploration,
landing and
archive zone
What is
happening?
Enterprise
warehouse
and data mart
plus analytics
databases
zone
Discovery and
exploration
What did I
learn, what’s
best?
Reporting, analysis,
Why did it
happen?
content analytics
Cognitive
What could
happen?
Image and
video
What action
should I take?
Decision
management
New business
models
Financial
performance
Risk
Predictive analytics
and modeling
Information governance zone
Operations
and fraud
Social data
Systems
Third-party data
Customer
experience
Security
Storage
On premise, Cloud, As a service
IT economics
IBM Big Data & Analytics Infrastructure
13
© 2014 IBM Corporation
Big Data & Analytics
Battelle – Boosts power grid reliability and helps consumers
and businesses cut their energy costs
Processes up to 10PB
of data in real time to deliver game
changing insights
Boosts grid efficiency
through the introduction of innovative
transactive control mechanisms
Provides price incentives
helping consumers make informed
choices about energy usage
Solution components
• IBM® InfoSphere® Streams
• IBM PureData™ System for Analytics
(powered by IBM Netezza® technology)
The transformation: Capturing and analyzing huge volumes of
data from smart energy meters, and applying appropriate
algorithms helps set electricity prices based on changing demand
factors.
“This project will help optimize the system and better
integrate renewable resources.”
— Ronald Melton, PhD, project director for the Pacific Northwest
Smart Grid Demonstration Project, led by Battelle.
Real-time
analytics
zone
14
Exploration,
landing and
archive
zone
Enterprise
warehouse
and data
mart plus
analytics
databases
zone
© 2014 IBM Corporation
Big Data & Analytics
Dublin City Centre – Monitors citywide traffic intelligently to
optimize public transit systems
Monitors 600 buses
across 150 routes and 5,000 bus
stops daily
Offers 50 updates a second
for real-time visualization of bus
locations and arrival times
Estimates arrival times
and transit times, as well as flagging
likely delays
Solution components
• IBM® InfoSphere® Streams
The transformation: Measuring speed and traffic flow across
public transit routes enables new insight for intelligent decisions
that enhance system performance and reliability.
“IBM solutions have enabled our traffic managers to
make more accurate decisions.”
— Brendan O'Brien, Head of Technical Services, Dublin City
Council Traffic Division
Real-time
analytics
zone
15
Exploration,
landing and
archive
zone
Enterprise
warehouse
and data
mart plus
analytics
databases
zone
© 2014 IBM Corporation
Big Data & Analytics
Vestas – Turns climate into capital with Big data using
IBM InfoSphere BigInsights
97% decrease
in response times for wind
forecasting information
Cuts cost per kilowatt hour
increasing customer’s return on
investment
40% reduction
in energy consumption, reducing IT
footprint while increasing power
Solution components
• IBM® InfoSphere® BigInsights™
Enterprise Edition
The transformation: Analyzing petabytes of wind data to
pinpoint optimal turbine placement, maximizes power generation
and reduces energy costs.
“We can now show our customers how the wind behaves
and provide a solid business case that is on par with any
other investment that they may have.”
— Lars Christian Christensen, vice president, Vestas Wind
Systems
Real-time
analytics
zone
16
Exploration,
landing and
archive
zone
Enterprise
warehouse
and data
mart plus
analytics
databases
zone
© 2014 IBM Corporation
Big Data & Analytics
InfoSphere BigInsights is based on Apache™ Hadoop
Apache™ Hadoop® is an open source software project that
enables the distributed processing of large data sets across
clusters of commodity servers.
MapReduce - The framework that understands and assigns work to the nodes in a cluster.
HDFS - A file system that spans all the nodes in a Hadoop cluster for data storage.
Scalable – New nodes can be added as needed, and added without needing to change
data formats, how data is loaded, how jobs are written, or the applications on top.
Cost effective – Hadoop brings massively parallel computing to commodity servers.
Flexible – Hadoop is schema-less, and can absorb any type of data, structured or not,
from any number of sources.
Fault tolerant – When you lose a node, the system redirects work to another location of
the data and continues processing without missing a beat.
17
© 2014 IBM Corporation
Big Data & Analytics
InfoSphere BigInsights for Hadoop includes the latest Open Source
components, enhanced by enterprise components
IBM InfoSphere BigInsights for Hadoop
Open Source
IBM
Governance
GPFS FPO
Data Privacy for Hadoop
HDFS
Data Matching
File System
Flexible
Scheduler
HBase
Audit & History
Data Store
Adaptive MapReduce
Enterprise
Search
MapReduce
Data Masking
Big SQL
Security
Pig
Data Security for Hadoop
Sqoop
LDAP
Hive
Kerberos
ETL
YARN*
HCatalog
Monitoring
Flume
Search
Jaql
Resource Management &
Administration
Streams
Solr/
Lucene
Runtime
Text
Analytics
Oozie
Stream Computing
Console
Big R
Data
Access
18
Text Analytics
Extractors
Dashboard
Advanced Analytics
R
BigSheets Reader
and Macro
Eclipse Tooling:
MapReduce, Hive, Jaql,
Pig, Big SQL, AQL
BigSheets
Charting
Applications & Development
ZooKeeper
Visualization & Ad
Hoc Analytics
© 2014 IBM Corporation
* In Beta
Big Data & Analytics
IBM InfoSphere BigInsights – Enterprise Ready Hadoop solution
performance
gain on average
over open
source Hadoop
InfoSphere BigInsights
Accelerators
Visualization & Exploration
Analytics
Analytics
Development
Environment
Development Tools
Included in BigInsights Enterprise
Edition:
(limited use license)
Big SQL
Enterprise
capabilities
Extractors
and APIs
Streams
Analytics Extraction Engine
Data
Explorer
Cognos
BI
Connectors
Workload
Management
Security
Workload Optimization (MapReduce/SQL)
• Accelerators
Administration & Security
Open source
based
components
19
Open source
based
components
IBM tested & supported
open source
components
© 2014 IBM Corporation
Big Data & Analytics
Collaborative Big Data for many roles

Business Users can get their hands on
big data and use big data applications and
BigSheets to get insights into their data
 Data scientists can perform deeper analysis
and get richer insights
 Administrators are empowered to be more
agile through better controls and views into key
performance indicators
 Developers can leverage unified tooling in a Big Data
Application Development Lifecycle and are able to create and
deploy new types of applications, with enhancements that
simplify even complex workflows
20
© 2014 IBM Corporation
Big Data & Analytics
BigSheets to analyze and visualize

Model “big data” collected from
various sources in spreadsheetlike structures

Filter and enrich content with
built-in functions

Combine data in different
workbooks

Visualize results through
spreadsheets, charts

Export data into common formats
(if desired)
No programming knowledge needed!
21
© 2014 IBM Corporation
Big Data & Analytics
Big SQL – Architected for Performance

Leverage IBM's rich SQL heritage, expertise, and technology
• Modern SQL:2011 capabilities
• DB2 compatible SQL PL support
SQL-based
Application
•SQL bodied functions and stored procedures
•Application logic/security encapsulation
IBM Data Server Client
22

Architected from the ground up for performance
• low latency and high throughput

MapReduce replaced with a modern MPP
architecture
• Compiler and runtime are native code (not java)
• Big SQL worker daemons live directly on cluster
• Continuously running (no startup latency)
• Processing happens locally at the data
Big SQL
SQL MPP Runtime
Data Sources
Parquet
CSV
Seq
RC
Avro
ORC
JSON
Custom

Operations occur in memory with the ability
InfoSphere BigInsights
to spill to disk
• Supports aggregations and sorts larger than available RAM

Integration with BigSheets (source & target)
© 2014 IBM Corporation
Big Data & Analytics
IBM Referenzarchitektur für Informationsmanagement und Analyse
New/Enhanced
Applications
All Data
Real-time
analytics
zone
Transaction and
application data
Machine and
sensor data
Enterprise
content
Information
ingestion and
integration
zone
Exploration,
landing and
archive zone
What is
happening?
Enterprise
warehouse
and data mart
plus analytics
databases
zone
Discovery and
exploration
What did I
learn, what’s
best?
Reporting, analysis,
Why did it
happen?
content analytics
Cognitive
What could
happen?
Image and
video
What action
should I take?
Decision
management
New business
models
Financial
performance
Risk
Predictive analytics
and modeling
Information governance zone
Operations
and fraud
Social data
Systems
Third-party data
Customer
experience
Security
Storage
On premise, Cloud, As a service
IT economics
IBM Big Data & Analytics Infrastructure
23
© 2014 IBM Corporation
Predictive Analytics & Big Data
- die skalierbare IBM SPSS Lösung
jetzt auch auf Hadoop
24
© 2013 IBM Corporation
Big Data & Analytics
IBM Referenzarchitektur für Informationsmanagement und Analyse
New/Enhanced
Applications
All Data
Real-time
analytics
zone
Transaction and
application data
Machine and
sensor data
Enterprise
content
Information
ingestion and
integration
zone
Exploration,
landing and
archive zone
What is
happening?
Enterprise
warehouse
and data mart
plus analytics
databases
zone
Discovery and
exploration
What did I
learn, what’s
best?
Reporting, analysis,
Why did it
happen?
content analytics
Cognitive
What could
happen?
Image and
video
What action
should I take?
Decision
management
New business
models
Financial
performance
Risk
Predictive analytics
and modeling
Information governance zone
Operations
and fraud
Social data
Systems
Third-party data
Customer
experience
Security
Storage
On premise, Cloud, As a service
IT economics
IBM Big Data & Analytics Infrastructure
25
© 2014 IBM Corporation
Big Data & Analytics
Was ist Predictive Analytics
Predictive Analytics
generiert aus Daten operative Aktionen, indem
verlässliche Schlüsse zur aktuellen Situation und
zukünftigen Ereignissen erkannt bzw. prognostiziert
werden.
Die drei häufigsten Anwendungsgebiete sind Kundenbeziehungen, Operations und Risikothemen:
26
© 2014 IBM Corporation
Big Data & Analytics
Anwendung von Preditive Analytics am Point of Interaction
Menschen helfen, die bestmögliche Aktion auszuführen
Was soll ich
jetzt tun???
Mach dies!
Systemen helfen, die bestmögliche Aktion auszuführen
Was soll ich
jetzt tun???
Mach dies!
27
© 2014 IBM Corporation
Big Data & Analytics
Verschiedene Ziele müssen gleichzeitig betrachtet werden
“Mit welcher
Wahrscheinlichkeit
wird ein Kunde
reagieren?”
“Welche
Kunden sind
abwanderungsgefährdet?”
Erkennung und
Prävention von
Betrug
Akquise
“guter”
Kunden
Bindung
profitabler
Kunden
RisikoMinimierung
Steigerung
des Kundenwerts
28
“Welche
Aktivitäten
sind betrugsverdächtig?”
“Welches ist das
interessanteste
nächste
Produktangebot?“
“Welche Kunden
werden vss. Zahlungsschwierigkeiten
bekommen?”
© 2014 IBM Corporation
Big Data & Analytics
Predictive Analytics - Einsatzgebiete
Controlling & Produktion
•
•
•
•
•
•
•
•
•
Analyse von Fehlerquellen  Ausschussminimierung
Intelligente Wartung
Umsatzprognosen
Standortanalysen und -planung
Lagerbestandsanalysen und Prognosen
Risikoanalyse
Abweichungsanalyse: Ursache - Wirkung
Einkaufsoptimierung
Ressourceneinsatzplanung
Marketing
•
•
•
•
•
•
•
•
29
Analytisches Customer Relationship Management
Kampagnenoptimierung / Zielgruppensegmentierung
Kundenbindungsmanagement
Kundenwertanalyse
Warenkorbanalysen  Cross-/Upsell
Verhaltensanalyse / Clickstream-Analyse
Sentimentanalyse
Abwanderungsanalyse
© 2014 IBM Corporation
Big Data & Analytics
Predictive Analytics - Einsatzgebiete
Vertrieb und Service
•
•
•
•
•
•
RFM-Analyse
Next-Best-Action
Ersatzteilprognose und –versorgung
Präventiver Austausch von fehlerhaften Teilen im Kundendienst
Optimierung von Rückrufaktionen
Gewährleistungsanalyse
Öffentlicher Bereich und Healthcare
30
• Erkennung ungewöhnlicher Fälle (Steuerbetrug, Sozialbetrug,
Geldwäsche, EU-Fördermittel)
• Vorhersage der Bevölkerungswanderung Stadt/Land
• Stauprognose, intelligente Verkehrsflussleitung
• Verkehrsmittelplanung im ÖPNV, Auslastung
• Wahl der passenden Therapieform durch
Wirksamkeitsvorhersagen
• Polizei: Personaleinsatzplanung, Vorhersage von Hotspots
• Import- / Export-Kontrollen
• Justiz-Psychologie, Wirksamkeitsvorhersage der
Behandlungskonzepte, Vorhersage Rückfallwahrscheinlichkeit
30
© 2014 IBM Corporation
Big Data & Analytics
Beispiel: Zielgerichtetes Marketing bei Inbound-Telefonkontakten
“Ich rufe an, weil ich mich mal wegen meines Download limits erkundigen
wollte. Wie viel davon habe ich denn schon verbraucht?”
“Frau Burghardt, Sie sind gerade kurz vor Ihrem 10GB
Limit. Wir können Ihnen als geschätztem langjährigen
Kunden aber anbieten, zu unserem attraktiven BreitbandUnlimited Angebot zu wechseln”
“Natürlich, Frau Burghardt. Ich sehe
kurz nach… “
Next Best Action : Empfehlung Breitband-Unlimited
31
© 2014 IBM Corporation
Big Data & Analytics
Predictive Analytics im „moment of truth“
Kündigungsrisiko
0622147763
Het is voor U voordeliger om een sms voorraad 50 aan te sluiten.
Kundenwertindikator
Johnson
Churchilllaan 22
1022 AM
Amsterdam
53463788
Cross-Selling Vorschlag
0622147763
32
© 2014 IBM Corporation
Big Data & Analytics
E-Commerce Retailer optimiert das Kundenerlebnis
und steigert den Erfolg durch optimierte Marketingkampagnen
Herausforderungen
•
•
Verständnis über Websiteverhalten und
Ableitung zielgerichteter Maßnahmen sowie
Optimierung von Kampagnen.
Zusammenführung von Daten aus
unterschiedlichen Systemen in near-real-time.
Lösung
•
•
•
33
Analytische Lösung verbindet Daten aus 17
Business Units und über 70 Brand-Websites zu
einer vollständigen, umfassenden Kundensicht.
Statistische und prädiktive Analytik analysiert
near-real-time stream von Kundendaten, Transaktionen, click streams, Mobile App Nutzung,
Online Umfrageresultate und
Filialtransaktionen.
Mustererkennung im Kundenverhalten und
Unterstützung der Next Best Action.
Nutzen
Amortisation des Projekts in sieben
Monaten mit 122% ROI.
Reduktion der Kosten von mehr als
500,000 EUR und Steigerung des
Umsatzes.
Verringerung der Aufwände und Zeit für
Kampagnenmanagement und
Datenverarbeitungsprozesse um 90%.
© 2014 IBM Corporation
Big Data & Analytics
Richmond Police Department
Public –
Police, Crie &
Defense
Eindämmung von Verbrechen mit Predictive Analytics
Hintergrund & Challenge
Benefits
Angesichts von steigenden Verbrechensraten benötigte das Richmond Police
Department einen effizienten und kostengünstigen Weg, um
Verbrechensdaten zu analysieren, Bedrohungen für die öffentliche Sicherheit
zu erkennen und intelligente Personalentscheidungen treffen zu können.
Analyse von sehr großen
Datensammlungen und Vorhersage
von Verbrechensmustern
Lösung
Notwendige Intelligenz zur
Eindämmung von Verbrechen
Die Polizei entschied sich deswegen für IBM SPSS Predictive Analytics
Software, um ein Tool zu implementieren, das Daten von verschiedenen
Quellen in ein Data Warehouse integrieren würde und dabei versteckte
Beziehungen in den Daten erkennen kann. So können beispielsweise
automatische Verbrechensvorhersagen kreiert werden.
Komponenten der Lösung
Möglichkeit zur effizienten
Ressourcenverteilung
Reduktion von Gewaltverbrechen um
32% von 2006 bis 2007 und um
zusätzliche 40% von 2007 bis 2008
IBM SPSS Statistics
IBM SPSS Modeler
34
34
© 2014 IBM Corporation
Big Data & Analytics
Einsatz von Predictive Analytics im gesamten Produktlebenszyklus
Diagnose von
Versuchsdaten
Prognose von Serviceintervallen
Entwicklung
Identifikation verwandter Probleme,
Ursachen und Maßnahmen
Analyse von
Händlerabrechnungen
und daraus abgeleitete
Massnahmen
After Sales
Garantie
Frühe
Baureihen
Vermeidung von
Wiederholreparaturen
Automatisierte
Auswertung von Texten
Produktempfehlungen am
Point of Sales
Automatisierte Befragungen und
direkte Analyse der Antworten
Produktionsoptimierung
und -monitoring
Marketing
Sales
Produktion
Logistik
Kundenbedarfsermittlungen durch
Segmentierung
35
Nutzung von
Telemetriedaten zur
vorzeitigen
Fehleridentifikation
Lean SixSigma
Vorausschauende
Instandhaltung und
Lagerhaltung
© 2014 IBM Corporation
Big Data & Analytics
Daimler AG: Automobilhersteller steigert Produktivität in der
Zylinderkopfproduktion
Ergebnisse



25 Prozent Steigerung der Produktivität in
der Daimler Zylinderkopfproduktion dank der
mit IBM SPSS gewonnenen Erkenntnisse.
50 Prozent Verkürzung der Hochlaufphase
des Fertigungsprozesses bis zur Erreichung
der Zielwerte.
Bei Überschreitung von Schwellwerten
ermöglichen die Auswertungen eine schnelle
Fehlerquellenlokalisierung, gezielte
Prozesseingriffe und somit die Vermeidung
von Ausschussprodukten, noch bevor sie
entstehen.
Link zur Referenz - englisch:
http://www-01.ibm.com/common/ssi/cgibin/ssialias?subtype=AB&infotype=PM&appname=SWGE_YT_YV_WWEN&html
fid=YTC03659WWEN&attachment=YTC03659WWEN.PDF
Link zur Referenz - deutsch:
http://www-01.ibm.com/common/ssi/cgibin/ssialias?subtype=AB&infotype=PM&appname=SWGE_YT_YV_DEDE&htmlfi
d=YTC03659DEDE&attachment=YTC03659DEDE.PDF
36
© 2014 IBM Corporation
Big Data & Analytics
Israel Electric Corporation
Israel Electric Corporation increased the efficiency of maintenance schedules, costs and resources, resulting in
fewer outages and higher customer satisfaction.
The Company
Business Need
Israel Electric Corporation (IEC) is the
primary electricity provider in Israel,
which is responsible for building,
maintaining and operating the country’s
power infrastructure..
IEC generates 95 percent of Israel’s
electricity. To meet peak demand, its
turbines need to run at full capacity – so
it is vital to keep them online and
running efficiently.
“Using IBM’s analytical tools has
brought us significant savings,
both by reducing the time taken
to understand faults and by
cutting the dollars spent on
turbine failures and downtime. ”
Dr. Moshe Shavit,
CTO for Gas Turbines at IEC
Solution
Sophisticated analysis of machine behavior
The IEC team used IBM SPSS Modeler to perform cluster analyses of the data
from each of the turbines and create a model of their “normal” behavior during
start-up, steady-state and shut-down. With the baselines for each individual unit
established, the team was able to compare their performance and begin
identifying common problems
Moving towards preventive maintenance
Better root-cause analysis of past component failures enables IEC to move from a
break-fix maintenance model to a more preventive approach.
Improving safety
The turbines have an alarm built-in by the manufacturer which is triggered 30
minutes before a major failure. With IBM SPSS Modeler IEC can predict such an
event 30 hours before it happens
Enhancing performance and fuel efficiency
Neural network techniques calculate expected values for each turbine (workload,
fuel consumption and other conditions) and compare them with the actual values
on a daily basis. If a large variance is detected, the control engineers are alerted
immediately
37
Key Benefits
• Reduce costs by up to 20
percent by avoiding the need to
restart turbines after an outage
• Saved approximately USD
75,000 in fuel costs per
turbine by identifying inefficient
fuel usage.
• Provides early warning of
certain types of failure up to 30
hours before they occur,
instead of 30 minutes.
© 2014 IBM Corporation
Big Data & Analytics
IBM SPSS Predictive Analytics
Produktportfolio und Architektur
© 2014 IBM Corporation
Big Data & Analytics
IBM SPSS Predictive Analytics
Capture
Predict
Transaktionen
Demographie
Interaktionen
Meinungen
Data
Collection
Act
Vorhersagen
Optimierung und
Umsetzung in
Prozesse
Real time Analytics
Predictive Modeling
Data Mining
Text Analytics
Social Network Analysis
Statistical Analysis
Social Media
Analytics
Statistics
Modeler
Analytic Server
Decision
Management
Collaboration and Deployment Services
Predictive
Customer Analytics
Acquire
Grow
Retain
39
Predictive
Operational Analytics
Manage
Maintain
Maximize
Predictive
Threat & Fraud Analytics
Monitor
Detect
Control
© 2014 IBM Corporation
Big Data & Analytics
IBM SPSS Data Collection
 Befragungstechnologie um
Meinungen, Einstellungen und
Zufriedenheit von Kunden,
Mitarbeitern und Lieferanten zu
sammeln
 Vervollständigt intern gesammelte
Daten, um eine vollständigere Sicht
auf den Kunden zu bekommen
Liefert eine genauere Sicht über Meinungen und Einstellungen
40
40
© 2014 IBM Corporation
Big Data & Analytics
IBM SPSS Social Media Analytics
 Umfassende Monitoring, Analyse und
Reporting Plattform für Social Media
Insights
 Eingebaute Big Data Fähigkeiten
 Führende Sentiment Analyse und
Segmentierung (Geographics,
Demographics, Influencers)
 Beziehungen (Affinitäten,
Assoziationen, Kausalketten)
 Impact Analyse (Share of Voice,
Reichweite, Sentiment)
 Explorative / Discovery – Fähigkeiten
(Themen, Akteure, Sentiment)
41
© 2014 IBM Corporation
Big Data & Analytics
IBM SPSS Statistics
 Datenmanagement, Advanced
Statistics für Analysten
 Sammlung, Exploration, Analyse,
Interpretation und Präsentation von
Daten
 Bietet tiefere Einsichten in
Stichproben und ein Vielzahl an
Prozeduren für Forecasting und
Analyse
 Riesige User-Basis aus den
Universitäten
Steigert Vertrauen in Ergebnisse und Entscheidungen
42
42
© 2014 IBM Corporation
Big Data & Analytics
IBM SPSS Modeler
 Komplette Workbench
für Data und Text Mining
 Hohes Maß an Interaktivität
und Benutzerfreundlichkeit
 Vielzahl von Algorithmen für
Exploration und Vorhersage
 Ermöglicht die Entdeckung von
neuen Mustern und Trends zur
weiteren Verwendung in
Business-Prozessen
Bringt Wiederholbarkeit in Entscheidungsprozesse
43
43
© 2014 IBM Corporation
Big Data & Analytics
IBM SPSS Modeler
 Komplette Workbench
für Data und Text Mining
 Hohes Maß an Interaktivität
und Benutzerfreundlichkeit
 Vielzahl von Algorithmen für
Exploration und Vorhersage
 Ermöglicht die Entdeckung von
neuen Mustern und Trends zur
weiteren Verwendung in
Business-Prozessen
Bringt Wiederholbarkeit in Entscheidungsprozesse
44
44
© 2014 IBM Corporation
Big Data & Analytics
IBM SPSS Analytic Server


Delivers fast time to solution for
predictive analytics of big data
Visual, easy to use interface
abstracts analysts & line of business
users from complexities of big data
systems
Big Data Predictive Analytics auf Hadoop
45
45
© 2014 IBM Corporation
Big Data & Analytics
IBM SPSS Decision Management
 Business Applikationen
auf der Basis von SPSS Modeler
 Schnelle Produktivität und
Automation
 Auf den Entscheidungsprozess
ausgerichtetes GUI
Maßgeschneiderte Applikationen für Business Anwender
46
46
© 2014 IBM Corporation
Big Data & Analytics
IBM SPSS Predictive Analytics Enhances other IBM Technology
Predictive
Customer Analytics
Predictive
Threat & Fraud Analytics
Manage
Maintain
Maximize
Acquire
Grow
Retain
Data
Collection
Predictive
Operational Analytics
Social Media
Analytics
Statistics
Monitor
Detect
Control
Modeler
Analytic Server
Decision
Management
Collaboration and Deployment Services
IBM Research
Etc…
47
© 2014 IBM Corporation
Big Data & Analytics
IBM SPSS Modeler

Enables discovery of key insights, patterns
& trends in data to optimize decisions
Data Mining workbench
• Easy to use / visual
• Comprehensive set of algorithms
• Structured & unstructured data
• Supports data mining process
(CRISP-DM)
• Outstanding performance & scalability
via SQL pushback, in-database
processing & Hadoop Map / Reduce
processing via Analytic Server
• Reproducible process delivering high
productivity, quick time-to-solution & high
ROI

Brings repeatability to ongoing decision making
49
© 2014 IBM Corporation
Big Data & Analytics
In-Database Support with SPSS Modeler Server
•
•
•
•
All the features of IBM SPSS Modeler
Large volumes of data
High performance
Administration and security options
In-Database via…
• SQL pushback
• In Database Algorithms
• Scoring Adapters
• SQL scoring
50
© 2014 IBM Corporation
Big Data & Analytics
In-Database Scoring with SPSS Modeler



51
Extension to current In-Database Capabilities allowing more SPSS models to be scored
In-Database
Improve the efficiency of scoring models by minimizing data movement and leveraging
database capabilities
Supported for the following platforms
• Teradata (13 and above)
• PureData for Analytics (6.0 and above)
• DB2 for z/OS (DB2 Accessories Suite)
• DB2 (Linux, Unix, Windows)
© 2014 IBM Corporation
Big Data & Analytics
Helper Applications in SPSS Modeler
Modeler Server supports integration with data mining and modeling tools that are available
from database vendors, including
 IBM PureData for Analytics (Netezza)
 IBM DB2 InfoSphere Warehouse
 Oracle Data Miner
 Microsoft Analysis Services
52
© 2014 IBM Corporation
Big Data & Analytics
SPSS Modeler and PureData for Analytics(Netezza)

Modeler supports integration with IBM PureData for Analytics, providing the ability to run
data mining algorithms to be directly in the IBM PureData for Analytics environment from
the Modeler user interface.

The following algorithms from PureData for Analytics are supported within Modeler
 Bayes Net
 Decision Trees
 Divisive Clustering
 Generalized Linear
 K-Means
 KNN
 Linear Regression
 Naive Bayes
 PCA
 Regression Tree
 Time Series
 2 Step cluster
53
© 2014 IBM Corporation
Big Data & Analytics
Real-Time Analytics on Streaming Data
Real Time Decisions

Streaming Enhancement - Support
for Forecasting (Time Series)
Environment
Monitoring
ICU
Monitoring
Powerful
Analytics
Algo
Trading
Telco Churn
Prediction
Smart
Grid
Cyber
Security
Government /
Law Enforcement
Millions of
Events per
Second
Microsecond
Latency
Traditional / Non-traditional
Data Sources
54
© 2014 IBM Corporation
Big Data & Analytics
IBM SPSS Analytic Server Architecture
Relational
databases
IBM SPSS Modeler
IBM SPSS Modeler
SQL & UDF
Server
Desktop
Big data requests
Analytics
IBM SPSS Analytic Server
55
55
IBM Infosphere Biginsights
© 2014 IBM Corporation
InfoSphere BigInsights –
Hadoop fit für den Einsatz im
Unternehmen
Live - Demonstration
56
© 2013 IBM Corporation
Big Data & Analytics
Live Demo – BigInsights, step 1
• Select file OR directory in the file browser
• Choose the suitable reader (in this example JSON Object Reader)
• Review the data in a table and create a „Master Workbook“
57
© 2014 IBM Corporation
Big Data & Analytics
Live Demo – BigInsights, step 2
• Add and delete columns (in this example add „Datum“)
• Implement analytics using function (fx) and add sheets
• Create result sheet and add chart (if desired)
58
© 2014 IBM Corporation
Big Data & Analytics
Live Demo – BigInsights, step 3
• Choose a suitable chart type and add chart
• Fill parameters in chart wizard to customize chart design
• Review chart within BigSheets
59
© 2014 IBM Corporation
Big Data & Analytics
Live Demo – BigInsights, step 4
• Create or choose a dashboard
• Add chart or content (Add Widget button)
• Customize dashboard layout (size and position of the content)
60
© 2014 IBM Corporation
Big Data & Analytics
Live Demo – BigInsights, step 5
•
•
•
•
61
Select the proper application (in the example Distributed File Copy_MR)
Fill in the required parameters
Select the BigSheets worksbook to update (Advanced Settings)
Run the application
© 2014 IBM Corporation
Big Data & Analytics
Live Demo – BigInsights, step 6
• View the status of the running application
• View the automated start and status of all depended BigSheets workbooks
• View the yellow warning triangle in the dashboard
62
© 2014 IBM Corporation
Big Data & Analytics
Live Demo – BigInsights, step 7
• View options of dashboard charts (links, settings, etc.)
• Select a dashboard chart and klick ‚link to the corresponding workbook‘
• The workbook will be automatically opened
63
© 2014 IBM Corporation
Big Data & Analytics
Live Demo – BigInsights, step 8
• Select workbook and workflow icons
• Explore content and navigation of workbook diagramm
• Explore content and navigation of workflow diagramm
64
© 2014 IBM Corporation
Big Data & Analytics
Live Demo – BigInsights, step 9
• Push the „Create Table“ button within the BigSheets workbook
• Enter target schema and table name
• Confirm table creation
65
© 2014 IBM Corporation
Big Data & Analytics
Live Demo – BigInsights, step 10
• Open the Big SQL client
(select „run Big SQL queries“ option in the BigInsights welcome tab )
• Enter SQL query
• Run the SQL query and review results
66
© 2014 IBM Corporation
Big Data & Analytics
IBM BigInsights - Information
 IBM BigInsights Quick Start Edition
Information and software download
• Big Data University
Information and courses about
Big Data, BigInsights and more
 IBM Big Data YouTube channel
including BigInsights Quick Start Tutorials
• developerWorks
technical resource and professional network
for IT practitioners
67
© 2014 IBM Corporation
IBM SPSS Predictive Analytics
live Demonstration
anhand von Beispielen aus den Bereichen Kundendatenanalyse und
Analyse von Produktions-/ Fertigungsdaten in der Industrie
68
© 2013 IBM Corporation
IBM SPSS Analytic Server Architecture
Relational
databases
IBM SPSS Modeler
IBM SPSS Modeler
SQL & UDF
Server
Desktop
Big data requests
Analytics
IBM SPSS Analytic Server
69
IBM Infosphere Biginsights
© 2013 IBM Corporation
High End Analytics mit IBM SPSS Modeler
Visuelles Programmieren
analytischer Streams
Hohes Maß an Interaktivität
und Benutzerfreundlichkeit
Skalierbarkeit durch Client-/
Server Architektur
Nahtlose Zusammenarbeit
mit allen gängigen
Datenbanksystemen
Orientierung am CRISP-DM
Modell für Data Mining
70
© 2013 IBM Corporation
IBM SPSS Modeler
Menüleiste
Symbolleiste
Streams,
Ausgaben
und Model
Manager
Stream Zeichenfläche
Projektfenster
Palette
Status
71
Knoten
© 2013 IBM Corporation
Datenzugriff, -aufbereitung & Reporting (Überblick!)
72

Datenzugriff
 ODBC Datenbanken, Flat Files, …

Datenmanipulation und -aufbereitung, u.a.:
 Datenselektion & -transformation
 Umgang mit fehlenden oder extremen Werten
 Pre-processing, Bereinigung, Abfragen
 Festlegen von Typen/Rollen
 RFM-Analyse
 Transformationen
 Merkmalsauswahl (Vorselektion für Modellierung)
 'Outputs' von Modellen werden wie Transformationen
weiterverarbeitet

Export von Einzelfallinformationen und Scores
sowie von aggregierten Informationen
© 2013 IBM Corporation
Interaktive graphische Ad-Hoc-Analysen für die zielgerichtete
Exploration gefundener Zusammenhänge
Explorative Grafiken
 Erster Einblick in die Datenstruktur
 Dienen auch als interaktive
Datenaufbereitungstools
Banking
Histogram
Home Insurance
Web
Car Insurance
Exit Page
Current Account
Bonds
Homepage
Personal Loan
Savings
Credit Card
Entdeckung von Zusammenhängen
 Erkenntnisgewinn
 Unterstützung für weitere Aufbereitung
Visualisierung von Modellergebnissen
Plot
73
© 2013 IBM Corporation
Mächtige Modellierungsalgorithmen
Klassifikation und Prognose
Neuronale Netze, C5.0, C&RT, CHAID, Quest, Regression (log., OLS, Cox), GZLM,
Zeitreihen, Decision List, Diskriminanz, SLRM, SVM, Bayes‘sche Netze
Bagging und Boosting von Modellen möglich
Clusterung
Kohonennetze, K-Means, TwoStep, Anomalieerkennung
Assoziationsregeln
Apriori, CARMA, Sequenzanalyse
Text Mining
Datenreduktion: Faktorenanalyse, Merkmalsauswahl
Meta-Modelling
Automatische Modellselektion (binäre und numerische Zielgrößen,
Cluster, Zeitreihenmodelle), Vergleich/Kombination der Ergebnisse
mehrerer Modelle
In-Database Modelling
74
© 2013 IBM Corporation
Big Data & Analytics
Live Demonstration IBM SPSS Modeler Premium
75
© 2014 IBM Corporation
Big Data & Analytics
IBM SPSS Modeler Premium
Decision Tree
Scoring Rules and Predictor Importance
Model evaluation shows that the model built on
data combined with Textmining concepts shows
the best results (green line)
76
© 2014 IBM Corporation
Big Data & Analytics
IBM SPSS Modeler: Entscheidungsbaumverfahren identifizieren
charakteristische Fehlermuster in der Produktion
Gesamtdaten:
32% Teile nicht
in Ordnung
Wenn Öffnungszeit >1292
dann 75,9% Ausschuß
77
Wenn Öffnungszeit >1292
Und Kühlkreis22Durchfl.Max zw. 0,9 und 1,1
und TempMax <= 386
dann 90,3% nicht in Ordnung
© 2014 IBM Corporation
Big Data & Analytics
IBM SPSS Analytic Server
Analytic Server delivers fast time to solution for big data analytics
• Delivers integrated support for unstructured/semi-structured
predictive analytics
• Data-centric architecture ensures scalability & performance
(analytics occurs near data)
• Leverages visual, easy to use interfaces that abstract
analysts from complexities of big data systems – no coding
is required
Abstracts analysts from complexities
of distributed big data systems
78
© 2014 IBM Corporation
Big Data & Analytics
IBM SPSS Modeler with IBM SPSS Analytic Server
• Visual, easy to use interface shields analysts & line of business users
from complexities of big data systems
– IBM SPSS Modeler
•
79
Data mining and text analytics workbench to build predictive models
without programming or coding
© 2014 IBM Corporation
Big Data & Analytics
IBM SPSS Analytic Server
Access, import and export data directly
from/to Hadoop
80
80 80
© 2014 IBM Corporation
Big Data & Analytics
Use Case for SPSS Modeler + SPSS Analytic Server:
Leveraging Big Data to Ensure Quality Electric Service
Background
 Large electric utility has instrumented with a system of 10 million
smart meters
Business Need
 Ensure quality of service by balancing traditionally-generated power
with customer-generated power
User
 IBM SPSS Modeler clients working in a Hadoop environment
 “Downstream” consumers who use the information to help them better
manage their energy consumption
Without IBM SPSS Analytic Server
 No realistic way to analyze the massive data set derived from 10
million meters generating 9 billion records and 100 fields across all
data sources
With IBM SPSS Analytic Server
 “Looking ahead” over the next 48 hours to predict likely system
imbalances and address them proactively
 Taking into account weather conditions on which customer-generated
power is highly dependent
81
© 2014 IBM Corporation
Big Data & Analytics
Use Case for SPSS Modeler + SPSS Analytic Server: (Cont’d)
Leveraging Big Data to Ensure Quality Electric Service
 Background
– Approximately 9 billion records and 100 fields across all data sources
 Without IBM SPSS Analytic Server
– An aggregation and subsequent sampling of meter readings leading to less than
accurate forecasts
 With IBM SPSS Analytic Server
– Models built on individual meter readings
This Modeler stream (i.e. this predictive model) combines customer data (e.g.
rate plan), meter data (actual usage), and weather data to produce predicted
individual usage
82
© 2014 IBM Corporation
Big Data & Analytics
Architecture – SPSS Modeler, Analytic Server & BigInsights
SQL / UDF
IBM SPSS Modeler
Stream File
Big Data
Request
Modeler Client
Relational Database
IBM SPSS
Analytic Server
Modeler Server
Hadoop Job
Analytics
IBM InfoSphere BigInsights


Modeler Server utilizes Analytic Server for Big Data
Analysts define analysis in a familiar & accessible workbench to conduct analysis, modeling & scoring over high volumes
of varied data
 Federation of heterogeneous data sources to use legacy & external data in model building & scoring
 Transformations, sampling & write-back of output to big data systems
83
© 2014 IBM Corporation
Vorstellung und Ausblick neuer IBM Technologien
und Trends
IBM Watson™ Analytics
© 2014 IBM Corporation
Expectations from technology have never been higher
Our work and
personal lives have
blurred
It’s an
“always-on” world
© 2014 IBM Corporation
A Do-It-Yourself
mentality now
prevails
Leveraging analytics still faces many obstacles
38% have a limited
understanding of how
to use analytics
34% can not find
time to analyze data
24% find it difficult
to get data
The desire to make datadriven decisions is
prevalent
Making decisions rapidly
is no longer a goal; it’s an
imperative
Access to required data
sources is critical while
maintaining governed
standards
Source: Analytics: The New Path to Value, a joint MIT Sloan Management Review and IBM Institute
for Business Value study. Copyright © Massachusetts Institute of Technology
© 2014 IBM Corporation
Even a simple analytics project has multiple steps and people
Data Access
Data Preparation
Reporting
Business
Analysts
IT
Analysis
Collaboration
Business
Users
Validation
© 2014 IBM Corporation
Data Scientists
and
Statisticians
And it’s rarely a straightforward process
Data Access
Business
Analysts
IT
Data Preparation
Reporting
Collaboration
Validation
Analysis
Business
Users
Data Scientists
and
Statisticians
© 2014 IBM Corporation
IBM Watson Analytics
Put analytics in the hands of a broad range of users
Make data access and refinement easier
Deliver through the cloud for agility and speed
Understand Your
Business
Tell a Story
Automated intelligence accelerates
your ability to answer questions
Visualizations support your decisions
and communicate results
Mobile Ready
Secure
Get Better Data
Think Ahead
Predictive analytics reveals insights
and opportunities
Embedded information services
provide data access and refinement
© 2014 IBM Corporation
IBM Watson Analytics
Self-service analytics for business users and
experts alike
Business Users
Business Analysts
Data Scientists
© 2014 IBM Corporation
IT
IBM Watson Analytics
Empowering the business for success
Marketing
Sales
Finance
IT
Operations
HR
Campaign
Planning and ROI
Customer
Retention
Prioritizing
Accounts
Receivable
Helpdesk
Case
Analysis
Warranty
Analysis
Employee
Retention
Examples
© 2014 IBM Corporation
Video Watson Analytics
http://www.youtube.com/watch?v=IV7mVOI5Gug
© 2014 IBM Corporation
IBM Watson Analytics
Quick start
intuitive interface
Natural
language
dialogue
Data discovery
Mobile-ready
Cloud-based agility
© 2014 IBM Corporation
IBM Watson Analytics
Data access and
refinement
Intelligent
automation
Integrated
social business
Report and
dashboard
creation
Visual
storytelling
Guided
analytic
discovery
Unified analytics experience
© 2014 IBM Corporation
IBM Watson Analytics
Single Analytics Experience
 Fully Automated Intelligence
 Natural Language Dialogue
 Guided Analytic Discovery

Visit WatsonAnalytics.com
and get started for free
http://www.ibm.com/analytics/watsonanalytics/
© 2014 IBM Corporation
Legal Disclaimer
o © IBM Corporation 2014. All Rights Reserved.
o The information contained in this publication is provided for informational purposes only. While efforts were made to
verify the completeness and accuracy of the information contained in this publication, it is provided AS IS without
warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans
and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages
arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in
this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM
or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing
the use of IBM software.
o References in this presentation to IBM products, programs, or services do not imply that they will be available
in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation
may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not
intended to be a commitment to future product or feature availability in any way. Nothing contained in these
materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you
will result in any specific sales, revenue growth or other results.
© 2014 IBM Corporation
Big Data & Analytics
Weitere Informationen und Kontaktdaten
Die Präsentationen sowie weitere Informationen zu „Predictive Analytics meets Big Data“
finden Sie unter
www.ibm.com/events/spssbd
Generelle Informationen zu den IBM SPSS Lösungen und Big Data finden Sie unter
www.ibm.com/de/spss
bzw. www.ibm.com/software/products/de/category/bigdata
Das IBM SPSS und Big Data Team steht Ihnen gerne unter 0049-89-4504 2022 (SPSS)
bzw. 0049-7032-1549 116 (Big Data - Frau Marta Musial) zur Verfügung.
97
© 2014 IBM Corporation