Analytics auf z Systems

Transcription

Analytics auf z Systems
Analytics auf z Systems
Martin Schneider
Manager, Analytics Center of Excellence
IBM Research and Development, Boeblingen
[email protected]
Agenda
Warum Analytics?
Hybrider Ansatz: Transaktionen + Analytics
Unterstützung auf z Systems
Neues vom DB2 Analytics Accelerator
Mathematische Optimierung
Gartner 2014 CIO Agenda:
Analytics auf Platz 1 der Technologie-Themen
Source: Gartner “Taming the Digital Dragon: The 2014 CIO Agenda”, Dave Aron | Poh-Ling Lee – January 2014
3
Neue Beziehung zum Kunden
Vormals: “Ich habe ein Produkt – ich
Heute: “Ich habe einen Kunden – was
suche dafür einen Kunden”
braucht er am meisten?”
Interaction
Interaction
Branding
Accounts
Branding
Accounts
Commerce
Commerce
4
•
“By eliminating analytic latency and data synchronization
issues, hybrid transaction/analytical processing will enable IT
leaders to simplify their information management
infrastructure”
•
“This architecture will drive the most innovation in real-time
analytics over the next 10 years via greater situation awareness
and improved business agility”
Gartner Research Note G00259033: Gartner 01-2014 Hybrid Transaction Analytical Processing Will Foster Opportunities
The Analytics Landscape
Advanced
Analytics
Competitive Advantage
Prescriptive
(Optimization)
Predictive
(Analytics)
Descriptive
(Reporting)
Complexity
Analytics bedeutet Gewinn oder Verlust
80%
der Vermarkter
senden
dasselbe
Material an alle
Kunden
6%
+7,6%
der Firmen sind jährliche Zunahme
„höchst
des Customer
zufrieden“ mit der Lifetime Value für
Bereitstellung von
Firmen mit
Informationen für
Engagement
ihre Arbeit
Analytics
226 Mio $
geschätzer
Verlust durch
Betrug im
Gesundheitswesen
16%
entgangenes
Steueraufkommen
aufgrund von
Noncompliance
Abhilfe: Nutzung von IT als Geschäftsstrategie
7
100 Mio $
Höhe des typischen
Bußgelds im Fall
eines Regulationsverstoßes einer
Bank
Was erschwert die Umsetzung?
Komplexität
Latenz
Datenduplikation
Kosten
8
Agenda
Warum Analytics?
Hybrider Ansatz: Transaktionen + Analytics
Unterstützung auf z Systems
Neues vom DB2 Analytics Accelerator
Mathematische Optimierung
Die Zukunft: Hybride Transaktions- und
Analytics-Verarbeitung
• abgeschlossener Kauf
• verbrauchte Ressource
• bezahlte Rechnung
• eingereichter Erstattungsantrag
• aktualisierte Information
• beendeter Anruf im Call Center
10
Analytics als Teil des Geschäftsablaufs
• Was ist passiert?
• Wie viele, wie oft, wo?
• Welche Aktionen sind nötig?
• Was passiert, wenn?
• Was führt zum besten Ergebnis?
Für den Zusammenschluss von
analytischen und transaktionalen Daten…
Collect
Transaction
Data
Transactional
processing
z System server
z Systems
Predictive
analytics
Historical
Data
Predictive
Data
Analyze
z System
server
Historical
view
z System
server
Report
…ist IBM z Systems die ideale Plattform
11
sicher – skalierbar – hochverfügbar – performant
Operations and analytics coexistence:
benchmark configuration
OLTP transactions
Operational analytics
High concurrency
Advanced analytics
Standard reports
OLAP
DB2 native
processing
Coupling
facility
DB2 data sharing
z/OS LPAR group
z/OS LPAR
DB2 member 1
DB2 member 2
Complex queries
Historical queries
Two main use cases:
Operational Priority
Keeping operational throughput constant, add analytics
load to the system. Data used for analysis can be
slightly
12 out of sync with operations
Data Priority
Data used for operations and analytics must be in
complete synchronization. Slight degradation of
operational throughput is acceptable
IBM DB2 Analytics Accelerator
Real time data ingestion
Operations and analytics coexistence: results
First use case: periodic data
synchronization – end-of-business
day data access
Thousands of complex, analytical
queries now integrated with
operational workload
Second use case: (near-)
real time data access
LPAR 2
LPAR 1
Accelerator
13
Baseline
w/Analytics
Operational throughput
maintained with no additional
mainframe capacity
Baseline
w/Analytics
Data kept in sync real-time
with minimal degradation in
transaction ITR (3%)
Agenda
Warum Analytics?
Hybrider Ansatz: Transaktionen + Analytics
Unterstützung auf z Systems
Neues vom DB2 Analytics Accelerator
Mathematische Optimierung
Datenplattformen für Analytics auf z Systems
DB2 mit Analytics Accelerator
– Gestern angekündigt: Version 5.1
Apache Hadoop
– IBM Open Platform, IBM Big Insights for Apache Hadoop
– Hadoop läuft auf dem Mainframe
– Hadoop läuft woanders und holt Daten vom Mainframe
Apache Spark
DB2 Analytics Accelerator
Version 5.1
The turbocharger for z Systems analytics
A blending of PureData Systems for Analytics
(powered by Netezza) and z Systems technology
that dramatically speeds up complex business
analysis – transparently to users
1
What is Hadoop?
Hadoop is a new Data Management Framework born out of the Need to
manage Internet Scale Data
Wins Terabyte
sort
benchmark
Publishes
MapReduce,
GFS Paper
early research
Apache Open
Source MapReduce
& HDFS projects
created
Runs 4,000
node Hadoop
Cluster
open source dev
momentum
Launches
SQL Support
for Hadoop
Releases
CHD3
initial success stories
InfoSphere
BigInsights
launched
Commercialization
IBM InfoSphere BigInsights for Linux on System z
Secure perimeter
z Systems server
18
IBM InfoSphere System z Connector for Hadoop
System z Mainframe
Linux for System z
z/OS
InfoSphere BigInsights
DB2
VSAM
S
M
F
MapReduce, Hbase, Hive
System z
Connector
For Hadoop
HDFS
IMS
Logs
z/VM
System
z
CP(s)
Connector
For Hadoop
1
IFL
IFL
…
IFL
 Hadoop on your platform of
choice
 IBM System z for security
 Power Systems
 Intel Servers
 Point and click or batch selfservice data access
 Lower cost processing & storage
Now there are two z Systems options for Analytics using Hadoop
Request
IBM InfoSphere
BigInsights for Linux on
z Systems
+
z Systems Connector for
Hadoop
Request
Integrate
On-platform analysis of
non-relational data
Linux
(IFLs)
21
IBM InfoSphere BigInsights
Integrate
Integrate insights from big data sources to
augment mainframe analysis
z/OS
Intel-based
IBM Power
Systems
What is Apache Spark?
•
Addressing limitations of the Hadoop MapReduce programming model
– No iterative programming, latency issues, ...
•
Using a fault-tolerant abstraction for in-memory cluster computing
– Resilient Distributed Datasets (RDDs)
•
Can be deployed on different cluster managers
– YARN, MESOS, standalone
•
Supports a number of languages
– Java, Scala, Python, SQL, R
•
Comes with a variety of specialized libraries
– SQL, ML, Streaming, Graph
•
Enables additional use cases, user roles, and tasks
– E.g. data scientist
22
What is Apache Spark?
Languages
Java / Python / Scala / R
Spark SQL
Spark MLlib
Spark GraphX
Spark Streaming
Relational
Operators
Machine
Learning
Graph
Processing
Real-Time
Streaming
Spark Core
Spark Core
General Execution Engine
YARN
MESOS
HDFS / Cassandra / Hbase / Parquet / ...
Spark Libraries
Standalone
Cluster Manager
Data Abstraction
The Analytics Landscape
Advanced
Analytics
Competitive Advantage
Prescriptive
(Optimization)
Predictive
(Analytics)
QMF
Descriptive
(Reporting)
Complexity
Agenda
Warum Analytics?
Hybrider Ansatz: Transaktionen + Analytics
Unterstützung auf z Systems
Neues vom DB2 Analytics Accelerator
Mathematische Optimierung
DB2 Analytics Accelerator – Four Usage Scenarios
Understand your workload and data:
On average, 70% of the data that feeds data warehousing and business
analytics solutions originates on the System z platform (financial
information, customer lists, personal records, manufacturing…)
Where transaction source data
is being analyzed today
Use Case
Benefits
1
If the data is analyzed on the mainframe
Rapid Acceleration of Business
Critical Queries
Performance improvements and cost reduction
while retaining System z security and reliability
2
If the data is offloaded to a distributed data
warehouse or data mart
Reduce IT Sprawl for analytics
Simplify and consolidate complex
infrastructures, low latency, reliability, security
and TCO
If the data is not being analyzed yet
Derive business insight from z/OS
transaction systems
3
4
If the analysis is based on a lot of historical data
26
Improve access to historical data and
lower storage costs
One integrated, hybrid platform, optimized to run
mixed workload.
Simplicity and time to value
Performance improvements and cost reduction
Introducing Accelerator-only Tables in DB2 for z/OS
Creation (DDL) and access remains through DB2 for z/OS in all cases
Non-accelerated DB2 table
• Data in DB2 only
Accelerator table
• Data in DB2 and the Accelerator
Archive table / partition
• Empty read-only partition in DB2
• Partition data is in accelerator only
Accelerator-Only table (AOT)
• “Proxy table“ in DB2
• Data is in accelerator only
27
Table 1
Table 2
Table 2
Table 3
Table 3
Table 4
Table 4
Multi-Step Reporting Applications with DB2 for z/OS
Before Accelerator-only tables: source data might reside on the accelerator already
Reporting Application
Multi-Step
Report
1
Credit Card
Transaction History
Credit Card
Transaction History
Customer
Summary Mart
Customer
Summary Mart
2
n
Temporary results
2
Temporary results
n
Temporary results
Reports and Dashboards
1
Multi-Step Reporting Applications with DB2 for z/OS
With Accelerator-only tables: temporary objects and processing on the Accelerator
Reporting Application
Multi-Step
Report
1
2
n
Credit Card
Transaction History
Credit Card
Transaction History
Customer
Summary Mart
Customer
Summary Mart
1
Temporary results
2
Temporary results
n
Temporary results
Reports and Dashboards
In-Database Transformation
• Improve existing transformation logic in DB2 for z/OS and the Accelerator
– Automate ETL-to-ELT transformation with Data Stage Balanced
Optimization
– Efficient and fast ELT processing with Accelerator-only tables
• Data Mart Consolidation
– Deploy existing or new data marts with DB2 for z/OS
– Consolidation benefits: simplification, lower latency, security, …
31
Traditional Approach: ETL on a different Platform
Mainframe
Distributed
DBMS
Customer
Transactions
Customer
Transactions
Customer data
Transaction Processing Systems (OLTP)
Copy
Table
Data
(FTP)
Customer data
ETL
logic
Customer Transaction
Summary and
History
Customer Summary
Mart
Analytics
Disadvantages:
 process driven movement of large
amounts of data
 aged data for analytics/reporting
depending on performance of data
movement and transformation process
Unix
Server
Using Accelerator-only Tables and ELT logic in the Accelerator
DB2 z/OS with Accelerator
Customer
Transactions
Customer
Data
Transaction Processing
Systems (OLTP)
Customer
Transactions
Customer
Data
ELT logic
Customer Transaction
Summary and History
AOTs
Customer Summary
Mart AOTs
Analytics
Advantages:
Data for transactional and analytical processing
 Simpler to manage
 Better performance and
reduced latency
To get a backup copy
of an AoT, they could
be loaded into a DB2
z/OS regular or
accelerated table
again
Ad-Hoc Analysis
• Data Scientist Work Area
– Data Scientist are creating temporary database objects for ad-hoc analysis
– Access control through personal database in DB2 for z/OS
– Accelerator-only tables to process and store filtered and transformed
results of source transactional data
34
Data Scientist Work Area
Customer
Transactions
Customer
Data
Customer
Transactions
Customer
Data
Transaction Processing
Systems (OLTP)
Work-Database
John
Work Area 1
Work Area 2
Data Scientist (John)
Work-Database
Jane
Work Area 1
Work Area 2
Data Scientist (Jane)
Data for transactional and analytical processing
Integrate more data sources for
analytics
•
Integrate with data not yet stored on DB2 for z/OS
– DB2 Analytics Accelerator Loader for z/OS to add data from various sources directly
into accelerator-only tables
• IMS
• Other DBMS
• Files
– Join accelerator-only table with other accelerated table data in DB2 for z/OS
36
Integrate more data sources for analytics
Customer
Transactions
Customer
Data
Customer
Transactions
Customer
Data
Transaction Processing
Systems (OLTP)
DB2 Analytics Accelerator Loader
File A
Related
Data from
other Sources
External Data
Combined Result
Data for transactional and analytical processing
Analytics
Agenda
Warum Analytics?
Hybrider Ansatz: Transaktionen + Analytics
Unterstützung auf z Systems
Neues vom DB2 Analytics Accelerator
Mathematische Optimierung
Extreme ROI
€20 mil
$160 mil
€22 mil
$226 mil
Amount a major
transportation
company reduced
operating costs
annually through
better allocation of
rolling stock.
Amount a central
securities depository
saved financial
institutions in one
year by faster
clearing of securities
transactions.
Amount a power
system operator
reduced annual costs
to consumers through
better dispatch of
generators.
Amount a major
hotel chain
increased annual
revenue by offering
the right product to
the right customer
at the right price.
The Science of Better Decisions
Aircraft and crew
allocation
What to build,
where and when?
Risk vs. potential
reward
Optimization helps businesses to
create measurable results:
• create the best possible plans
• explore alternatives and
understand trade-off
• respond to changes in business
operations
Inventory cost vs.
customer satisfaction
Cost vs.carbon
emission
“Optimization: the process of making
something as good or effective as possible”
(Oxford Advanced Learners’ Dictionary)
Operations Research Optimization
(OR Optimization)
• An abstract model with variables and
constraints on those variables
• Setting all variables to some values –
solution/plan
• Evaluating the solution by a goal
• Search algorithm (sometimes called
engine or solver)
Steps of optimization
• Create the abstract models, e.g.
Java/C++/Python API, OPL (optimization
programming language)
• Solve that model with a solver/engine
• CPLEX – math programming for
linear/quadratic models with
float/integer variables
• CPO - constraint programming and
constraint-based scheduling with
integer variables
Examples
Manufacturing
•
•
•
•
•
•
Machines producing goods (capacity constraint)
prices and capacity for raw material
decide how many goods I should make on which machines
Optimize on cost?
Optimize on time?
Minimize the missed deadlines of the finished products?
Goods delivery
•
•
A fleet of trucks needs to deliver goods to given destinations
every morning
Find the shortest route (km) my trucks travel visiting my
customers during the day.
Everything has to be measurable, otherwise
there is no math-based optimization
A feel for the problem size and
performance
•
CPLEX can solve any size of problems (measured
by the # of variables and # of constraints)
provided the hardware matches the memory
• 2 Billion variables type of problems exist (energy)
• CPO can solve over 1M job scheduling problems (aircraft manufacturing)
• Performance varies depending on the problem type, market, needs
• Everything between realtime and several weeks
Payments & Settlements – CPLEX/zOS
Challenge
• Achieve a higher volume of trade settlements at a lower cost to
increase liquidity and capital flow in the Eurozone.
• With high trading volumes anticipated, the bank needed to find the
optimal set of nightly settlement trades within their short time
horizon.
Solution
• The bank turned to IBM to help find a solution combining core
optimization technology and institutional business expertise to
come up with a superior solution.
Benefits/ROI
• Settling more trades at lower cost to increase liquidity and capital
flow.
• Using CPLEX will allow the bank to respond more quickly to new
constraints as legislation and customer behavior changes.
• The optimized settlement system should free up hundreds of
millions of euro worth of collateral used to back up trades.
Customer Profile
Major central bank in EU, charged
with implementing the trade
settlement modules for T2S, working
with three other national central
banks on behalf of the European
Central Bank.
Batch jobs scheduling
Challenge
• Reducing batch nights to have more online transactions
• Many mainframe jobs, too many to manually schedule
• Identify bottlenecks in order to invest efficiently
Solution
• Use optimization on top of TWS or other mainframe
schedulers
• Consider mainframe power as a key scarce resource and
improve utilization rate
• Do more with what you have
Benefits/ROI
• Reduce cost
• Increase throughput
• Increase robustness
Customer Profile
• z Systems client who has many batch jobs
that need to run during a fixed time
window
Resource Optimization on z System
Goals:
•
•
•
•
Minimize peak usage
Obtain proof if new machine is needed
Minimize the reallocation of partitions
Avoid peak periods before 17:00 (foreign
fund transfer has to finish by 17:00 each day
otherwise fine is applied)
Danke!
Martin Schneider
Manager, Analytics Center of Excellence
IBM Research & Development
Schönaicher Str. 220
71032 Boeblingen
Tel. 0170 / 22 100 14
[email protected]