Slides

Transcription

Slides
HyPer: one DBMS for all
Tobias Mühlbauer, Florian Funke, Viktor Leis, Henrik Mühe, Wolf Rödiger,
Alfons Kemper, Thomas Neumann
!
Technische Universität München
!
New England Database Summit 2014
http://www.hyper-db.com/
1
One DBMS for all? OLTP and OLAP
Traditionally, DBMSs are either optimized for OLTP or OLAP
!
Isolation: Snapshotting
OLTP
high rate of mostly tiny transactions
• high data access locality
•
!
OLAP
OLTP
OLAP
few, but long-running transactions
• scans large parts of the database
• must see a consistent database state during execution
•
!
Conflict of interest: traditional solutions like 2PL would block OLTP
However: Main memory DBMSs enable new options!
2
One DBMS for all: the Brawny Few and the Wimpy Crowd
One DBMS for all? Wimpy and brawny
Tobias Mühlbauer
Angelika Reiser
Wolf Rödiger
Alfons Kemper
Robert Seilbeck
Thomas Neumann
Technische
Universität
München
High-performance
DBMSs
are
optimized
for
brawny servers
{muehlbau, roediger, seilbeck, reiser, kemper, neumann}@in.tum.de
Brawny servers
ABSTRACT
predominantly x86-64 arch
multiple sockets, many cores
Shipments •of smartphones and tablets with wimpy CPUs
are outpacing brawny PC and server shipments by an everincreasing •
margin. While high performance database systems have traditionally been optimized for brawny systems,
wimpy systems have received only little attention; leading to
! performance and energy inefficiency on such systems.
poor
This demonstration presents HyPer, a high-performance
hybrid OLTP&OLAP main memory database system that
we optimized for both, brawny and wimpy systems. The
•
efficient compilation
of transactions and queries into efficient machine code allows for high performance, independent of the•target platform. HyPer has a memory footprint
of just a few megabytes, even though it supports the SQL92 standard,
• a PL/SQL-like scripting language, and ACIDcompliant transactions. It is the goal of this demonstration to showcase the same HyPer codebase running on (a)
!
a wimpy
ARM-based smartphone system and (b) a brawny
x86-64-based server system. In particular, we run the TPCC, TPC-H, and a combined CH-benCHmark and report performance and energy metrics. The demonstration further allows the interactive execution of arbitrary SQL queries and
visualization of optimized query plans.
Wimpy devices
predominantly ARM arch
wide-spread use of SQLite
energy efficiency very important
processor shipments in units
!
predominantly ARMv7
Q2 2012
Q2 2013
200 M
150 M
predominantly x86-64
100 M
50 M
0M
Smartphone
Tablet
PC
wimpy
brawny
IHS. Processor Market Set for Strong Growth in 2013, Courtesy of Smartphones and Tablets.
Figure
1: Shipments of wimpy processors are outpacing shipments of brawny processors [4]
While the number of devices with wimpy processors is
ever-increasing, these devices receive only little attention
from the database community. It is true that database
vendors have developed small-footprint database systems,
e.g., IBM DB2 Everyplace, Oracle Lite and BerkeleyDB,
SAP Sybase SQL Anywhere, and Microsoft SQL Server CE.
However, these systems either reached end-of-life, are nonrelational data stores, or are intended for synchronization
with a remote backend server only. In fact, SQLite has
Question: How to enable high-performance OLTP/OLAP on different
architectures (ARM, x86-64, ...)?
3
Server
HyPer: one DBMS for all
!
Simultaneous (ACID) OLTP and (SQL-92+) OLAP:
•
Efficient snapshotting (ICDE 2011)
Platform-independent high-performance on brawny and wimpy nodes:
•
Data-centric code generation (VLDB 2011)
Recent research:
•
ARTful indexing: the adaptive radix tree (ICDE 2013)
•
HTM for concurrent transaction processing (ICDE 2014)
•
Compaction: Hot/Cold Clustering of transactional data (VLDB 2012)
•
Instant Loading: bulk loading at wire speed (VLDB 2014)
•
ScyPer: replication and scalable OLAP throughput (DanaC 2013)
•
Locality-aware parallel DBMS operators (ICDE 2014)
4
Efficient Snapshotting
HyPer isolates long-running transactions (e.g., OLAP)
using virtual memory (VM) snapshots.
•
OLTP “owns” database
for OLAP only VM page
table is initially copied
• MMU/OS keep track of
changes
•
snapshots remain
consistent (copy on write)
• very little overhead
•
Extremely fast execution model.
Supports OLTP and OLAP simultaneously.
5
Data-centric
ution
Code Generation
Main memory is so fast that CPU becomes the bottleneck
• classical
iterator
model fine for disks,
fragments
in tight
loops
but not so for main memory
source pipeline breaker into CPU registers
• iterator model: many branches,
ng operations
bad code and data locality
e next pipeline
breaker
!
HyPer’s data-centric code generation
esses and
produces
• touches
data compact
as rarely ascode
possible
e generation
• prefersusing
tight LLVM
work loops
a=b
x=7
R1
z=c
z;count(*)
y=3
verhead 1. load data into CPU registers
R2
R3
2. perform all
pipeline operations
ce of hand-written
code
3. materialize into next pipeline breaker
• efficient platform-independent machine code generation using LLVM
6
Resent Research
Tentative Execution
•  HyPer(h
perform
Mühe, H; Kemper, A.; Neumann, T., “Executing Long-Running Transactions in Synchronization-Free
Main
Memory Database Systems”, CIDR, 2013
•  Long(tx(
the(syst
•  Tenta&v
!
2PL(–(al
!
tx(throu
!
of(long(t
!
efficient
Snapshot
Main
Long
TX
TX
Query
!
HyPer offers outstanding performance for tiny transactions: > 100,000 TPC-C
transactions per second per hardware thread
•  Paper(@
• Long-running transactions that write to the database cripple performance
•
•
Tentative execution: process long-running transactions on snapshot and
“merge” changes into main process
8
ARTful Indexing
Leis, V.; Kemper, A.; Neumann, T., “The adaptive radix tree: ARTful indexing for main-memory
databases”, ICDE, 2013
Adaptive Radix Tree (Trie) indexing
• efficient general-purpose read/
write/update-able index structure
• faster than highly-tuned, read-only
search trees
• optimized for modern hardware
•
0
1
child pointer
3
2
0
1
3
2
Node4 13 129130
child index
0
1
2
child pointer
3
255
0
1
0
1
2
3
8
9
47
…
key
Node16
2
…
Node48
highly space-efficient by adaptive
choice of compact data structures
performance comparable to HT
• BUT: data is sorted; enables, e.g.,
range scans and prefix lookups
•
key
child pointer
15
0
1
15
2
…
… 255
child pointer
0
Node256
9
TID
1
2
3
4
5
TID
TID
TID
TID
6
…
255
TID
child pointer
4
5
TID
TID
TID
6
255
0
HTM for Concurrency Control
…
TID
ART
ART
GPT
GPT
(dense) (sparse) (dense) (sparse)
RB
CSB
kary
FAST
HT
Leis, V.; Kemper, A.; Neumann, T., “Exploiting Hardware Transactional Memory in Main-Memory
Databases”, ICDE, 2014
database transaction
400,000
nf
lic
t
request a new timestamp, record safe timestamp
HT
M
co
HTM transaction
conflict detection: read/write sets in hardware
elided lock: latch
single tuple access
verify/update tuple timestamps
M
co
nf
lic
t
...
HTM transaction
conflict detection: read/write sets in hardware
elided lock: latch
transactions per second
conflict detection: read/write sets via timestamps
elided lock: serial execution
HT
TID
3
timestamp conflict
2
partitioned
HTM
serial
2PL
300,000
200,000
100,000
single tuple access
verify/update tuple timestamps
0
...
1
release timestamp, update safe timestamp
2
3
4
multiprogramming level (threads)
Concurrent transaction processing in main memory DBMSs often relies on
user-provided data partitioning (1 thread per partition)
• BUT: what if no partitioning is provided/can be found?
•
•
Hardware transactional memory (HTM) allows for efficient light-weight
optimistic concurrency control without explicit partitions
10
Compaction:
Hot/Cold
Clustering
Hot/Cold
Clustering & Compaction
stering
& Compaction
Hot/Cold Clustering & Compact
Funke, F.; Kemper, A.; Neumann, T., “Compacting Transactional Data in Hybrid OLTP&OLAP
Databases”, VLDB 2012
Main memory is a scarce resource!
Reality
!
Hot and Cold
Purge hot items
!
!
!
Ideal
clustering
!
Reality
!
Hot/Cold Clustering
!
!
“Temperature” detection:
• Hardware-assisted (MMU)
•
Almost no overhead!
11
Temperature Detection
Immutable
Compressed
Optimized for OLAP
Instant Loading
Mühlbauer, T.; Rödiger, W.; Seilbeck, R.; Reiser, A.; Kemper, A.; Neumann, T., “Instant Loading for Main
Memory Databases”, PVLDB, 2013, presented at VLDB 2014
CSV
high-speed
NAS/DFS or SSD/RAID
...
loading/unloading at wire speed
main memory
load
window of interest
1
unload
...
CSV or binary
queries
(a) CSV
updates
Load CSV
2
Work: OLTP and OLAP in
full-featured database
3
Unload
Figure 2: Instant Loading for data staging processload-work-unload
cycles
across
data.DBMSs
ofing:
structured
text data
is slow
inCSV
current
Bulk loading
(c) physica
• Current loading Contributions.
3: Contine
does not saturate
wire true
speed
of data source
data sink
To achieve
instantaneous
loading, and Figure
•
•
•
CSV (a), relation
we optimize CSV bulk loading for modern super-scalar multiInstant Loadingcore
allows
scalable
loading at all
wire
speed
by efficient taskCPUs by
task- andbulk
data-parallelizing
phases
of loadto a main memory
ing. In particular, we make the following contributions: In
and data-parallelization
interest can even be
a first step we propose generic high-performance parsing,
as selection predica
and input validation
methods
based on SSE
10x faster thandeserialization,
Vectorwise/MonetDB,
orders
of magnitude
faster than
cess. Further, data c
4.2 SIMD instructions. While these already improve loadtraditional DBMSs
(with
same
conversions/consistency
checks)
full set of features o
ing time significantly, other phases of loading become the
including efficient su
bottleneck. We thus further12show how copying deserialized
(OLTP)—can be u
tuples into the storage backend can be sped up and how in-
versität München
Germany
@in.tum.de
Technische Universität München
Munich, Germany
[email protected]
Thomas Neumann
ScyPer
Replication and Scalable OLAP Throughput
Technische Universität München
Munich,
GermanyW.; Reiser, A.; Kemper, A.; Neumann, T., “ScyPer: Elastic OLAP Throughput on
Mühlbauer,
T.; Rödiger,
[email protected]
Transactional
Data”, DanaC, 2013
OLAP Load Balancing
800
Snapshotting
OLTP
...
Primary HyPer
Redo Log Multicasting
On-Demand Secondary HyPer Nodes
execution time [ms]
OLAP
writers
readers
aborts
23%
faster
600
56%
faster
400
200
0
normal
execution
logical log
replaying
physical log
replaying
Figure 3: Time savings when replaying 100k TPC-C
Figure 1: Elastic provisioning of secondary HyPer
transactions using
physical
redo logging
• Secondary
for high availability–why
notlogical
use and
them
for OLAP?
nodes
for scalable nodes
OLAP needed
throughput
redo log has to be persisted and written to durable storage
•
Primary
HyPer
node
processes
transactions
and multicasts the redo log to
system [3] process more than 100,000 TPC-C transactions
so that it can be replayed. The undo log however can be
(TX) per second in a single thread, which is enough for hudiscarded when a TX commits.
man generated workloads even during peak hours. A ball-ScyPer uses multicasting to propagate the redo log of committed TX from the primary to secondary nodes to keep
park
• estimate of Amazon’s yearly transactional data volume
them up-to-date with the most recent transactional state.
further reveals that retaining all data in-memory is feasible
Multicasting allows to add secondaries on-demand without
• large enterprises: with a revenue of $60 billion,
even for
increasing the network bandwidth usage.
an average item price of $15, and about 54 B per orderline,UDP vs. PGM multicast. Standard UDP multicasting
• less than 1/4 TB for the orderlines—the dominant
we derive
is not a feasible solution for ScyPer as it may drop messages,
repository in a sales application. We thus conjecture 1that
for
deliver them multiple times, or transfer them out of order.
3
Instead, ScyPer uses OpenPGM for multicasting, an open
the lion’s share of OLTP workloads, a single server suffices.
secondaries (via Ethernet or Infiniband)
We advocate for physical log multicasting after commit
non-determinism in transaction logic and by unforeseeable faults
no need to re-execute expensive transaction logic
Data Shuffling Distributed Query Processing with HyPer
Optimal Partition Assignment
Rödiger, W.; Mühlbauer, T.; Unterbrunner, P.; Reiser, A.; Kemper, A.; Neumann, T., “Locality-Sensitive
Operators for Parallel Main-Memory Database Clusters”, ICDE, 2014
Node 0
Node 0
!
# tuples
# tuples
# tuples
31
29
15
2
S
R
S
R
7 Node24
1
7
24
19S
8 R
19
8
17
24 24
7 17
24
18
8 88
19 18
24
19
17 24
1924
19
4 48
18 19
27
262619
24 27
12
18184
19 12
30
272726
27 30
12 24
24
5 518
30 19
19
232327
24 16
16
2 25
19 14
14
272723
16 26
26
20202
14
18 18
222227
26
20
24 24
1717
18
24
22
17
Node 2
Node 2
S
S
26Node
26
3S
3
262323
31616
239 9
166 6
91515
62424
156 6
241919
67 7
192020
72323
205 5
234 4
5
2020
4
20
R
R
220
20
11R
11
2320
23
6 611
2323
23
4 46
1818
23
5 54
1414
18
22225
6 614
22
1515
23236
4 415
3 323
4
66
3
6
Node 0
Node 0
P0
P0
P1
P1
P2
P2
P3
P3
P4
P4
P5 P6 P7
P5 P6 P7
77P0
117
11P1
331
72
P
7
P
773
14
P1
4
P44
P161 P4
5
7
Node
Node221
Node
27
2
221 11
2
3 2
11
Node 2
2 11 2
Node
Node110
Node
2 10
10
7
1 4
110 411
2
33 10
3 10
2
3
33 399 10
22 100
3
3
9
2
0
network phase
phase duration
duration
network phase duration
Node
Node00
➡➡ ➡
➡ ➡➡
➡
➡
➡
S
R
S
R
21 Node
100
21
10
13 S
10 R
13
10
921
1
9
1 10
1131
13 1310
13913
8 8 1
15115
1 1 13
141314
22 22 8
0150
29 29 1
221422
26 2622
21021
3 3 29
282228
11 1126
102110
7 7 3
102810
16 1611
151015
3 3 7
10
31 31
15 1516
15
3
29 29
2 2
Node 1
Node 1
Node 0
Distributed operators
like the join operator
need data shuffling
Node11
Node
Node 1
Node22
Node
Node 2
P10 P
5 P6PP7
P0 PP10PP21PP32PP43PP54 PP65PP76 PP7 0 P
P12 P
P23 P
P34 PP45 PP56 PP67P7P0PP01PP1 2PP2 3PP3 4P4P5PP
6 7
P0 P1 P2 P3 P4 P5 P6 P7 P0 P1 P2 P3 P4 P5 P6 P7 P0 P1 P2 P3 P4 P5 P6 P7
00
22
0
2
4
4
4
66
6
88
8
10
10
10
12
12
12
Locality-aware data shuffling:
• can exploit already small degrees of data locality
• does not degrade when data exhibits no co-location
•
optimally assigns partitions
•
avoids cross traffic through communication scheduling
14
http://www.hyper-db.com/
15
TPC-H query 8
http://www.hyper-db.com/interface.html
16
TPC-H query 8
http://www.hyper-db.com/interface.html
17
TPC-H query 8
WebInterface on TPC-H scale factor 1, non-parallel query execution
Note: compilation time independent of scale factor
Soon: WebInterface on brawny machine, TPC-H scale factor 100, parallel query
execution and instant loading interface for your own files!
http://www.hyper-db.com/interface.html
18
Conclusion
http://www.hyper-db.com/
!
Inspired by “OLTP through the looking glass” and “The End of an Architectural
Era”, the HyPer project has been started.
!
HyPer is one DBMS that …
•
offers highest performance on both, brawny x86-64 platforms
as well as wimpy ARM platforms
•
performance comparable to/outperforming VoltDB in TPC-C
•
performance comparable to/outperforming Vectorwise in TPC-H
•
enables OLAP on the most recent OLTP state
19
References
1) Kemper, A.; Neumann, T., “HyPer: A hybrid OLTP&OLAP main memory database system based
on virtual memory snapshots”, ICDE, 2011
2) Neumann, T., “Efficiently compiling efficient query plans for modern hardware”, VLDB, 2011
3) Mühe, H; Kemper, A.; Neumann, T., “Executing Long-Running Transactions in SynchronizationFree Main Memory Database Systems”, CIDR, 2013
4) Leis, V.; Kemper, A.; Neumann, T., “The adaptive radix tree: ARTful indexing for main-memory
databases”, ICDE, 2013
5) Leis, V.; Kemper, A.; Neumann, T., “Exploiting Hardware Transactional Memory in Main-Memory
Databases”, ICDE, 2014
6) Funke, F.; Kemper, A.; Neumann, T., “Compacting Transactional Data in Hybrid OLTP&OLAP
Databases”, VLDB 2012
7) Mühlbauer, T.; Rödiger, W.; Seilbeck, R.; Reiser, A.; Kemper, A.; Neumann, T., “Instant Loading for
Main Memory Databases”, PVLDB 2013, VLDB 2014
8) Mühlbauer, T.; Rödiger, W.; Reiser, A.; Kemper, A.; Neumann, T., “ScyPer: Elastic OLAP Throughput
on Transactional Data”, DanaC, 2013
9) Rödiger, W.; Mühlbauer, T.; Unterbrunner, P.; Reiser, A.; Kemper, A.; Neumann, T., “Locality-Sensitive
Operators for Parallel Main-Memory Database Clusters”, ICDE, 2014
20