Data warehousing with MySQL Flat Files MySQL MS-SQL

Transcription

Data warehousing with MySQL Flat Files MySQL MS-SQL
Data warehousing with MySQL
MySQL
MS-SQL
Oracle
MySQL
DB2
Flat Files
Free and Open Source Software
MySQL is licensed under GPL.
The GPL is a Free and Open Source Software (FOSS)
license that grants licensees many rights to the
software under the condition that, if they choose
to share the software, or software built with GPLlicensed software, they share it under the same
liberal terms.
Free and Open Source Software
Advantages of Open Source
•MySQL has 5 million plus active installation base.
•New releases immediately downloaded by users
providing early feedback on bugs and features.
•Access to source code
•Write your own features/proprietary Storage
Engine
•Freedom !
Data Warehousing application
Data Warehouse is a relational database.
It is designed for query and analysis rather than for
transaction processing.
It enables an organization to consolidate data from
several resources.
Extraction ,Transformation and Loading
Data
Source
Staging
Tables
MERGE
& BULK
INSERT
MERGE
Tables
Indexes,
Memory
Views,
Summary
SWH
AWH
HEAP
Extract
Load
Transform Storage
Performance
OLTP/ BI
Users
Extraction ,Transformation and Loading
•
•
•
•
•
•
•
Staging database
“LOAD DATA INFILE ….” Command.
Merging of SQLs
Segregating Informations
View enhancements
Index Enhancement
Memory Manipulation
Extraction, Transformation and Loading
Staging Area and its benefits
Relational Table structures are flattened to
support extract processes in Staging Area.
• First data is loaded into the temporary table and
then to the main DB tables.
• Reduces the required space during ETL.
• Data can be distributed to any number of data
marts
Partitioning and Storage Engine
The MERGE Table
MERGE SALES
Table
• A collection of identical MyISAM
tables used as one
• You can use SELECT, DELETE,
UPDATE, and INSERT on the
collection of tables.
• Use it when having large tables
• DROP the MERGE table, you drop
only the MERGE spec.
• Advantage : manageability and
performance
Sales
for
Yr’04
Aug’04
Oct’04
Sep’04
Partitioning and Storage Engine
MERGING based on month as Range
JUN2004
JUL2004
JUN2004
OCT2004
AUG2004
SEP2004
OCT2004
Partitioning and Storage Engine
MERGE Table Example
mysql> CREATE TABLE jan04 ( -> a INT NOT
NULL AUTO_INCREMENT PRIMARY KEY, ->
message CHAR(20));
mysql> CREATE TABLE feb04 ( -> a INT NOT
NULL AUTO_INCREMENT PRIMARY KEY, ->
message CHAR(20));
mysql> CREATE TABLE year04 ( -> a INT NOT
NULL AUTO_INCREMENT, -> message
CHAR(20), INDEX(a)) -> TYPE=MERGE
UNION=(jan04,feb04) INSERT_METHOD=LAST;
Partitioning and Storage Engine
MyISAM Storage Engine
• Supports MERGE table.
• Support fulltext indexing
• “INSERT DELAYED ...” option very useful when
clients can't wait for the INSERT to complete.
Many client bundled together and written in
one block
• Compress MyISAM tables with “myisampack” to
take up much less space.
• Benefit from higher performance on SELECT
statements
Partitioning and Storage Engine
Restrictions on MERGE tables
• You can use only identical MyISAM tables for a
MERGE table.
• MERGE tables use more file descriptors. If 10 clients
are using a MERGE table that maps to 10 tables, the
server uses (10*10) + 10 file descriptors.
• Key reads are slower. When you read a key, the
MERGE storage engine needs to issue a read on all
underlying tables to check which one most closely
matches the given key.
Partitioning and Storage Engine
•
my.cnf parameters for DWH (example)
key_buffer
=
1G
• myisam_sort_buffer_size
=
256M
• sort_buffer
=
5M
• query_cache_type
=
1
• query_cache_size
=
100M
key_buffer is the important one, this tells mysql how
much memory to cap itself
Business Intelligence
Using MySQL database server
• Drastically reduce information retrieval by
distributing data into replicated clusters. This enables
parallel processing.
• Tighter storage format (3 TB squeezed to 1TB)
• Aggregate huge amount of data and deliver reports
for OLAP
• Relieve overloaded OLTP databases
• Availability, scalability and throughput for the most
demanding applications, and of course affordability
Summary
•
•
•
•
•
•
Free and Open Source under GPL
MyISAM Storage Engine
No Transactional Overhead
MERGE Table
Tighter storage format
Highly efficient