Dirk deRoos Presentation - Dama-NY
Transcription
Dirk deRoos Presentation - Dama-NY
Big Data and the Cloud Dirk deRoos [email protected] @Dirk_deRoos IBM World-Wide Technical Sales Leader, Big Data © 2013 IBM Corporation 2 © 2013 IBM Corporation The Economics of Growth Have Changed • Land • Labor • Capital • Cloud • Analytics • Data Need to Agree on Definitions: Cloud On-demand – Users can sign-up for the service and use it immediately Self-service – Users can use the service at any time Scalable – Users can scale-up the service at any time, without waiting for the provider to add more capacity Measurable – Users can access measurable data to determine the status of the service Coined by Dave Nielsen, CloudCamp Founder Source: Dave Nielsen, CloudCamp Cloud Computing Service Models Software as a Service (SaaS) – Computing capacity – Middleware – Applications Platform as a Service (PaaS) – Middleware – Raw computing capacity SaaS Infrastructure as a Service (IaaS) – Raw computing capacity PaaS IaaS Source: NIST Definition of Cloud Computing v15 “Consumerization” of IT IT departments not seen as source of innovation Home and web-based experiences driving IT expectations in enterprise – Self service provisioning – Time-to-value measured in minutes Enterprise LOB consuming Services by-passing IT dept – IT departments respond by adopting newer technologies, evolving traditional capabilities Deployment Models Private Hybrid Public IT capabilities are provided “as a service,” over an intranet, within the enterprise and behind the firewall Internal and external service delivery methods are integrated IT activities / functions are provided “as a service,” over the Internet Private Cloud Managed Private Cloud Hosted Private Cloud Member Cloud Services Third-party operated On-Premise (Enterprise data center) Public Cloud Services Movement from Traditional Environments to Cloud Many clients are already on the way to cloud with consolidation and virtualization efforts CLOUD Dynamic provisioning for workloads SHARED RESOURCES Common workload profiles AUTOMATE Flexible delivery & Self Service STANDARDIZE Operational Efficiency VIRTUALIZE Increase Utilization CONSOLIDATE Physical Infrastructure Traditional IT Leon Katsnelson ([email protected]) Some Workloads Better than Others for Cloud Higher Gain from External Clouds Idealized Workloads Collaboration Discovery Application Development On-Line Storage SMB ERP Web Scale Analytics [Enterprise Data] Higher Pain to Cloud Delivery DB MigrationSituational Apps Projects Transactional Content Large Enterprise ERP Lower Gain from External Clouds Dep’t. BI Application Test Web2.0 Data Archive Lower Pain to Cloud Delivery “Loosely Coupled” Architecture “Content-Centric” Architecture “DB-Centric” Architecture Storage and Data Integration Arch. Dev/Test Environments: Challenges/Observations 30% to 50% of all servers within a typical IT environment are dedicated to test Most test servers run at less than 10% utilization, if at all! IT staff report a top challenge is finding available resources to perform tests in order to move new applications into production 30% of all defects are caused by badly configured test Testing backlog is often very long and single largest factor in the delay new application deployments Test environments are seen as expensive and providing little real business value * “Industry Developments and Models – Global Testing Services: Coming of Age,” IDC, 2008 and IBM Internal Reports Development/Test Environment - Perfect for Cloud Quick ROI – 30% to 50% of all servers within a typical IT environment are dedicated to test – Most test servers run at less than 10% utilization, if they are running at all! Low risk – Low risk in terms of business and overall IT operations – Security/compliance concerns easily mitigated Excellent return on automation – Agility – Consistent dev/test environments mean fewer errors – Self-service Need to Agree on Definitions: Big Data Information management challenges that can’t be dealt with using traditional tools and approaches Cost efficiently processing the growing Volume 50x 2010 30 Billion 35 ZB RFID sensors and counting Collectively analyzing the broadening Variety 80% of the worlds data is unstructured 2020 Viability Viscosity Responding to the increasing Velocity Veracity Variability Value Valence The Big Data Conundrum The percentage of available data an enterprise can analyze is decreasing proportionately to the available to that enterprise Quite simply, this means as enterprises, we are getting “more naive” over time Data AVAILABLE to an organization Data an organization can PROCESS Traditional Enterprise Data and Analytics Data Sources Put Staging Area in the EDW Actionable Insights + In-database transformations (ELT faster than ETL) + Provides some structure, enabling queries - Adds significant cost and overhead to EDW Predictive Analytics & Modeling Staging Area Expanded EDW Marts BI & Performance Management Structured Operational Archive Information Movement & Transformation Traditional Data Mining and Exploratory Analysis 15 © 2013 IBM Corporation 6 Warehouse Modernization Has Two Themes Traditional Analytics Big Data Analytics Structured & Repeatable Structure built to store data Iterative & Exploratory Data is the structure Business Users Determine Questions IT Team Delivers Data On Flexible Platform Analyzed Information Available Information Capacity constrained down sampling of available information Analyzed Information IT Team Builds System To Answer Known Questions Carefully cleanse a small information before any analysis Analyze ALL Available Information Whole population analytics connects the dots Analyzed Information Business Users Explore and Ask Any Question Analyze information as is & cleanse as needed & existing repeatable 7 Warehouse Modernization Has Two Themes Traditional Analytics Big Data Analytics Structured & Repeatable Structure built to store data Iterative & Exploratory Data is the structure Hypothesis Question Data ? All Information Exploration Analyzed Information Answer Data Start with hypothesis Test against selected data Analyze after landing… Actionable Insight Correlation Data leads the way Explore all data, identify correlations Analyze in motion… Next Generation Information Management Architecture Data Sources Big Data Platform Actionable Insights Real-Time Analytics Predictive Analytics & Modeling Streaming Sensor Geospatial Time Series Structured Operational Landing, Exploration & Archive Analytic Appliances Enterprise Warehouse BI & Performance Management Data Marts Exploration & Discovery Unstructured Information Movement, Matching & Transformation External Social Security, Governance and Business Continuity Hadoop and the Cloud: Considerations Hadoop was designed for bare metal – Hadoop runs best with locally attached storage and dedicated networking – Rack awareness breaks in many cloud deployments – Hadoop will still run in virtualized environments, but data processing will not perform as well as on bare metal • Large amount of network traffic Hadoop has sweet spots – Large scale batch analysis – Data flexibility Data governance requirements – – – – – – 19 Privacy Security Regulatory requirements Metadata management Data access interfaces … © 2013 IBM Corporation Conclusions Cloud infrastructure has many benefits for Big Data analytics – Inexpensive storage – Inexpensive processing (short term) – Flexible (scale in/out) architecture Ideal workloads: Ad-hoc analysis – Performance is of secondary concern – Ability to flexibly pull in many different data sets Longer term applications are more costly on public clouds – Private clouds are an interesting option for internal Hadoop deployments – Ideal for short-term ad-hoc projects • Flexible, inexpensive Consider governance issues!!! – Private clouds may be necessary – Governance tools are available for Hadoop and the cloud • Hint, hint… IBM 20 © 2013 IBM Corporation THINK 21 © 2013 IBM Corporation