Introduction to Cask Data Application Platform
Transcription
Introduction to Cask Data Application Platform
Introduction to Cask Data Application Platform The Introduction to Cask Data Application Platform (CDAP) course delivers the key concepts and expertise needed to build real-time and batch applications on CDAP. This course is comprised of a combination of lectures and labs to reinforce core concepts. At the end of the course participants will be able to write CDAP applications, integrate applications in their CI environments and possess the skills to test and debug applications. Course Duration 8 Hours Audience & Prerequisites This course is designed for developers and architects who are looking to use CDAP. Participants must be familiar with the basics of Java programming, debugging skills and should have basic knowledge about HDFS and MapReduce. Materials Required Laptop with Mac, Windows or Linux installed. For Windows Virtual Box installed (https://www.virtualbox.org/wiki/Downloads) CDAP Standalone VM (will be provided by the instructor) For Mac / Linux JDK 1.6 or 1.7 installed Maven 3.1.1 or higher installed CDAP Standalone ZIP (will be provided by the instructor) ● ● ● ● ● Course Overview ● ● ● ● ● ● ● ● CDAP Concepts & Capabilities Data Ingestion & Exploration Understanding Datasets Data Serving Batch Processing Scheduling & Sequencing Jobs using Workflow Real-time processing Testing and Debugging Strategies Course Outline 1. The Motivation for CDAP Context The Hadoop Ecosystem Today Challenges with Hadoop The need for CDAP CDAP benefits for developers ● ● ● ● 8. Datasets on HBase Introduction to Ta ble Using Table Datasets Comparison to relational databases ● ● ● ● 2. CDAP Overview Functional view of CDAP CDAP architecture Building Blocks ● ● 9. Datasets on HDFS Creating and using file based Datasets Partitioned file Datasets ● ● ● 3. Lab1-Application Lifecycle Lifecycle management for CDAP applications ● 4. Data Ingestion Ingesting data in real-time Ingesting data in batch Managing the data lifecycle ● 10. Data Serving Using services in applications Writing handlers to serve data ● ● 11. Lab3 - Datasets and Services Reading and Writing Datasets using Services 5. Data Exploration Exploring ingested data Attaching schemas ● ● 6. Lab2-Ingestion & Exploration Ingesting data in real-time & batch Attaching and refining schemas ● ● 7. Introduction to Datasets The need for Datasets Data abstraction using Datasets Using Datasets in applications ● ● 15. Real-time Processing Introduction to Tigon Reading from Streams & Datasets Writing to Datasets from Tigon Flows ● ● ● 16. Lab 5 -Simple real-time Flows Writing a simple flow that reads from streams and writes to Datasets ● ● ● ● 14. Lab 4 - MapReduce & Workflow Writing a simple MapReduce job and scheduling using Workflow 17. Testing and Debugging Unit testing CDAP applications Integrating testing in CI Debugging CDAP applications in clusters ● 12. Batch Processing Brief intro to MapReduce Writing batch jobs in CDAP Reading from Streams & Datasets Writing data to Datasets ● ● ● ● ● ● 18. Lab 6 -Writing Unit tests Writing unit tests for CDAP applications ● 13. Scheduling using Workflow Scheduling jobs using workflows Creating time-based and size-based triggers ● ● ● ● Cask Data, Inc., 150 Grant Ave, Palo Alto, CA, 94306