Advanced Cask Data Application Platform

Transcription

Advanced Cask Data Application Platform
Advanced Cask Data Application Platform
The Advanced Cask Data Application Platform course will enable participants get
deeper insights into CDAP building blocks and advanced features to build
CDAP-optimized applications. This course is comprised of a combination of lectures
and labs to reinforce the core concepts.
Course Duration
8 Hours
Audience & Prerequisites
This course is designed for developers and architects who are looking to use CDAP.
Participants should have completed the Introduction to CDAP course as a
prerequisite.
Materials Required
Laptop with Mac, Windows or Linux installed.
For Windows
Virtual Box installed (https://www.virtualbox.org/wiki/Downloads)
CDAP Standalone VM (will be provided by the instructor)
For Mac / Linux
JDK 1.6 or 1.7 installed
Maven 3.1.1 or higher installed
CDAP Standalone ZIP (will be provided by the instructor)
●
●
●
●
●
Course Overview
●
●
●
●
●
●
CDAP Architecture and internals
Building Custom Datasets
Building and extending ETL pipelines
Building Application templates and adapters
Optimize CDAP applications
Deeper understanding of transactions
Course Outline
1. CDAP architecture deep dive
Understanding
CDAP system components
Understanding data flow
●
●
2. Stream internals
Stream service design
A deeper look at Streamwriter
and scaling
●
●
8. Metrics and logging
Metrics and logging
programming APIs
Accessing metrics and logs
using REST
●
14. Lab 3 - Extending ETL pipeline
Writing custom sinks and
transformations
●
●
9. Preferences API
Introduction to preferences
Setting multi-level preferences
Scoped runtime arguments
●
●
15. Creating custom Application
Templates
Creating re-usable
application patterns with
templates
●
●
3. Advanced Datasets
Creating reusable data
patterns with Custom Datasets
Dataset-Hive integration
Readless increments
●
●
●
4. Lab 1 - Creating Custom Datasets
Create a custom dataset using
composition
●
5. Transactions deep-dive
The need for transactions
Optimistic concurrency control
Transaction manager
Tephra
●
●
●
●
6. Advanced topics in Tigon
Enabling exactly-once
processing using transactions
Tigon partitioning strategies
Batching to optimize
throughput
Generator flowlets
●
●
●
●
10. Scaling programs
Scaling instances in services
Scaling flowlet instances
●
●
11. Namespaces
Data and application isolation
using Namespaces
Guaranteeing application
resources in Namespaces
●
16. Workers
The need for service workers
Introduction to workers
Workers in applications
●
●
●
17. Lab 4 - Workers
Implementing a simple
worker
●
●
18. Application design
Decomposing a problem into
CDAP building blocks
A sample application as a
case study
●
12. Application Templates & Adapters
Introduction to Application
Templates & Adapters
Introduction to ETL App
Template
●
●
13. Extending ETL App Templates
Writing custom sources, sinks
and transformations Plugins
Deployment and Management
of Plugins
●
●
7. Lab 2 - Flow optimizations
Partitioning strategies
Batching
●
●
Cask Data, Inc.,
150 Grant Ave, Palo Alto, CA, 94306
●
19. Schema design
Do’s and Don'ts in designing
schemas in Datasets
Complex Dataset Design with
Table & Filesets
Salting and Sharding
●
●
●