Advanced Cask Data Application Platform
Transcription
Advanced Cask Data Application Platform
Advanced Cask Data Application Platform The Advanced Cask Data Application Platform course will enable participants get deeper insights into CDAP building blocks and advanced features to build CDAP-optimized applications. This course is comprised of a combination of lectures and labs to reinforce the core concepts. Course Duration 8 Hours Audience & Prerequisites This course is designed for developers and architects who are looking to use CDAP. Participants should have completed the Introduction to CDAP course as a prerequisite. Materials Required Laptop with Mac, Windows or Linux installed. For Windows Virtual Box installed (https://www.virtualbox.org/wiki/Downloads) CDAP Standalone VM (will be provided by the instructor) For Mac / Linux JDK 1.6 or 1.7 installed Maven 3.1.1 or higher installed CDAP Standalone ZIP (will be provided by the instructor) ● ● ● ● ● Course Overview ● ● ● ● ● ● CDAP Architecture and internals Building Custom Datasets Building and extending ETL pipelines Building Application templates and adapters Optimize CDAP applications Deeper understanding of transactions Course Outline 1. CDAP architecture deep dive Understanding CDAP system components Understanding data flow ● ● 2. Stream internals Stream service design A deeper look at Streamwriter and scaling ● ● 8. Metrics and logging Metrics and logging programming APIs Accessing metrics and logs using REST ● 14. Lab 3 - Extending ETL pipeline Writing custom sinks and transformations ● ● 9. Preferences API Introduction to preferences Setting multi-level preferences Scoped runtime arguments ● ● 15. Creating custom Application Templates Creating re-usable application patterns with templates ● ● 3. Advanced Datasets Creating reusable data patterns with Custom Datasets Dataset-Hive integration Readless increments ● ● ● 4. Lab 1 - Creating Custom Datasets Create a custom dataset using composition ● 5. Transactions deep-dive The need for transactions Optimistic concurrency control Transaction manager Tephra ● ● ● ● 6. Advanced topics in Tigon Enabling exactly-once processing using transactions Tigon partitioning strategies Batching to optimize throughput Generator flowlets ● ● ● ● 10. Scaling programs Scaling instances in services Scaling flowlet instances ● ● 11. Namespaces Data and application isolation using Namespaces Guaranteeing application resources in Namespaces ● 16. Workers The need for service workers Introduction to workers Workers in applications ● ● ● 17. Lab 4 - Workers Implementing a simple worker ● ● 18. Application design Decomposing a problem into CDAP building blocks A sample application as a case study ● 12. Application Templates & Adapters Introduction to Application Templates & Adapters Introduction to ETL App Template ● ● 13. Extending ETL App Templates Writing custom sources, sinks and transformations Plugins Deployment and Management of Plugins ● ● 7. Lab 2 - Flow optimizations Partitioning strategies Batching ● ● Cask Data, Inc., 150 Grant Ave, Palo Alto, CA, 94306 ● 19. Schema design Do’s and Don'ts in designing schemas in Datasets Complex Dataset Design with Table & Filesets Salting and Sharding ● ● ●