Data Integration Console User Manual
Transcription
Data Integration Console User Manual
Data Integration Console User Manual Document publish date: 06/04/15 Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #2 Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. This document may not be reproduced, modified or distributed without the prior written permission of GoodData Corporation. GOODDATA CORPORATION PROVIDES THIS DOCUMENTATION AS-IS AND WITHOUT WARRANTY, AND TO THE MAXIMUM EXTENT PERMITTED, GOODDATA CORPORATION DISCLAIMS ALL IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION THE IMPLIED WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT AND FITNESS FOR A PARTICULAR PURPOSE. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #3 Table of Contents Table of Contents 3 Introduction to Data Integration Console 5 Users of Data Integration Console 5 Before You Begin 6 Recommended Practices on Managing Data Loads 6 Interactions between Data Integration Console and CloudConnect Designer 7 Accessing Data Integration Console 10 CloudConnect Resources 10 CloudConnect Training 11 CloudConnect Documentation 11 Managing Data Loading Processes 13 Data Integration Console Overview Screen 13 Data Integration Console Projects Screen 16 Data Integration Console Project Details Screen 17 Deploying a Process 19 Scheduling a Process 21 Schedule a Process on the Data Integration Console Configuring Schedule Parameters Defining Project Parameters Testing Parameter Execution Parameter Usage Tips Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. 21 23 24 25 25 Data Integration Console User Manual Referencing the Project ID Configuring Automatic Retry of Failed Processes Troubleshooting Failed Schedules Page #4 25 26 27 Configuring Schedule Sequences 28 Timing the Schedule 29 Custom Schedules 30 Schedule Details 31 Schedule Execution History 33 Running Schedules On-Demand 36 Batch Loading of Data through Data Integration Console Notification Rules Configuring Notification Rules 37 39 41 Example Notification Message 43 Modifying Legacy Recipients through the Gray Pages 43 Data Integration Console Process Logging Deleting Graphs and Processes Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. 44 44 Data Integration Console User Manual Page #5 Introduction to Data Integration Console The Data Integration Console enables project administrators to manage and track the data loading processes that are supplying data to their GoodData projects. Through Data Integration Console, you can perform the following tasks: l Monitor successful or failed data loads through a single dashboard l Manage many data loads within multiple projects at the same time. l Automate and monitor data loading processes applied to your GoodData projects. l Postpone or disable a scheduled execution. l Trigger manually ad-hoc executions of any data loading process. l Set up and receive alerts and notifications related to data loading process execution. l Review historical performance of the data loads. l Review logs generated during execution. NOTE: In some situations, you may find it easier to monitor and schedule data loads from within your own application. For more information on the GoodData APIs, see GoodData API Documentation. Users of Data Integration Console The Data Integration Console can be used for managing many processes and schedules for the following types of users. l ETL Developer. During the implementation of ETL projects, developers may find the console useful for troubleshooting issues with process execution and monitoring performance of execution runs. Additionally, developers can set up notifications for other members of the implementation Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #6 team, so that they are informed of the status of data in the projects under development. l l API Developer: Through GoodData APIs, developers can access Data Integration Console functionality to manage data population and maintenance of one or more projects. System administrator. After the project has been migrated to maintenance and support, the console provides on-going information about scheduled executions of ETL, either through the console or configured notifications. As the volume and number of ETL processes changes, administrators can adjust schedules so that ETL is processed smoothly. As needed, scheduled processes can be stopped, disabled, deleted, or executed on an ad-hoc basis. Before You Begin Before you begin using Data Integration Console, please verify the following: 1. You are an administrator to at least one GoodData project. NOTE: All projects to which you have access are displayed in the console, so all project users can monitor the project's data loading processes. However, you may modify only the projects that 1) contain at least one process deployed to the platform and 2) you are an Administrator in the project. 2. You are familiar with the ETL graphs of that project, as defined in the CloudConnect project supplying data to your GoodData project. See CloudConnect Designer User Manual (PDF). Recommended Practices on Managing Data Loads When you are using the Data Integration Console, CloudConnect, or other method to execute ETL processes, please keep in mind the following important considerations: l When deploying a new schedule, you should set up a notification to inform you if the data load process has failed. See New Process Schedule. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual l l l l Page #7 After an execution begins, data is being loaded into the system. If the execution is stopped, the data that has been loaded remains in the system. The GoodData Platform does not prevent you from uploading duplicate data. Unless your ETL process has been designed to prevent loading duplicates, you should be careful using ad-hoc executions, which may create duplicate rows of data. Scheduling ETL processes to execute during business hours may impact the performance of the projects into which data is being loaded. Where possible, schedule regular data loads during off-peak hours. All schedule timing is based on UTC. Manual scheduling entries use the cron format. Interactions between Data Integration Console and CloudConnect Designer The CloudConnect Designer desktop application is used to develop logical data models and ETL graphs. All graphs that you have deployed are available in Data Integration Console. l l l A graph is the graphical representation of the set of transformations required to extract, transform, and load source data into your GoodData project. A graph is the minimum unit of processing that can be executed at one time and is specified in a single file. Graphs are defined in CloudConnect Designer, from which they are published to one or more GoodData projects in the platform. A process is one or more graphs of a CloudConnect project deployed into the GoodData Platform. A schedule is the automated execution of the graphs in a deployed process. Schedules can be created only after the process has been deployed into the platform; they cannot be created in CloudConnect Designer. In the diagram below, you can review the relationships between graphs, processes, and the schedules defined in the Data Integration Console to manage them. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #8 Figure: ETL Graphs, Processes, and Schedules In the above image, the Data Integration Console manages the definition of the following items: l l timing. Associated with each schedule is the interval between executions of the ETL graph. Graphs can be scheduled to execute as frequently as every fifteen minutes. See Scheduling a Process. Schedule parameters. As part of any schedule, you can define specific parameters to apply to the graph when it executes. For example, you can define separate schedules to pull from specific user accounts in a source system by defining schedule parameters for the graph's execution. See Configuring Schedule Parameters. NOTE: These parameter settings override any settings defined for the project parameters, which are defined within CloudConnect Designer and are included as part of the Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #9 CloudConnect project definition. See https://developer.gooddata.com/article/cloudconnect-usingparameters. l l graph. Each schedule is associated with an individual graph. Depending on how the ETL is designed, however, this graph can be an orchestrator graph, which calls a number of other ETL graphs as part of its normal operation. Notifications (not pictured). For each process, you can define notifications to alert stakeholders on the status of process executions. These notifications are specific to the selected process only. See Notification Rules. NOTE: A notification applies to all schedules of the entire process. If the process contains multiple schedules, the notification should be designed to support all schedules of the process. Process Deployments: When a process is deployed to the platform, it becomes an unscheduled process in the Data Integration Console. NOTE: Unscheduled processes must have schedules associated with them before they can be executed in the Console. l l A scheduled process is a process in the Data Integration Console for which a recurring schedule has been created. When the process is scheduled to execute, it is queued for processing by the GoodData Platform. Scheduled processes may also be executed on-demand, although there are some considerations to review before executing processes off of their normal schedule. See Running Schedules On-Demand. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #10 Accessing Data Integration Console NOTE: You must be an Administrator of any project that you wish to manage through the Data Integration Console. See Before You Begin. You may access the Data Integration Console using one of the following methods: l l l If you are a project administrator of the project currently loaded in the GoodData Portal, click the menu that displays your name. Select Data Integration Console. Project administrators may also click the Go to Administration link in the Manage page. You may access it via the following URL: https://secure.gooddata.com/admin/disc/ NOTE: Please be sure to include the final backslash. Exiting the Console: To return to the GoodData Portal displaying the project, select the menu that displays your name. Then, select Dashboards. l If you are working with a specific project in Data Integration Console, you may click Go to Dashboards at the top of the Project Details screen to open the selected project in the GoodData Portal. CloudConnect Resources The following resources are available to assist you in getting you up and running with CloudConnect. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #11 CloudConnect Training GoodData University offers a range of instructor-led online training classes, including multiple offerings on the CloudConnect Designer. l Please visit GoodData University. CloudConnect Documentation The following documentation resources are available for CloudConnect Designer, ETL, and the CloudConnect projects fed by the data. Resource Description Link CloudConne ct User Manual General documentatio n on the CloudConnec t Designer and related platform components. CloudConnect Designer User Manual (PDF) CloudConne ct LDM Modeler Guide Documentatio n on the LDM Modeler component of the CloudConnec t Designer. CloudConnect LDM Modeler Guide (PDF) Data Integration Console User Manual Documentatio n on Data Integration Console and Process management Data Integration Console User Manual (PDF) Developer articles on the MAQL DDL documentatio Data Definition n Language http://developer.gooddata.com/reference/maql Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #12 version of MAQL, which is used for defining logical data models Developer CL Tool articles on the documentatio commandn line tool Developer Portal Documentatio n on implementing schedules and processes http://developer.gooddata.com/reference/cltool http://developer.gooddata.com/article/scheduli ng-and-notifications Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #13 Managing Data Loading Processes Through Data Integration Console, you can manage the loading of data into your projects, as defined by your processes, including tracking their execution, reviewing logs, or scheduling them for regular execution. When you login, you are placed in the Overview screen, where you can review the current counts for failed, running, scheduled, and successful executions of your processes. l See DISC Overview Screen. l See DISC Projects Screen. NOTE: A process corresponds to one or more graphs and the schedules associated with them. These graphs are created and tested in CloudConnect Designer before they are deployed to your GoodData projects, after which they appear in Data Integration Console. See Interactions with CloudConnect Designer. l To logout of GoodData, select the menu displaying your name. Then, select Logout. Data Integration Console Overview Screen In the Overview screen of the Data Integration Console, you can review the counts for execution outcomes for processes in your projects. l l The listed projects are the ones to which you have access. You may make modifications only to the listed projects for which you are an Administrator and that have loaded processes. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #14 Figure: Overview Screen Click any of the counts to review the individual projects that contribute to it: l Failed: count of projects where one or more processes were started yet failed to complete. Executions that have failed may have no data updates or incomplete date updates applied to the target project. Failed executions should be explored and resolved as soon as possible to prevent project users from working with inaccurate data. Try to keep the count of failed executions at 0. l l l Running: count of projects where one or more processes are currently being executed by the GoodData Platform. Scheduled: Count of projects where one or more processes have been scheduled for execution. These processes have been placed in the queue and are run as soon as possible. Successful: Count of projects where one or more processes have successfully executed. Tip: All of your processes should be listed in this category. Fix or disable those that are not. For more information on statuses, see Schedule Execution History. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #15 For each project for which you are an Administrator, you can review the processes and their related schedules in a hierarchy. For each schedule, you can review details on the execution. l l Click a schedule name to review its details. See Schedule Details. To execute a schedule immediately, click the checkbox next to its name. Then, click Run or Restart. Depending on the state of the data in the project, restarting a partially completed schedule may introduce duplicate data in the project. See Running Schedules On-Demand. l l To disable a schedule, click its checkbox. Then, click Disable. To review the logging information for an executed schedule, click the Log ( ) icon. To explore additional information on the project, process, or schedule, click the corresponding name in the detail table. l See Project Details Screen. l See Scheduling a Process. General Bulk Operations: In the Overview screen, you can apply the same operation to multiple projects and processes at the same time. l l l l For more information on applying bulk operations to projects, see DISC Projects Screen. To select all processes of all projects listed on the screen for which you are an administrator, click the checkbox next to the Run button. You may also select one or more processes and projects (which applies them to all processes in the project). After you make your selections, click one of the buttons above the list of projects. Restarting and Redeploying Operations: Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual l l l l Page #16 Before stopping multiple projects, please verify that no processes are currently running for those projects. Any partially loaded data remains in the project, and a restart of the process may create duplicate data. Before you restart multiple data loads, please verify that you have addressed any issues with the process. You may need to review the log and download and fix the process before redeploying. If you disable multiple projects, you may use the status filter on the Projects page to review these projects. See DISC - Projects Screen. Before running a schedule, you should verify that you aren't loading duplicate data. Check the status of the last few scheduled executions. Data Integration Console Projects Screen In the Projects screen of the Data Integration Console, you can review all of the projects to which you have access. l Click Projects in the menu bar. NOTE: You may only make modifications to projects for which you are a project Administrator. Figure: Projects Screen l l To search your available projects, enter a project identifier or a search string for the project name in the textbox. To filter the list of projects based on the results of last execution of the process, make a selection from the drop-down above of the table. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual l l Page #17 To the left of the name of each project, you can review the results of the most recently scheduled process. A green checkmark indicates a successful execution. To review the details of a project's processes, click the name of the project. See Project Details Screen. Bulk Operations: In the Projects screen, you can apply the same operation to multiple projects at the same time. l For more information on applying bulk operations to processes, see DISC Overview Screen. NOTE: The order of execution of bulk operations cannot be guaranteed based on the selections in this screen. If there are dependencies on execution order, you should avoid using bulk operations. l l l To select all projects for which you are an administrator, click the checkbox next to the Deploy Process button. You may also select one or more specific projects. After you make your selections, click the button above the list of projects to apply the operation. Data Integration Console Project Details Screen In the Project Details screen, you can review the individual processes and schedules that are associated with the selected project. By default, the current schedules are displayed, with schedule execution history over the preceding seven days displayed. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #18 Figure: Project Details Screen l To create a new schedule for a process in the displayed project, click New schedule. l l l l l You may also create a schedule for an individual graph. Click the graphs tab in the Project Details screen. Then, click the Schedule link. See Scheduling a Process. Select the schedule link to review details of the process schedules. See Schedule Details. To deploy a process from your local desktop to the selected project, click Deploy Process. See Deploying a Process. To open the project in the GoodData Portal, click Go to dashboards. Tip: Under the project's metadata, you can review and copy the internal project identifier, which may be useful in locating and accessing the project through other interfaces. For each process, you can review the execution history over the preceding seven days or access the schedule, graphs of the process, and metadata associated with the process. The current schedule is listed, followed by indicators the executions attempted over the previous seven days. l A green vertical bar indicates a successful execution. A red bar indicates a failed execution. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual l Page #19 To download the process for use in CloudConnect Designer, click Download. Tip: Download the entire process for review, testing, and debugging in CloudConnect Designer. l l To delete the process from the project in the platform, click Delete. A new version of the process can be deployed from CloudConnect Designer or Data Integration Console. For more information on uploading through the Console, see Deploying a Process. To configure notification rules for the scheduled execution, click the link indicating the count of notification rules. See Configuring Notification Rules. Deploying a Process Through the Project Details screen, you can deploy a process to the currently selected project. Click Deploy Process. Figure: Deploying a process Steps: Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #20 1. To select a package, click Browse. Navigate your local environment to select the ZIP file to upload. The file must contain all resources required for the process to execute. NOTE: Deployed process packages must be less than 1MB in size. l l CloudConnect processes can be extracted through the application. The local CloudConnect project must be saved into a ZIP file, which can be uploaded through this interface. See CloudConnect Designer User Manual (PDF). Ruby scripts need to include all components of the process, including scripts for ETL, logical data model, and any parameter files. These packages need to be bundled in a single ZIP file. The Ruby option is for internal use only. It will be enabled for external users in a later release. Tip: You may also deploy Ruby packages using scripts that reference commands in the GoodData Ruby SDK. See http://sdk.gooddata.com/gooddata-ruby/. 2. For the Process name, enter a descriptive value, which appears in the Data Integration Console interface. 3. To upload the process to the target GoodData project, click Deploy. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #21 Scheduling a Process After you have deployed your processes to your GoodData project, you can schedule execution of them through Data Integration Console. NOTE: You do not need to specify login credentials or be logged into the GoodData Platform at the time of execution for a schedule process to be initiated. l l To schedule execution of a process, click New Schedule in the Project Details screen. See Project Details Screen. You may also execute schedules on-demand. Depending on the process and the current state of the data in the target project, you may be inserting duplicate data in the project. See Running Schedules On-Demand. Schedule a Process on the Data Integration Console Administrators Create a new data loading schedule to automatically execute an existing data loading process at a specified time. Only one data loading process can be executed at a time. You can schedule only data loading processes that already exist. See Preparing a Data Loading Process. NOTE: Data loading during business hours may negatively impact system performance. Frequent updates may also impact the performance of your projects. See Timing the Schedule. Steps: 1. Click your user name > Data Integration Console. 2. Click Projects to open the Projects page. Click the name of the project where you want to create the schedule and click New schedule. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #22 3. Select the process to execute and frequency of execution. The following type of processes exist: l l l Ruby scripts CloudConnect project -- Select a.grf file. For more information about .grf files see Preparing a Data Loading Process. Data loading process -- Loads data from the ADS to a data mart. For data loading processes, you must also select the datasets to load data to under Upload Data To. Tip: Use the after selection to configure schedules to execute sequentially. See Configuring Schedule Sequences. 4. For .grf files and ruby scripts, optionally add additional parameters to your schedule. A project parameter is a name-value pair that can be passed to the graph before execution begins. If the graph is designed to consume it, the project parameter can be used to define variables specific to the execution. For Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #23 example, you can define project parameters for customer-specific login credentials for an external data source. See Configuring Schedule Parameters. 5. Optionally specify a new schedule name. 6. Click Schedule. 7. The schedule is saved and the GoodData Platform executes the process as scheduled. Tip: You can add a retry delay to a schedule after you create it. See Configuring Automatic Retry of Failed Processes. Configuring Schedule Parameters In a schedule for a process, you may reference parameters from your CloudConnect project. Schedule parameters are inputs to be applied to the execution of the scheduled process. Using parameter values, the process can be configured to behave differently depending on the circumstances. For example, you can specify parameters to load multiple projects using the same process. In CloudConnect Designer, a parameter is a name-value pair that is stored internally in a graph or externally at the project level. Parameters that may be modified by other CloudConnect users must be stored as external parameters. l In CloudConnect Designer, parameters and their values are stored in the *.prm files. l NOTE: Schedule parameters override any parameter settings defined within CloudConnect Designerand are applied only when the data loading process executes. So, you can use your schedule parameters to manage specific configuration of multiple schedules, such as changing the process to run for each customer on different schedules. For more information, see Testing Parameter Execution. l For more information on the uses of parameters in CloudConnect Designer, see http://developer.gooddata.com/advanced-guides/cloud- Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #24 connect-best-practices/cloudconnect-using-parameters. Figure: Configuring schedule parameters You may specify secure and unsecure parameters to include in your schedule. l l Secure parameters are useful for passing in sensitive data, such as usernames and passwords, as part of the transformation. These parameter values are encrypted and do not appear in clear-text form in any GUI or log entries. Before saving the schedule, use the Show Value checkbox to display the value of a secure parameter for review purposes. When the schedule is saved, secure parameter values are hidden. Defining Project Parameters Through CloudConnect Designer, you can define parameters for your projects. which makes them available for use in Data Integration Console. NOTE: Parameters must be defined within one of the CloudConnect Designer graphs used in your process to be available for inclusion in your process schedule. The values specified in the schedule take precedence over any values specified in the graph definition and are applied to all graphs in the process. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #25 A CloudConnect project's parameters are defined in its workspace.prm file. See CloudConnect Designer User Manual (PDF). Testing Parameter Execution In CloudConnect Designer, you can create and test your parameters before you apply them to your production data loading processes in the GoodData Platform. Parameters can be added by file or by manual entry. l For more information, see http://developer.gooddata.com/advancedguides/cloud-connect-best-practices/cloudconnect-using-parameters. Parameter Usage Tips The following are some tips on how to use parameters effectively in defining your schedules. l l l l l l When defining a process, you can use parameters to switch between your development, testing, and production environments. You can also use parameters for deployment of a process across multiple customer projects. Define and use parameters for credentials and other data that can easily change. Use secure parameters for sensitive data such as passwords. Define default parameters in the workspace.prm file. As needed, you can override them during execution using the schedule parameters. For more information, see http://developer.gooddata.com/advancedguides/cloud-connect-best-practices/cloudconnect-using-parameters. Referencing the Project ID This section provides an example of how to use project parameters in your schedules. For reference purposes, you may wish to create a parameter that corresponds to the GoodData project identifier. For example, you may define the PROJECT_ID parameter in the external parameters as the following: Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #26 PROJECT_ID=(project_identifier) where l (project_identifier) is the ID for your GoodData project. This identifier never changes. You can retrieve the GoodData project identifier through CloudConnect Designer. See CloudConnect Designer User Manual (PDF). Configuring Automatic Retry of Failed Processes Occasionally, scheduled executions of ETL processes in the platform may fail. These failures may be due to configuration issues, network interruptions, scheduled maintenance, or similar issues. NOTE: By default, a process that fails is not restarted automatically. To enable auto-restart, you must add a retry delay. When a user-defined delay is specified, the platform automatically re-runs the ETL process if it fails, after the period of time specified in the delay has elapsed. If it fails again, execution is attempted again after the same period of time. l l The minimum permitted delay is 15 minutes. When a schedule fails 5 times in a row, a notification email is delivered to you. NOTE:If a process fails 30 times in a row, it is automatically disabled and cannot be re-run until it is manually enabled again. See Troubleshooting Failed Schedules. In the schedule definition area of the Data Integration Console, you may define the retry delay. l To change the retry delay, click Add retry delay. Enter the value in minutes that you would like for the platform to wait before retrying the ETL process. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #27 Figure: Add retry delay If your ETL processes need to occur in a specific sequence or if your data loads may push the maximum limits permitted for your organization, you should specify your retry delays for each process with some care. l l If the retry delay overlaps the next execution of the scheduled process, then the failed scheduled execution is dropped, and the latest scheduled execution is processed. ETL processes that are retried are inserted into a processing queue, so they may not be processed at the exact interval. Troubleshooting Failed Schedules If an ETL schedule fails 5 consecutive times, a notification email is sent. Unless the underlying issues are corrected, the process is automatically disabled after 30 failed executions in a row. You must re-enable it manually after all issues are resolved. If you have been notified when your schedule has failed repeatedly, the root cause of the error may vary. Please check the following: l Your credentials are valid. l Your connections are set up properly. l l l The changes you’ve made since the last successful execution haven’t broken the graph supplying data. All data sources are accessible. All GD Writer components in your graphs have been properly configured with the appropriate property settings. Check the last text log in the schedule to assist in identifying issues. Failing schedules should remain disabled until the issues are addressed. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #28 Configuring Schedule Sequences As needed, you can configure schedules to be executed in sequence, creating a chain of updates to your data. For example, suppose your project receives order updates through an ETL process. Before you update your order ETL, you might wish to provide updates from your enterprise master data ETL. In this manner, any new customers or products referenced in the order stream are available in the project. When configuring a schedule, you can specify that the schedule should be executed after the successful execution of another schedule: Figure: Configuring a schedule to occur after another The triggering schedule must successfully execute in order to execute the schedule. l l Schedules cannot be sequenced in a loop. A schedule can be used only once in a scheduling sequence. If a schedule is deleted, all schedules that are supposed to run after it must be reconfigured. The reported error message for these schedules is "Trigger schedule missing!" You can navigate a sequence of schedules. Links to connected schedules are displayed next to the schedule name: Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #29 Figure: Click links to navigate the sequence of schedules Timing the Schedule Processes can be scheduled to execute according to calendar intervals or based on cron timings you specify. NOTE: If the execution time of a run is greater than the interval between scheduled runs, then the next scheduled run is dropped, and the third scheduled run is later executed according to the schedule. For example, if a daily run takes 25 hours to execute, the run is executed every two days. You may need to tune your timings based on the average length of your runs. l See Custom Schedules. Steps: 1. Select the interval from the drop-down. The minimum supported frequency is ever 15 minutes. If you select a short interval for a large dataset, you may experience performance impacts on the system. GoodData recommends that you schedule your processes to execute and complete during off-peak hours and at intervals that do not impact system performance. 2. As needed, make selections from the provided drop-downs for the selected interval to configure the specific time within that interval for the process to run. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #30 NOTE: All timing is based off of UTC (Coordinated Universal Time), which corresponds to Greenwich Mean Time. Please adjust your timing accordingly. 3. To save your process schedule changes, click Save Changes. Custom Schedules If none of the available scheduling options is appropriate for your process, you can configure a custom schedule, as needed. The Data Integration Console enables the configuration of schedules using the cron format. cron is a Unix-based job scheduling mechanism that enables users to trigger the execution of scripts or other processes at predefined intervals. The cron timing format enables more options for configuring the process execution timing. NOTE: GoodData does not support the use of seconds in cron expressions. Please enter five-digit cron expressions. NOTE: You should understand the formatting requirements of cron before you specify custom schedules. This information is publicly available. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #31 Schedule Details In the Schedule Details pane, you can review the history of runs of the selected schedule over time and make modifications as necessary to the schedule. Modifications to the schedule include the graph to execute, the parameters to apply, and the retry delay. l l The listed username identifies the user under which the schedule executes. This user currently owns the schedule. The owner of the schedule may differ from the owner of the process, since processes can be downloaded and redeployed at any time. For example, if a process created by User A is redeployed by User B, all schedules associated with the process are now owned by User B, who will be the user under which all schedules for the process are henceforth executed. Figure: Process History Commands: l l By default, a schedule's name is set to the graph's name. To change the name of the schedule, click the graph name. Enter the new name and click Save. To switch the graph used in this schedule, select a different graph from the drop-down at the top of the pane. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual l Page #32 To queue the process for execution, click Run. The process is executed as soon as possible. NOTE: You may queue a schedule for execution at any time. However, executing the process may create issues with the data that is loaded in the project. Please be aware of the possibilities before running schedules on-demand. See Running Schedules On-Demand. l l l To stop the process that has been queued for execution or is being executed, click Stop. To delete a schedule, click Delete. The schedule is deleted, while the process and any associated notifications remain in the platform. As needed, you may disable a schedule, which prevents scheduled executions until you re-enable the schedule. Click Disable. l You may click Run to execute disabled or enabled schedules. NOTE: If a schedule repeatedly fails, it may be automatically disabled, and your project data is no longer refreshed until you fix the issue causing failure and re-enable the schedule. For more information on debugging failing schedules, see Troubleshooting Failed Schedules. l l l You can review the history of schedule execution at the bottom of the screen. See Schedule Execution History. To change the timing of the schedule, select a new interval from the dropdown. Update other properties as needed. Then, click Save Changes. See Timing the Schedule. Parameters can be added to the process to customize it for individual executions. See Configuring Schedule Parameters. Changing the graph: To modify the graph to use in the schedule, select the graph from the drop-down at the top of the window. The next time the process executes, the new graph is used to run the process. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #33 NOTE: When the new graph runs, it uses any previously defined parameters as part of the execution. You may wish to modify these parameters before execution of the new graph. See Configuring Schedule Parameters. Schedule Execution History At the bottom of the Schedule Details screen, you can review the history of the schedule executions. Figure: Process History Details In the above figure, the last seven days of executions of the graph are displayed. For each execution, you can review: l l In the dated history bar, you can see the instances in which the process has been executed over the past seven days. The icon on the left side of the screen indicates whether the process executed successfully or not. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual l l Page #34 A hand icon indicates an execution that was triggered manually. Red text indicates that the process encountered an error and failed. Process Execution Result States The table below identifies the possible states of schedules in the list. l l You may access the log generated for the process run. To open the log, click the ( ) icon. See Process Logging. You may also review the runtime duration of the process execution, as well as start and end timestamps. State Successful Failed Description Tip: All of your processes should be listed in this category. Fix or disable those that are not. Executions that failed to complete or that were manually stopped have been marked in red. The displayed ERROR message provides information on what caused the process to fail. To review the log for further details, click the icon. Executions that have failed may have no data updates or incomplete date updates applied to the target project. Failed executions should be explored and resolved as soon as possible to prevent project users from working with inaccurate data. Try to keep the count of failed executions at 0. NOTE: Stopped processes are categorized as errors, since the data load is incomplete. All incomplete loads are treated as errors. Running Scheduled graph execution has begun in the platform. A timestamp indicates when the execution began and the current duration of the execution. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #35 Scheduled Graph has been scheduled for execution at the appropriate time. Disabled The scheduled has been disabled. It will not automatically run until it has been re-enabled. l Broken Schedule Disabled schedules can be manually re-run, although there are some risks with doing so. See Schedule Details. Schedules whose graph no longer exists are marked as broken schedules. Typically, schedules are broken if a process is redeployed under a new name. The scheduled can be fixed by selecting the appropriate to graph to run in the schedule definition. Unscheduled These processes do not have a schedule associated with them. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #36 Running Schedules On-Demand As needed, you can run a schedule at any time. The processes of the schedule are queued in the platform and are executed as soon as resources become available. l In the Data Integration Console, you must create a schedule for a process before you may execute process. GoodData does not prevent the loading of duplicate data through a process. Particularly if you run a process at an ad-hoc interval, it is possible to load duplicate versions of data. Please use this feature carefully. Depending on the volume and complexity of the process, executing a process during peak hours can impact performance of the GoodData project that it is updating. Where possible, execute processes during off-peak hours. Tip: As a best practice, you should create one or more data validation reports to identify how your processes are working. To see the effects of your processes in your GoodData projects, you can open a different browser tab and navigate to https://secure.gooddata.com. Open a report that is populated by the process. Steps: 1. In the Project Details screen, select the scheduled graph you wish to run. 2. Then, click Run. 3. The schedule is queued for execution and is run as soon as possible. NOTE: The schedule is executed as soon as platform resources are available. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #37 To stop a schedule in the middle of execution, click the Stop button. Figure: Stop button NOTE: When an upload is stopped in the middle of execution, the data that has already been uploaded to the project remains in the project. If you are unsure whether your ETL process can safely resume loading data, you can manually delete the uploaded data from the Manage page of your project. Through the following API endpoint, you may trigger on-demand executions of processes without scheduling: /gdc/projects/[project_id]/dataload/processes/[process_id] Batch Loading of Data through Data Integration Console Through the Data Integration Console, you can enable execution of multiple files in a single CloudConnect process. To enable batch loading of multiple files that you have posted to your projectspecific storage, please create the following parameter in your schedule: GDC_USE_BATCH_SLI_UPLOAD=TRUE When enabled in your schedule, the process attempts to load all files in projectspecific storage, based on the JSON manifest file that you have created. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual l Page #38 Batch loading of files requires additional configuration in your ETL graphs in CloudConnect Designer. For more information, see https://developer.gooddata.com/article/multiload-of-csv-data. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #39 Notification Rules If desired, you can configure notifications to be delivered via email for process execution events. These notification rules can be used to update stakeholders when data is refreshed or project administrators when there were problems with the process execution run. l To configure notification rules, click the notification rules link at the top of the Project Details screen. See Project Details Screen. NOTE: A notification rule is associated with a process, not a schedule. If the schedule is removed, the notification remains. If the process is removed, any associated notifications are removed from the Console. However, these notifications still exist in the project and may be accessed through the APIs. See GoodData API Documentation. Tip: A notification applies to all schedules for the process and should support the corresponding event for each schedule of the process. You can use the variable identifying the executable to assist in identifying the scheduled graph or script that was run. Tip: REST-based notifications may also be configured for delivery using the GoodData APIs. See http://developer.gooddata.com/article/setting-up-thenotifications-using-api. This list of notification rules for the process is displayed. Figure: List of notification rules for this process Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #40 NOTE: Notification rules created in the Data Integration Console can be modified by any administrator of the process for which the notification is configured. For notification rules that were created through the legacy method in the gray pages, only the original owner of the notification can make changes. Through the gray pages, you can locate the original resource where the owner of the notification is identified. l l To edit any notification rule, select it and make your edits as needed. See Configuring Notification Rules. To create a new notification rule for the process, click Add notification rule. The Notification Rules window is displayed, where you can specify new ones. If you have previously configured notification rules, you can review them in this window. See Configuring Notification Rules. l l To delete a notification rule, click the Trash icon. To close the Notification Rules window, click Close dialog. If notification rules have been added or deleted, the number of notification rules is updated at the top of the Project Details screen. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Configuring Notification Rules Figure: Notification Rules window To configure a new notification rule, please complete the following steps. Tip: When you are beginning to use a new process, you may wish to create notification rules for all possible events. As the process stabilizes over a number of successful executions, you may choose to remove some of the notifications. For stable processes, you should retain at least the failed notification. Steps: Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Page #41 Data Integration Console User Manual Page #42 1. Enter a valid email address or alias in the Send Email To textbox. NOTE: You may configure only one recipient address per notification rule. Some legacy notifications may include references to multiple recipients. These notifications must be modified through the gray pages. For more information, see Modifying Legacy Recipients through the Gray Pages. 2. From the drop-down, select the event that triggers the notification: 1. success: Notification is sent upon successful completion of the process. 2. failure: Notification is sent if the process fails to complete. 3. process scheduled: Notification is sent if the process has been added to the queue for execution. Typically, the time between this event and the process started event is very short. 4. process started: Notification is sent when the process begins execution. 5. custom event: You may specific events that are custom to the specific project. For more information on defining custom events, see https://developer.gooddata.com/article/creating-custom-notificationevents. 3. Enter a meaningful text message in the Subject. This message should indicate that the event occurred. 4. You can insert variables into the Subject or the body of the message. 1. Below is an example variable that you can insert: {$params.USER_EMAIL} NOTE: The list of available variables varies depending on the selected event. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #43 2. When the email is generated, these strings are replaced with the corresponding values from the process. 3. Above image is one example of a success message. For an example of using these variables in an error notification, see Example Notification Message. 5. For the message body, provide sufficient descriptive information so that the recipient knows the name and type of event that occurred, as well as the project in which it occurred. Adding a start and end time is helpful, too. 6. To create the notification rule, click Save Changes. 7. To cancel the rule, click Close dialog. Example Notification Message Below, you can review an example notification message, which could be used for configuring a notification when a process failed to execute: Please note that the load of the GoodData project (id =${params.PROJECT}) using process "${params.PROCESS_NAME}", graph ${params.GRAPH} that started at ${params.START_TIME} failed at ${params.FINISH_TIME} with following ERROR: ${params.ERROR_MESSAGE} Please inspect the ${params.LOG} for more details. Modifying Legacy Recipients through the Gray Pages For more information, see http://developer.gooddata.com/article/modifyingmultiple-recipients-of-notifications-through-gray-pages. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual Page #44 Data Integration Console Process Logging For each execution of a process, a log is generated containing the status messages of the steps of the process run. All logs are accessible via the Data Integration Console. Tip: If you are using the Chrome browser, GoodData provides a useful extension to assist in monitoring and debugging process execution. For more information on the GoodData Extension Tool for Chrome, see https://developer.gooddata.com/tools. l Click l When selected, the log is displayed as a text file in your browser. l to open the log for a specific execution of the schedule. To locate errors, search the text file for ERROR. All error messages need to be addressed in your CloudConnect project before the process can successfully execute. You can identify the source of the error by examining the filename where the error occurred. The following are general areas where problems may occur in the execution of a process: l Connectivity issues l Transformation processing errors l Problems with the data source Please review and attempt to fix through CloudConnect Designer. The name of the component where the error occurs is also displayed with the error message. This information is useful in debugging and fixing your transformation issues. l See CloudConnect Designer User Manual (PDF). Deleting Graphs and Processes Processes may exist in the GoodData Platform, in CloudConnect Designer projects in your local environment, or in references in any scripts that you use to deploy them. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Data Integration Console User Manual l Page #45 To remove an entire process from your GoodData project in the platform, select the project. In the Project Details screen, click Delete next to the name of the process. NOTE: The above step removes the process from the selected project only. Processes may be deployed to multiple projects. If you are using CloudConnect to deploy processes, you should verify that the CloudConnect project containing the process in your local environment is not configured to use the GoodData project as its working project. l l See CloudConnect Designer User Manual (PDF). If you are deploying the process via API, you should verify that your deployment scripts are no longer referencing the GoodData project. Schedules and notifications are defined within the GoodData Platform only; if you remove them from the CloudConnect Designer, they are removed from the system. l You may delete schedules through the Data Integration Console. See Schedule Details. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved. Copyright © GoodData Corporation 2007 - 2015 All Rights Reserved.