How to automate Enterprise Miner model business predictions up to date
Transcription
How to automate Enterprise Miner model business predictions up to date
How to automate Enterprise Miner model training to be more efficient and to keep business predictions up to date Marcel Eberle / Dr. Ingo Hary Agenda • Motivation and project idea • How to automate an Enterprise Miner workflow • Extending automation to train many models Motivation Initial situation • Models are not retrained at a regular basis because of resource constraints • But training new models is also time-consuming and cumbersome when done with the GUI • Existing models are vulnerable to changes in underlying databases -> Is it possible to improve the current predictive modelling process? Project idea The key improvement is the full automation of this process. This offers the following advantages: • Models are always up to date • Development of new models and maintenance is quite easy • Independence of changes in underlying databases How to automate an Enterprise Miner workflow Difference between existing and new process Training table Existing process Automated EG process Manual EM model-building process Automated EG process EM process is parameterised and integrated in EG with SAS code Automated EG process New process Automated EG process Focus of the following slides How to automate an Enterprise Miner workflow Steps to automate model generation and training using SAS code The first step is to automate the process for a single model: 1. Create an Enterprise Miner project with a base model 2. Export this model/workflow as SAS code 3. Parameterise this exported SAS code 4. Run the SAS code to generate a miner project and train the model How to automate an Enterprise Miner workflow 1. Create an Enterprise Miner project with a base model This workflow is exported and parameterised. How to automate an Enterprise Miner workflow 2. Export this model/workflow as SAS code Important metatables The exported SAS code contains the definition of this entire workflow. A lot of information is saved in SAS metatable definitions: Metatables (SAS datasets) Content actions Execution details nodes Node IDs and description nodeprops All node properties connect Connections between nodes workspace Miner project details (path, name, etc.) Dec_&Target_Name._DM * Properties of decision node Dec_&Target_Name._DD * Properties of decision node *: only if a decision node is used How to automate an Enterprise Miner workflow 2. Export this model/workflow as SAS code Example: metadata table “connect” How to automate an Enterprise Miner workflow 3. Parameterise exported SAS code The exported SAS code contains several constants (names, quantities, etc.). These parts must be parameterised in order to use this workflow for different models. COUNT=&_N_Target_1_.; data work.Dec_&_target_._DD; DATAPRIOR=%sysevalf(&_N_Target_1_. / (&_N_Target_0_. + &_N_Target_1_.)); How to automate an Enterprise Miner workflow 4. Run SAS code to generate a miner project and train model After the entire process is defined it can be run using the “em5batch” macro. This macro generates and runs the miner project based on the previously defined metatables. How to automate an Enterprise Miner workflow 4. Run SAS code to generate a miner project and train model Extend automation to train many models After setting up a prototype to automate the training of a single model, the next step is to automate it to train many models. Training table New process for one model Automated EG process EM process is parameterised and integrated in EG using SAS code Automated EG process New process for many models Automated EG process SAS Code (EG) creates EM project for model 1 SAS Code (EG) creates EM project for model 2 …….. SAS Code (EG) creates EM project for model n Automated EG process Extend automation to train many models All process steps are managed and executed out of Enterprise Guide Process overview Data preparation Targets / candidate table Predictor table Training automation of several models 1. Define process steering metatables 2. Create model training table 3. Generate and run miner project Loop over all models 4. Export model/score code Model reporting and quality check Scoring on current customer data (deployment) Focus of the following slides Extend automation to train many models 1. Define process steering metatables The process is controlled with the following two metatables. Name of the macro which creates the training table Green: Input for Enterprise Miner (%em5batch) Calculated and updated during training process Defines which attributes are eligible to build models depending on the customer segment. Extend automation to train many models 2. Create training tables There are two tables which contain all needed attributes. One table contains all predictors and one contains all targets and candidates. Predictors Join + variable selection Sampling Targets/ candidates Macro to create training table Training table Create EM data source definition (%emds macro) Update metadata Extend automation to train many models 2. Create training tables Code extract &Project_ID_1 = cableonpchurn &Project_ID_2 = dslforseg2 &Project_ID_3 = dslforseg3 etc. Macro to create training table Extend automation to train many models 3. Generate and run miner projects An Enterprise Miner project is generated for each model: • Create miner-specific metadata tables (nodes, nodeprops, connect, etc.) • Generate miner project with the %em5batch macro Extend automation to train many models 3. Generate and run miner projects All miner projects are run using the %em5batch macro. Saves and renames model Extend automation to train many models Result EM projects Model quality (lift) Models Conclusion: An automated model training process is an alternative to manual training. Benefits • No need to check if retraining is necessary - just do it • Control over entire process is improved dramatically • Frees up significant amounts of resources for other modelling & analysis tasks • Enables you to train and deploy large numbers of models in one process run • Allows you to parameterise node settings to find the best solution Questions? Contact: [email protected]