How to automate Enterprise Miner model business predictions up to date

Transcription

How to automate Enterprise Miner model business predictions up to date
How to automate Enterprise Miner model
training to be more efficient and to keep
business predictions up to date
Marcel Eberle / Dr. Ingo Hary
Agenda
• Motivation and project idea
• How to automate an Enterprise Miner workflow
• Extending automation to train many models
Motivation
Initial situation
• Models are not retrained at a regular basis because of
resource constraints
• But training new models is also time-consuming and
cumbersome when done with the GUI
• Existing models are vulnerable to changes in underlying databases
-> Is it possible to improve the current predictive
modelling process?
Project idea
The key improvement is the full automation of this process.
This offers the following advantages:
• Models are always up to date
• Development of new models and maintenance is quite easy
• Independence of changes in underlying databases
How to automate an Enterprise Miner
workflow
Difference between existing and new process
Training
table
Existing
process
Automated EG
process
Manual EM model-building process
Automated EG
process
EM process is parameterised and
integrated in EG with SAS code
Automated EG
process
New process
Automated EG
process
Focus of the following slides
How to automate an Enterprise Miner
workflow
Steps to automate model generation and training using
SAS code
The first step is to automate the process for a single model:
1. Create an Enterprise Miner project with a base model
2. Export this model/workflow as SAS code
3. Parameterise this exported SAS code
4. Run the SAS code to generate a miner project and train the model
How to automate an Enterprise Miner
workflow
1. Create an Enterprise Miner project with a base model
This workflow is exported and parameterised.
How to automate an Enterprise Miner
workflow
2. Export this model/workflow as SAS code
Important metatables
The exported SAS code contains the
definition of this entire workflow. A lot of
information is saved in SAS metatable
definitions:
Metatables (SAS datasets)
Content
actions
Execution details
nodes
Node IDs and description
nodeprops
All node properties
connect
Connections between nodes
workspace
Miner project details (path, name, etc.)
Dec_&Target_Name._DM
* Properties of decision node
Dec_&Target_Name._DD
* Properties of decision node
*: only if a decision node is used
How to automate an Enterprise Miner
workflow
2. Export this model/workflow as SAS code
Example: metadata table “connect”
How to automate an Enterprise Miner
workflow
3. Parameterise exported SAS code
The exported SAS code contains several constants (names,
quantities, etc.). These parts must be parameterised in order to use
this workflow for different models.
COUNT=&_N_Target_1_.;
data work.Dec_&_target_._DD;
DATAPRIOR=%sysevalf(&_N_Target_1_. / (&_N_Target_0_. + &_N_Target_1_.));
How to automate an Enterprise Miner
workflow
4. Run SAS code to generate a miner project and train model
After the entire process is defined it can be run using the “em5batch”
macro. This macro generates and runs the miner project based on
the previously defined metatables.
How to automate an Enterprise Miner
workflow
4. Run SAS code to generate a miner project and train model
Extend automation to train many models
After setting up a prototype to automate the training of a single
model, the next step is to automate it to train many models.
Training
table
New process for one model
Automated EG
process
EM process is parameterised and
integrated in EG using SAS code
Automated EG
process
New process for many models
Automated EG
process
SAS Code (EG) creates EM project for model 1
SAS Code (EG) creates EM project for model 2
……..
SAS Code (EG) creates EM project for model n
Automated EG
process
Extend automation to train many models
All process steps are managed and executed out of Enterprise Guide
Process overview
Data preparation
Targets / candidate table
Predictor table
Training automation of several models
1. Define process steering metatables
2. Create model training table
3. Generate and run miner project
Loop over all
models
4. Export model/score code
Model reporting and quality check
Scoring on current customer data (deployment)
Focus of the
following slides
Extend automation to train many models
1. Define process steering metatables
The process is controlled with the following two metatables.
Name of the macro which creates the
training table
Green: Input for
Enterprise Miner
(%em5batch)
Calculated and updated during training process
Defines which attributes are eligible to build models depending on
the customer segment.
Extend automation to train many models
2. Create training tables
There are two tables which contain all needed attributes. One table
contains all predictors and one contains all targets and candidates.
Predictors
Join +
variable selection Sampling
Targets/
candidates
Macro to create
training table
Training
table
Create EM data
source definition
(%emds macro)
Update metadata
Extend automation to train many models
2. Create training tables
Code extract
&Project_ID_1 = cableonpchurn
&Project_ID_2 = dslforseg2
&Project_ID_3 = dslforseg3 etc.
Macro to create training table
Extend automation to train many models
3. Generate and run miner projects
An Enterprise Miner project is generated for each model:
•
Create miner-specific metadata tables (nodes, nodeprops, connect, etc.)
•
Generate miner project with the %em5batch macro
Extend automation to train many models
3. Generate and run miner projects
All miner projects are run using the %em5batch macro.
Saves and
renames model
Extend automation to train many models
Result
EM projects
Model quality (lift)
Models
Conclusion: An automated model training process
is an alternative to manual training.
Benefits
• No need to check if retraining is necessary
- just do it
• Control over entire process is improved
dramatically
• Frees up significant amounts of resources for other
modelling & analysis tasks
• Enables you to train and deploy large numbers of
models in one process run
• Allows you to parameterise node settings to find the
best solution
Questions?
Contact: [email protected]