Christian W. Günther and Wil MP van der Aalst

Transcription

Christian W. Günther and Wil MP van der Aalst
Fuzzy Mining -Adaptive
Process Simplification
Christian W. Günther
and Wil M.P. van der Aalst
Outline
• Introduction
• Less-Structured Process-the Infamous Spaghetti Affair
• An Adaptive Approach for Process simplification
• Log-based Process Metrics
• Adaptive Graph Simplification
• Implementation and Application
• Related Work
• Discussion and Future work
Introduction
•What is Fuzzy mining?
- Adaptive Process Simplification based on Multi-perspective
metrics.
•Why do we need this approach?
-
The traditional approach shows all detail, and we need to simplify
this one to see the point
Less-structured process:
the infamous spaghetti affair
•The fundamental idea of process
•More making less structured
mining is both simple and persuasive. mining model to make flexible
more mining model gets mass
•Real life is unpredictable and very
flexible unlike computer system
•It makes process model look like
spaghetti
•However, many useful and technical
mining algorithms support that proper
way to build system.
Mass directions on the process model seem to be infeasible task.
With this two assumptions, traditional process mining cannot
reach deriving more reality.
Assumption 1: All logs are reliable and trustworthy
Assumption 2: There exists an exact process which is reflected in
the logs
To make process mining a useful tool in practical, lessstructured settings, these assumptions need to be discarded.
An Adaptive Approach
for process simplification
Process mining need to be able to provide high-level view on
the process so that it is suitable for less-structured
environment.
Like the road map, we can derive the simple adaptive
approach for process by emphasizing four facts such as
aggregation, abstraction, emphasis, and customization.
To develop the simple and visual process model, there are two
fundamental metrics which can support such decisions.
Significance measures the relative importance of behavior
Correlation measures how closely related related two events
following one another are.
For process simplification, we can demonstrate our approach.
There is a set of metrics to measure significance or correlation
based on different perspectives.
The user can customize the produced results to a large degree.
Log-Based Process Metrics
To approach making configureable and extensible framework,
there are three types of metrics.
Metrics Framework
- Because the log contains a large number of undesired events,
actual causal dependencies may go unrecorded.
- To solve it, we allow measuring relationships of arbitrary
length ex) A B and B C
A C
1.Unary Significance Metrics
:can demonstrate the relative importance of an event class.
- Frequency significance
:weight on the log value by frequency.
- Routing significance
:the points are interesting in defining the structure of a process
2.Binary Significance Metrics
:can describe the relative importance of a precedence relation
between two even classes.
- Distance significance
: the more the significance of a relation differs from its source
and target nodes’ significances, the less its distance significance
value
3.Binary Correlation Metrics
:measures the distance of events in a precedence relation
Binary correlation is the main driver of the decision between
aggregation or abstraction of less-significant behavior.
Proximity correlation evaluates event classes.
Originator correlation is determined from the names of the
persons.
Endpoint correlation compares the activity names of subsequent
events.
Adaptive Graph Simplification
Most process mining techniques focus on mapping behavior
found in the log to typical process design patterns.
This paper focuses on high-level mapping of behavior found in
the log.
There are three transformation methods to the process model
such as conflict resolution, edge filtering and aggregation and
abstraction
1.Conflict resolution in Binary Relations.
:whenever two nodes in the initial process model are connected
by edges in both directions.
Length-2-loop, Exception, Concurrency
Relative significance
2.Edge Filtering
:remove remaining edges. isolate the most important behavior
Edge filtering evaluates each edge by its utility util(A, B), a
weighted sum of its significance and correlation.
Util(A, B) = ur X sig(A, B) + (1 - ur) X cor(A, B)
The edge cutoff parameter determines the aggressiveness of the
algorithm.
3.Node Aggregation and Abstraction
:the most effective tool for simplification is removing nodes
How ?
preserve highly correlated groups of less-significant nodes as
aggregated clusters.
Victim : Every node whose unary significance is below this
threshold becomes a victim.
The first step is to build initial clusters of less-significant
behavior.
-For each victim, find the most highly correlated neighbor
-If this neighbor is a cluster node, add the victim to this cluster
-Otherwise, create a new cluster node, and add the victim as its
first element.
How to merge clusters and decrease their number
-For each cluster, check whether all predecessors or all
successors are also clusters.
- If all predecessor nodes are clusters as well, merge with the
most highly correlated one and move on to the next cluster
- If all successors are clusters as well, merge with the most highly
correlated one
- Otherwise, if both the cluster’s pre- and postset contain regular
nodes, the cluster is left untouched.
Abstraction : removes isolated and singular cluster.