Presentation - Copernicus.org

Transcription

Effect of Object Identification Algorithms on Feature
Based Verification Scores
Michael Weniger
Petra Friederichs
Meteorological Institute, University Bonn
EGU Vienna
18 April 2015
Michael Weniger (MIUB)
OIA and Feature Based Verification
1 / 30
Motivation
Motivation
Feature based methods for the evaluation of spatial fields employ object
identification algorithms (OIA).
Verification scores are defined based on the results of these OIA.
⇒ How does the choice of OIA and its parameters influence the resulting scores?
⇒ What are the implications for spatial fields with non-negligible observational
uncertainties?
2 / 30
Introduction
Goal: Evaluation of Probabilistic Spatial Fields
Figure: Example for two 2D spatial fields
Definitions
spatial field: a set of data with spatial dimension greater than one.
probabilistic spatial field: the value of each point is not a real number, but
a random variable. These random variables are usually correlated in
space and time, e.g.
I
I
non-negligible observational uncertainties,
the output of an ensemble model.
3 / 30
Introduction
First Step: Evaluation of Deterministic Spatial Fields
Why do traditional methods not work?
I
Double penalty of misplaced events: a forecast with a misplaced event is
scored worse by point-to-point measures than a forecast with either a
complete miss or a false alarm since it is penalized as both at once.
I
Domination of small scale errors and noise ⇒ interesting information is
lost
⇒ We need different techniques.
4 / 30
Introduction
During the last decade many new spatial verification methods have been
developed. They can be classified into four categories [1, 2]:
I
fuzzy verification / neighborhood methods
I
scale separation techniques
I
field deformation
I
feature-based methods
5 / 30
Introduction
During the last decade many new spatial verification methods have been
developed. They can be classified into four categories [1, 2]:
I
fuzzy verification / neighborhood methods
I
scale separation techniques
I
field deformation
I
feature-based methods ← we will focus on these methods
5 / 30
Introduction
Feature-Based Method: Overview
Spatial Field
Object
Identification
External object identification algorithms
(OIA) are employed to define objects in the
original spatial data. There are many different OIA, which have one or more parameters
(e.g. threshold level or smoothing), which
have to be specified by the user.
Objects
6 / 30
Introduction
Feature-Based Method: Overview
Spatial Field
Object
Identification
External object identification algorithms
(OIA) are employed to define objects in the
original spatial data. There are many different OIA, which have one or more parameters
(e.g. threshold level or smoothing), which
have to be specified by the user.
Objects
Verification
Method
Verification scores are calculated based
on the binary object masks given by
the OIA and the original spatial fields.
Scores
6 / 30
Introduction
Feature-Based Method: Central Question
How important is the choice of OIA and parameters for the
resulting verification scores?
Why is this important?
I
The OIA and its parameters usually have to be chosen by the user.
I
It is not uncommon to find multiple valid choices.
I
If a score is very sensitive to these choices one might get very different
results that have equal justification.
⇒ The explanatory power of the verification method is very weak.
7 / 30
Introduction
Feature-Based Method: Central Question
How important is the choice of OIA and parameters for the
resulting verification scores?
Why is it particularly important for probabilistic fields?
I
The effect of uncertainties is closely connected to the sensitivity towards
certain OIA parameters.
I
A good example is the threshold value, which is used in most algorithms to
identify objects as cohesive areas where the fields exceeds the value of this
parameter.
I
Changing the value of the threshold parameter in the OIA is therefore
closely related to observational uncertainties, which change the value of the
field itself.
⇒ The sensitivity of a score to varying OIA and parameter values is an
indication for its sensitivity towards uncertainties.
7 / 30
Introduction
SAL: Introduction [3]
SAL is a feature based method, that was developed to
I
measure the quality of a forecast using three distinct scores, which have
direct physical interpretations to allow for conclusions on potential sources of
model errors;
I
it does not require to match individual objects in observations and forecasts,
but compares the statistical characteristics of those fields;
I
it yields scores close to a subjective visual assessment of the accuracy of
the forecast for precipitation data.
SAL stands for its three score components:
I
(S)tructure: describes the shape of objects
I
(A)mplitude: describes a global intensity error
(L)ocation: consists of two parts
I
L1: describes a global displacement error
L2: describes the spread of objects
For this study we are interested in the object depend scores S ∈ [−2, 2]
and L2 ∈ [0, 1].
8 / 30
Introduction
SAL: OIA Setting
We study the effect of different parameters for three different OIA
I
threshfac: defines objects as coherent areas of threshold exceedances. It
depends on the value of the threshold level, which is defined by the
parameter fac.
I
threshsizer: relies on the results of threshfac and removes small objects.
The parameter NContig defines the minimal object size.
I
convthresh: applies a smoothing operator to the field before using
threshfac. The radius of the smoothing disc is given by the parameter
smoothpar.
9 / 30
Introduction
SAL: Data
Map of the area shared by observational data
and model output. For the calculation of SAL the
model output is interpolated onto the coarser observational grid.
SAL scores are calculated for various parameter settings using 400
cases of spectral radiance (6.2IR) fields over Germany:
I
model output: COSMO-DE forward operator
I
observations: SEVIRI satellite
10 / 30
Introduction
SAL: Overview
Spatial Field
Object
Identification
Algorithm
-
Parameter
convthresh
smoothpar
threshsizer
Ncontig
threshfac
fac
smoothness
minimal
object-size
threshold
Objects
SAL
(S)tructure
Verification
Method
S ∈ [−2, 2]
(A)mplitude
(L)ocation
L2 ∈ [0, 1]
Scores
11 / 30
Parameter Sensitivity of SAL
Statistical Procedure
Parameter Sensitivity: Statistical Procedure
1
Compare changes in parameter values to the response in SAL scores.
I
I
2
Study differences in the distributions of the resulting sets of SAL scores.
I
I
3
We evaluate mean and maximal response over the set of 400 spatial fields
for each algorithm and both L2 and S scores.
This yields results regarding parameter sensitivity on an absolute scale, i.e.
“How strongly does the choice of OIA and parameters influence my score?”
In order to assess various characteristics of the score distributions, we
employ five different hypothesis tests to detect significant differences due to
changes in parameter values.
This yields results regarding parameter sensitivity on a relative scale, i.e.
“How important is the choice of OIA and parameters for the interpretation of
my SAL results?”
Take a closer look at the underlying processes and case studies.
I
I
We approach this with a brief theoretical showcase example and then look at
some of the worst case scenarios.
This gives us the information we need, when thinking about new spatial
methods in a probabilistic environment, i.e. “What can go wrong with the
present approach, and how can we avoid it?”
12 / 30
1
I
I
2
I
I
3
my SAL results?”
I
I
12 / 30
1
I
I
2
I
I
3
my SAL results?”
I
I
12 / 30
Absolute Parameter Sensitivity
Absolute Par. Sensitivity: Min. Object-Size (L2)
Minimal object size is measured in number of grid points. Recall that L2 ∈ [0, 1].
We observe only very small mean responses and still controllable worst case
scenarios.
It is important to note the linear decay in response-strength for small changes in
parameter values.
13 / 30
Absolute Par. Sensitivity: Min. Object-Size (S)
Minimal object size is measured in number of grid points. Recall that S ∈ [−2, 2].
The results for the S-score confirm the above points.
Due to the linear decay of response-strength for small changes in parameter
values and small absolute response strength, we denote the min. object-size as
a stable parameter.
14 / 30
Absolute Par. Sensitivity: Threshold Ratio (L2)
The threshold ratio takes values in (0, 1]. Recall that L2 ∈ [0, 1].
We observe very strong mean responses and completely uncontrollable worst
case scenarios.
A response equal to one means, that we effectively cannot distinguish between
the best score (L2 = 0) and the worst score (L2 = 1).
15 / 30
Absolute Par. Sensitivity: Threshold Ratio (S)
Threshold ratio takes values in (0, 1]. Recall that S ∈ [−2, 2].
This case is completely analogous to L2.
Since there is no decay of response-strength for small changes in parameter
values and the absolute response is very strong, the threshold ratio is an
unstable parameter.
16 / 30
Absolute Par. Sensitivity: Smoothing Radius (L2)
Smoothing radii are measured in number of grid points. Recall that L2 ∈ [0, 1].
We observe weak mean but strong maximal responses.
This indicates an underlying process with high impact, that needs very specific
conditions to occur. Therefore, it occurs less often for small changes in
parameter values leading to a linear decay in mean response.
17 / 30
Absolute Par. Sensitivity: Smoothing Radius (S)
Smoothing radii are measured in number of grid points. Recall that S ∈ [−2, 2].
This case is again completely analogous to L2.
Due to weak mean but strong maximal responses, the smoothing radius is a
metastable parameter. In this case it is particularly important to understand the
causes for the infrequently occurrence of strong score responses.
18 / 30
Absolute Parameter Sensitivity: Summary
The results for L2 and S are consistent:
I
the minimal object-size is stable
I
the smoothing radius is metastable
I
the threshold ratio is unstable
The behavior of the threshold parameter is particularly important for the
sensitivity towards observational uncertainties.
How important are these results for the (statistical) interpretation of SAL?
19 / 30
Relative Parameter Sensitivity
Relative Parameter Sensitivity: Procedure
We compare the distributions of S and L2 scores for each possible
parameter pairing
To detect differences in distributions, we apply different hypothesis tests:
I
Kolmogorov-Smirnov
I
Student-t
I
Wilcox-Mann-Whitney
I
Median
I
Quantile
The null hypothesis H0 is always defined as:
“Both parameter values yield identical distributions.”
When H0 is dismissed with a significance level of 5%, we know that both
distributions are significantly different in a statistical characteristic, which
is given by the specific hypothesis test.
20 / 30
Relative Parameter Sensitivity: Results
Due to the number of possible combinations of parameter values, we have to
evaluate a large number of statistical test results. These can be summarized
as follows.
Contrary to the absolute parameter sensitivity, L2 and S exhibit different
behaviors.
I
For L2 distributional differences are closely connected to changes in the
mean value.
I
For S these differences occur most likely in the spread of the distributions.
The reason for this can be found in the definition of the scores:
I
L2 ∈ [0, 1] is a positively defined one-sided score.
I
S ∈ [−2, 2] is defined as a two-sided score.
For the S score responses to changing parameters can cancel each
other out in the mean value, but lead to a larger spread.
21 / 30
Relative Parameter Sensitivity: Results
Due to the number of possible combinations of parameter values, we have to
evaluate a large number of statistical test results. These can be summarized
as follows.
Varying smoothing radii and minimal object-sizes (stable and
metastable)
I
Only large changes ion parameter values lead to significant distributional
differences.
I
For L2 the differences are only visible in the mean value.
I
For S the majority of significant differences are only visible in the spread,
i.e. with the quantile test.
Varying threshold levels (unstable)
I
Most parameter parings exhibit distributional differences.
I
In the majority of cases all five hypothesis tests detected these differences.
21 / 30
Underlying Processes
In the following section we aim to understand the processes that lead to
unstable or very sensitive parameters.
1
We consider two theoretical showcase scenarios, which will give us an
idea what we should be looking for in the data.
2
We examine if the theoretical considerations are consistent with the
statistics of the data.
3
We look at some case studies to observe the processes in concrete sets
of data.
22 / 30
Theoretical Considerations
Setting: there is a large and flat object
with an intensity value just above the
threshold level.
Slightly raising the threshold level causes
the object to vanish.
This process can drastically change the
S (structure) and L2 (scattering) scores.
The changes in S and L2 can occur
independently from each other.
⇒ The correlation between |∆S| and ∆L2
is expected to be small.
23 / 30
Theoretical Considerations
Setting: there is a large object with a
small interconnecting bridge.
Slightly raising the threshold level or
increasing the smoothing radius causes
the object to decompose.
This process can drastically change the
S (structure) and L2 (scattering) scores.
The changes in S and L2 are coupled:
the decomposition yields smaller
structures with larger spread.
⇒ The correlation between |∆S| and ∆L2
is expected to be very high.
23 / 30
Correlation Between |∆S| and ∆L2
Varying minimal object-sizes (threshsizer )
I
None of the processes can occur by omitting only small objects.
I
Omitting small objects reduces the spread and increases the structure score
simultaneously.
⇒ The correlation of S and L2 score changes are expected to be high.
Varying smoothing radii (convthresh)
I
Only process (b) can occur, if an increase in smoothing causes an
interconnecting bridge to vanish.
⇒ The correlation of S and L2 score changes are expected to be very high.
Varying threshold levels (threshfac)
I
Both processes can occur.
I
We expect cases with low correlation (process (a)) and cases high
correlation (for process (b)).
⇒ The correlation should behave more irregular and should overall be lower.
24 / 30
Correlation Between |∆S| and ∆L2
The theoretical considerations are consistent with the statistics of our data.
24 / 30
Case Studies
Both processes should lead to significant changes in S and/or L2 scores.
We have taken a look at spatial fields, which exhibit the largest score
differences for different parameter settings, i.e. the worst case scenarios.
For small changes in parameter values the vast majority of score
differences are founded in one of the described processes.
⇒ We have identified the processes, which lead to unstable parameter
behaviors.
An example for each process is given on the following slides.
25 / 30
Case Study (a): Large and Flat Object
∆S = −2.5,
∆L2 = 0.1
26 / 30
Case Study (b): Small Interconnecting Bridge
∆S = 0.6,
∆L2 = 0.4
27 / 30
Summary
Summary: Parameter Sensitivity of SAL (I)
Vanishing of large flat objects and the decomposition of large objects is
present in all studied sets of data:
I
total cloud cover
I
spectral radiance (8 different channels)
I
precipitation (2km and 6km resolution COSMO reanalysis)
These processes can cause high parameter sensitivity of OIA
The maximal score response was similar across all sets of data
⇒ The frequency of these “bad” cases is the deciding factor for the stability
of a parameter.
28 / 30
Summary
Summary: Parameter Sensitivity of SAL (II)
Varying threshold levels are very problematic:
I
The threshold level is a parameter, that has to be chosen in each of the three
studied OIA.
I
Varying threshold levels are closely related to observational uncertainties
⇒ All studied OIA are potentially very sensitive to uncertainties
OIA, which rely on threshold levels, are not viable in a probabilistic
environment with non-negligible observational uncertainties.
The key to find a solution is to circumvent the non-continuous operator of
thresholding. Promising approaches are
I
Probabilistic level sets [4]
I
Image warping with splines [5]
I
Wavelet decomposition [6]
29 / 30
Literature
Literature
[1] Eric Gilleland, David Ahijevych, Barbara G. Brown, Barbara Casati, and Elizabeth E. Ebert.
Intercomparison of Spatial Forecast Verification Methods.
Wea. Forecasting, 24(5):1416–1430, October 2009.
[2] E Ebert, L Wilson, A Weigel, M Mittermaier, P Nurmi, P Gill, M Göber, S Joslyn, B Brown, T Fowler, et al.
Progress and challenges in forecast verification.
Meteorological Applications, 20(2):130–139, 2013.
[3] Heini Wernli, Marcus Paulat, Martin Hagen, and Christoph Frei.
Sal - a novel quality measure for the verification of quantitative precipitation forecasts.
Monthly Weather Review, 136(11), 2008.
[4] Kai Pöthkow, Britta Weber, and Hans-Christian Hege.
Probabilistic marching cubes.
In Computer Graphics Forum, volume 30, pages 931–940. Wiley Online Library, 2011.
[5] Eric Gilleland, Johan Lindström, and Finn Lindgren.
Analyzing the image warp forecast verification method on precipitation fields from the icp.
Weather and Forecasting, 25(4):1249–1262, 2010.
[6] B Casati, G Ross, and DB Stephenson.
A new intensity-scale approach for the verification of spatial precipitation forecasts.
Meteorological Applications, 11(02):141–154, 2004.
30 / 30

Presentation - Copernicus.org

Transcription

Similar documents

OIA3262 Air Flyer-r3f.indd

Context C-cycle model Model sensitivity Variational data

OIA3265 Ocean Flyer-r5f.indd

TRAINING AND DEVELOPMENT 1

Anywhere, Anywhen Testing™

Conference Brochure - Ontario Institute of Agrologists

2015â2016 Independent (V3) Verification Form

cpv_map-flyer_081512.1 - Sal Paone Builder Blog

Math 3D03 Assignment #5 Due: Tuesday, March 31st, 2015 in class

Final print brochure