Coarse categorical data under ontologic and epistemic uncertainty

Transcription

Coarse categorical data under ontologic and epistemic uncertainty
Coarse categorical data
under ontologic and epistemic uncertainty
Julia Plaß
Ludwigs-Maximilians-University (LMU)
08th of September 2014
J.Plaß (LMU)
Coarse categorical data
08th of September 2014
1 / 17
Outline
1
Introduction to the problem
2
Data under ontologic uncertainty
Motivation
Basic idea of analysis
3
Data under epistemic uncertainty
Motivation
Analysis of two different situations
4
Summary
J.Plaß (LMU)
Coarse categorical data
08th of September 2014
2 / 17
Introduction to the problem
Epistemic vs. ontic/ontologic uncertainty
Coarse
variable
of interest
J.Plaß (LMU)
• coarse nature induced
by indecision
• truth is represented by
coarse variable
Coarse categorical data
(I. Couso, D. Dubois, 2014)
Precise
observation
of sth. coarse
08th of September 2014
3 / 17
Introduction to the problem
Epistemic vs. ontic/ontologic uncertainty
Coarse
variable
of interest
Precise
variable of
interest
J.Plaß (LMU)
• coarse nature induced
by indecision
• truth is represented by
coarse variable
Coarsening
mechanism
Coarse categorical data
(I. Couso, D. Dubois, 2014)
Precise
observation
of sth. coarse
Coarse
observation
of sth. precise
08th of September 2014
3 / 17
Data under ontologic uncertainty
Motivation
Why should data under ontologic uncertainty be collected?
Example:
Which party will you give your vote?
A
J.Plaß (LMU)
B
Coarse categorical data
C
Don‘t know
08th of September 2014
4 / 17
Data under ontologic uncertainty
Motivation
Why should data under ontologic uncertainty be collected?
Example:
Which party will you give your vote?
A
J.Plaß (LMU)
B
Coarse categorical data
C
Don‘t know
08th of September 2014
4 / 17
Data under ontologic uncertainty
Motivation
Why should data under ontologic uncertainty be collected?
Example:
Which party will you give your vote?
A
B
C
Don‘t know
Maybe B
Maybe A
J.Plaß (LMU)
Not C!
Coarse categorical data
08th of September 2014
4 / 17
Data under ontologic uncertainty
Motivation
Why should data under ontologic uncertainty be collected?
Example:
Which party will you give your vote?
A
B
C
Don‘t know
Maybe B
Maybe A
J.Plaß (LMU)
Not C!
Coarse categorical data
08th of September 2014
4 / 17
Data under ontologic uncertainty
Motivation
Why should data under ontologic uncertainty be collected?
Example:
Which party will you give your vote?
A
B
C
Don‘t know
Maybe B
Maybe A
J.Plaß (LMU)
Not C!
Coarse categorical data
A or B
08th of September 2014
4 / 17
Data under ontologic uncertainty
Motivation
Why should data under ontologic uncertainty be collected?
Example:
Which party will you give your vote?
A
B
C
Don‘t know
Maybe B
Maybe A
Not C!
A or B
Dealing with ontologic
uncertainty:
Allow multiple anwers
J.Plaß (LMU)
Coarse categorical data
08th of September 2014
4 / 17
Data under ontologic uncertainty
Basic idea of analysis
Basic idea (illustrated by GLES 2013)
certainty
very certain
certain
not that certain
not certain at all
vote
SPD
CD
GREEN
CD
J.Plaß (LMU)
assesCD
-1
4
3
-3
assesSPD
5
3
4
2
assesGREEN
2
3
4
2
Coarse categorical data
assesLEFT
1
1
-1
2
...
...
...
...
...
Exemplary
extraction of
the dataset
08th of September 2014
5 / 17
Data under ontologic uncertainty
Basic idea of analysis
Basic idea (illustrated by GLES 2013)
certainty
very certain
certain
not that certain
not certain at all
vote
SPD
CD
GREEN
CD
Analysis:
assesCD
-1
4
3
-3
assesSPD
5
3
4
2
assesGREEN
2
3
4
2
assesLEFT
1
1
-1
2
...
...
...
...
...
Exemplary
extraction of
the dataset
Prediction
Classical analysis - Vote of respondents who are certain:
who will vote for ”CD” with certainty
Prediction for ”CD”: #respondents
= 0.46
#respondents who are certain
J.Plaß (LMU)
Coarse categorical data
08th of September 2014
5 / 17
Data under ontologic uncertainty
Basic idea of analysis
Basic idea (illustrated by GLES 2013)
certainty
very certain
certain
not that certain
not certain at all
vote
SPD
CD
GREEN
CD
Analysis:
assesCD
-1
4
3
-3
assesSPD
5
3
4
2
assesGREEN
2
3
4
2
assesLEFT
1
1
-1
2
...
...
...
...
...
Exemplary
extraction of
the dataset
Prediction
Classical analysis - Vote of respondents who are certain:
who will vote for ”CD” with certainty
Prediction for ”CD”: #respondents
= 0.46
#respondents who are certain
Dealing with ontologic uncertainty:
CD
519
SPD-CD
34
GREEN-SPD-CD
13
SPD
287
GREEN-SPD
34
SPD-CD-OTHER
13
ˆ
Bel(CD)
=
P̂l(CD)
=
J.Plaß (LMU)
GREEN
105
CD-OTHER
24
LEFT-GREEN-SPD
13
519
= 0.39
1321
519 + 34 + 24 + 13 + 13
1321
LEFT
101
LEFT-SPD
14
SPD-OTHER
12
⇒
= 0.45
Coarse categorical data
OTHER
62
GREEN-LEFT
13
rare comb.
77
Prediction
[0.39, 0.45]
for
08th of September 2014
”CD”:
5 / 17
Data under ontologic uncertainty
Basic idea of analysis
Basic idea (illustrated by GLES 2013)
certainty
very certain
certain
not that certain
not certain at all
vote
SPD
CD
GREEN
CD
Analysis:
assesCD
-1
4
3
-3
assesSPD
5
3
4
2
assesGREEN
2
3
4
2
assesLEFT
1
1
-1
2
...
...
...
...
...
Exemplary
extraction of
the dataset
Regression
For reasons of explanation interpretation of selected β estimators only
Reference category: ”SPD”
Classical analysis - Vote of respondents who are certain:
CD
Intercept
0.2914
J.Plaß (LMU)
sexFEM
0.2456
InfoNEWSPAPER
-0.0037
InfoRADIO
-0.1487
Coarse categorical data
InfoWEB
-0.6774
08th of September 2014
6 / 17
Data under ontologic uncertainty
Basic idea of analysis
Model under ontologic uncertainty
Data under ontologic uncertainty:
Yi : categorical random variable of nominal scale of measurement with Yi ⊆
{a, b, ...}
| {z }
precise categories
m = |P(Ω) \ ∅)|: number of categories of Yi
Model under ontologic uncertainty:
⇒ classical multinomial logit model with different number of categories:
The probability of occurence for category r = 1, 2, 3, ..., m − 1 can be calculated by
P(Yi = r |xi ) =
exp(xiT βr )
Pm−1
1 + s=1 exp(xiT βs )
and for category m by
P(Yi = m|xi ) =
J.Plaß (LMU)
1+
1
Pm−1
T
s=1 exp(xi βs )
Coarse categorical data
08th of September 2014
7 / 17
Data under ontologic uncertainty
Basic idea of analysis
Basic idea (illustrated by GLES 2013)
certainty
very certain
certain
not that certain
not certain at all
vote
SPD
CD
GREEN
CD
Analysis:
assesCD
-1
4
3
-3
assesSPD
5
3
4
2
assesGREEN
2
3
4
2
assesLEFT
1
1
-1
2
...
...
...
...
...
Exemplary
extraction of
the dataset
Regression
For reasons of explanation interpretation of selected β estimators only
Reference category: ”SPD”
Classical analysis - Vote of respondents who are certain:
CD
Intercept
0.2914
sexFEM
0.2456
InfoNEWSPAPER
-0.0037
InfoRADIO
-0.1487
InfoWEB
-0.6774
Dealing with ontologic uncertainty:
CD
SPD-CD
GREEN-SPD
J.Plaß (LMU)
Intercept
0.4706
-2.2007
-2.4432
sexFEM
0.3135
0.4034
0.9393
InfoNEWSPAPER
-0.0784
-1.2556
-1.3053
Coarse categorical data
InfoRADIO
0.1856
0.9476
0.2300
InfoWEB
-0.5025
0.1886
0.1844
08th of September 2014
8 / 17
Data under ontologic uncertainty
Basic idea of analysis
Basic idea (illustrated by GLES 2013)
certainty
very certain
certain
not that certain
not certain at all
vote
SPD
CD
GREEN
CD
Analysis:
assesCD
-1
4
3
-3
assesSPD
5
3
4
2
assesGREEN
2
3
4
2
assesLEFT
1
1
-1
2
...
...
...
...
...
Exemplary
extraction of
the dataset
Regression
For reasons of explanation interpretation of selected β estimators only
Reference category: ”SPD”
Classical analysis - Vote of respondents who are certain:
CD
Intercept
0.2914
sexFEM
0.2456
InfoNEWSPAPER
-0.0037
InfoRADIO
-0.1487
InfoWEB
-0.6774
Dealing with ontologic uncertainty:
CD
SPD-CD
GREEN-SPD
J.Plaß (LMU)
Intercept
0.4706
-2.2007
-2.4432
sexFEM
0.3135
0.4034
0.9393
InfoNEWSPAPER
-0.0784
-1.2556
-1.3053
Coarse categorical data
InfoRADIO
0.1856
0.9476
0.2300
InfoWEB
-0.5025
0.1886
0.1844
08th of September 2014
8 / 17
Data under epistemic uncertainty
Motivation
Epistemic vs. ontologic uncertainty
Coarse
variable
of interest
Precise
variable of
interest
J.Plaß (LMU)
• coarse nature induced
by indecision
• truth is represented by
coarse variable
Coarsening
mechanism
Coarse categorical data
Precise
observation
of sth. coarse
Coarse
observation
of sth. precise
08th of September 2014
9 / 17
Data under epistemic uncertainty
Motivation
When do data under epistemic uncertainty occur?
Reasons for coarse categorical data:
Guarantee of anonymization, prevention of refusals
Example:
“Which kind of party did you elect?”
rather left center rather right
J.Plaß (LMU)
Coarse categorical data
08th of September 2014
10 / 17
Data under epistemic uncertainty
Motivation
When do data under epistemic uncertainty occur?
Reasons for coarse categorical data:
Guarantee of anonymization, prevention of refusals
Example:
“Which kind of party did you elect?”
rather left center rather right
Different levels of reporting accuracy
(lack of knowledge, vague question formulation)
Examples:
“Which car do you drive?”
J.Plaß (LMU)
Coarse categorical data
08th of September 2014
10 / 17
Data under epistemic uncertainty
Analysis of two different situations
Addressed data situations
Y
A
B
A
Coarsening
..
..
B
J.Plaß (LMU)
Coarse categorical data
AB
B
A
..
..
B
08th of September 2014
11 / 17
Data under epistemic uncertainty
Analysis of two different situations
Addressed data situations
Y
A
B
A
Coarsening
..
..
B
AB
B
A
..
..
B
Two different situations will be regarded:
Case 1 - No covariates:
• IID-assumption
• Constant coarsening mechanisms
q1 and q2
J.Plaß (LMU)
Coarse categorical data
08th of September 2014
11 / 17
Data under epistemic uncertainty
Analysis of two different situations
Addressed data situations
X
Y
1
1
0
A
B
A
..
..
1
AB
B
A
Coarsening
..
..
B
..
..
B
Two different situations will be regarded:
Case 2 - Binary covariates,
no intercept
Case 1 - No covariates:
• IID-assumption
• ddff
• Constant coarsening mechanisms
q1 and q2
J.Plaß (LMU)
• Constant coarsening mechanisms
q1 and q2
Coarse categorical data
08th of September 2014
11 / 17
Data under epistemic uncertainty
Analysis of two different situations
Case 1 - No covariates + IID-assumption
log-Likelihood under the iid assumption :
l(πA , q1 , q2 )
=
ln
Y
Y
P(Y = A|Y = A) πiA
P(Y = B|Y = B)(1 − πiA )
|
{z
}
|
{z
}
i:Y =A
i:Y =B
i
i
(1−q1 )
(1−q2 )
Y
P(Y = AB|Y = A) πiA + P(Y = AB|Y = B)(1 − πiA )
|
{z
}
|
{z
}
i:Y =AB
i
iid
=
q1
q2
nA · [ln(1 − q1 ) + ln(πA )] + nB · [ln(1 − q2 ) + ln(1 − πA )]
nAB · [q1 πA + q2 (1 − πA ))]
FOC:
I.)
II.)
III.)
∂
∂πA
∂
∂q1
∂
∂q2
=
=
=
nAB
q1 πA + q2 (1 − πA )
nAB
q1 πA + q2 (1 − πA )
nAB
q1 πA + q2 (1 − πA )
J.Plaß (LMU)
(q1 − q2 ) +
πA −
nA
πA
nA
nB
1 − πA
!
=0
!
1 − q1
(1 − πA ) −
−
=0
nB
1 − q2
!
=0
Coarse categorical data
08th of September 2014
12 / 17
Data under epistemic uncertainty
Analysis of two different situations
Case 1 - No covariates + IID-assumption
log-Likelihood under the iid assumption :
l(πA , q1 , q2 )
=
ln
Y
Y
P(Y = A|Y = A) πiA
P(Y = B|Y = B)(1 − πiA )
|
{z
}
|
{z
}
i:Y =A
i:Y =B
i
i
(1−q1 )
(1−q2 )
Y
P(Y = AB|Y = A) πiA + P(Y = AB|Y = B)(1 − πiA )
|
{z
}
|
{z
}
i:Y =AB
i
iid
=
q1
q2
nA · [ln(1 − q1 ) + ln(πA )] + nB · [ln(1 − q2 ) + ln(1 − πA )]
nAB · [q1 πA + q2 (1 − πA ))]
FOC:
I.)
II.)
III.)
∂
∂πA
∂
∂q1
∂
∂q2
=
=
=
nAB
q1 πA + q2 (1 − πA )
nAB
q1 πA + q2 (1 − πA )
nAB
q1 πA + q2 (1 − πA )
J.Plaß (LMU)
(q1 − q2 ) +
πA −
nA
πA
nA
nB
1 − q2
=0
⇒
=0
nB
!
1 − πA
!
1 − q1
(1 − πA ) −
−
!
=0
Coarse categorical data
Neccessary and sufficient condition for
estimators (π̂A , q̂1 , q̂2 )
nAB
n
= q̂1 · π̂A + q̂2 · (1 − π̂A )
08th of September 2014
12 / 17
Data under epistemic uncertainty
Analysis of two different situations
Case 2 - Binary covariates, no intercept
Involved assumptions:
No intercept β0
⇒ πiA|Xi =1 =
J.Plaß (LMU)
exp(βA )
1+exp(βA )
and πiA|Xi =0 =
1
2
Coarse categorical data
08th of September 2014
13 / 17
Data under epistemic uncertainty
Analysis of two different situations
Case 2 - Binary covariates, no intercept
Involved assumptions:
No intercept β0
⇒ πiA|Xi =1 =
exp(βA )
1+exp(βA )
and πiA|Xi =0 =
1
2
Constant coarsening mechanisms q1 and q2 :
Example: ”Do you regularly steal candy (./) out of your mother’s candy box?”
Asked: girls (g ) and boys (b)
X
g
g
g
b
b
b
b
Y
./
./
./
./
./
./
./
Y
./ or ./
./
./
./ or ./
./
./
./
J.Plaß (LMU)
P(Y = ./ or ./|Y =./, X = g ) = P(Y = ./ or ./|Y =./, X = b)
P(Y = ./ or ./|Y = ./, X = g ) = P(Y = ./ or ./|Y = ./, X = b)
⇓
the coarsening mechanisms do not
depend on subgroup x
Coarse categorical data
08th of September 2014
13 / 17
Data under epistemic uncertainty
Analysis of two different situations
Case 2 - Binary covariates, no intercept
log-Likelihood involving binary covariate X :
l(π0A , π1A , q1 , q2 )
=
n1A · [ln(1 − q1 ) + ln(π1A )] + n0A · [ln(1 − q1 ) + ln(π0A )]
+
n1B · [ln(1 − q2 ) + ln(1 − π1A )] + n0B · [ln(1 − q2 ) + ln(1 − π0A )]
+
n1AB · [q1 π1A + q2 (1 − π1A ))] + n0AB · [q1 π0A + q2 (1 − π0A ))]
Relative bias of estimators (pi1A=0.65, q1=0.2 and q2=0.4)
data
estimation
with π01 = 12
relative bias
0.05
Estimation by ...
... using information
of unknown Y
... likelihood
maximization
0.00
−0.05
−0.10
^
pi1A
^
q
1
^
q
2
^
pi1A
^
q
1
^
q
2
estimators
Evaluation
by means of
J.Plaß (LMU)
Coarse categorical data
08th of September 2014
14 / 17
Data under epistemic uncertainty
Analysis of two different situations
Case 2 - Binary covariates, no intercept
Questions / Discussion suggestions:
Is this result reasonable? (Remember: The coarsening mechanism does
not depend on the values of X )
J.Plaß (LMU)
Coarse categorical data
08th of September 2014
15 / 17
Data under epistemic uncertainty
Analysis of two different situations
Case 2 - Binary covariates, no intercept
Questions / Discussion suggestions:
Is this result reasonable? (Remember: The coarsening mechanism does
not depend on the values of X )
Is it possible to derive those estimators by means of equations like in
the iid case, but now for each subgroup of X ?
n1AB
n1
n0AB
n0
n1A
n1
n0A
n0
J.Plaß (LMU)
=
q̂1 · π̂1A + q̂2 · (1 − π̂1A )
=
q̂1 · π̂0A + q̂2 · (1 − π̂0A )
=
(1 − q̂1 ) · π̂1A
=
(1 − q̂1 ) · π̂0A
Coarse categorical data
08th of September 2014
15 / 17
Data under epistemic uncertainty
Analysis of two different situations
Case 2 - Binary covariates, no intercept
Questions / Discussion suggestions:
Is this result reasonable? (Remember: The coarsening mechanism does
not depend on the values of X )
Is it possible to derive those estimators by means of equations like in
the iid case, but now for each subgroup of X ?
n1AB
n1
n0AB
n0
n1A
n1
n0A
n0
J.Plaß (LMU)
=
q̂1 · π̂1A + q̂2 · (1 − π̂1A )
=
q̂1 · π̂0A + q̂2 · (1 − π̂0A )
=
(1 − q̂1 ) · π̂1A
=
(1 − q̂1 ) · π̂0A
Resulting estimators
⇒
π̂1A
=
q̂1
=
(1)
and q̂2
Coarse categorical data
n1A · n0
2 · n1 · n0A
n0A
1−2·
n0
(2)
and q̂2
08th of September 2014
15 / 17
Data under epistemic uncertainty
Analysis of two different situations
Case 2 - Binary covariates, no intercept
Relative bias of estimators (pi1A=0.65, q1=0.2 and q2=0.4)
data
estimation
relative bias
0.1
0.0
Estimation by ...
... using information
of the unknown Y
... derived analytic
results
−0.1
(1)
=
2 · n0AB − n0 + 2 · n0A
n0
(2)
=
n (n −2n )
n1AB
− 1A 2n0 n 0A
n1
1 0A
2n1 n0A −n1A n0
2n1 n0A
q̂2
q̂2
−0.2
^
pi1A
^
q
1
^
q
2
^
pi1A
^
q
1
^ (1)
q
2
^ (2)
q
2
estimators
J.Plaß (LMU)
Coarse categorical data
08th of September 2014
16 / 17
Summary
Summary
Important to distinguish between epistemic and ontologic uncertainty
One can deal with ontologic uncertainty by redefining the sample
space
In case of ...
... iid variables under epistemic uncertainty, a set of estimators results
characterized by a special condition
... being a binary covariate available, precise real valued point
estimators seem to result
J.Plaß (LMU)
Coarse categorical data
08th of September 2014
17 / 17
Summary
References
Couso, I. and Dubois, D.
Statistical reasoning with set-valued information: Ontic vs. epistemic
views, IJAR, 2014.
Heitjan, D. and Rubin, D.
Ignorability and Coarse Data, Annals of Statistics, 1991.
Matheron, G.
Random Sets and Integral Geometry, Wiley New York, 1975.
Plaß, J.
Coarse categorical data under epistemic and ontologic uncertainty:
Comparison and extension of some approaches, master’s thesis, 2013
J.Plaß (LMU)
Coarse categorical data
08th of September 2014
17 / 17

Similar documents