Coarse categorical data under ontologic and epistemic uncertainty
Transcription
Coarse categorical data under ontologic and epistemic uncertainty
Coarse categorical data under ontologic and epistemic uncertainty Julia Plaß Ludwigs-Maximilians-University (LMU) 08th of September 2014 J.Plaß (LMU) Coarse categorical data 08th of September 2014 1 / 17 Outline 1 Introduction to the problem 2 Data under ontologic uncertainty Motivation Basic idea of analysis 3 Data under epistemic uncertainty Motivation Analysis of two different situations 4 Summary J.Plaß (LMU) Coarse categorical data 08th of September 2014 2 / 17 Introduction to the problem Epistemic vs. ontic/ontologic uncertainty Coarse variable of interest J.Plaß (LMU) • coarse nature induced by indecision • truth is represented by coarse variable Coarse categorical data (I. Couso, D. Dubois, 2014) Precise observation of sth. coarse 08th of September 2014 3 / 17 Introduction to the problem Epistemic vs. ontic/ontologic uncertainty Coarse variable of interest Precise variable of interest J.Plaß (LMU) • coarse nature induced by indecision • truth is represented by coarse variable Coarsening mechanism Coarse categorical data (I. Couso, D. Dubois, 2014) Precise observation of sth. coarse Coarse observation of sth. precise 08th of September 2014 3 / 17 Data under ontologic uncertainty Motivation Why should data under ontologic uncertainty be collected? Example: Which party will you give your vote? A J.Plaß (LMU) B Coarse categorical data C Don‘t know 08th of September 2014 4 / 17 Data under ontologic uncertainty Motivation Why should data under ontologic uncertainty be collected? Example: Which party will you give your vote? A J.Plaß (LMU) B Coarse categorical data C Don‘t know 08th of September 2014 4 / 17 Data under ontologic uncertainty Motivation Why should data under ontologic uncertainty be collected? Example: Which party will you give your vote? A B C Don‘t know Maybe B Maybe A J.Plaß (LMU) Not C! Coarse categorical data 08th of September 2014 4 / 17 Data under ontologic uncertainty Motivation Why should data under ontologic uncertainty be collected? Example: Which party will you give your vote? A B C Don‘t know Maybe B Maybe A J.Plaß (LMU) Not C! Coarse categorical data 08th of September 2014 4 / 17 Data under ontologic uncertainty Motivation Why should data under ontologic uncertainty be collected? Example: Which party will you give your vote? A B C Don‘t know Maybe B Maybe A J.Plaß (LMU) Not C! Coarse categorical data A or B 08th of September 2014 4 / 17 Data under ontologic uncertainty Motivation Why should data under ontologic uncertainty be collected? Example: Which party will you give your vote? A B C Don‘t know Maybe B Maybe A Not C! A or B Dealing with ontologic uncertainty: Allow multiple anwers J.Plaß (LMU) Coarse categorical data 08th of September 2014 4 / 17 Data under ontologic uncertainty Basic idea of analysis Basic idea (illustrated by GLES 2013) certainty very certain certain not that certain not certain at all vote SPD CD GREEN CD J.Plaß (LMU) assesCD -1 4 3 -3 assesSPD 5 3 4 2 assesGREEN 2 3 4 2 Coarse categorical data assesLEFT 1 1 -1 2 ... ... ... ... ... Exemplary extraction of the dataset 08th of September 2014 5 / 17 Data under ontologic uncertainty Basic idea of analysis Basic idea (illustrated by GLES 2013) certainty very certain certain not that certain not certain at all vote SPD CD GREEN CD Analysis: assesCD -1 4 3 -3 assesSPD 5 3 4 2 assesGREEN 2 3 4 2 assesLEFT 1 1 -1 2 ... ... ... ... ... Exemplary extraction of the dataset Prediction Classical analysis - Vote of respondents who are certain: who will vote for ”CD” with certainty Prediction for ”CD”: #respondents = 0.46 #respondents who are certain J.Plaß (LMU) Coarse categorical data 08th of September 2014 5 / 17 Data under ontologic uncertainty Basic idea of analysis Basic idea (illustrated by GLES 2013) certainty very certain certain not that certain not certain at all vote SPD CD GREEN CD Analysis: assesCD -1 4 3 -3 assesSPD 5 3 4 2 assesGREEN 2 3 4 2 assesLEFT 1 1 -1 2 ... ... ... ... ... Exemplary extraction of the dataset Prediction Classical analysis - Vote of respondents who are certain: who will vote for ”CD” with certainty Prediction for ”CD”: #respondents = 0.46 #respondents who are certain Dealing with ontologic uncertainty: CD 519 SPD-CD 34 GREEN-SPD-CD 13 SPD 287 GREEN-SPD 34 SPD-CD-OTHER 13 ˆ Bel(CD) = P̂l(CD) = J.Plaß (LMU) GREEN 105 CD-OTHER 24 LEFT-GREEN-SPD 13 519 = 0.39 1321 519 + 34 + 24 + 13 + 13 1321 LEFT 101 LEFT-SPD 14 SPD-OTHER 12 ⇒ = 0.45 Coarse categorical data OTHER 62 GREEN-LEFT 13 rare comb. 77 Prediction [0.39, 0.45] for 08th of September 2014 ”CD”: 5 / 17 Data under ontologic uncertainty Basic idea of analysis Basic idea (illustrated by GLES 2013) certainty very certain certain not that certain not certain at all vote SPD CD GREEN CD Analysis: assesCD -1 4 3 -3 assesSPD 5 3 4 2 assesGREEN 2 3 4 2 assesLEFT 1 1 -1 2 ... ... ... ... ... Exemplary extraction of the dataset Regression For reasons of explanation interpretation of selected β estimators only Reference category: ”SPD” Classical analysis - Vote of respondents who are certain: CD Intercept 0.2914 J.Plaß (LMU) sexFEM 0.2456 InfoNEWSPAPER -0.0037 InfoRADIO -0.1487 Coarse categorical data InfoWEB -0.6774 08th of September 2014 6 / 17 Data under ontologic uncertainty Basic idea of analysis Model under ontologic uncertainty Data under ontologic uncertainty: Yi : categorical random variable of nominal scale of measurement with Yi ⊆ {a, b, ...} | {z } precise categories m = |P(Ω) \ ∅)|: number of categories of Yi Model under ontologic uncertainty: ⇒ classical multinomial logit model with different number of categories: The probability of occurence for category r = 1, 2, 3, ..., m − 1 can be calculated by P(Yi = r |xi ) = exp(xiT βr ) Pm−1 1 + s=1 exp(xiT βs ) and for category m by P(Yi = m|xi ) = J.Plaß (LMU) 1+ 1 Pm−1 T s=1 exp(xi βs ) Coarse categorical data 08th of September 2014 7 / 17 Data under ontologic uncertainty Basic idea of analysis Basic idea (illustrated by GLES 2013) certainty very certain certain not that certain not certain at all vote SPD CD GREEN CD Analysis: assesCD -1 4 3 -3 assesSPD 5 3 4 2 assesGREEN 2 3 4 2 assesLEFT 1 1 -1 2 ... ... ... ... ... Exemplary extraction of the dataset Regression For reasons of explanation interpretation of selected β estimators only Reference category: ”SPD” Classical analysis - Vote of respondents who are certain: CD Intercept 0.2914 sexFEM 0.2456 InfoNEWSPAPER -0.0037 InfoRADIO -0.1487 InfoWEB -0.6774 Dealing with ontologic uncertainty: CD SPD-CD GREEN-SPD J.Plaß (LMU) Intercept 0.4706 -2.2007 -2.4432 sexFEM 0.3135 0.4034 0.9393 InfoNEWSPAPER -0.0784 -1.2556 -1.3053 Coarse categorical data InfoRADIO 0.1856 0.9476 0.2300 InfoWEB -0.5025 0.1886 0.1844 08th of September 2014 8 / 17 Data under ontologic uncertainty Basic idea of analysis Basic idea (illustrated by GLES 2013) certainty very certain certain not that certain not certain at all vote SPD CD GREEN CD Analysis: assesCD -1 4 3 -3 assesSPD 5 3 4 2 assesGREEN 2 3 4 2 assesLEFT 1 1 -1 2 ... ... ... ... ... Exemplary extraction of the dataset Regression For reasons of explanation interpretation of selected β estimators only Reference category: ”SPD” Classical analysis - Vote of respondents who are certain: CD Intercept 0.2914 sexFEM 0.2456 InfoNEWSPAPER -0.0037 InfoRADIO -0.1487 InfoWEB -0.6774 Dealing with ontologic uncertainty: CD SPD-CD GREEN-SPD J.Plaß (LMU) Intercept 0.4706 -2.2007 -2.4432 sexFEM 0.3135 0.4034 0.9393 InfoNEWSPAPER -0.0784 -1.2556 -1.3053 Coarse categorical data InfoRADIO 0.1856 0.9476 0.2300 InfoWEB -0.5025 0.1886 0.1844 08th of September 2014 8 / 17 Data under epistemic uncertainty Motivation Epistemic vs. ontologic uncertainty Coarse variable of interest Precise variable of interest J.Plaß (LMU) • coarse nature induced by indecision • truth is represented by coarse variable Coarsening mechanism Coarse categorical data Precise observation of sth. coarse Coarse observation of sth. precise 08th of September 2014 9 / 17 Data under epistemic uncertainty Motivation When do data under epistemic uncertainty occur? Reasons for coarse categorical data: Guarantee of anonymization, prevention of refusals Example: “Which kind of party did you elect?” rather left center rather right J.Plaß (LMU) Coarse categorical data 08th of September 2014 10 / 17 Data under epistemic uncertainty Motivation When do data under epistemic uncertainty occur? Reasons for coarse categorical data: Guarantee of anonymization, prevention of refusals Example: “Which kind of party did you elect?” rather left center rather right Different levels of reporting accuracy (lack of knowledge, vague question formulation) Examples: “Which car do you drive?” J.Plaß (LMU) Coarse categorical data 08th of September 2014 10 / 17 Data under epistemic uncertainty Analysis of two different situations Addressed data situations Y A B A Coarsening .. .. B J.Plaß (LMU) Coarse categorical data AB B A .. .. B 08th of September 2014 11 / 17 Data under epistemic uncertainty Analysis of two different situations Addressed data situations Y A B A Coarsening .. .. B AB B A .. .. B Two different situations will be regarded: Case 1 - No covariates: • IID-assumption • Constant coarsening mechanisms q1 and q2 J.Plaß (LMU) Coarse categorical data 08th of September 2014 11 / 17 Data under epistemic uncertainty Analysis of two different situations Addressed data situations X Y 1 1 0 A B A .. .. 1 AB B A Coarsening .. .. B .. .. B Two different situations will be regarded: Case 2 - Binary covariates, no intercept Case 1 - No covariates: • IID-assumption • ddff • Constant coarsening mechanisms q1 and q2 J.Plaß (LMU) • Constant coarsening mechanisms q1 and q2 Coarse categorical data 08th of September 2014 11 / 17 Data under epistemic uncertainty Analysis of two different situations Case 1 - No covariates + IID-assumption log-Likelihood under the iid assumption : l(πA , q1 , q2 ) = ln Y Y P(Y = A|Y = A) πiA P(Y = B|Y = B)(1 − πiA ) | {z } | {z } i:Y =A i:Y =B i i (1−q1 ) (1−q2 ) Y P(Y = AB|Y = A) πiA + P(Y = AB|Y = B)(1 − πiA ) | {z } | {z } i:Y =AB i iid = q1 q2 nA · [ln(1 − q1 ) + ln(πA )] + nB · [ln(1 − q2 ) + ln(1 − πA )] nAB · [q1 πA + q2 (1 − πA ))] FOC: I.) II.) III.) ∂ ∂πA ∂ ∂q1 ∂ ∂q2 = = = nAB q1 πA + q2 (1 − πA ) nAB q1 πA + q2 (1 − πA ) nAB q1 πA + q2 (1 − πA ) J.Plaß (LMU) (q1 − q2 ) + πA − nA πA nA nB 1 − πA ! =0 ! 1 − q1 (1 − πA ) − − =0 nB 1 − q2 ! =0 Coarse categorical data 08th of September 2014 12 / 17 Data under epistemic uncertainty Analysis of two different situations Case 1 - No covariates + IID-assumption log-Likelihood under the iid assumption : l(πA , q1 , q2 ) = ln Y Y P(Y = A|Y = A) πiA P(Y = B|Y = B)(1 − πiA ) | {z } | {z } i:Y =A i:Y =B i i (1−q1 ) (1−q2 ) Y P(Y = AB|Y = A) πiA + P(Y = AB|Y = B)(1 − πiA ) | {z } | {z } i:Y =AB i iid = q1 q2 nA · [ln(1 − q1 ) + ln(πA )] + nB · [ln(1 − q2 ) + ln(1 − πA )] nAB · [q1 πA + q2 (1 − πA ))] FOC: I.) II.) III.) ∂ ∂πA ∂ ∂q1 ∂ ∂q2 = = = nAB q1 πA + q2 (1 − πA ) nAB q1 πA + q2 (1 − πA ) nAB q1 πA + q2 (1 − πA ) J.Plaß (LMU) (q1 − q2 ) + πA − nA πA nA nB 1 − q2 =0 ⇒ =0 nB ! 1 − πA ! 1 − q1 (1 − πA ) − − ! =0 Coarse categorical data Neccessary and sufficient condition for estimators (π̂A , q̂1 , q̂2 ) nAB n = q̂1 · π̂A + q̂2 · (1 − π̂A ) 08th of September 2014 12 / 17 Data under epistemic uncertainty Analysis of two different situations Case 2 - Binary covariates, no intercept Involved assumptions: No intercept β0 ⇒ πiA|Xi =1 = J.Plaß (LMU) exp(βA ) 1+exp(βA ) and πiA|Xi =0 = 1 2 Coarse categorical data 08th of September 2014 13 / 17 Data under epistemic uncertainty Analysis of two different situations Case 2 - Binary covariates, no intercept Involved assumptions: No intercept β0 ⇒ πiA|Xi =1 = exp(βA ) 1+exp(βA ) and πiA|Xi =0 = 1 2 Constant coarsening mechanisms q1 and q2 : Example: ”Do you regularly steal candy (./) out of your mother’s candy box?” Asked: girls (g ) and boys (b) X g g g b b b b Y ./ ./ ./ ./ ./ ./ ./ Y ./ or ./ ./ ./ ./ or ./ ./ ./ ./ J.Plaß (LMU) P(Y = ./ or ./|Y =./, X = g ) = P(Y = ./ or ./|Y =./, X = b) P(Y = ./ or ./|Y = ./, X = g ) = P(Y = ./ or ./|Y = ./, X = b) ⇓ the coarsening mechanisms do not depend on subgroup x Coarse categorical data 08th of September 2014 13 / 17 Data under epistemic uncertainty Analysis of two different situations Case 2 - Binary covariates, no intercept log-Likelihood involving binary covariate X : l(π0A , π1A , q1 , q2 ) = n1A · [ln(1 − q1 ) + ln(π1A )] + n0A · [ln(1 − q1 ) + ln(π0A )] + n1B · [ln(1 − q2 ) + ln(1 − π1A )] + n0B · [ln(1 − q2 ) + ln(1 − π0A )] + n1AB · [q1 π1A + q2 (1 − π1A ))] + n0AB · [q1 π0A + q2 (1 − π0A ))] Relative bias of estimators (pi1A=0.65, q1=0.2 and q2=0.4) data estimation with π01 = 12 relative bias 0.05 Estimation by ... ... using information of unknown Y ... likelihood maximization 0.00 −0.05 −0.10 ^ pi1A ^ q 1 ^ q 2 ^ pi1A ^ q 1 ^ q 2 estimators Evaluation by means of J.Plaß (LMU) Coarse categorical data 08th of September 2014 14 / 17 Data under epistemic uncertainty Analysis of two different situations Case 2 - Binary covariates, no intercept Questions / Discussion suggestions: Is this result reasonable? (Remember: The coarsening mechanism does not depend on the values of X ) J.Plaß (LMU) Coarse categorical data 08th of September 2014 15 / 17 Data under epistemic uncertainty Analysis of two different situations Case 2 - Binary covariates, no intercept Questions / Discussion suggestions: Is this result reasonable? (Remember: The coarsening mechanism does not depend on the values of X ) Is it possible to derive those estimators by means of equations like in the iid case, but now for each subgroup of X ? n1AB n1 n0AB n0 n1A n1 n0A n0 J.Plaß (LMU) = q̂1 · π̂1A + q̂2 · (1 − π̂1A ) = q̂1 · π̂0A + q̂2 · (1 − π̂0A ) = (1 − q̂1 ) · π̂1A = (1 − q̂1 ) · π̂0A Coarse categorical data 08th of September 2014 15 / 17 Data under epistemic uncertainty Analysis of two different situations Case 2 - Binary covariates, no intercept Questions / Discussion suggestions: Is this result reasonable? (Remember: The coarsening mechanism does not depend on the values of X ) Is it possible to derive those estimators by means of equations like in the iid case, but now for each subgroup of X ? n1AB n1 n0AB n0 n1A n1 n0A n0 J.Plaß (LMU) = q̂1 · π̂1A + q̂2 · (1 − π̂1A ) = q̂1 · π̂0A + q̂2 · (1 − π̂0A ) = (1 − q̂1 ) · π̂1A = (1 − q̂1 ) · π̂0A Resulting estimators ⇒ π̂1A = q̂1 = (1) and q̂2 Coarse categorical data n1A · n0 2 · n1 · n0A n0A 1−2· n0 (2) and q̂2 08th of September 2014 15 / 17 Data under epistemic uncertainty Analysis of two different situations Case 2 - Binary covariates, no intercept Relative bias of estimators (pi1A=0.65, q1=0.2 and q2=0.4) data estimation relative bias 0.1 0.0 Estimation by ... ... using information of the unknown Y ... derived analytic results −0.1 (1) = 2 · n0AB − n0 + 2 · n0A n0 (2) = n (n −2n ) n1AB − 1A 2n0 n 0A n1 1 0A 2n1 n0A −n1A n0 2n1 n0A q̂2 q̂2 −0.2 ^ pi1A ^ q 1 ^ q 2 ^ pi1A ^ q 1 ^ (1) q 2 ^ (2) q 2 estimators J.Plaß (LMU) Coarse categorical data 08th of September 2014 16 / 17 Summary Summary Important to distinguish between epistemic and ontologic uncertainty One can deal with ontologic uncertainty by redefining the sample space In case of ... ... iid variables under epistemic uncertainty, a set of estimators results characterized by a special condition ... being a binary covariate available, precise real valued point estimators seem to result J.Plaß (LMU) Coarse categorical data 08th of September 2014 17 / 17 Summary References Couso, I. and Dubois, D. Statistical reasoning with set-valued information: Ontic vs. epistemic views, IJAR, 2014. Heitjan, D. and Rubin, D. Ignorability and Coarse Data, Annals of Statistics, 1991. Matheron, G. Random Sets and Integral Geometry, Wiley New York, 1975. Plaß, J. Coarse categorical data under epistemic and ontologic uncertainty: Comparison and extension of some approaches, master’s thesis, 2013 J.Plaß (LMU) Coarse categorical data 08th of September 2014 17 / 17