STATA, pdf 6 pp
Transcription
STATA, pdf 6 pp
PubHlth640 - Spring 2015 Intermediate Biostatistics Page 1 of 6 Unit 5 – Logistic Regression WEEKS 10 - Practice Problems SOLUTIONS – Stata version 13 Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition. Boca Raton: Chapman and Hall, 2004. Exercises #1-#3 utilize a data set provided by Afifi, Clark and May (2004). The data are a study of depression and was a longitudinal study. The purpose of the study was to obtain estimates of the prevalence and incidence of depression and to explore its risk factors. The study variables were of several types – demographics, life events, stressors, physical health, health services utilization, medication use, lifestyle, and social support. Tip - To access this data set in STATA, type the following into the command window: use “http://people.um ass.edu/biep640w /datasets/depress.dta” Consider the following three variables. Variable drink sex cases Codings 1 = yes 2 = no 1 = male 2 = female 0 = Normal 1 = Case of Depression Label in STATA Regular Drinker Depressed is cesd > 16 1. Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition. Boca Raton: Chapman and Hall, 2004, Problem 12.9, page 330. Using Stata, load the depression data set and execute the commands needed to fill in the following table: Sex Regular Drinker Yes No Total …. Sol_logistic_STATA.docx Female 139 44 183 Male 95 16 111 Total 234 60 294 PubHlth640 - Spring 2015 Intermediate Biostatistics Page 2 of 6 . tabulate drink sex regular | sex drinker? | male female | Total -----------+----------------------+---------yes | 95 139 | 234 no | 16 44 | 60 -----------+----------------------+---------Total | 111 183 | 294 What are the odds that a woman is a regular drinker? 139 / 44 = 3.2 What are the odds that a man is a regular drinker? 95 / 16 = 5.9 What is the odds ratio? That is, compared to a man, what is the relative odds (odds ratio) that a woman is a regular drinker? OR = [odds for woman] / [odds for man] = 3.2/5.9 = 0.54 2. Repeat the tabulation that you produced for problem #1 two times, one for persons who are depressed and the other for persons who are not depressed. . * Use command SORT to sort the data by case status (depressed or not depressed) . sort case . * Use the command BY in front of the command TABULATE . by case: tabulate drink sex ----------------------------------------------------------------------------------------------------------------> cases = normal regular | sex drinker? | male female | Total -----------+----------------------+---------yes | 87 106 | 193 no | 14 37 | 51 -----------+----------------------+---------Total | 101 143 | 244 ----------------------------------------------------------------------------------------------------------------> cases = depressed regular | sex drinker? | male female | Total -----------+----------------------+---------yes | 8 33 | 41 no | 2 7 | 9 -----------+----------------------+---------Total | 10 40 | 50 Among Persons Who are Depressed Sex Regular Drinker Yes 33 No 7 Total 40 Female Male 8 2 10 OR (Relative odds, compared to a man, that a woman is a regular drinker): OR = [(33)(2)] / [(7)(8) ] = 1.18 …. Sol_logistic_STATA.docx Total 41 9 50 PubHlth640 - Spring 2015 Intermediate Biostatistics Page 3 of 6 Among Persons Who are NOT Depressed Sex Regular Drinker Yes 106 No 37 Total 143 Female Male Total 87 14 101 193 51 244 OR (Relative odds, compared to a man, that a woman is a regular drinker): OR = [(106)(14)] / [(37)(87)] = 0.46 3. Fit a logistic regression model using these variables. Use DRINK as the dependent variable and CASES and SEX as independent variables. Also include as an independent variable the appropriate interaction term. .* Some variable creation commands .* Create 0/1 indicators of drinker and female gender . generate drink01=. (294 missing values generated) . replace drink01=1 if drink==1 (234 real changes made) . replace drink01=0 if drink==2 (60 real changes made) . label define drinkf 0 "0=nondrinker" 1 "1=drinker" . label values drink01 drinkf . generate female=. (294 missing values generated) . replace female=0 if sex==1 (111 real changes made) . replace female=1 if sex==2 (183 real changes made) . label define sexf 0 "0=male" 1 "1=female" . label values female sexf .* Create a new variable called FEM_CASE that is the interaction of FEMALE and CASES . generate fem_case=female*cases . * Use the command LOGISTIC if you want output to include ODDS RATIOS . * Use the command LOGIT if you want the output to include BETAs and SEs . * LOGISTIC OUTCOME PREDICTOR PREDICTOR etc.. . logit drink01 cases female fem_case Logistic regression Log likelihood = -145.95772 Number of obs LR chi2(3) Prob > chi2 Pseudo R2 = = = = 294 5.62 0.1318 0.0189 -----------------------------------------------------------------------------drink01 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------cases | -.4405564 .8413815 -0.52 0.601 -2.089634 1.208521 female | -.7743296 .3455196 -2.24 0.025 -1.451536 -.0971237 fem_case | .9386327 .9578851 0.98 0.327 -.9387877 2.816053 _cons | 1.826851 .2879632 6.34 0.000 1.262453 2.391248 ------------------------------------------------------------------------------ …. Sol_logistic_STATA.docx PubHlth640 - Spring 2015 Intermediate Biostatistics Page 4 of 6 Fitted Model: logit [ pr (drinker=yes) ] = 1.8269 - 0.4406 [CASES] - 0.7743[FEMALE] + 0.9386 [FEM_CASE] where CASES =1 if depressed; 0 otherwise FEMALE = 1 if female; 0 otherwise FEM_CASE = (CASES) * (FEMALE) ˆ ˆ ) = 0.96 and p-value = .33 Is the interaction term in your model significant? No. βˆ 3 = 0.9386 SE(β 0 How does your answer to problem #3 compare to your answer to problem #2? Comment. The answers match. Among Depressed: OR = 1.18 Among NON-depressed: OR = 0.46 logit [ pr (drinker=yes) ] = 1.8269 - 0.4406 [CASES] - 0.7743[FEMALE] + 0.9386 [FEM_CASE] CASES FEMALE FEM_CASE Among Depressed “1” = Female “0” = Male 1 1 1 0 1 0 logit [ female ] = 1.8269 – 0.4406 – 0.7743 + 0.9386 = 1.5506 logit [male] = 1.8269 – 0.4406 = 1.3863 logit [ female ] - logit [ male ] = 1.5506 - 1.3863 = + 0.1643 OR [women compared to men ] = exp { logit [ p1 ] - logit [ p0 ] } = exp { + 0.1643 } = 1.1786 …. Sol_logistic_STATA.docx PubHlth640 - Spring 2015 CASES FEMALE FEM_CASE Intermediate Biostatistics Page 5 of 6 Among NON Depressed “1” = Female “0” = Male 0 0 1 0 0 0 logit [ female ] = 1.8269 – 0.7743 = 1.0526 logit [male] = 1.8269 logit [ female ] - logit [ male ] = 1.0526 - 1.8269 = -0.7743 OR [women compared to men ] = exp { logit [ p1 ] - logit [ p0 ] } = exp { -0.7743 } = 0.4610 4. Source: Kleinbaum, Kupper, Miller, and Nizam. Applied Regression Analysis and Other Multivariable Methods, Third Edition. Pacific Grove: Duxbury Press, 1998. p 683 (problem 2). A five year follow-up study on 600 disease free subjects was carried out to assess the effect of 0/1 exposure E on the development (or not) of a certain disease. The variables AGE (continuous) and obesity status (OBS), the latter a 0/1 variable were determined at the start of the follow-up and were to be considered as control variables in analyzing the data. (A) State the logit form of a logistic regression model that assesses the effect of the 0/1 exposure variable E controlling for the confounding effects of AGE and OBS and the interaction effects of AGE with E and OBS with E. Solution: logit[π] = β0 + β1*E + β 2 *AGE + β3 *OBS + β4 *AGEE + β5 *OBSE I used the following notation: π = Probability [ disease ] AGEE = AGE * E. This is a created variable that is the interaction of AGE with E OBSE = OBS * E Similarly, this is the interaction of OBS with E. logit[π] = β0 + β1*E + β 2 *AGE + β3 *OBS + β4 *AGEE + β5 *OBSE …. Sol_logistic_STATA.docx PubHlth640 - Spring 2015 Intermediate Biostatistics Page 6 of 6 (B) Given the model you have for part “A”, give a formula for the odds ratio for the exposure-disease relationship that controls for the confounding and interactive effects of AGE and OBS. Solution: The solution here follows the ideas on pp 9-11 in Lecture Notes 5, Logistic Regression. Value of Predictor for Person who is Exposed Not Exposed Predictor E AGE OBS AGEE OBSE 1 AGE1 OBS1 AGE1 OBS1 0 AGE0 OBS0 0 0 Then OR = exp { logit[π for exposed person] - logit[π for NON exposed person] } = exp { [ β0 + β1 + β 2 *AGE1 + β3 *OBS1 + β 4 *AGE1 + β5 *OBS1 ] - [ β0 + β 2 *AGE 0 + β3 *OBS0 ] } = exp { β1 + β 2 *(AGE1 -AGE o ) + β3 *(OBS1 - OBS0 ) + β 4 *AGE1 + β5 *OBS1 } (C) Now use the formula that you have for part “B” to write an expression for the estimated odds ratio for the exposure-disease relationship that considers both confounding and interaction when AGE=40 and OBS=1. Solution: ORˆ = exp { β1 + (40)β 4 + β5 } Predictor E AGE OBS AGEE OBSE Value of Predictor for Person who is Exposed Not Exposed 1 0 40 40 1 1 40 0 1 0 OR = exp { β1 + β 2 *(40-40) + β3 *(1 - 1) + β 4 *40 + β5 *1 } = exp { β1 + β 4 *40 + β5 *1 } …. Sol_logistic_STATA.docx