Homework 2: Bayesian Network Written Part
Transcription
Homework 2: Bayesian Network Written Part
CS6957: Probabilistic Modeling Sumedha Singla u0877456 Homework 2: Bayesian Network Written Part In this part you will be analyzing risk factors for certain health problems (heart disease, stroke, heart attack, diabetes). The data is from the 2011 Behavioral Risk Factor Surveillance System (BRFSS) survey, which is run by the Centers for Disease Control (CDC). The distilled data is in the spreadsheet RiskFactorData.csv. Ques1: Create the following Bayesian network to analyze the survey results. You will want to use the provided function createCPT.fromData. a. What is the size (in terms of the number of probabilities needed) of this network? Alternatively, what is the total number of probabilities needed to store the full joint distribution? Solution: Size of the network: 502 Total number of probabilities needed for full joint distribution: 32768 Ques2: For each of the four health outcomes (diabetes, stroke, heart attack, angina), answer the following by querying your network (using your infer function): a. What is the probability of the outcome if I have bad habits (smoke and don't exercise)? How about if I have good habits (don't smoke and do exercise)? b. What is the probability of the outcome if I have poor health (high blood pressure, high cholesterol, and overweight)? What if I have good health (low blood pressure, low cholesterol, and normal weight)? Solution (a): Diabetes Diabetes 1 (yes) 2 (Only during pregnancy) 3 (no) 4 (pre- diabetic) P(Diabetes| Exercise, Smoke) Bad Habit Good Habit Exercise = 2 (No) Smoke = Exercise = 1 (Yes) Smoke 1 (Yes) = No (2) 0.150515956 0.127119324 0.008964854 0.008864952 0.822423377 0.847693106 0.018095813 0.016322618 Stroke Stroke 1 (yes) 2 (no) P(Stroke| Exercise, Smoke) Bad Habit Good Habit Exercise = 2 (No) Smoke = 1 Exercise = 1 (Yes) Smoke = (Yes) No (2) 0.04926405 0.03611044 0.95073595 0.96388956 Heart Attack Attack P(Attack| Exercise, Smoke) Bad Habit Good Habit Exercise = 2 (No) Smoke = 1 Exercise = 1 (Yes) Smoke = (Yes) No (2) 0.07433041 0.05279765 0.92566959 0.94720235 Angina P(Angina| Exercise, Smoke) Bad Habit Good Habit Exercise = 2 (No) Smoke = 1 Exercise = 1 (Yes) Smoke = (Yes) No (2) 0.08044778 0.05475537 0.91955222 0.94524463 1 (yes) 2 (no) Angina 1 (yes) 2 (no) Solution (b): Diabetes Diabetes 1 (yes) 2 (Only during pregnancy) 3 (no) 4 (pre- diabetic) P(Diabetes| Bp, Bmi, Cholesterol) Bad Health Good Health Bp = 1 (Yes) Cholesterol = 1 Bp = 3 (No) Cholesterol = 2 (Yes) Bmi = 3 (over weight) (No) Bmi = 2 (Normal) 0.115422719 0.057709954 0.007661825 0.009543386 0.860872761 0.016042695 0.922193878 0.010552782 Stroke Stroke 1 (yes) 2 (no) P(Stroke| Bp, Bmi, Cholesterol) Bad Health Good Health Bp = 1 (Yes) Cholesterol = 1 Bp = 3 (No) Cholesterol = 2 (Yes) Bmi = 3 (over weight) (No) Bmi = 2 (Normal) 0.08268577 0.01446014 0.91731423 0.98553986 Heart Attack Attack P(Attack | Bp, Bmi, Cholesterol) Bad Health Good Health Bp = 1 (Yes) Cholesterol = Bp = 3 (No) Cholesterol = 2 1 (Yes) Bmi = 3 (over (No) Bmi = 2 (Normal) weight) 0.1407844 0.01616133 0.8592156 0.98383867 Angina P(Angina| Bp, Bmi, Cholesterol) Bad Health Good Health Bp = 1 (Yes) Cholesterol = Bp = 3 (No) Cholesterol = 2 1 (Yes) Bmi = 3 (over (No) Bmi = 2 (Normal) weight) 0.1616076 0.01332601 0.8383924 0.98667399 1 (yes) 2 (no) Angina 1 (yes) 2 (no) Ques3: Evaluate the effect a person's income has on their probability of having one of the four health outcomes (diabetes, stroke, heart attack, angina). For each of these four outcomes, plot their probability given income status (your horizontal axis should be i = 1, 2, …… 8; and your vertical axis should be P(y = 1 | income = i), where y is the outcome). What can you conclude? Solution: Conclusion: From the above graph we can conclude that with increase in income the probability of having any of the four health outcomes (diabetes, stroke, heart attack, angina) decreases. Although the absolute decreases in probability is very small, but relatively it’s very significant. For diabetes with increase in income from 1 to 8 the probability of having diabetes decreases from 14.64462% to 12.33497%, the absolute decrease is 2.3%. But relatively there is 16.5224% decrease in the chances of having diabetes by increase of income from 1 to 8 level, which is pretty significant. That is a person with income level 8 has 16.52% less chances of having diabetes as compared to person with income level 1. Similarly we can see the same trend for all the other health outcomes. For Anigna with increase in income from 1 to 8 the probability of having anigna decreases from 7.936% to 5.09%, the absolute decrease is 2.84%. But relatively there is 35.86% decrease in the chances of having anigna by increase of income from 1 to 8 level, which is pretty significant. For heart attack with increase in income from 1 to 8 the probability of having heart attack decreases from 7.36% to 4.94%, the absolute decrease is 2.42%. But relatively there is 32.88% decrease in the chances of having heart attack by increase of income from 1 to 8 level. For stroke with increase in income from 1 to 8 the probability of having stroke decreases from 4.96% to 3.36%, the absolute decrease is 1.6%. But relatively there is 32.25% decrease in the chances of having stroke by increase of income from 1 to 8 level. Ques4: Create a second Bayesian network as above, but add edges from smoking to each of the four outcomes and edges from exercise to each of the four outcomes. Now redo the queries in Question 2. What was the effect, and do you think the assumptions of the first graph were valid or not? Solution: Diabetes Diabetes 1 (yes) 2 (Only during pregnancy) 3 (no) 4 (pre- diabetic) P (Diabetes| Exercise, Smoke) Bad Habit Good Habit Exercise = 2 (No) Smoke = Exercise = 1 (Yes) Smoke 1 (Yes) = No (2) 0.210944859 0.098552162 0.006915095 0.009884084 0.760692694 0.877575578 0.021447352 0.013988176 From the above graphs we can conclude that, on adding edges from smoking to diabetes and edges from exercise to diabetes, the resultant probabilities differ. Probability of having diabetes with bad habits increases, and probability of having diabetes with good habits decreases. Hence our assumption is not valid. Stroke Stroke 1 (yes) 2 (no) P(Stroke| Exercise, Smoke) Bad Habit Good Habit Exercise = 2 (No) Smoke = 1 Exercise = 1 (Yes) Smoke = (Yes) No (2) 0.07803498 0.02431088 0.92196502 0.97568912 From the above graphs we can conclude that, on adding edges from smoke and exercise to stroke, the probability of having stroke increases given bad habits and decreases given good habits. Hence our assumption was not valid. Heart Attack Attack 1 (yes) 2 (no) P(Attack| Exercise, Smoke) Bad Habit Good Habit Exercise = 2 (No) Smoke = 1 Exercise = 1 (Yes) Smoke = (Yes) No (2) 0.1211659 0.03101531 0.8788341 0.96898469 From the above graphs we can conclude that, on adding edges from smoke and exercise to attack, the probability of having attack increases given bad habits and decreases given good habits. Hence our assumption was not valid. Angina Angina 1 (yes) 2 (no) P(Angina| Exercise, Smoke) Bad Habit Good Habit Exercise = 2 (No) Smoke = 1 Exercise = 1 (Yes) Smoke = (Yes) No (2) 0.1190069 0.03680005 0.8809931 0.96319995 From the above graphs we can conclude that, on adding edges from smoke and exercise to angina, the probability of having angina increases given bad habits and decreases given good habits. Hence our assumption was not valid. Solution (b): Diabetes Diabetes 1 (yes) 2 (Only during pregnancy) 3 (no) 4 (pre- diabetic) P(Diabetes| Bp, Bmi, Cholesterol) Bad Health Good Health Bp = 1 (Yes) Cholesterol = 1 Bp = 3 (No) Cholesterol = 2 (Yes) Bmi = 3 (over weight) (No) Bmi = 2 (Normal) 0.123480634 0.054172949 0.007460298 0.009731215 0.852415963 0.016643105 0.925952333 0.010143502 From the above graphs we can conclude that, on adding edges from smoking and exercise to diabetes, the resultant probabilities differ. Probability of having diabetes with bad health increases, and probability of having diabetes with good health decreases. Hence our assumption is not valid. Stroke Stroke 1 (yes) 2 (no) P(Stroke| Bp, Bmi, Cholesterol) Bad Health Good Health Bp = 1 (Yes) Cholesterol = 1 Bp = 3 (No) Cholesterol = 2 (Yes) Bmi = 3 (over weight) (No) Bmi = 2 (Normal) 0.08425692 0.01399739 0.91574308 0.98600261 From the above graphs we can conclude that, on adding edges from smoking and exercise to stroke, the resultant probabilities are almost same. So our assumptions are valid in this case. Heart Attack Attack 1 (yes) 2 (no) P(Attack | Bp, Bmi, Cholesterol) Bad Health Good Health Bp = 1 (Yes) Cholesterol = Bp = 3 (No) Cholesterol = 2 1 (Yes) Bmi = 3 (over (No) Bmi = 2 (Normal) weight) 0.1421993 0.01546893 0.8578007 0.98453107 On adding edges from smoking and exercise to stroke, the resultant probabilities are almost same. There is a slight increase in probability of having attack considering smoking and exercise causes (edges) attack given bad health, and a slight decrease in probability of having attack given good health. Hence our assumption is almost valid. Angina Angina 1 (yes) 2 (no) P(Angina| Bp, Bmi, Cholesterol) Bad Health Good Health Bp = 1 (Yes) Cholesterol = Bp = 3 (No) Cholesterol = 2 1 (Yes) Bmi = 3 (over (No) Bmi = 2 (Normal) weight) 0.1629716 0.01294426 0.8370284 0.98705574 In the above case our assumption was valid, as there is no much difference in probabilities before and after adding edges between smoke, exercise to Angina. Ques5: Make a third network, starting from the network in Question 4, but adding an edge from diabetes to stroke. For both networks, evaluate the following probabilities: P(stroke = 1(yes) | diabetes = 1(yes)) and P(stroke = 1(yes)| diabetes = 3 (no)) Again, what was the effect, and was the assumption about the interaction between diabetes and stroke valid? Solution: Probabilities before adding edges between diabetes and stroke Stroke P(Stroke | Diabetes) Diabetes = 1 (yes) Diabetes = 3 (No) 1 (yes) 0.04416376 0.04047831 2 (no) 0.95583624 0.959521698 Probabilities after adding edge between diabetes and stroke. Stroke P(Stroke | Diabetes) Diabetes = 1 (yes) Diabetes = 3 (No) 1 (yes) 0.07619782 0.03501533 2 (no) 0.92380218 0.96498467 From the above tables we can conclude that, on considering diabetes as one of the causes of stroke in Bayesian network (by adding the edge between diabetes and stroke) the probability of having stroke given diabetes increases i.e if a person have diabetes there is greater possibility that the person have stroke too if diabetes is one of the causes of stroke. There is 42% increase in relative probability. The absolute difference in probabilities is small, but relatively they are high. Hence here our assumption was invalid.