IT5 Medical Diagnosis for Cardiac Arrests using Data
Transcription
IT5 Medical Diagnosis for Cardiac Arrests using Data
Medical Diagnosis for Cardiac Arrests using Data Mining and Bayes’ Theorem Dr. Tariq Mohammed Saied Al Tayee Mr.Viswan Vimbi Department of Information Technology College of Applied Sciences Sohar, Sultanate of Oman [email protected] Department of Information Technology College of Applied Sciences Sohar, Sultanate of Oman [email protected] Abstract—Data Mining is the process of employing one or more techniques to automatically analyze and extract knowledge from data contained within a database. It can be used in all branches of industries including health care management, where from a voluminous data of a medical data set data mining enables prediction or early detection of a disease like cardiac arrests. The goal of this paper is to show that the data mining technique using Bayes Theorem helps in predicting cardiac disorders. Bayes Theorem is a technique to estimate the likelihood of a property given the set of data as input. The medical data sets like age, sex, blood pressure, blood sugar, etc. helps in early detection or predicting cardiac problems in patients. Medical practitioners usually take decisions based on their intuitions and experience rather than the knowledge rich data hidden in the database. This affects the quality of service provided to patients. The technique used here tries to establish that the data mining technique used evolves significant knowledge like the medical data sets related to cardiac disorders. The results obtained shows that the system designed using Bayes Theorem with the support of databases of patient records, a knowledge rich environment could be created and can help to significantly improve the quality of clinical diagnosis by early detection or prediction of cardiac disorders. Numerous studies have been performed in the past describing the significance of data mining in the detection of heart related diseases using Bayes’ Theorem. The results gathered in these researches differentiate each other in one or another way. Some of the works in this respect are discussed in this research paper. In the study of Palaniappan and Awang (2008), a specific model of Intelligent Heart Disease Prediction System (IHPDS) was designed using the data mining technique of Bayes’ Theorem and the results obtained were quite specific to the objective of the tool by facilitating in the determination of important information related to heart disease. II. BACKGROUND A. Cardiac Disorders It is observed that Cardiac disorders are diseases that affect the cardiovascular system that involves heart or blood vessels, and technically refers to the diseases that influence the cardio vascular system of human beings [1]. This disorder remains the biggest cause of the deaths around the world while these diseases usually influence the older adults. To prevent the disorder, it is essential to take necessary considerations on its detection and treatment [2]. Therefore, different data mining tools were being used particularly Bayes’ theorem to detect cardiac disorders [3]. Keywords—data Mining, Bayes Theorem, K fold, supervised learning, classification, clustering I. INTRODUCTION B. Bayes’Theorem Bayes’ Theorem, related to probability theory, is used to calculate the chance that an instance belongs to a target class given a summary (or conjunction) of attribute values. According to this theorem, the predicted class for a tested instance – defined by a set of attribute values as follows V = v1 ˄ v2 ˄ ….. ˄ vn – is that one having the highest conditional probability. The theorem is demonstrated as follows: Data mining is referred as an ability to analyze data sets for gaining significant and in-depth knowledge about any field of interest. It is an influential technological method assisting the medical industry in keeping the relevant and important information within their data warehouses more effectively. It can be applied to the defense, medical, science and other industries [1, 5, 7, 8, 9]. Statistical data mining methods like Bayes’ Theorem is extensively used in data mining processes provides the chance to revise the probabilities of the incidents by obtaining more information. This method is used as a tool for the detection of the cardiac arrests in modern health care centers. Cardiac arrests, also known as, circulatory arrests, can be explained as a sudden halt in blood circulation due to the failure of heart to contract [11]. The impact of cardiac arrests, in some cases, is reversible if treated early. However, in most cases, cardiac arrests often leads to death within minutes. In the theorem, T is the hypothesis required for testing and E demonstrates the confirmation or disconfirmation of the theory [2]. For any proposition ‘S’ the P(S) is the level of subjective probability that the S is true and P(T) demonstrates the effective estimation of the probability of the theory before the 1 technique in the form of the Bayes’ theorem assists effectively to prevent these situations as in the Intelligent Heart Disease Prediction System using the Naive Bayes’ relating to the Bayes’ theorem. It provides the analysis of the complex medical data such as age, sex, blood pressure and blood sugar that can effectively predict the occurrence of patients getting cardiac disorders [2]. The medical data sets with significant variables such as age, sex, blood pressure, blood sugar are quite helpful in the early prediction of the cardiac problems in patients. Moreover, it facilitates significant knowledge, for example the patterns, relationships between medical factors related to heart disease to be established. consideration of new piece of evidence that is known as the prior probability of T [3]. For example, the equation can be decoded as follows: P (T|E) = chance of having cardiac arrest (T) given a positive symptom (E). P (E|T) = chance of a symptom (E) given that a patient has cardiac disorder. P (T) = chance of having cardiac disorder. P (¬T) = chance of not having cardiac disorder. P (E|¬T) = chance of a positive symptom (E) given that a patient has not cardiac disorder. E. Data Processing It is observed that cleaning and filtering of data from huge amount data is quite complex while the existence of accuracy is of great importance during the data processing of the disease information [1]. In order to make the data suitable for the data mining process, it is required to be transformed effectively, while the Bayes’ data mining system provides effective utilization of the data where, in the system, the data is changed in to reliable data sets with the suitable characteristics [2]. The heart disease data warehouse is refined by decreasing the duplicate records and providing the required accurate outcomes and is made suitable for clustering of the cardiac disease data from the heart disease data warehouse [3]. It is observed that the applications of the theorem is widespread and is not restricted to the financial and mathematical field only [2]. However, it is identified that the Bayes’ theorem can be utilized to determine the accuracy of the medical test results by taking into consideration the way it is predictable that a person is to have the disease with the general accuracy of the tests [6]. C. Early Detection of Important Patterns related to Cardiac Disease from Heart Disease Data Warehouse using Bayes’ Theorem From the beginning of data mining, there has been guidance from the requirement to solve the practical issues related to cardiac disease [1]. The amount of data concerning the cardiovascular factors and diseases has been stored in very large databases and hence it is quite essential to attain the accurate results from the analysis. It is observed that Bayes’ Theorem are widely used as the effective data mining analysis to attain medical decision-making concerning the classification and diagnosing. There are several situations where the decision are required to be reliable and effective [2]. III. EXPERIMENT METHODOLODY We have taken 14 attributes from the medical dataset [10] as shown in table 1 with chronic disorder being the diagnosis attribute. It is observed from the one of the research study [3] that the Bayes’ networks are utilized significantly in detecting the varying patterns of cardiac disease from the huge disease data, as it provides the diagnostic reasoning of making probabilistic inferences of the disease in the conditions of uncertainty [3]. It is observed that the intelligent heart disease prediction system uses the data mining technique Naive Bayes’ using the .NET platform in order to attain accurate predicted results. According to the research studies conducted by Soni et al. [4], the Bayes’ theorem provides 86.53 accuracy of the results concerning the predictable patterns of cardiac disease. D. Identification of the significance of Medical Data Set Problems The health care industry collects large amounts of healthcare data that unluckily are not mined to find out hidden information for efficient decision-making. The discovery of the hidden patterns and the relationship among them for accurate identification of data patterns concerning the medical data sets of the cardiac disease are significant for the analysis of cardiac diseases [3, 6]. Hence, the advanced data mining 2 Sl. No. 1 2 3 4 Attribute Description Age Sex Chest Pain Blood Pressure 5 6 Cholesterol Fasting Blood Sugar 7 ECG 8 9 Heart Rate Induced Angina 10 Old Peak 11 Slope 12 Thal 13 CA In years Male = 1, Female = 0 Types of chest pain Blood pressure taken during rest mg/dl true = 1 when > 120mg/dl, false = 0 Electrocardiographic results during rest Maximum heart rate 1 if experiencing angina, 0 if not 0 – no depression, 1 – yes depression of peak exercise ST segment Value=3:Normal, value 6:fixed defect, value 7: reversible defect Number of major vessels colored by fluoroscopy (value 0-3) 14 the variable ‘age’ with the range of 57 years, variable ‘sex’ of participants of study, the variable blood sugar with the range of 118, the variable ‘chest pain’ of individuals with the range of 71, the variable blood pressure with 49 and heart rate of 58 are suffering from chronic disorder [4]. Angiographic disease status Table 1: Attribute of Heart Disease Data Sets For conducting the experiment of analyzing early prediction of the occurrence of cardiac arrests the data sets of age, sex, blood pressure, blood sugar, chest pain and heart rate in the Naive Bayes’ classifier is used which in turn applies the Bayes theorem for analysis and decision making. The significance of the Naive Bayes’ classifier is that it provides accurate and reliable results based on small amount of data for the training phase of the software. Only categorical attributes were used and for simplicity the number of attributes (in table 1) were reduced to 7 (table 2). The reduced dataset is fed to the classification model using the K fold cross validation. Sl. No. 1 2 3 4 5 6 7 Chronic disorder Attribute Description Age In years Sex Male = 1, Female = 0 Chest Pain Types of chest pain Blood Blood pressure taken Pressure during rest Fasting Blood true = 1 when > Sugar 120mg/dl, false = 0 Heart Rate Maximum heart rate Chronic Angiographic disease disorder status Table 2: Reduced Attributes list The data set of 200 records of 150 males and 50 females with the 6 attributes is used for the experiment. The diagnosis attribute, chronic disorder, is the class identifier with the value 0 demonstrating no cardiac arrests ailments while the value 1 demonstrates the presence of cardiac ailments. Figure 3 Pre-processed using Weka tool The classification used is supervised learning method to extract the model demonstrating the significant data classes or to predict the future trends where this method is largely utilized in pattern recognition and artificial intelligence and have extensive effective recognition in medical diagnosis. Hence, the study uses Naive Bayes classification through clustering in order to diagnose the presence of heart disease in patients, as it is observed that Naive Bayes performed with good prediction probability of 95 % using different attributes [2]. In the research study this classifier assumes that the attributes are independent and the learning speed and classification speed are the significant advantage of the Bayes classifier. Fig 4(a) IV. RESULTS AND ANALYSIS Experiments were conducted with Weka tool. The selected data set were pre-processed and filtered with supervised classification to obtain the diagnostic classifier (fig 3) and figures 4(a) to 4(g) visualizes the 6 attributes influenced by the diagnostic classifier – chronic disorder. From the figures Fig 4(b) Figures 4(a) to 4(g) depicts the influence of the diagnostic classifier – chronic disorder – on the attributes sex, age, blood pressure, blood sugar, chest pain and heart rate. 3 analysis is made from a sample size of 200 patients from which information with respect to age, sex, blood sugar, blood pressure, chest pain and heart rate were collected. The data was entered in the data mining software. The findings depict that if the patient’s age ranges up to 57 years, its gender is male, blood pressure level is identified as 49, blood sugar level as 118, chest pain is equal to 71, and heart rate is equal to 58, then there is a strong possibility that the patients with such conditions possess a significant chance to have cardiac disorder. Furthermore, these findings depict that the efficiency of data mining cannot be undermined in detecting the cardiac and other chronic disorders rather take precautions for dealing with the issue. These graphs and tables additionally helped in predicting the influence of one component of the individual on other and comparatively an aggregate impact of all the individual characteristics upon the disorder [1, 2, 6]. Fig 4(c) Fig 4(d) The above results of the patients’ data concerning age, sex, blood pressure, blood sugar, chest pain, heart rate and chronic disorder patterns records with the correlation coefficient of 0.8532 demonstrates the accuracy of the results obtained from the data of the patients using Bayes theorem. It demonstrates the significant impact of age, sex, blood pressure, blood sugar, chest pain, heart rate and chronic disorder factors on the decision of cardiac arrests. V. CONCLUSION Fig 4(e) It is concluded from the findings of the research study that data mining is an effective process of applying different techniques to automatically evaluate and extract the desired reliable data within the database. It helped to predict the early detection of the disease, cardiac arrests, from a large amount of data. The paper demonstrated that using the data mining technique Naive Bayes in the analysis of heart disease patients assist in predicting cardiac disorders. Fig 4(f) The results analysis of the patients’ data concerning the age, sex, blood pressure, blood sugar, chest pain, and chronic disorder patterns records the correlation results obtained from the patients data using the Bayes theorem, where the theorem demonstrates accuracy of the predicted decision. However the results does not show the intensity of cardiac arrests or pain in a prediction. We intend to continue our future work in the study of fuzziness in mining data for the range of intensity of cardiac pain or arrests. REFERENCES [1] B.Patil, S. & Y.S.Kumaraswamy (2009), Intelligent and Effective Heart Attack Prediction System Using Data Mining and Artificial Neural Network, European Journal of Scientific Research 31(4), pp.642-56. [2] S. Palaniappan. & R. Awang (2008), Intelligent Heart Disease Prediction System Using Data Mining Techniques, IJCSNS International Journal of Computer Science and Network 8(8). Fig 4(g) It can be analyzed from the above results that the concerns given to the various patient attributes can help the physicians in determining the impact of cardiac disorders. The above [3] Sitar-Taut, D.-A. & Sitar-Taut, a.-V. (2010), Overview on How Data Mining Tools May Support Cardiovascular Disease, Journal of Applied Computer Science & Mathematics, 8(4), pp.1-24. 4 [4] J. Soni, U. Ansari & D. Sharma (2011), Predictive Data Mining for Medical Diagnosis: An Overview of Heart Disease Prediction, International Journal of Computer Applications, 17(8), pp.43-48. [5] Frank Lemke, Johann-Adolf Mueller (2003), Medical data analysis using self-organizing data mining technologies, Systems Analysis Modelling Simulation, 43(10), pp.1399-1408. [6] Latha Parthiban & R. Subramanian (2008), Intelligent Heart Disease Prediction System using CANFIS and Genetic Algorithm, International Journal of Biological, Biomedical and Medical Sciences, 3(3). [7] I. H. Witten & E. Frank, 2005, Data Mining: Practical Machine Learning Tool and Techniques, Morgan Kaufmann Publishers, San Francisco. [8] J. Han & M. Kamber, 2006, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, San Francisco. [9] M. H. Dunham, 2003, Data Mining: Introductory and Advanced Topics, Pearson Education, Pearson Education, United States. [10] Asuncion, A. & Newman, D.J. (2007), UCI Machine Learning Repository [http://www.ics.uci.edu/-mlearn/MLRepository.html]. Irvine, C. A: University of California, School of Information and Computer Science. [11] Jameson J.N. St. C, Dennis L Kasper, Harrison Tinsley Randolph, Braunwald Eugene, Fauci Anthony S, Hauser Stephen L, Longo Dan L (2005), Harrison’s Principles of Internal Medicine, New York, McGrawHill Medical Publication Division, ISBN 0-07-140235-7. 5