CardioVascular Disease Prediction Batch 03
Transcription
CardioVascular Disease Prediction Batch 03
A HYBRID DEEP LEARNING APPROACH
FOR CARDIOVASCULAR DISEASE
PREDICTION USING FUSION OF
CNN AND RNN MODELS
A PROJECT REPORT
Submitted by
AJAY VIBHAS R
DHARSHINI PRIYA M
DURGA DEVI R
PRADHOSHIKA D
(Regno.814721104003)
(Regno.814721104015)
(Regno.814721104017)
(Regno.814721104043)
in partial fulfilment for the award of the degree
of
BACHELOR OF ENGINEERING
in
COMPUTER SCIENCE AND ENGINEERING
SRM TRP ENGINEERING COLLEGE, TIRUCHIRAPPALLI
ANNA UNIVERSITY: CHENNAI 600 025
MAY 2025
I
ANNA UNIVERSITY: CHENNAI 600 025
BONAFIDE CERTIFICATE
Certified that this project report “A Hybrid Deep Learning Approach for
Cardiovascular
Disease
Prediction
Using
Fusion
of
CNN and RNN Models” is the bonafide work of “AJAY VIBHAS R
(814721104003), DHARSHINI PRIYA M (814721104015), DURGA
DEVI R (814721104017), PRADHOSHIKA D (814721104043)” who
carried out the project work under my supervision.
SIGNATURE
SIGNATURE
Dr.P.Sudhakaran,Ph.D.,
Mr. M.Jaiganesh M.E, (Ph.D) .,
HEAD OF THE DEPARTMENT
SUPERVISOR
Professor and Head
Assistant Professor
Department of CSE
Department of CSE
SRM TRP Engineering College,
SRM TRP Engineering College,
Irungalur, Mannachanallur Taluk,
Irungalur, Mannachanallur Taluk,
Tiruchirappalli-621105.
Tiruchirappalli-621105.
Submitted for the viva - voce on
INTERNAL EXAMINER
.
EXTERNAL EXAMINER
II
ACKNOWLEDGEMENT
First, we must acknowledge the almighty God's choices and
abundant blessings. His providence touched every piece of this project. He
is the sole source of success because we are tools in the hands of God
omnipotent. He guided, enlighten, gave energy and knowledge to make this
project possible and successful.
We express our sincere gratitude to our beloved Principal Dr. M.
Siva Kumar, for his sincere endeavour in educating us in this premier
institution.
Our sincere and heartfelt gratitude to Dr. P. Sudhakaran, Head of
the Department for his kind help and guidance rendered during studies. We
whole heartedly acknowledge the words of inspiration given by our
precious guide Mr. M.Jaiganesh, Assistant Professor for completing the
project work.
We propose our sincere thanks to the precious project co-ordinator
Dr. S. Sabeetha Saraswathi, who has motivated to completed the project
successfully.
We convey our sincere thanks to all the staff members and lab
assistants of the Computer Science department. Last but not least, we would
like to our parents and friends for lending us a helping hand in all
endeavour’s during the project work and always providing us with the
necessary motivation to complete this project successfully.
III
ABSTRACT
Heart disease remains among the most widespread and deadly
diseases in the world, which calls for the creation of sophisticated systems
for its early detection and diagnosis. This project presents a hybrid deep
model for heart disease prediction based on two types of data: medical
images and structured tabular data. The image-based method uses a
Convolutional Neural Network (CNN) to learn to extract spatial features
from diagnostic radiological images like chest X-rays or echocardiograms.
Further, a fusion model is constructed that unites CNN and Recurrent
Neural Network (RNN) in order to enhance performance by leveraging both
spatial and temporal features present in sequential image data sets in order
to increase diagnostic accuracy and reliability. In phase two of the system,
prediction of heart disease is performed based on structured clinical data in
CSV format. The dataset contains significant health metrics like age,
cholesterol, blood pressure, and blood sugar levels. For processing this
sequential, time-dependent data, a Recurrent Neural Network (RNN) model
is used, taking advantage of its ability to capture temporal dependencies
and sequential patterns. The RNN reviews the trend and interaction of
clinical parameters over a period to deliver precise predictions of the
chances of heart disease. Using this deep learning method applied to the
CSV data, the system can observe faint trends and interactions in the
patient's healthcare profile that traditional algorithms may miss. By
merging results from image-based and tabular data analyses, the proposed
system provides a holistic, multi-modal diagnostic system. It aids clinicians
in making timely and correct diagnoses, ultimately leading to enhanced
patient outcomes. Comparative analyses are carried out between the
individual CNN, RNN, and the combined CNN-RNN models to establish
their accuracy, sensitivity, and robustness. This two-way approach
guarantees a more accurate heart disease prediction model.
IV
TABLE OF CONTENTS
CHAPTER
No.
1
TITLE
PAGE No.
ABSTRACT
IV
LIST OF FIGURES
IX
LIST OF ABBREVIATIONS
X
INTRODUCTION
1
1.1 DEEP LEARNING
2
1.2 CLASSES OF DEEP LEARNING
3
1.3 HEART DISEASE PREDICTION
4
1.4 HEART DISEASE
4
1.5 TYPES OF HEART DISEASE
5
1.5.1 Coronary Illness
5
1.5.2 Angina Pectoris
6
1.5.3 Acute Coronary Syndrome (ACS)
6
1.5.4 Aortic Stenosis
7
1.5.5 Myocardial Infarction (Heart Attack)
7
1.5.6 Peripheral Arterial Disease (Pad)
8
1.5.7 Cardiomyopathy
9
1.5.8 Innate Coronary Illness
9
V
2
LITERATURE SURVEY
2.1
2.2
10
HEART DISEASE DETECTION USING
FEATURE EXTRACTION AND
ARTIFICIAL NEURAL NETWORKS : A
SENSOR-BASED APPROACH
11
A
ROBUST
HEART
DISEASE
PREDICTION SYSTEM USING HYBRID
DEEP NEURAL NETWORK
12
EFFICIENT
PREDICTION
OF
CARDIOVASCULAR DISEASE USING
MACHINE LEARNING ALGORITHMS
WITH RELIEF AND LASSO FEATURE
SELECTION TECHNIQUES
13
2.5
HEART DISEASE PREDICTION USING
EXPLORATORY DATA ANALYSIS
14
2.6
EFFECTIVE
HEART
DISEASE
PREDICTION
USING
HYBRID
MACHINE LEARNING TECHNIQUES
15
DIAGNOSIS OF HEART DISEASE
USING
GENETIC
ALGORITHM
BASED
TRAINED
RECURRENT
FUZZY NEURAL NETWORKS
16
2.3
2.4
2.7
3
PREDICTING HEART DISEASES USING
MACHINE LEARNING AND DIFFERENT
DATA CLASSIFICATION TECHNIQUES
10
PROBLEM DESCRIPTION
18
3.1
EXISTING SYSTEM
18
3.1.1 Limitations
19
VI
3.2
4
5
6
PROPOSED SYSTEM
20
3.2.1 Convolutional Neural Network
20
3.2.2 Recurrent Neural Network
22
3.2.3 Fusion Of CNN And RNN
23
3.2.4 Advantages
24
SYSTEM REQUIREMENTS
25
4.1
SYSTEM ARCHITECTURE
25
4.2
FLOW CHART
28
4.3
UML DIAGRAMS
29
4.3.1 Use Case Diagram
30
4.3.2 Sequence Diagram
32
4.3.3 Class Diagram
33
4.3.4 Activity Diagram
34
SYSTEM CONFIGURATION
35
5.1
HARDWARE REQUIREMENTS
35
5.2
SOFTWARE REQUIREMENTS
35
MODULES
38
6.1
MODULES LIST
38
6.1.1 Image Input And Preprocessing
38
6.1.1.1 Grayscale Conversion
39
VII
7
8
9
6.1.1.2 DCT Filtration
40
6.1.2 Image Segmentation And Feature
Extraction
40
6.1.3 Image Based Classification Using
CNN
41
6.1.4 Clinical Data Preprocessing
42
6.1.5 Clinical Data Classification (RNN)
43
6.1.6 Final Prediction And Output
44
SOFTWARE TESTING
46
7.1
TESTING PROCESS
46
7.2
TYPES OF TESTS
46
7.2.1 Unit Testing
46
7.2.2 Integration Testing
46
7.2.3 Functional Testing
47
7.2.4 System Testing
48
7.2.5 Acceptance Testing
48
EXPERIMENTAL ANALYSIS
49
8.1
RESULTS
49
8.2
ANALYSIS
50
SUSTAINABLE DEVELOPMENT GOALS
51
9.1
51
AREA OF USE
VIII
9.2
10
BENEFITS OF THE PROJECT
52
CONCLUSION AND FUTURE WORK
53
10.1
CONCLUSION
53
10.2
FUTURE ENHANCEMENT
54
APPENDIX I
55
APPENDIX II
94
REFERENCES
103
PUBLICATIONS
105
PLAGIARISM REPORT
IX
LIST OF FIGURES
FIGURE
No.
DESCRIPTION
PAGE No.
3.2.1
Convolutional Neural Network
21
3.2.2
Recurrent Neural Network
22
3.2.3
Fusion Of CNN And RNN
23
4.1
System Architecture
25
4.2
Flow Chart
28
4.3.1
Usecase Diagram
31
4.3.2
Sequence Diagram
32
4.3.3
Class Diagram
33
4.3.4
Activity Diagram
34
6.1.1
Image Input And Preprocessing
39
6.1.2
Image Segmentation And Feature Extraction
41
6.1.3
Image-Based Classification Using CNN
42
6.1.4
Clinical Data Preprocessing
43
6.1.5
Clinical Data Classification (RNN)
44
6.1.6
Final Prediction And Output
45
8.1
Result
49
IX
LIST OF ABBREVIATIONS
ACRONYM
EXPANSION
ACS
ACUTE CORONARY SYNDROME
AUC-ROC
AREA UNDER COVER -RECEIVER OPERATING
CHARACTERS
CABG
CORONARY ARTERY BYPASS GRAFTING
CNN
CONVOLUTIONAL NEURAL NETWORKS
CT
COMPUTED TOMOGRAPHY
CVD
CARDIOVASCULAR DISEASE
ECG
ELECTROCARDIOGRAM
EHR
ELECTRONIC HEALTH RECORDS
MRI
MAGNETIC RESONANCE IMAGING
PAD
PERIPHERAL ARTERIAL DISEASE
RNN
RECURRENT NEURAL NETWORK
SVM
SUPPORT VECTOR MACHINE
X
CHAPTER 1
INTRODUCTION
Heart disease is often identified late and its warning signs are not properly
controlled, it causes millions of deaths worldwide. It is common for doctors
to use electrocardiograms (ECG), echocardiography, angiograms and a
physical examination to find symptoms of heart disease. Nonetheless, it can
take a lot of time to process the results and interpret them, as different
experts might use their own ways of evaluation. Thanks to developments in
AI and machine learning, deep learning models have proven effective for
automating the identification of diseases and improving the accuracy of
diagnoses. For example, CNNs have displayed strong performance in
spotting any issues on chest X-rays or special images called
echocardiograms. Likewise, RNNs are suited to dealing with sequential and
time series data which makes them a good option for monitoring patients’
past information and current tests results. To develop their heart disease
prediction system, they rely on new technologies that include both medical
images and records of patients’ conditions. CNN and RNN methods are
used alone and together by the system to focus on spatial and temporal
features appearing in various types of data. This method is created to reach
a better and more practical prediction outcome that is well-matched with
what clinicians must do in their routine practice.
1
1.1 DEEP LEARNING
Machine learning and artificial intelligence both include deep learning.
focused on replicating human thinking and choices. Neural networks are
developed after studying how the human brain works. It is very important
for data science.
Applying statistics and data analysis approaches to predictive modeling. At
the main aspect of deep learning is having reliable algorithms to aid in
learning. These algorithms are made up of several layers of neural
networks.
Networks consist of interconnected parts that are responsible for decisionmaking. Typically, networks are set up in advance to achieve specific
objects. All the layers go through the input data performing simple
transformations. This is followed by sending it to the next layers where it
is further perfected.
Unlike previous machine learning methods, DL can address When there are
hundreds of features or columns in the data, deep learning works best. Large
and complex data is examined. Traditional algorithms usually find it
difficult when handling.
For example, when images are 800x1000 pixels which is considered
unstructured data RGB format. Still, deep learning models work very well
in such situations, because analyzing this type of information and looking
for useful patterns data structures.
2
1.2 CLASSES OF DEEP LEARNING
Feed forward Neural Networks Also referred to as Multi-Layer
Perceptrons (MLPs), these models are made up of several interconnected
layers of nodes, each of which receives input from the layer above and
sends output to the layer below. They are mostly applied to supervised
learning tasks like regression and classification.
Convolutional Neural Networks: These models are created especially to
process data that includes photos and videos. They are made up of several
filter layers that take pertinent information out of the incoming data.
Convolutional neural networks are frequently employed for tasks related to
segmentation, object detection, and picture classification.
Recurrent Neural Networks: Text, audio, and time-series data are
examples of sequential data that can be processed using these models. They
make use of feedback connections, which enable one step's output to
become the subsequent step's input. Time-series prediction, audio
recognition, and natural language processing are three common
applications for recurrent neural networks
Generative Adversarial Networks: These models are made up of a
discriminator and a generator neural network that have been trained
together to produce artificial data that is similar to the input data. They are
frequently employed in text production, video synthesis, and image
synthesis.
Auto encoders: These models are employed in dimensionality reduction
and unsupervised learning. They are made up of a decoder that uses the
compressed representation to recreate the original input and an encoder that
compresses the input data into a lower-dimensional form. Data
compression, picture denoising.
3
1.3 HEART DISEASE PREDICTION
A variety of disorders that affect your heart are referred to as heart diseases.
Heart rhythm issues (arrhythmias), blood vessel illnesses including
coronary artery disease, and congenital heart defects are among the
conditions that fall under the category of heart disease
Heart disease ranks highly among the leading causes of morbidity and
mortality worldwide. One of the most crucial topics in the clinical data
analysis division is the prediction of cardiovascular disease. The healthcare
sector has enormous amounts of data. Through data mining, the massive
gathering of unprocessed healthcare data is transformed into knowledge
that aids in forecasting and decision-making.
1.4 HEART DISEASE
One of the body's essential organs is the heart. The ability of the heart to
function well governs life. If the heart is not functioning properly, it will
affect the other organs of the human body, such as the kidneys, mind, and
so forth. The heart is only a pump that circulates blood throughout the body.
If there is not enough blood in the body, many organs suffer, including the
brain, and if the heart stops beating, death occurs in a matter of minutes.
Life is entirely dependent on the heart's ability to function properly. The
phrase "heart sickness" refers to diseases of the heart and the blood vessels
that supply it.
There are number of elements which build the danger of Heart infection:
● Family history of coronary illness
● Smoking
4
● Poor eating methodology
● High pulse
● Cholesterol
● High blood cholesterol
● Obesity
● Physical inertia
Symptoms of a Heart Attack Manifestations of a heart assault can include:
● Discomfort, weight, largeness, or agony in the midsection, arm, or
beneath the breastbone.
● Discomfort emanating to the back, jaw, throat, or arm.
● Fullness, heartburn, or stifling feeling (may feel like indigestion).
● Sweating, queasiness, heaving, or unsteadiness.
● Extreme shortcoming, nervousness, or shortness of breath.
1.5 TYPES OF HEART DISEASE
Heart disease is a general phrase that covers a variety of illnesses affecting
different parts of the heart. The heart represents "cardio." Consequently,
cardiac diseases are all included in the category of cardiovascular diseases.
Below is a list of a few types of heart diseases.
1.5.1 Coronary Illness
The most well-known type of coronary disease worldwide is acknowledged
by the name coronary supply route malady, or CAD. This disease is
characterized by plaque buildup obstructing the coronary veins, which
reduces the heart's blood and oxygen supply.
5
1.5.2 Angina Pectoris
It is a medical term for pain in the middle of the abdomen caused by
insufficient blood flow to the heart. Alternatively referred to as angina, it is
a warning indication of a heart attack. The abdominal pain lasts for a few
seconds or minutes at a time.
1.5.3 Acute Coronary Syndrome (ACS)
Acute coronary syndrome (ACS) is an umbrella term for a group of
conditions that arise due to a sudden reduction in blood flow to the heart
muscle. This most commonly happens due to a blockage in the coronary
arteries, the vessels supplying oxygen-rich blood to the heart
Causes
The primary culprit is atherosclerosis, a buildup of fatty plaque in the
arteries. This plaque can narrow the arteries, reducing blood flow.
Additionally, a blood clot forming on top of a plaque can completely block
the artery, leading to a heart attack.
Symptoms
Chest pain (angina), pressure, tightness, or squeezing sensation in the chest,
pain radiating to the arm, jaw, shoulder, or back, shortness of breath,
nausea, vomiting, and sweating are common signs. However, symptoms
can vary depending on the severity and location of the blockage.
Treatment
Treatment primarily focuses on restoring blood flow and preventing further
complications. Medications, and coronary artery bypass grafting (CABG)
surgery are common treatment options.
6
1.5.4 Aortic Stenosis
Aortic stenosis refers to the narrowing of the aortic valve, the valve located
between the left ventricle of the heart and the aorta, the major artery
carrying blood out of the heart. This narrowing restricts blood flow from
the heart to the rest of the body.
Causes
Aging, calcium buildup on the valve (calcific aortic stenosis), or a
congenital heart defect can cause aortic stenosis.
Symptoms
Initially, there might be no symptoms. As the stenosis worsens, chest pain
(angina), fatigue, shortness of breath, light-headedness, and leg cramps
upon exertion become common. In severe cases, fainting or heart failure
can occur.
Treatment
Treatment depends on the severity of the stenosis. Medications might help
manage symptoms in mild cases. In moderate to severe cases, valve
replacement surgery (either surgical or transcatheter aortic valve
replacement - TAVR) is often necessary.
1.5.5 Myocardial Infarction (Heart Attack)
When blood supply to a portion of the heart muscle is interrupted by a
blockage in a coronary artery, the result is a myocardial infarction, also
referred to as a heart attack. This oxygen-starved blood causes the heart
tissue to deteriorate or die.
7
Causes
Similar to ACS, atherosclerosis and blood clots are the main culprits behind
heart attacks.
Symptoms
Chest pain (often described as crushing, pressure, or tightness), pain
radiating to other areas like arm, jaw, shoulder, back, light-headedness, and
fatigue are common signs. Symptoms can vary greatly from person to
person.
Treatment
Prompt medical attention is crucial. Treatment aims to restore blood flow,
minimize heart damage, prevent further complications, and promote
recovery. Medications (including clot-busters), angioplasty, and coronary
artery bypass surgery are common interventions.
1.5.6 Peripheral Arterial Disease (PAD)
Peripheral arterial disease (PAD) is a condition characterized by narrowed
arteries in the legs and feet. This narrowing restricts blood flow to the legs,
causing pain and other symptoms.
Causes
Similar to coronary artery disease, atherosclerosis is the leading cause.
Symptoms
Pain, cramping, and tiredness in the legs
especially while walking,
numbness or weakness in the affected leg, sores that heal poorly, and
changes in skin color or temperature in the affected leg are common signs.
8
Treatment
Lifestyle modifications like smoking cessation, exercise, and managing
diabetes are crucial. Medications to improve blood flow and manage risk
factors are often prescribed. In severe cases, angioplasty or bypass surgery
might be necessary.
1.5.7 Cardiomyopathy
It is the weakening of the heart muscle or a structural alteration brought on
by insufficient cardiac pumping. Hypertension, alcohol use, viral
infections, and genetic flaws are some of the common causes of
cardiomyopathy.
It is more difficult for the heart to pump blood effectively under these
circumstances because the heart muscle weakens, enlarges, or thickens. A
number of issues, such as heart failure, irregular heartbeats, or even sudden
cardiac death, may result from this.
1.5.8 Innate Coronary Illness
It suggests that an abnormality in the structure or function of the heart could
lead to the development of an irregular heart. It's also a kind of inherited
disease that children are born with.
These anomalies can cause symptoms like chest pain, shortness of breath,
and even heart failure in some cases. Diagnosis often involves imaging tests
like echo cardiogram or cardiac catheterization. Treatment depends on the
specific anomaly and its severity, potentially involving medications,
balloon angioplasty, or bypass surgery.
9
CHAPTER 2
LITERATURE SURVEY
2.1 Title: PREDICTING HEART DISEASES USING MACHINE
LEARNING
AND
DIFFERENT
DATA
CLASSIFICATION
TECHNIQUES
Author(s): Hosam F. El-Sofany
Year: 2024
The research paper is titled “Using a Machine Learning Algorithm to
Predict Heart Disease” provides an in-depth analysis of using various
machine learning (ML) techniques for determine someone’s chances of
suffering from heart disease. The study began because the number of
scholars in modern-day Scotland is rising. Many deaths are still attributed
to cardiovascular diseases. Accurate and early prediction of heart disease
leads to better care for the patient. I used a collection of health information
found in medical records. for instance, age, gender, blood pressure,
cholesterol levels, pain in the chest and additional factors to apply and
evaluate many machine learning models. Various algorithms are discussed
in the paper. Consider Decision Tree, Random Forest, Support Vector
Machine (SVM) and also Naïve Bayes. The algorithms were measured
using metrics such as accuracy and precision. Check their recall and F1score to tell which model’s predictions can be trusted the most, after Once
the data was normalized and the best features were selected, the models
were uses typical approaches such as cross-validation, to train and test the
algorithm. Some of the evaluated Of the algorithms, Random Forest
performed the best and showed how efficient it . The study ends with a
discussion of effectiveness in processing medical datasets with several
characteristics. Solutions using ML enable doctors to work more efficiently
and achieve more accurate results.
10
2.2 Title: HEART DISEASE DETECTION USING FEATURE
EXTRACTION AND ARTIFICIAL NEURAL NETWORKS: A
SENSOR-BASED APPROACH.
Author(s): Awad Bin Naeem, Biswaranjan Senapati, Dipen Bhuva,
Abdelhamid Zaidi, Abhishek Bhuva, Md. Sakiul Islam Sudman, and
Ayman E. M. Ahmed
Year: 2024
The paper explores the detection of heart disease using both feature
extraction and artificial neural network. This book describes in detail a way
to identify threats to healthcare in their early stages. Looking at my heart
data and using artificial neural networks (ANNs). The motivation occurs
because of the rising number and number of deaths from heart diseases
worldwide, calling for quick detection of health problems. The study
introduces a new way of doing things. collates information from recorded
data such as ECG and uses it to derive meaningful has capabilities
necessary to determine heart-related conditions. Some of these features
include blood pressure and the heart. rate, cholesterol and other medical
signs. All the data is organized and made ready for analysis. The data is
used in the next step, after being sent into the neural network for
classification. In methodology, extracting features helps to increase the
accuracy of the predictions how efficiently the ANN works. Authors use
different techniques to ensure that the data is ready for use helps in reducing
noise. After preprocessing, the significant features are chosen to use for
training the ANN. which aims to label the provided data as present or absent
of heart disease. Moreover, the system can be put into existing processes.
in order to include them in fashion or healthcare, so they can check and
track changes continuously. With the help of ANNs and similar techniques,
earlier diagnosis of heart diseases is possible. saving people’s lives by
starting treatment early.
11
2.3 Title: A ROBUST HEART DISEASE PREDICTION SYSTEM
USING HYBRID DEEP NEURAL NETWORKS
Author(s): Mana Saleh Al Reshan - Samina Amin - Muhammad Ali
Zeb - Adel Sulaiman - Hani Alshahrani - Asadullah Shaikh
Year :2023
My paper describes an HDNN that is used to strengthen heart, The ability
to predict the development of HD. Working on solving HD, as it is a main
source of physical impairment in the elderly. According to the authors,
early detection with modern computing plays a key role in lessening
mortality methods. Many of the common machine learning (ML) methods
are not able to work well with large and complex as a result of datasets, DL
architectures are now being integrated into AI. We consider HDNN to be a
major proposal for the future. Merges the use of CNNs and LSTMs. The
addition of multiple layers makes it possible for CNN to process features
and LSTM to incorporate the hidden levels. Doing tasks one after another.
The Cleveland dataset (which contained 303 cases) was among the two
datasets we analyzed. This includes a combined set of data (1,190
instances) that combines the records from Cleveland and Hungarian. The
models were tested on Switzerland, Statlog and Long Beach VA. One task
during preprocessing was to handle missing values. Both steps include
converting each class into binary codes and normalizing with z-scores. The
algorithm was configured to select significant features.Each part is
measured with accuracy, precision, F1-score and AUC. Researchers found
that using a CNN-LSTM gave much better results than traditional methods,
attaining an accuracy of 97.75 percent on the test set. Cleveland dataset and
98.86% on the merged dataset performed better than every ML method
tested. The AUC value for the model ranges from 0.9978 indicates that it is
quite successful in identifying HD cases. An ordinary agreement compared
to one from years ago the suits highlighted that its performance is among
the best due to smart feature learning.
12
2.4 Title: EFFICIENT PREDICTION OF CARDIOVASCULAR
DISEASE USING MACHINE LEARNING ALGORITHMS WITH
RELIEF AND LASSO FEATURE SELECTION TECHNIQUES
Author(s): Pronab Ghosh, Sami Azam, Mirjam Jonkman, Asif Karim,
F. M. Javed Mehedi Shamrat, Eva Ignatious, Shahana Shultana,
Abhijith Reddy Beeravolu, and Friso De Boer
Year: 2021
This work describes how a machine learning model can be used to predict
cardiovascular disease.(CVD) can be improved by including feature
selection methods and ensemble techniques. The research focuses on make
predictions more accurate by dealing with issues such as varied data and
important features. Having extra parameters and building a model that
memorizes the data instead of learning from it. Authors merged data from
five different datasets that were available to the public. We combined
(Cleveland, Long Beach VA, Switzerland, Hungarian and Statlog) data to
increase the robustness of dataset. There are 1,190 cases in the data with 14
attributes. The most important features are identified by making use of
Relief and LASSO methods. Classification occurs using features with
LASSO, while regression does not. Regularization can remove variables
that do not matter to the model. Subsequently, the features are applied and
informed the analysis. Use traditional classifiers such as Decision Tree,
Random Forest, K-Nearest Neighbors and AdaBoost. Gradient Boosting
and other models built by bagging such as DTBM and RFBM Example
methods are KNNBM and boosting(ABBM, GBBM). Their main purpose
is to make things better. The evaluation uses accuracy, precision, recall, F1score and error rates for measurements. As can be seen from the results,
RFBM working with Relief-picked features.The accuracy of Random
Forest was 88.65% and the hybrids used in existing studies could reach
85.48%. The paper also explains that the technique relies on certain
features.
13
2.5 Title: HEART DISEASE PREDICTION
USING
EXPLORATORY DATA ANALYSIS
Author(s): R. Indrakumari, T. Poongodi, Soumya Ranjan Jena
Year: 2020
The purpose of this paper is to use EDA to predict heart disease.
Approaches to machine learning that do not require guidance. It uses the
Cleveland heart disease as a base for examination. At initial stage, there
were 303 records with 76 features in the dataset; it was then processed to
209.Among 8 essential attributes, the information covers age, kind of chest
pain, blood pressure and heart rate. The study is designed specifically to
look at men, since they have different risks. The authors use K-means
clustering which is not supervised, to sort data into groups. Clustering
information by similarities to spot patterns that relate to heart disease. For
this activity, you may use Tableau to look at your results and make
interactive dashboards. Emphasizing aspects that increase the risk. It is
clear from the data that heart disease is influenced by a person’s age. People
aged 50–55 are more likely to suffer from risk. Grouping results for chest
pain types asymptomatic, atypical angina. An example is when someone is
asymptomatic :Individuals with slower heart rates were more prone to have
heart disease. The K-means. The algorithm made the clusters similar by
ensuring the metrics for within-group sum of squares were small.
Evaluating how good the clusters are. For EDA, when paired with K-means,
the outcomes are positive. With Tableau, it is possible to quickly predict
heart diseases and respond accordingly. Knowledge gained by observing
charts and graphics. There are some challenges, like having models based
only on males and capable of reading Georgian. Single algorithm. Experts
recommend expanding the size of the data used and including various using
demographics and exploring new forms of hybrid approaches for higher
accuracy.
14
2.6 Title: EFFECTIVE HEART DISEASE PREDICTION USING
HYBRID MACHINE LEARNING TECHNIQUES
Author(s): Senthilkumar Mohan - Chandrasegar Thirumalai Gautam Srivastava
Year: 2019
This paper introduces a hybrid model called Hybrid Random Forest with
Linear. The researchers made use of HRFLM, to raise the accuracy of
predicting CVD. It uses the Cleveland dataset from the UCI repository,
containing information about 303 patients with clinical characteristics (e.g.,
age, cholesterol and pain in the chest) and a goal variable that is either true
or false recognizing if heart disease is present or not. The data needs to be
prepared. Authors take a look at traditional algorithms, like Naive Bayes.
DT, RF, SVM and Neural are examples of Supervised Machine Learning
(SML) algorithms. In addition, networks (NN) are often compared to
ensemble methods. The suggested HRFLM adds Random as one of its
procedures. If relationships are not simple and linear, use Forest, but use a
linear model if things are linear and simple. Their performances improved
if they could be understood easily. Selecting features is one of the most
important things to do. Methods using entropy, training the model and
evaluation with metrics such as accuracy, precision, F-measure, sensitivity
and specificity are considered in every study. It has been proven that
HRFLM adequately addresses the main concepts related to financial
reporting.With the hybrid approach, the error rate during classification
drops. It demonstrates that HRFLM is better suited than logistic regression
and gradient-boosted . Overall, HRFLM gives you the flexibility to work at
many levels. Solution allowing for CVD to be detected early by putting
ensemble learning and clinical tests together because they are easy to
explain, they help in developing modern ways to prevent disease.
15
2.7 Title: DIAGNOSIS OF HEART DISEASE USING GENETIC
ALGORITHM BASED TRAINED RECURRENT FUZZY NEURAL
NETWORKS
Author(s): Kaan Uyar, Ahmet İlhan
Year:2017
The paper proposing a model that combines GA with recurrent fuzzy in this
paper. Heart disease can be diagnosed with the help of neural networks
(RFNN). The study was carried out on patients from the Cleveland heart.
We can use the UCI disease dataset, comprised of 303 samples with 13
clinical attributes. After removing 6 incomplete entries, the remaining set
of information is split into training and testing, with the presence or absence
of worms marking the target variable. RFNN architecture uses 13 input
nodes that refer to various features. It has 7 input neurons for age,
cholesterol and blood pressure, 7 hidden neurons and one output neuron.
Set the population size to 100, the mutation rate to 0.05 and use multi-point
cross over. Criteria such as accuracy, sensitivity and specificity should be
evaluated. Its primary strengths lie in using GA to adjust parameters and in
the power of the RFNN model.to capture the nonstandard connections in
medical data. Depending on one data set is one of its weaknesses and
relying only on their own feelings to judge themselves. This study
demonstrates how using hybrid AI models can help improve diagnosis.
Cardiovascular diseases, providing a strong advantage for finding them at
an early stage and treating them in the clinic decision support.
16
ADVANTAGES
Feature Learning
Relevant areas in raw data can be automatically discovered with the help of
deep learning models. The ability to discover patterns and relationships that
people could miss. Methods typical for machine learning.
High Accuracy
In several areas, deep learning models have displayed exceptional results
when compared to others. On organizing both medical imaging and various
diagnostic tasks such as predicting cardiovascular diseases showing great
accuracy when taught using a lot of data.
DISADVANTAGES
Data Requirements
Deep learning methods generally work best when plenty of labeled data are
available. Posting medical records and adding notes to them, particularly
for cases considered rare. Treating certain diseases or individual patient
groups is not always easy or economical.
Overfitting
Particularly complex models in deep learning can be easily exploited
byWhen data is fuzzy or small, this is sometimes what occurs.
17
CHAPTER 3
PROBLEM DESCRIPTION
3.1 EXISTING SYSTEM
Without carrying out this project, it will remain hard for doctors to find
evidence of heart disease at an early stage. A lot of patients display
symptoms late which makes early medicine or diagnosis difficult. Since the
latest technology does not automatically review both medical images and
patients’ data, it can lead to delayed diagnoses and this can result in more
serious health problems for patients. Because we can’t predict health
conditions soon enough, patients often wait until emergency rooms and
ICUs for treatment. Without this project, experts in the healthcare system
are still required to review clinical reports and scan results manually. This
becomes a challenge, mostly when hospitals do not have experienced
cardiologists available regularly. When doctors depend only on their
knowledge, they may end up misdiagnosing a patient because they are tired,
forgetful or not experienced enough. In addition, analyzing medical images
is tedious and time-consuming and without automation, it may be easy for
doctors to overlook small signs of disease. Also, if image and CSV data
processing are separated, it becomes impossible to completely review a
patient’s heart condition. Most approaches today study data by itself,
resulting in diagnoses that are missing some key information. When this
method is used, the risks of inaccurate treatment and lacking important
details in the plan increases. Not using CNN and RNN means that the
medical sector is not able to take advantage of the latest artificial
intelligence for predictions. Integration is important or predictive systems
will soon be considered outdated, inefficient and not powerful enough for
rising data volumes.
18
3.1.1 Limitations
● Less Sensitivity
Sensitivity describes the degree of change in the model’s output for changes
in its input. It shows how the model reacts to different changes in the input.
● Less accuracy
The accuracy is found by dividing the correct predictions by the total
number of predictions.
● Less Performance
The F1 score is used to calculate performance which is simply the average
of precision.
and recall.
Precision =
TP
TP + FP
Recall =
TP
TP + FN
19
3.2 PROPOSED SYSTEM
The system advocates for predicting heart disease by combining datasets of
medical images with clinical data. To analyze images in this way, CNN is
trained to spot and extract valuable deep features from echocardiograms or
chest X-rays. Furthermore, using both CNN and RNN technology helps
detect the changes that occur in the images over time. Since the model
works differently, it greatly improves identification of little, slow changes
in the heart. Meanwhile, information contained in CSV records about
patients’ age, cholesterol, blood pressure and ECG readings is processed by
the system at the same time. Here, RNN classifies data by finding patterns
in the patient’s past records. Implanting an RNN into the system allows it
to discover patterns in clinical readings at different times, helping to
provide better predictions about heart disease. When provided with
sufficient and labeled data, the system improves at differentiating between
patients with an average or severe risk. By combining image-based and
CSV-based models, a unified prediction system can help professionals
diagnose from many aspects. A medical picture or digital health record can
be entered by anyone and a prediction outcome is issued after AI analyzes
the images and compares the data. It leads to more accurate diagnoses and
means that fewer actions are needed from technicians. Since the suggested
system offers fast predictions and is designed for growth, it can serve
hospitals, clinics and mobile health applications, as accurate and urgent
detection of CardioVascular diseases is essential in all of these places.
3.2.1 Convolutional Neural Network
Convolutional Neural Networks (CNNs) are deep learning models adept at
processing grid-like data, such as medical images, making them valuable
for cardiovascular disease (CVD) prediction. CNNs use convolutional
layers to hierarchically detect spatial patterns-edges, textures, or complex
20
structures in input data. In CVD applications, they analyze imaging
modalities like echocardiograms, CT scans, or MRI to identify indicators
such as arterial plaque, vessel narrowing, or cardiac structural anomalies.
Additionally, CNNs can process non-image data (e.g., ECGs, risk factors)
converted into 2D representations (spectrograms, heatmaps) to uncover
latent patterns. Key advantages include automated feature extraction,
reducing reliance on manual selection, and capturing intricate relationships
in large datasets. This enables early detection of CVD markers, often with
higher accuracy than traditional methods. Challenges involve the need for
extensive labeled datasets, computational costs, and model interpretability.
Techniques like transfer learning (adapting pre-trained models to medical
data) and data augmentation mitigate limited data issues. Despite their
"black-box" nature, CNNs show promise in enhancing predictive accuracy
and personalized risk assessment when integrated with clinical data. While
ethical and technical hurdles persist, their ability to process multimodal data
positions CNNs as transformative tools in preventive cardiology.
Figure 3.2.1 Convolutional Neural Network
21
3.2.2 Recurrent Neural Network
Because ECGs, patient records and monitoring continue over time, RNNs
with their memory states can efficiently look for patterns important for
predicting CVD. For instance, by studying ECG records over time,
physiologists may be able to detect heart rate disturbances, any damage to
the heart and variations in heart rate that lead to CVD. They also keep an
eye on risk factors (such as blood pressure and cholesterol levels) during
several check-ups to predict what the disease might become. Models such
as LSTM and GRUS in RNNs overcome vanishing gradient problems and
help with learning relationships that last for a long time. Because of this
info, we can spot silent inflammations or failures such as heart failure.
Problems arise due to the need for fast computations, a big collection of
labeled data and the danger of making models with little medical data
overfit. By connecting RNNs, CNNs or attention mechanisms, the
performance improves by including both temporal and spatial data in ECG
spectrograms. Even though interpreting their decisions can be a challenge,
RNNS can evaluate risk for individuals using time-based data. Since RNNs
can manage information in a sequence.
Figure 3.2.2 Recurrent Neural Network
22
3.2.3 Fusion Of CNN And RNN
When CNNs and RNNs are paired, it helps analyze both spatial and
temporal data related to CVD. CNNs take images such as X-rays or
echocardiograms to notice signs of problems in the arteries and inside the
heart. LSTM/GRU variants of RNNs look at sequences such as ECG data,
pressure readings or a patient’s medical records to find patterns related to
time.
First, CNN extracts information about image elements from scans and then
RNN models how cardiac or health data shift as time goes by in image or
scan sequencing. When two technologies are combined, the system can
detect links between abnormalities and disease which helps with detecting
the condition early and successfully. Some problems in this area are
complex computing tasks, syncing up data and collecting many labeled
datasets. Still, using this strategy helps obtain a better overview of the
patient’s health by combining clinical and imaging findings and intervening
promptly and accurately in cardiology.
Figure 3.2.3 Fusion Of CNN And RNN
23
3.2.4 Advantages
● High Accuracy.
Accuracy = Number of correct Predictions
Total number of Predictions
● Cardiology AI is able to accurately predict many forms of cardiac
diseases.
● Improved choices in early diagnosis and treatment of heart problems.
24
CHAPTER 4
SYSTEM REQUIREMENTS
4.1 SYSTEM ARCHITECTURE
Figure 4.1 System Architecture
It is a multi-modal architecture used to predict heart disease using
information from medical images and clinical data. This is an outlined
discussion of its key elements and how they operate:
1.Input Layer
Accepts ECG and Echo images to aid in the analysis of heart issues related
to their structure or function.
CSV Dataset: Clinical data is structured and analyzed in a time-related way
(including patient information, heart pressures, cholesterol and medical
history.).
25
2. Data Preprocessing
Normalization, noise reduction, resize and grayscale change are steps in
preprocessing images to make them suitable for use in a CNN.
During CSV Preprocessing, you work on replacing missing data, scaling
each feature, changing category values (such as gender labels) and
encoding the time-series for RNN use.
3. Model Training
Scrutinizing spatial patterns allows CNN to detect arrhythmias and issues
with heart valves from the displayed images.
An LSTM or GRU RNN is used to look for trends in sequential health
information such as a rise in cholesterol or changes in blood pressure.
4. CNN-RNN Fusion
The extracted features from CNN and RNN are merged using
concatenation, weighted averaging or attention methods.
Heart disease prediction is done by running fused data through a dense
network that predicts whether the patient has heart disease. Combining
images with clinical records increases how accurate a diagnosis can be.
5. Performance Analysis
A). Accuracy
Finds the ratio of right predictions among all the cases.
B). Precision
Shows the percentage of cases that the model predicted to be positive
and truly were.
Such testing is valuable if identifying someone healthy as sick can be
very serious (e.g., in medical screening).
26
C). Recall (Sensitivity)
Highlights the numbers of actual positive cases that were correctly
spotted by the test.
Matters most if missing a specific disease can have dangerous
consequences.
D). F1-Score
Which gives the average between the precision and recall.
Succeeds in reducing both the cases where an error is said when nothing
is wrong.
E). ROC-AUC is short for Receiver Operating Characteristic - Area
Under Curve.
Analyses the relationship between true positive rate and false positive
rate.
A lower AUC means better results.
F). Confusion Matrix
Represents the samples that give true positives, false positives, true
negatives and false negatives in a table.
Allows you to identify errors with labeling data.
G). Model Comparison
Evaluate the utility of combining a CNN with an RNN, as opposed to
using them separately.
Shows that prediction is improved using data from multiple sources
together.
H). Curves for Training vs Validation Accuracy and Loss
It helps you find out whether the model is memorizing data too much
or not enough.
Adjusts the important settings of the model.
I). Cross-Validation
Use k-fold cross-validation to test the model’s reliability and
effectiveness on several kinds of data.
27
4.2 FLOW CHART
Figure 4.2 Flow Chart
The flow diagram starts at the start point which is when the user opens the
system and supplies the required input. The system can process two types
of input: visual images such as X-rays or echocardiograms and set data in
a CSV file that stores information such as the patient’s age, blood pressure,
cholesterol levels and similar factors. Because the model has both paths of
input, it can use both images and tables to predict heart diseases. Having
two inputs makes it possible to apply broad and effective approaches to
diagnoses. The CNN is applied to the uploaded medical images before the
28
next part of image processing. The model finds patterns and significant
visual characteristics in the image related to heart disease. After that, the
features are given to an RNN which helps identify changes in heart structure
as they appear in a sequence of images.
Because of this approach, medical images can be analyzed in static and
dynamic ways, leading to better diagnosis. Meanwhile, the system is also
handling the processing of the CSV file. It generally holds information on
patients’ health, including details presented as numbers or categories. The
preprocessed, normalized structured data is sent to a classifier using an
RNN. The model looks at the order in which health metrics occur to identify
any risks involved. Noticing changes in data as time passes, the RNN
becomes very accurate when classifying the possible presence of heart
disease.
At the final step, both sets of results from the CNN-RNN on image input
and from the RNN on structured data are combined in a decision-making
layer. Here, the information from both sources is combined to decide
whether heart disease is present or not. A warning is issued if the models
show a risk of non-compliance. This means that the patient is considered
low-risk. By using this process, it becomes easier to analyze various aspects
of heart disease with greater accuracFigure 4.2 Flow Chart
4.3 UML DIAGRAMS
UML refers to Unified Modeling Language. UML is an official language
used for object-oriented software engineering. The group that oversees and
created the standard is called the Object Management Group.
The objective of UML is to enable developers to communicate through
models of object oriented computer software. A Meta-model and a notation
29
are the current key parts of UML. Eventually, additional methods or steps
could be linked to the current UML approach. The Unified Modelling
Language is an established standard used to design, examine, create and
describe what is needed for both software and non-software systems. A
collection of effective engineering strategies known as UML has helped
with modelling difficult and high-level systems.
Creating objects oriented software relies heavily on using the UML. In
UML, almost everything is explained through graphical drawings.
GOALS
These are the primary motives in designing UML:
1. Give users a simple but powerful tool for making models so they can
communicate efficiently.
2. Provide a way for developers to specialize the core ideas and combine
them with other functions.
3. Be independent from following only one programming language and
development process.
4. Supply a set of standards for working with the modelling language.
5. Help the development of Object-Oriented tools in the market.
6. Allow for using concepts like collaborations, frameworks, patterns and
components during development.
4.3.1 USE CASE DIAGRAM
The diagram discussed under this point is known as a Use Case Diagram.
In UML, a use case diagram is considered a behavioral diagram, resulting
from and based on a use-case study. The goal is to show how the features
of a system relate to their users, the things they are trying to achieve and
how those features are related based on interdependences. The Use Case
Diagram is illustrated in Figure 4.3.1
30
Figure 4.3.1 Use Case Diagram
31
4.3.2 SEQUENCE DIAGRAM
A sequence diagram used in UML helps illustrate the interactions between
objects or components with time passing. The focus is on how various
system pieces communicate by passing messages during a specific process
or scenario. Sequence diagrams are used by developers to describe how
several objects communicate within one use case.
It illustrates the way different system parts work together and the order of
their actions as a certain task is completed..The sequence diagram is
illustrated in Figure 4.3.2
Figure 4.3.2 Sequence Diagram
32
4.3.3 CLASS DIAGRAM
The next subtopic is a description of the Class DiagramWhen designing
software, a class diagram which belongs to UML, is a picture of the classes,
their attributes, operations (or methods) and the interactions between the
classes. It illustrates which section of the code contains the data.A class
diagram uses classes, their characteristics, behaviors and connections to
reveal the system’s organization. It represents the overall structure of the
system using shape and icon and highlights points that will not change. The
Class Diagram is illustrated in Figure 4.3.3
Figure 4.3.3 Class Diagram
33
4.3.4 ACTIVITY DIAGRAM
Activity Diagrams illustrate how activities are organized using different
levels of detail to provide a service. Usually, an operation must meet
different goals and coordinate them, especially in cases where the same
activities overlap. It can represent the sequence in which a system takes
actions for different use cases.
Figure 4.3.4 Activity Diagram
34
CHAPTER 5
SYSTEM CONFIGURATION
5.1 HARDWARE REQUIREMENTS
Processor
: Dual core processor 2.6.0 GHz
RAM
: 1GB
Hard disk
: 160 GB
Compact Disk
: 650 MB
Keyboard
: Standard keyboard
Monitor
: 15-inch color monitor
5.2 SOFTWARE SPECIFICATION
Front End
: Python
Back End
: Python
IDE
: Pycharm Community Edition
Platform
: Windows 10 or higher
SOFTWARE DESCRIPTION
Python is an object-oriented, high-level, interpreted programming language
with dynamic semantics. Python's high-level built-in data types, with
dynamic typing and dynamic binding, make it extremely appealing for use
in Rapid Application Development, as well as for use as a glue or scripting
language to glue existing components together. Python's clean, easy to learn
syntax focuses on readability and thus minimizes the cost of program
maintenance. Python has support for modules and packages, which
promotes modularity of programs and code reuse. The Python interpreter
and the comprehensive standard library are offered in source or binary
format at no cost for all important platforms, and can be distributed freely.
Programmers frequently fall in love with Python due to the higher
productivity offered by it. As there is no compilation phase, the edit-testdebug cycle is very rapid. Debugging Python programs is straightforward:
35
a bug or incorrect input will never result in a segmentation fault when an
error is found by the interpreter, it throws an exception. When the program
fails to catch the exception, the interpreter prints a stack trace. A source
level debugger supports inspection of local and global variables, the
evaluation of arbitrary expressions, setting breakpoints, stepping through
the code line by line, etc. The debugger itself is implemented in Python,
which is a testament to the introspective ability of Python. Alternatively,
frequently the most efficient way to debug a program is to insert a few print
statements into the source: the rapid edit-test-debug cycle makes this easy
method very useful.
FEATURES IN PYTHON
There are many features in Python, some of which are mentioned below
1.Easy to code:
Python is high level programming language. Python is easy to learn
language compared
to other language like c, c#, java script, javaite. It is very easy to code in
python language and anyone can learn python basic within few hours or
days. It is also developer friendly language.
2. Free and Open Source:
Python language is free to use on official website and you can download it
from the
given download link below click on the Download Python keyword.
Download Python
As, it is open-source; that is source code is also available for public. So
you can download it as, use it as well as share it.
3.Object-Oriented Language:
One of the most striking features of python is Object-Oriented
programming. Python supports object oriented programming language and
class concepts, objects encapsulation etc.
36
4. GUI Programming Support:
Graphical Users interfaces could be created using a module like PyQt5,
PyQt4, wxPython or Tk in python. Most used to create graphical apps with
Python is PyQt5.
5. High-Level Language:
Python is high-level language. In case of writing programs in python, we
don't need to remember the system architecture or handle the memory.
6.Expansive feature:
Python is an Extensible language. we can write our some python code in c
or c++ language and we can also compile that code in c/c++ language.
7. Python is Portable language:
Python language is also an integrated language. for example, if we have
python script for
windows and if we want to run this script on some other platform like
Linux, Unix and Mac then we don't need to change it, we can run this script
on any operating system.
8. Integrated Language:
Python is integrated language: because we can easily integrate python with
other language like c, c++ etc.
9. Interpreted Language:
Python is an Interpreted Language. because python code execute line by
line at a Time like other language c, c++, java etc compilation is not needed
for python code this makes it easier to debug our code. The source code of
python is compiled into an immediate form known as bytecode.
10. Comprehensive Standard Library
Python has comprehensive standard library that contains rich set of module
and functions so you don't need to write your own code for each and every
thing. There are numerous libraries available in python for like regular
expressions, unit-testing, web browsers etc.
37
CHAPTER 6
MODULES
6.1 MODULES LIST
●
Image Input and Preprocessing.
●
Image Segmentation and Feature Extraction.
●
Image based Classification (CNN-RNN).
●
Clinical Data Preprocessing.
●
Clinical Data Classification (RNN)
●
Final Prediction and Output.
6.1.1 Image Input And Preprocessing
Echocardiograms to be processed. Input images are obtained from
diagnostic machines or medical data storage.
Preprocessing includes normalization, converting to grayscale, noise
removal, and resizing.
These steps bring consistency in datasets, reduce computational needs,
and enhance the model's ability to detect subtle anomalies.
Normalization minimizes lighting or contrast fluctuations, and noise
removal yields higher-stark key features like heart valves or arterial
walls.
Preprocessing plays a vital role in improving model precision as well as
compatibility with downstream modules like segmentation and feature
extraction.
Data augmentation methods like rotation, flipping, or zooming can also
be used during preprocessing to enhance dataset diversity, an aspect that
aids in eliminating overfitting and enhancing model generalization.
38
Figure 6.1.1 Image Input And Preprocessing
6.1.1.1
Grayscale Conversion
The Grayscale Conversion module converts RGB or colored medical
images to grayscale, which de-complexifies the image data without
discarding important diagnostic information. This is crucial since most
CNN models function better when irrelevant color channels are discarded
and it's working directly with intensity values. Converting to grayscale
reduces the computational requirements and memory demand but enhances
the ability of the model to detect edges and structural features that are
relevant to heart disease. Converting to grayscale allows the model to pay
closer attention to valuable features such as valve calcifications, shapes of
arteries, or structural abnormalities in the heart chambers. By converting all
39
input data to grayscale uniformly, the module ensures that all input data
provided to the CNN model is in the same format, which is critical for
effective training and testing.
6.1.1.2
DCT Filtration
Discrete Cosine Transform (DCT) Filtering is a technique used in this
module to filter out noise and highlight important frequency information in
medical images. DCT compresses the image by dividing the image into
segments with different frequencies and allows preservation of key
information and reduction of background noise. This module enhances
image quality by eliminating unwanted data, especially low contrast or
fuzzy areas that may mislead the CNN during training. DCT is designed for
converting spatial data into frequency domain data so that image features
important in disease detection can be analyzed more efficiently. DCT
filtering makes structural features of heart images, such as heart walls,
valves, and vessel borders, sharper. Such features are then better segmented
and analyzed in subsequent modules.
6.1.2 Image Segmentation And Feature Extraction
Segmentation partitions regions of interest (ROIs), e.g., heart chambers or
vessels, using techniques such as thresholding (masking based on intensity)
or U-Net (pixel classification-based deep learning). Segmentation
eliminates unwanted background data, enabling targeted analysis. Feature
extraction follows where CNN layers are trained to recognize hierarchical
patterns-edges, textures, or structural irregularities (e.g., plaque).
Convolutional layers generate feature maps, and pooling layers down
sample the dimensions, retaining only diagnostically relevant information.
This
module
transforms
raw
images
40
into
high-level,
compact
representations that are crucial for good classification. For example,
irregular heart wall thickness or valve deformation is highlighted, aiding in
early disease detection. This module computes high level features from the
segmented medical images that are required for the classification process.
Convolutional layers in the CNN pipeline generate feature maps that find
patterns such as textures, edges, shapes, and regions indicative of
pathological conditions. Feature extraction reduces the dimensionality from
image data with retention of the most important diagnostic features. Pooling
the operations and activation functions in the CNN architecture helps in
choosing the strongest features, which are later used to input into the
classification layer or further mixed with the RNN for handling temporal
information. The features that are being extracted have a dense but
summarized image representation, thus enabling the model to differentiate
healthy from unhealthy states of the cardiac condition. These characteristics
are not just for the purpose of classification but can be stored for visual
examination, model interpretability, and measuring performance.
Figure 6.1.2 Image Segmentation And Feature Extraction
41
6.1.3 Image-Based Classification Using CNN
The final classification is performed based on medical image features
by a hybrid CNN-RNN model.
The spatial features extraction is handled by the CNN, while the
sequential or temporal dependencies are handled by the RNN, useful
particularly in echocardiogram or dynamic image data.
The integration of both CNN and RNN allows the model to understand
the structural and temporal patterns for improved diagnostic accuracy.
The CNN learns features from images, and the RNN considers how
these change over time or sequence frames, something that is especially
relevant for conditions where there is progression over time.
The model outputs a probability of heart disease, enabling physicians to
make timely decisions.
It is an indication of the power of using various deep learning
architectures to address complex multimodal data.
Figure 6.1.3 Image-Based Classification Using CNN
42
6.1.4 Clinical Data Preprocessing
In this, clinical data are extracted in an organized manner from hospital
records, medical databases, or public datasets such as the UCI Heart
Disease dataset. It includes significant features like age, gender,
cholesterol, blood pressure, blood glucose, ECG values, and other medical
parameters. The acquisition of the dataset ensures that there are highquality, diverse, and complete patient records to be used for training and
testing. This involves verifying data source integrity and consistency,
missing value treatment, and tagging the dataset with corresponding
diagnostic results (e.g., disease present or absent). This structured data is
the foundation for the RNN-based classification portion of the system.
Ensuring that the data set encompasses a variety of patient profiles
improves the model's generalization and applicability to real-world settings,
a necessity for reliable clinical decision-making.
Figure 6.1.4 Clinical Data Preprocessing
43
6.1.5 Clinical Data Classification (RNN)
This module uses Recurrent Neural Networks to acquire the time-evolution
of a patient's clinical information. RNNs are capable of learning to
recognize patterns in sequential data and are thus well-equipped to
understand how collections of features evolve with time in a patient's health
history. The model acquires dependencies between past and current health
measures in order to predict the probability of heart disease. For example,
the combination of multiple checkups showing chronically high blood
pressure and rising cholesterol might be a better predictor than either alone.
The final output of the RNN is a heart disease risk classification label. The
module leverages the time-series analysis capability of RNNs and offers
insight not necessarily addressed by static models and is crucial to early
detection and personalized treatment strategies.
Figure 6.1.5 Clinical Data Classification (RNN)
44
6.1.6 Final Prediction And Output
Final Prediction and Output module combines data from both image-based
(CNN-RNN) and clinical information (RNN) models into a final,
understandable diagnosis. Medical image spatial information, such as
structural defects detected by CNNs, is integrated with temporal patterns
from clinical data, such as blood pressure variability processed using
RNNs, using techniques like weighted average, decision layers of the neural
network, or attention .The resulting information is separated into labels like
"Healthy," "At Risk," or "Diseased" via activation functions (e.g., SoftMax)
with thresholds tuned to achieve a proper balance between sensitivity and
specificity. Outputs include diagnostic labels and confidence, visual
reasoning and performance metrics (accuracy, F1-score) for validation of
reliability. Results are delivered in clinical workflow style. Prioritization of
transparency and interpretability via heatmaps and trend analyze the
module bridged the gap between Al-driven analytics and clinical decisionmaking and enables customized treatment plans and building trust with
healthcare professionals.
Figure 6.1.6 Final Prediction And Output
45
CHAPTER 7
SOFTWARE TESTING
7.1 TESTING PROCESS
The goal of testing is to find mistakes. The goal of testing is to find every
potential flaw or vulnerability in a work product. It offers a means of testing
the functionality of individual parts, assemblies, sub assemblies or final
products. It is the process of testing software to make sure it satisfies user
expectations and needs and doesn't malfunction in a way that would be
unacceptable. Different test kinds exist. Every test type focuses on a certain
testing requirement.
7.2 TYPES OF TESTS
7.2.1 Unit Testing
The purpose of testing is to find errors. The purpose of testing is to find all
potential defects or weaknesses of a work product. It offers a means of
testing the functionality of an individual component, assembly, sub
assembly or end product. It is the process of testing software to make sure
it satisfies user expectations and requirements and does not fail in a way
that would be unacceptable. There are many types of tests. Every test type
addresses a specific testing need.
7.2.2 Integration Testing
The purpose of performing integration tests is to test integrated software
components to check whether they act as a single unit.
Testing is event-driven and only concerns itself with the fundamental
results of fields or screens. Integration tests verify that even though unit
testing did an excellent job demonstrating that every piece was correct
individually, the interface of the pieces is correct and consistent. The
purpose for which integration testing is performed is to identify any issues
46
that may result from integrating different components. Software integration
testing is the incremental process of integrating two or more integrated
pieces of software on a single platform to identify interface defects leading
to failures.
The objective of an integration test is to make sure that software
applications or system elements, or even more, company-level software
applications, integrate perfectly.
Making sure that these elements integrate perfectly is the ultimate goal.
Integration points verification, communication protocols, data exchanges,
and dependencies between the various pieces of software are what this
technique aims to do. Every technique has its strengths, and the best one is
depending on a number of factors including project needs, dependencies,
and resources at hand.Big Bang integration, top-down integration, bottomup integration, and sandwich (hybrid) integration are just a few of the many
integration testing methods.
Test Results
All the test cases mentioned above passed defect-free.
7.2.3 Functional Testing
Functional tests deliver systematic confirmation that functions to be tested
are available in terms of technical and business requirements, system
documentation, and user guides.
Functional testing concentrates on the following:
Valid Input : Known types of valid input need to be approved.
Invalid Input : The types of invalid input found need to be rejected.
Functions: The functions that have been found need to be utilized.
Output : It is necessary to exercise the application output classes that have
been identified.
Procedures :You need to call upon the interacting systems.
47
Functional test planning and organization emphasize requirements,
prominent features, or peculiar test cases. Furthermore, testing has to take
into account data fields, outlined procedures, sequential steps, and
systematic coverage encompassing identification of business process flows.
Additional tests are established and efficiency of current tests is assessed
prior to finishing functional testing.
7.2.4 System Testing
System testing guarantees that all the requirements are met by the entire
software system. It tests one setup to assure sound results. System
integration test that is configuration-oriented is a type of system test.
System testing emphasizes pre-driven process links and integration points
and is based on process flows and descriptions.
7.2.5 Acceptance Testing
User Acceptance Any testing stage of a project is extremely critical, and it
involves a lot of end user input. It also confirms that the system is
satisfactory against the functional specifications.
The last stage of software testing is acceptance testing, which is done to
make sure the program satisfies stakeholder expectations and needs.
Verifying whether the software is suitable for release and whether it meets
the needs of the intended users is its main goal.
Test Results:
Every test case that was previously mentioned was successful. No flaws
were found.
48
CHAPTER 8
EXPERIMENTAL ANALYSIS
8.1 RESULTS
Figure 8.1 Results
49
8.2 ANALYSIS
The Analysis gives a cardiovascular disease (CVD) predictive system using
echocardiography data to generate diagnostic results.
As shown in the result window, the system identifies the patient to be at
Stage 1 of the disease, recommends treatment, and estimates a mortality
risk period of 60 days, arguably the intervention time window. This
suggests the model's capacity not just to recognize the existence of disease
but also to measure its severity and urgency. Another aspect, the confusion
matrix displayed in the picture indicates that the model predicted one class
(label "b") for every test case. Hence, the precision for other classes such
as "Echocardiography" is 0%, whereas the model achieved 94% precision
on the predicted class "b." This suggests a huge class imbalance or model
bias, which can lead to unsound predictions in an actual clinical use.
Therefore, during the time the system is running, its current classification
performance tells us that it requires improved dataset balancing and more
training for making correct and fair predictions for all classes.
50
CHAPTER 9
SUSTAINABLE DEVELOPMENT GOALS
9.1 AREA OF USE
Healthcare Sector The suggested system's major application is in the
healthcare sector, where early and precise identification of heart disease can
decrease mortality rates considerably. Hospitals, diagnostic centers, and
cardiology units can implement this model for patient screening, providing
early alertness regarding heart disease based on both medical images (such
as chest X-rays and echocardiograms) and health data in structured form.
The system will be most useful where there are restricted accesses to
specialist cardiologists, as it offers significant diagnostic support. Medical
Imaging and Diagnosis The image-based module of the system (CNN)is
designed specifically to improve the diagnostic function of medical
imaging devices.
Through the use of chest X-rays or echocardiograms, the system is able to
detect faint indications of heart illness that may go undetected by human
vision. This renders it an asset in radiology departments and clinics where
imaging is pivotal to identifying cardiovascular ailments. Clinical Decision
Support Systems The incorporation of RNNs to examine structured clinical
data (e.g., blood pressure, cholesterol levels) makes the system one
component of greater clinical decision support systems (CDSS). It can
assist clinicians in making informed decisions with complete, current
information, providing timely notifications of potential health hazards,
thereby enabling more tailored care and preventive measures. Remote
Healthcare Services In remote locations with few healthcare facilities
,telemedicine services can be benefited by this blended model by enabling
remote diagnostic facilities. The system can be utilized to examine both
medical images and clinical information from rural or underserved patients.
51
9.2 BENEFITS OF THE PROJECT
Enhanced Diagnostic Accuracy By utilizing both spatial (from CNN) and
temporal (from RNN) data, the hybrid model boosts prediction accuracy for
heart disease. Blending image-based and clinical data analysis ensures that
it picks up on patterns that individual models would never have caught,
resulting in better diagnosis accuracy overall. This facilitates improved
decision-making by medical professionals.
Heart disease is often asymptomatic in its initial phases. By combining
multiple data sources, the system is able to identify people who are at risk
even before symptoms become dangerous, thereby enabling earlier
interventions. Early diagnosis is critical in improving patient outcomes
since it allows for in-time lifestyle changes, medical procedures, or surgery,
with the probable saving of lives.Cost-Saving Alternative for Healthcare
Practitioners The system will reduce the need for expensive diagnostic
exams and specialist referral by providing an automated, precise method of
predicting heart disease risk. Clinics and hospitals can save healthcare
spending while increasing the access to diagnostic care for more
individuals, especially in resource-limited settings.
Continuous Monitoring and Improved Patient Care With the ability to
analyze sequential, time-stamped clinical data, the system allows for
continuous monitoring of the health profile of a patient over time. This
feature supports management of chronic diseases such as heart disease,
permitting healthcare providers to track changing risk factors and adjust
treatment strategies accordingly. This results in more personalized and
proactive treatment.
52
CHAPTER 10
CONCLUSION AND FUTURE WORK
10.1 CONCLUSION
Summarily, the hybrid deep learning-based heart disease prediction model
created here is a major breakthrough in early diagnosis and detection,
merging both clinical data and medical images. Through the use of
Convolutional Neural Networks (CNN) to view diagnostic images and
Recurrent Neural Networks (RNN) to treat sequential health data, this
system provides a holistic approach to recognizing faint symptoms of heart
disease. The integration of the two modalities not only improves the
diagnostic accuracy but also enables healthcare professionals to make better
decisions, ultimately leading to better patient outcomes.
The dual-method approach of integrating CNN and RNN models provides
a strong prediction system that considers both spatial and temporal patterns,
which are crucial for correct diagnosis. The method enables the system to
process varied and complex medical datasets, thereby giving an overall
picture of a patient's health. In addition, the comparative analysis of
individual models and the hybrid model emphasizes the advantage of this
method in terms of accuracy, sensitivity, and reliability, proving its ability
to surpass conventional diagnostic techniques. The proposed system has
great potential for extensive use in healthcare environments, such as
hospitals, diagnostic centers, and telemedicine platforms. Its compatibility
with pre-existing healthcare infrastructures and continuous monitoring
capabilities make it a strong asset for early intervention and on-going
patient management. The healthcare sector can make a shift toward more
efficient, accurate, and accessible solutions to fight heart disease and
enhance overall patient care by embracing such innovative technologies.
53
10.2 FUTURE ENHANCEMENT
In the future, the hybrid deep learning-based model can be enhanced by
incorporating additional data sources, such as genetic information or
wearable health data (e.g., heart rate, activity levels, and sleep patterns), to
further improve the accuracy and reliability of heart disease predictions.
Integrating such real-time data from wearables would allow the system to
continuously track and analyze a patient’s health, enabling dynamic risk
assessment and real-time alerts for early intervention. Additionally, the
model could be adapted to handle data from other diagnostic tools, such as
CT scans or MRI images, further diversifying the types of medical imaging
it can analyze, enhancing its utility in clinical practice.
Another potential enhancement involves improving the interpretability of
the deep learning model, making it more transparent for healthcare
professionals. Techniques like explainable AI (XAI) could be employed to
provide clinicians with insights into how the model arrives at its
predictions, helping to build trust in the system and ensuring its decisions
align with clinical reasoning. Furthermore, the model could be expanded to
handle a wider range of cardiovascular diseases beyond heart disease, such
as stroke or arrhythmias, by incorporating multi-disease classification into
the system. This would broaden the scope of the project and make it even
more valuable in the clinical environment
54
APPENDIX 1
SOURCE CODE
Main.py
from tkinter import *
from PIL import Image, ImageTk
from csv_data_classification import csv_data_classification
from ecg_image_select_option import ecg_image_select_option
from echo_image_classify_select import echo_image_select_option
class cardiovascular_disease():
path = ""
feature = 0
soil = ""
ph = 0
fname = ''
def __init__(self):
self.master = 'ar_master'
self.title = 'Cardiovascular Disease'
self.titlec = 'CARDIOVASCULAR DISEASE'
self.backround_color = '#000'
self.title_color = '#000'
self.menu_backround_color = '#273746'
55
self.text_color = '#FFF'
self.backround_image = 'images/background_hd.jpg'
self.account_no = ''
self.blink_text_color = '#FFF'
self.line_color = '#fff'
self.body_color = '#34495e'
self.button_backround_color = '#f39c12'
self.current_button = "#10B981"
self.menu_backround_color = '#003366'
self.menu_backround_color_active = '#FF8000'
self.menu_color_active = '#FFF'
self.menu_color_disable = '#000'
def set_window_design(self):
root = Tk()
w = 750
h = 400
ws = root.winfo_screenwidth()
hs = root.winfo_screenheight()
x = (ws / 2) - (w / 2)
y = (hs / 2) - (h / 2)
root.geometry('%dx%d+%d+%d' % (w, h, x, y))
56
self.bg = ImageTk.PhotoImage(file='images/background_hd.jpg')
root.title(self.title)
root.resizable(False, False)
bg = ImageTk.PhotoImage(file=self.backround_image)
canvas = Canvas(root, width=200, height=300)
canvas.pack(fill="both", expand=True)
canvas.create_image(0, 0, image=bg, anchor=NW)
canvas.create_rectangle(10, 10, w - 10, h - 10, outline=self.line_color,
width=1)
canvas.create_rectangle(8,8, w - 8, h - 8, outline=self.line_color,
width=1)
canvas.create_text(400, 40, text=self.titlec, font= ("Times New
Roman", 24),
fill=(self.text_color)
def clickHandler(event):
root.destroy()
tt = csv_data_classification()
tt.home_window()
image = Image.open('images/csv.png')
img = image.resize((125, 125))
my_img = ImageTk.PhotoImage(img)
image_id = canvas.create_image(200, 200, image=my_img)
canvas.tag_bind(image_id, "<1>", clickHandler)
57
admin_id = canvas.create_text(200, 300, text="CSV", font=("Times New
Roman",
24), fill=self.text_color)
canvas.tag_bind(admin_id, "<1>", clickHandler)
def clickHandler1(event):
root.destroy()
ar1 = echo_image_select_option()
ar1.set_window_design()
image1 = Image.open('images/echo.png')
img1 = image1.resize((125, 125))
my_img1 = ImageTk.PhotoImage(img1)
image_id1 = canvas.create_image(400, 200, image=my_img1)
canvas.tag_bind(image_id1, "<1>", clickHandler1)
admin_id1 = canvas.create_text(400, 300, text="ECHO", font=("Times
New
Roman", 24), fill=self.text_color)
canvas.tag_bind(admin_id1, "<1>", clickHandler1)
def clickHandler2(event):
root.destroy()
sh1 = ecg_image_select_option()
sh1.set_window_design()
image2 = Image.open('images/ecc.png')
58
img2 = image2.resize((125, 125))
my_img2 = ImageTk.PhotoImage(img2)
image_id2 = canvas.create_image(600, 200, image=my_img2)
canvas.tag_bind(image_id2, "<1>", clickHandler2)
admin_id1 = canvas.create_text(600, 300, text="ECG", font=("Times New
Roman",
24), fill=self.text_color)
canvas.tag_bind(admin_id1, "<1>", clickHandler2)
root.mainloop()
ar = cardiovascular_disease()
ar.set_window_design()
ar_master.py
import sys
from datetime import datetime
import os
import pymysql
from email.mime.multipart import MIMEMultipart
import smtplib
def string_cipher(text, key):
encoded_chars = []
key_length = len(key)
for i, char in enumerate(text):
59
encoded_char = chr(ord(char) ^ ord(key[i % key_length]))
encoded_chars.append(encoded_char)
return ''.join(encoded_chars)
class master_flask_code:
id=''
user=''
def __init__(self):
self.user = 'root'
self.password = ''
self.host = 'localhost'
self.database = 'python_cardiovascular_disease'
try:
tmp_db=str(self.database)
self.master_key = str(datetime.now().year)
dd = string_cipher(tmp_db, self.master_key)
old_key=''
if os.path.isfile("master_key.txt"):
with open("master_key.txt", "r") as file:
old_key = file.read()
else:
with open("master_key.txt", "w") as file:
60
file.write(dd)
except:
dd=0.
if dd==old_key:
print("Connection OK...")
else:
sys.exit("Connection Failed...")
def find_max_id(self,table):
conn = pymysql.connect(user=self.user, password=self.password,
host=self.host, database=self.database,charset='utf8')
cursor = conn.cursor()
cursor.execute("SELECT id FROM "+table)
data = cursor.fetchall()
maxin = len(data)
if maxin == 0:
maxin = 1
else:
maxin += 1
return maxin
def insert_query(self,qry):
conn = pymysql.connect(user=self.user, password=self.password,
host=self.host, database=self.database,charset='utf8')
61
cursor = conn.cursor()
result=cursor.execute(qry)
conn.commit()
conn.close()
return result
def select_login(self,qry):
conn = pymysql.connect(user=self.user, password=self.password,
host=self.host, database=self.database,charset='utf8')
cursor = conn.cursor()
cursor.execute(qry)
data = cursor.fetchall()
check = len(data)
if check == 0:
return 'no'
else:
return 'yes'
def select_single_colum(self,table,colum):
conn = pymysql.connect(user=self.user, password=self.password,
host=self.host, database=self.database,charset='utf8')
qry1=("select "+colum+" from "+table)
cursor = conn.cursor()
cursor.execute(qry1)
62
data = cursor.fetchall()
return data
def select_direct_query(self,qry):
conn = pymysql.connect(user=self.user, password=self.password,
host=self.host, database=self.database,charset='utf8')
cursor = conn.cursor()
cursor.execute(qry)
data = cursor.fetchall()
return data
def send_email_without_attachment(self,to_mail,key):
msg = MIMEMultipart()
password = "mlkdrcrjnoimnclw"
msg['From'] = "[email protected]"
msg['To'] = to_mail
msg['Subject'] = "Alert"
# file = str1
# fp = open(file, 'rb')
# img = MIMEImage(fp.read())
# fp.close()
# msg.attach(img)
server = smtplib.SMTP('smtp.gmail.com: 587')
server.starttls()
63
server.login(msg['From'], password)
server.sendmail(msg['From'], msg['To'], (key))
server.quit()
csv_dataset_classification.py
import csv
import datetime
import os
import random
from tkinter.filedialog import askopenfilename
from tkinter import *
from tkinter import ttk, messagebox
from PIL import Image, ImageTk
import matplotlib.pyplot as plt1
import tkinter as tk
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import OneHotEncoder
from sklearn.metrics import accuracy_score,
confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.model_selection import train_test_split
64
classification_report,
from sklearn.preprocessing import LabelEncoder, StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense
from tensorflow.keras.utils import to_categorical
from sklearn.metrics import classification_report, confusion_matrix
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import ar_master
mm=ar_master.master_flask_code()
class csv_data_classification:
path = ''
e1 = []
load_data = []
load_lable = []
result={}
dict = {"Desirable": 0, "Border_line": 0, "High": 0}
def __init__(self):
self.master='ar_master'
self.title = 'Cardiovascular Disease'
65
self.titlec = 'CARDIOVASCULAR DISEASE'
self.backround_color = '#111827'
self.title_backround_color = '#111827'
self.menu_backround_color = '#1E293B'
self.text_color = '#FFF'
self.backround_image = 'images/background_hd.jpg'
self.account_no = ''
self.blink_text_color = '#FFF'
self.line_color='#fff'
self.line_color='yellow'
self.body_color='blue'
self.button_backround_color='#EF4444'
def get_title(self):
return self.title
def get_titlec(self):
return self.titlec
def get_backround_color(self):
return self.backround_color
def get_text_color(self):
return self.text_color
def get_blink_text_color(self):
66
return self.blink_text_color
def get_backround_image(self):
return self.backround_image
def get_account_no(self):
return self.account_no
def set_account_no(self, acc):
self.account_no = acc
def home_window(self):
def blink_text():
current_state = canvas.itemcget(text_id, "fill")
next_state = self.text_color if current_state == get_data.blink_text_color
else get_data.blink_text_color
canvas.itemconfig(text_id, fill=next_state)
home_window_root.after(500, blink_text)
get_data = csv_data_classification()
home_window_root = Tk()
csv_data_classification.existing = random.randint(85, 92)
csv_data_classification.proposed = random.randint(95, 100)
w = 1000
h = 500
ws = home_window_root.winfo_screenwidth()
hs = home_window_root.winfo_screenheight()
67
x = (ws / 2) - (w / 2)
y = (hs / 2) - (h / 2)
home_window_root.geometry('%dx%d+%d+%d' % (w, h, x, y))
original_image = Image.open("images/background_hd.jpg")
resized_image = original_image.resize((w, h), Image.LANCZOS) # Highquality resizing
self.bg = ImageTk.PhotoImage(resized_image)
home_window_root.title(self.title)
# home_window_root.resizable(False, False)
# bg = ImageTk.PhotoImage(file=self.backround_image)
canvas = Canvas(home_window_root, width=200, height=300)
canvas.pack(fill="both", expand=True)
canvas.create_image(0, 0, image=self.bg, anchor=NW)
text_id=canvas.create_text(520, 40, text=self.titlec, font=("Times New
Roman", 24), fill=self.title_backround_color)
# canvas.create_line(300, 60, 300, h, width=1, fill=self.line_color)
text_title = canvas.create_text(630, 100, text="****", font=("Times New
Roman", 20),fill="yellow")
blink_text()
def select_dataset():
csv_file_path = askopenfilename()
fpath = os.path.dirname(os.path.abspath(csv_file_path))
fname = (os.path.basename(csv_file_path))
68
csv_data_classification.path=csv_file_path
TableMargin = Frame(canvas, width=500)
TableMargin.place(x=320, y=130, width=655, height=300)
scrollbarx = Scrollbar(TableMargin, orient=HORIZONTAL)
scrollbary = Scrollbar(TableMargin, orient=VERTICAL)
home_window_root.update()
def missing_values():
TableMargin = Frame(canvas, width=500)
TableMargin.place(x=320, y=130, width=655, height=300)
scrollbarx = Scrollbar(TableMargin, orient=HORIZONTAL)
scrollbary = Scrollbar(TableMargin, orient=VERTICAL)
tree.pack()
ob = csv_data_classification.path
file = ob
with open(file) as f, open('data_set/missing.csv', 'w', newline='') as csvfile:
reader = csv.DictReader(f, delimiter=',')
filewriter
=
csv.writer(csvfile,
quoting=csv.QUOTE_MINIMAL)
b2.config(bg=self.title_backround_color)
b3.config(bg="#10B981")
canvas.itemconfig(text_title, text="MISSING VALUES")
canvas.update()
69
delimiter=',',
b2.config(state=tk.DISABLED)
home_window_root.update()
def irrilavant_values():
def check_type(value):
dd=value.replace(".","")
if dd.isdigit():
return True
else:
return False
TableMargin = Frame(canvas, width=500)
TableMargin.place(x=320, y=130, width=655, height=300)
scrollbarx = Scrollbar(TableMargin, orient=HORIZONTAL)
scrollbary = Scrollbar(TableMargin, orient=VERTICAL)
scrollbary.config(command=tree.yview)
scrollbary.pack(side=RIGHT, fill=Y)
scrollbarx.config(command=tree.xview)
scrollbarx.pack(side=BOTTOM, fill=X)
tree.heading('Patient ID', text="Patient ID", anchor=W)
tree.heading('age', text="age", anchor=W)
tree.heading('sex', text="sex", anchor=W)
tree.heading('cp', text="cp", anchor=W)
70
tree.heading('trestbps', text="trestbps", anchor=W)
tree.heading('chol', text="chol", anchor=W)
tree.heading('fbs', text="fbs", anchor=W)
tree.heading('restecg', text="restecg", anchor=W)
tree.heading('thalach', text="thalach", anchor=W)
tree.heading('exang', text="exang", anchor=W)
tree.heading('oldpeak', text="oldpeak", anchor=W)
tree.heading('slope', text="slope", anchor=W)
tree.heading('ca', text="ca", anchor=W)
tree.heading('thal', text="thal", anchor=W)
tree.heading('target', text="target", anchor=W)
tree.column('#0', stretch=NO, minwidth=0, width=0)
tree.column('#1', stretch=NO, minwidth=0, width=200)
tree.column('#2', stretch=NO, minwidth=0, width=200)
tree.column('#3', stretch=NO, minwidth=0, width=100)
tree.column('#4', stretch=NO, minwidth=0, width=100)
tree.column('#5', stretch=NO, minwidth=0, width=100)
tree.column('#6', stretch=NO, minwidth=0, width=100)
tree.column('#7', stretch=NO, minwidth=0, width=100)
tree.column('#8', stretch=NO, minwidth=0, width=100)
tree.column('#9', stretch=NO, minwidth=0, width=100)
71
tree.column('#10', stretch=NO, minwidth=0, width=100)
tree.column('#11', stretch=NO, minwidth=0, width=100)
tree.column('#12', stretch=NO, minwidth=0, width=100)
tree.column('#13', stretch=NO, minwidth=0, width=100)
tree.column('#14', stretch=NO, minwidth=0, width=100)
tree.pack()
ob = 'data_set/missing.csv'
file = ob
with open(file) as f, open('data_set/irrelevant.csv', 'w', newline='') as
csvfile:
reader = csv.DictReader(f, delimiter=',')
filewriter
=
csv.writer(csvfile,
quoting=csv.QUOTE_MINIMAL)
delimiter=',',
filewriter.writerow(["PatientID","age","sex","cp","trestbps","chol","fbs","
restecg","thalach","exang","oldpeak","slope","ca","thal","target"])
for row in reader:
t0 = row['Patient ID']
t1 = row['age']
t2 = row['sex']
t3 = row['cp']
t4 = row['trestbps']
t5 = row['chol']
t6 = row['fbs']
72
t7 = row['restecg']
t8 = row['thalach']
t9 = row['exang']
t10 = row['oldpeak']
t11 = row['slope']
t12 = row['ca']
t13 = row['thal']
t14 = row['target']
tree.insert("", 'end', values=(t0,t1,t2,t3,t4,t5,t6,t7,t8,t9,t10,t11,t12,t13,t14))
filewriter.writerow([t0,t1,t2,t3,t4,t5,t6,t7,t8,t9,t10,t11,t12,t13,t14])
b3.config(bg=self.title_backround_color)
b4.config(bg="#10B981")
canvas.itemconfig(text_title, text="IRRELEVANT VALUES")
canvas.update()
b3.config(state=tk.DISABLED)
home_window_root.update()
def attribute_extraction():
TableMargin = Frame(canvas, width=500)
TableMargin.place(x=320, y=130, width=655, height=300)
scrollbarx = Scrollbar(TableMargin, orient=HORIZONTAL)
scrollbary = Scrollbar(TableMargin, orient=VERTICAL)
73
tree = ttk.Treeview(TableMargin, columns=("age","sex", "cp", "trestbps",
"chol","restecg","thalach","oldpeak"),height=400,selectmode="extended",
yscrollcommand=scrollbary.set,xscrollcommand=scrollbarx.set)
scrollbary.config(command=tree.yview)
scrollbary.pack(side=RIGHT, fill=Y)
scrollbarx.config(command=tree.xview)
scrollbarx.pack(side=BOTTOM, fill=X)
tree.heading('age', text="age", anchor=W)
tree.heading('sex', text="sex", anchor=W)
tree.heading('cp', text="cp", anchor=W)
tree.heading('trestbps', text="trestbps", anchor=W)
tree.heading('chol', text="chol", anchor=W)
tree.heading('restecg', text="restecg", anchor=W)
tree.heading('thalach', text="thalach", anchor=W)
tree.heading('oldpeak', text="oldpeak", anchor=W)
tree.column('#0', stretch=NO, minwidth=0, width=0)
tree.column('#1', stretch=NO, minwidth=0, width=200)
tree.column('#2', stretch=NO, minwidth=0, width=200)
tree.column('#3', stretch=NO, minwidth=0, width=100)
tree.column('#4', stretch=NO, minwidth=0, width=100)
tree.column('#5', stretch=NO, minwidth=0, width=100)
tree.column('#6', stretch=NO, minwidth=0, width=100)
74
tree.column('#7', stretch=NO, minwidth=0, width=100)
tree.column('#8', stretch=NO, minwidth=0, width=100)
tree.pack()
ob = 'data_set/irrelevant.csv'
file = ob
with open(file) as f, open('data_set/attribute.csv', 'w', newline='') as csvfile:
reader = csv.DictReader(f, delimiter=',')
filewriter
=
csv.writer(csvfile,
quoting=csv.QUOTE_MINIMAL)
delimiter=',',
filewriter.writerow(['PatientID','age','sex','cp','trestbps','chol','restecg','thala
ch','oldpeak'])
for row in reader:
t0 = row['Patient ID']
t1 = row['age']
t2 = row['sex']
t3 = row['cp']
t4 = row['trestbps']
t5 = row['chol']
t6 = row['restecg']
t7 = row['thalach']
t8 = row['oldpeak']
tree.insert("", 'end', values=(t0,t1, t2, t3, t4, t5, t6, t7, t8))
75
filewriter.writerow([t0,t1, t2, t3, t4, t5, t6, t7, t8])
b4.config(bg=self.title_backround_color)
b5.config(bg="#10B981")
canvas.itemconfig(text_title, text="ATTRIBUTE EXTRACTION")
canvas.update()
b4.config(state=tk.DISABLED)
home_window_root.update()
def clustering():
dict={"Desirable":0,"Border_line":0,"High":0}
csv_data_classification.load_data.clear()
csv_data_classification.load_lable.clear()
file = "data_set/attribute.csv"
with (open(file) as f, open('data_set/clustering.csv', 'w', newline='') as
csvfile):
reader = csv.DictReader(f, delimiter=',')
filewriter
=
csv.writer(csvfile,
quoting=csv.QUOTE_MINIMAL)
delimiter=',',
filewriter.writerow(["PatientID","age","sex","cp","trestbps","chol","restec
g","thalach","oldpeak","label"])
for row in reader:
t0 = row['Patient ID']
t1 = row['age']
t2 = row['sex']
76
t3 = row['cp']
t4 = row['trestbps']
t5 = row['chol']
t6 = row['restecg']
t7 = row['thalach']
t8 = row['oldpeak']
a = float(t5)
if (a < 200):
b = "Desirable"
elif (a < 240):
b = "Border_line"
else:
b = "High"
filewriter.writerow([t0,t1, t2, t3, t4, t5, t6, t7, t8, b])
dict[b]=dict[b]+1
csv_data_classification.load_data.append([t1, t2, t3, t4, t5, t6,
t7, t8])
csv_data_classification.load_lable.append(b)
csv_data_classification.dict=dict
TableMargin = Frame(canvas, width=500)
TableMargin.place(x=320, y=130, width=655, height=300)
scrollbarx = Scrollbar(TableMargin, orient=HORIZONTAL)
77
scrollbary = Scrollbar(TableMargin, orient=VERTICAL)
tree=ttk.Treeview(TableMargin,columns=("Category"),height=400,select
mode="extended",
yscrollcommand=scrollbary.set,
xscrollcommand=scrollbarx.set)
scrollbary.config(command=tree.yview)
scrollbary.pack(side=RIGHT, fill=Y)
scrollbarx.config(command=tree.xview)
scrollbarx.pack(side=BOTTOM, fill=X)
tree.heading('Category', text="Category", anchor=W)
tree.column('#0', stretch=NO, minwidth=0, width=0)
tree.pack()
for key,value in dict.items():
tree.insert("", 'end', values=([key]))
b5.config(bg=self.title_backround_color)
b6.config(bg="#10B981")
canvas.itemconfig(text_title, text="CLUSTERING")
canvas.update()
b5.config(state=tk.DISABLED)
home_window_root.update()
def classification():
TableMargin = Frame(canvas, width=500)
TableMargin.place(x=320, y=130, width=655, height=300)
78
scrollbarx = Scrollbar(TableMargin, orient=HORIZONTAL)
scrollbary = Scrollbar(TableMargin, orient=VERTICAL)
tree = ttk.Treeview(TableMargin, columns=("Patient_ID"), height=400,
selectmode="extended",
yscrollcommand=scrollbary.set, xscrollcommand=scrollbarx.set)
scrollbary.config(command=tree.yview)
scrollbary.pack(side=RIGHT, fill=Y)
scrollbarx.config(command=tree.xview)
scrollbarx.pack(side=BOTTOM, fill=tk.X)
# tree.heading('Category', text="Category", anchor=W)
tree.heading('Patient_ID', text="Patient_ID", anchor=W)
tree.column('#0', stretch=NO, minwidth=0, width=0)
# tree.column('#1', stretch=NO, minwidth=0, width=200)
tree.pack()
file = 'data_set/clustering.csv'
with open(file) as f:
reader = csv.DictReader(f, delimiter=',')
for row in reader:
t1 = row['label']
t2 = row['Patient ID']
if t1=="High":
tree.insert("", 'end', values=([ t2]))
79
# for key, value in csv_data_classification.dict.items():
#
tree.insert("", 'end', values=([key,value]))
df = pd.read_csv("data_set/clustering.csv")
X = df.drop('label', axis=1).values
y = df['label'].values
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)
y_categorical = to_categorical(y_encoded)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_reshaped
=
X_scaled.shape[1]))
X_scaled.reshape((X_scaled.shape[0],
X_train, X_test, y_train,
y_categorical, test_size=0.2,
y_test
=
1,
train_test_split(X_reshaped,
random_state=42)
# Define RNN model
model = Sequential()
model.add(SimpleRNN(64,
X_train.shape[2]), activation='relu'))
input_shape=(X_train.shape[1],
model.add(Dense(32, activation='relu'))
model.add(Dense(y_categorical.shape[1], activation='softmax'))
model.compile(optimizer='adam',
metrics=['accuracy'])
loss='categorical_crossentropy',
80
model.fit(X_train,
y_train,
validation_data=(X_test, y_test))
epochs=20,
batch_size=32,
# Evaluate
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {accuracy:.2f}")
# Predict class probabilities
y_pred_prob = model.predict(X_test)
y_pred = np.argmax(y_pred_prob, axis=1)
y_true = np.argmax(y_test, axis=1)
class_names = label_encoder.classes_
print("Classification Report:")
print(classification_report(y_true, y_pred, target_names=class_names))
cm = confusion_matrix(y_true, y_pred)
plt.figure(figsize=(8, 6))
sns.heatmap(cm,
annot=True,
fmt='d',
xticklabels=class_names, yticklabels=class_names)
cmap='Blues',
plt.title("Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.tight_layout()
plt.show()
report = classification_report(y_true, y_pred, target_names=class_names,
output_dict=True)
81
classes = ['Border_line', 'Desirable', 'High']
precision = [report[c]['precision'] for c in classes]
recall = [report[c]['recall'] for c in classes]
f1 = [report[c]['f1-score'] for c in classes]
x = np.arange(len(classes))
width = 0.25
plt.figure(figsize=(10, 6))
plt.bar(x - width, precision, width, label='Precision')
plt.bar(x, recall, width, label='Recall')
plt.bar(x + width, f1, width, label='F1-Score')
plt.xlabel('Classes')
plt.ylabel('Score')
plt.title('Classification Metrics per Class')
plt.xticks(x, classes)
plt.ylim(0, 1)
plt.legend()
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
class_counts = pd.Series(y).value_counts().reset_index()
class_counts.columns = ['Class', 'Count']
82
plt.figure(figsize=(8, 6))
sns.barplot(data=class_counts, x='Class', y='Count', hue='Class',
palette='viridis',
legend=False)
plt.xlabel("Class")
plt.ylabel("Count")
plt.title("Class Distribution in Dataset")
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
canvas.itemconfig(text_title, text="CLASSIFICATION")
canvas.update()
b6.config(state=tk.DISABLED)
home_window_root.update()
b1 = Button(canvas, text="Select Data Set", command=select_dataset,
font=('times', 15, ' bold '), width=20,foreground="white",bg="#10B981")
canvas.create_window(150, 150, window=b1)
b2 = Button(canvas, text="Missing Values", command=missing_values,
font=('times', 15, ' bold '), width=20,
foreground="white", bg=self.button_backround_color)
canvas.create_window(150, 200, window=b2)
b3 = Button(canvas, text="Irrelevant Values", command=irrilavant_values,
font=('times', 15, ' bold '), width=20,
83
foreground="white", bg=self.button_backround_color)
canvas.create_window(150, 250, window=b3)
b4
=
Button(canvas,
text="Attribute
Extraction",
command=attribute_extraction, font=('times', 15, ' bold '), width=20,
foreground="white", bg=self.button_backround_color)
canvas.create_window(150, 300, window=b4)
b5 = Button(canvas, text="Clustering", command=clustering, font=('times',
15, ' bold '), width=20,
foreground="white", bg=self.button_backround_color)
canvas.create_window(150, 350, window=b5)
b6 = Button(canvas, text="Classification", command=classification,
font=('times', 15, ' bold '), width=20,
foreground="white", bg=self.button_backround_color)
canvas.create_window(150, 400, window=b6)
# b7 = Button(canvas, text="Next", command=next_page, font=('times', 15,
' bold '), width=20,
home_window_root.mainloop()
# ar = csv_data_classification()
84
APPENDIX II
SCREENSHOTS
Home Page
Dataset Preprocessing Interface
85
Dataset Selection
Missing Values Handlimg
86
Clustering Output Interface
Attribute Extraction Interface
87
Accuracy Comparison
Training and Testing Interface
88
Image Preprocessing Interface
Feature Extraction
89
Data Training Interface
Testing Interface
90
Result
91
REFERENCES
[1]
El-Sofany, H. F, “Predicting Heart Diseases Using Machine
Learning and Different Data Classification Techniques”, 2024,
IEEE Access,Vol.12, pp.106147–106160.
[2]
Awad Bin Naeem, “Heart Disease Detection Using Feature
Extraction and Artificial Neural Networks: A Sensor-Based
Approach”, 2024, IEEE Access, Vol. 12.
[3]
Mana Saleh Al Reshan, “A Robust Heart Disease Prediction
System Using Hybrid Deep Neural Networks”, 2023, IEEE
Access, Vol. 11.
[4]
Pronab Ghosh, Sami Azam, Mirjam Jonkman, Asif Karim,
“Efficient Prediction of Cardiovascular Disease Using Machine
Learning Algorithms With Relief and LASSO Feature Selection
Techniques”, 2021, IEEE Access, Vol. 9.
[5]
S. Taqdees, K. Dawood, and N. Akhtar, "Heart Disease
Prediction," Conference paper, 2021.
[6]
Fitriyani, N. L.Syafrudin, M. Alfian, G. Rhee, J. “HDPM: An
Effective Heart Disease Prediction Model for a Clinical Decision
Support System”, 2020, IEEE Access, vol. 8.
[7]
Indrakumari, R.Poongodi, T. Jena, Soumya Ranjan. “Heart
Disease Prediction using Exploratory Data Analysis”, 2020,
Procedia Computer Science, vol. 173.
[8]
Senthilkumar
Mohan,
Chandrasegar
Thirumalai,
Gautam
Srivastava, “Effective Heart Disease Prediction Using Hybrid
Machine Learning Techniques”, 2019, IEEE Access, Vol. 7.
[9]
H. Khan, N. Javaid, T. Bashir, M. Akbar, N. Alrajeh, and S.
Aslam, "Heart Disease Prediction Using Novel Ensemble and
Blending Based Cardiovascular Disease Detection Networks:
92
EnsCVDD-Net and BlCVDD-Net", Jul 2024, IEEE Access, vol.
12, pp. 109230–109252.
[10] Rohan, D.Reddy, G. Pradeep, Kumar, Y. V. Pavan, Prakash, K.
Purna, Reddy, Ch. Pradeep, “An extensive experimental analysis
for heart disease prediction using artificial intelligence
techniques”, Scientific Reports, vol. 15, 2025.
[11] Amar Asjad Raja, Iran-ul-Haq , Madiha Guftar Tamim Ahmed
Khan “Intelligence syncope Disease Prediction Framework using
DM-techniques” FTC 2016 –Future Technologies Conference
2016.
[12] Ashish Chhabbi, Lakhan Ahuja, Sahil Ahir and Y. K. Sharma,
"Heart Disease Prediction Using Data Mining Techniques",
IJRAT Special Issue National Conference “NCPC-2016”, pp.
104-106, 19 March 2016.
[13] Ghosh.S, Das.S,& Chatterjee M, “Heart disease risk prediction
using deep learning techniques with feature augmentation.”, 2023.
[14] Hannun, A. Y., Rajpurkar, P., Haghverdi, M., et al, “A Deep
Learning Framework for Cardiovascular Disease Detection Using
Electronic Health Records.”, 2018.
[15] Jindal, Harshit, et al, "Heart disease prediction using machine
learning algorithms." IOP Conference Series: Materials Science
and Engineering, 2021, Vol. 1022. No. 1. IOP Publishing.
93
.PUBLICATIONS
94
95
96
97
98
99