ARTIFICIAL IMMUNE SYSTEMS FOR ILLNESSES DIAGNOSTIC Ubiquitous Computing and Communication Journal

Transcription

ARTIFICIAL IMMUNE SYSTEMS FOR ILLNESSES DIAGNOSTIC Ubiquitous Computing and Communication Journal
Ubiquitous Computing and Communication Journal
ARTIFICIAL IMMUNE SYSTEMS FOR ILLNESSES DIAGNOSTIC
Hiba Khelil, Abdelkader Benyettou
SIMPA Laboratory – University of Sciences and Technology of Oran,
PB 1505 M’naouer, 31000 Oran, Algeria
[email protected], [email protected]
ABSTRACT
Lately, a lot of new illnesses are frequently observed in our societies, that it can be
avoid by daily visits to the doctor. Cancer is one of these illnesses where patients
discover it only when it is too late. In this work we propose an artificial Cancer
diagnostic which can classify patients if they are affected by Cancer or no, for this
goal we have developed the artificial immune system for Cancer diagnostic. The
artificial immune system is one of the newest approaches used in several domains
as pattern recognition, robotic, intrusion detection, illnesses diagnostic…a lot of
methods are exposed as negative selection, clone selection and artificial immune
network (AINet). In this paper we’ll present the natural immune system, we’ll
develop also four versions of Artificial Immune Recognition System (AIRS) and
after we’ll present results for Cancer diagnostic with some critics and remarks of
these methods.
Keywords: Antigen, Antibody, B memory cells, Artificial Recognition Ball
(ARB), Artificial Immune Recognition System (AIRS), Cancer diagnostic.
1
INTRODUCTION
Pattern recognition is very vast domain in the
artificial intelligence, where we can find faces, prints,
speech and hand writing recognition and others
patterns that are not less important as ones
mentioned; for this goal several approaches are
developed as neuronal networks, evolutionary
algorithms, genetic algorithms and others under
exploitation. The artificial immune system is a new
approach used in different domains as pattern
recognition [1] [2] [3] [4] [5], intrusions detection in
Internet networks [6], robotics [7], machine learning
[8] and other various applications in different
domains. The artificial immune functions are
inspired from natural immune system where the
responsible cells of the immune response are
simulated to give an artificial approach adapted
according the application domain and the main
problem.
The present work is an application of the
artificial immune recognition system (AIRS) for
Cancer diagnostic. AIRS is a method inspired from
the biologic immune system for pattern recognition
(classification) proposed by A. Watkins in 2001 [9]
in his Master thesis to Mississipi university, the
improvement was been in 2004 by A. Watkins, J.,
Timmis and L. Boggess [10] where authors optimize
the number of B cells generated. This method is
characterized by the distributed training proven by A.
Watkins in his PHD to the university of Kent in 2005
[11]. In this paper we will begin by a short definition
Volume 3 Number 4
Page 88
of the natural immune system and immune response
types. The second part is a representation of an
artificial simulation of the immune systems giving a
description of training algorithms. As prototype of
training we’ll present a preview of Cancer data bases
and results of application of artificial immune system
for Cancer diagnostic. Finally some critics are given
to show limits and differences between methods and
some perspectives also.
2
NATURAL IMMUNE SYSTEM
The biologic immune system constitutes a
weapon against intruders in a given body, for this
goal several cells contribute to eliminate this intruder
named antigen, these cells participate for a 'biologic
immune response'.
We distinguish two types of natural immune
response one is innate and other is acquired
explained in the following points:
2.1
Innate Immunity
It is an elementary immune that very reduce
number of antigens are used, we can find this type of
immunity in newborn not yet vaccinated. A non
adaptive immune for long time can drive infections
and death because the body is not very armed against
antigens of the environment [12].
2.2
Innate Acquired
It is an immune endowed with a memory, named
also secondary response, triggered after the
www.ubicc.org
Ubiquitous Computing and Communication Journal
apparition of the same antigen in the same immune
system for the second time or more, where it
generates the development of B cells memory for
this type of antigen already met (memorized) in the
system. This answer is faster than innate one [12]
and caused the increase of the temperature of the
body, which can be explained by fighting of B cells
against antigens.
The primary immune response is only slower but
it keeps information about passage of antigens in the
system; this memorization phenomenon will interest
us to use it for the artificial pattern recognition;
according this principle the artificial immune
recognition system is developed and it is the main
subject of this paper.
3
was chosen from MC which has the least value of
affinity (maximize the stimulation) with this antigen,
noting that:
stimulation(ag , mc) 1 - affinity(ag , mc)
(2)
This cell will be conserved for a long time by
cloning and generate new ARBs, these ARBs will be
add to the old ARBs set; the clone number is
calculated by formula (3)
clone_ number hyper _clonal _ rate*clonal _ rate*stimulation(mc match, ag)
(3)
Every clone is mutating according a small
algorithm described in [9], which consist to alter the
characteristic cells vectors.
4.3 Competition for Resources and Development
of a Candidate Memory Cell
In this step, ARBs’s information was completed
by calculating resources allocations in function of its
stimulation as following:
THE ARTIFICIAL IMMUNE SYSTEM
The natural immune system is very complicated
to be artificially simulated; but A. B. Watkins
succeeded to simulate the most important functions
of the natural immune system for pattern recognition.
The main factors entering in the artificial immune
system are antigens, antibodies, B memory cells.
We’ll present in the next session training algorithms
that puts in work the noted factors (antibodies, B
memory cells and antigens).
4
resource
(4)
and calculate the average stimulation for each ARB
also. This step cans death some ARBs which are low
stimulated.
After we clone and mutate the subset of ARBs
according their stimulation level.
While the average stimulation value of each
ARB class (si) is less then a given stimulation
thresholds then we repeat the third step.
THE AIRS ALGORITHM
The present algorithm is inspired from A. B.
Watkins thesis [9] [13] [14] which present the
artificial immune recognition system intended for
pattern recognition named AIRS. First, antigens
represent the training data used in the training
program in order to generate antibodies (B cells) to
be used in the test step (classification). We can note
that there are four training steps in the artificial
immune training algorithm as following:
si
si
, abj
| ABi |
(5)
A Bi
(6)
a f fi n i ty _ th r e s h o ld
the antigenic pattern is better than that of
mc
match
mc
c a n d i d a t e to the
then add the candidate cell
memory cells set. Additionally, if the affinity of
m cm a tch
and
mc
ca n d id a te
is below the affinity
m cm a tch
threshold, then remove
from memory set.
So we repeat the second step until all antigens
was treated.
After the end of training phase, test is executed
using the memory cells generated from training step,
in order to classify the new antigenic patterns. The
critter of classification is to attribute the new antigen
to the most appropriate class using KMeans or KNN
(K Nearest Neighbor); in this paper we’ll present
classification results using KMeans algorithm.
2
Noting here that affinity is the Euclidian distance
between two antigens, and n is the number of
antigens (cardinality of training data)
To begin, we must initialize the B cells memory
set (MC) and ARBs population by choosing arbitrary
examples from training data.
4.2 B Cells Identification and ARBs Generation
Ones initialization was finished; this step is
executed for each antigen from training data. First,
Page
| A Bi |
∑ a b j .s ti m
j 1
4.4 Memory Cell Introduction
Select ARBs of the same class as the antigen
with highest affinity. If the affinity of this ARB with
4.1 Initialization Step
In this step, all characteristic vectors of antigens
are normalized, and affinity threshold is calculated
by (1)
n 1 n
∑ ∑ affinity(agi , ag j )
i 1j i 1
affinity _ threshold
(1)
n(n 1)
Volume 3 Number 4
stimulation(ag , ARB(antibody)) * clonal _ rate
89
www.ubicc.org
Ubiquitous Computing and Communication Journal
5
7
THE AIRS2 ALGORITHM
The changes made to the AIRS algorithm are
small, but it offers simplicity of implementation, data
reduction and minimizes the processing time.
The AIRS2 training steps are the same as AIRS
one, just some changes which are presented as
following:
1- It’s not necessary to initialize the ARB set.
2- It’s not necessary to mutate the ARBs class
feature, because in AIRS2 we are interesting only
about cells of the same class of antigen.
3- Resources are only allocated to ARBs of the same
class as antigen and are allocated in proportion to the
ARB’s stimulation level in reaction to the antigen.
4- The training stopping criterion no longer takes
into account the stimulation value of ARBs in all
classes, but only accounts for the stimulation value
of the ARBs of the same class as the antigen.
6
AIRS AND AIRS2 ALGORITHMS USING
MERGING FACTOR
In this session we’ll present other modification
of the AIRS and AIRS2. This modification carries on
the last training step (Memory cell introduction),
mainly in the cell introduction criterion; the
condition was as following:
C a n d S tim
S tim u la tio n ( a g , m c ca n d id a te )
M a tch S ti m
S tim u la tio n ( a g , m c m a tc h )
if ( C a n d S t im
M a tc h S tim )
if ( C e llA ff
MC
MC
AT
MC
*
ATS )
m c m a tch
m cc a n d id a te
MC
This
mc
(7)
a ffin ity ( m cc a n d id a te , m c m a tc h )
C e llA ff
source
explains
conditions
to
add
mc
m a t c h to the memory
and delete
cells set; the modification is carried in the following
condition:
(8)
if (CellAff AT
*
ATS
factor )
Noting that factor is calculated by:
ca n d id a te
factor
AT
*
ATS
*
dampener
*
log(np)
(9)
With ATS and dampener are two parameters
between 0 and 1, and np is the number of training
programs executed in parallel (number of classes).
This change to the merging scheme relaxes the
criterion for memory cell removal in the affinity
based merging scheme by a small fraction in
logarithmic.
This modification is used in the two algorithms
(AIRS and AIRS2), and all algorithms are applied
for Cancer diagnostics and all results we’ll be
presented in the next sessions.
Volume 3 Number 4
Page 90
RESULTS
To determine the relative performance of AIRSs
algorithms, it was necessary to test it on data base; so
we have chosen three Cancer data bases from
hospitable academic center of Wisconsin: Brest
Cancer Wisconsin (BCW), Wisconsin Prognostic
Breast Cancer (WPBC) and Wisconsin Diagnostic
Breast Cancer (WDBC). The description of this data
bases are given as following:
- Brest Cancer Wisconsin (BCW): This data base
was obtained from hospitable academic center of
Wisconsin in 1991, which describe the cancerous
symptoms and classify them into two classes:
‘Malignant’ or ‘Benin’. The distribution of patients
is given as following: (Malignant, 214) (Benin, 458).
- Wisconsin Prognostic Breast Cancer (WPBC): This
data base is conceived by the same hospitable
academic in 1995 but it gives more details then BCW
giving nucleus of cell observations. Basing of its
characteristics, patients are classify into two classes:
‘Recur’ and ‘NonRecur’, where its distribution as
following: (Recur, 47) (NonRecur, 151).
- Wisconsin Diagnostic Breast Cancer (WDBC):
This data base is conceived also by the same
hospitable academic in 1995, it has the same attribute
then WPBC, but it classify its patients into two
classes: ‘Malignant’ and ‘Benin’, where its
distribution as following: (Malignant, 212) (Benin,
357).
All training data are antigens, represented by
characteristic vectors; also for antibodies have the
same characteristic vector size as antigens. The ARB
is represented as structure having antibody
characteristic vector, his stimulation with antigen and
the resources that allowed.
7.1 Software and Hardware Resources
In order to apply algorithms we have used the
C++ language in Linux Mandriva 2006 environment,
every machine is endowed of 512Mo memory space
and 3.0 Ghz processor frequency.
All training programs have the same number of
antigens, the same number of initial memory cells
and ARBs also. In the same way the training
program is an iterative process; that we have fix 50
iterations for each training program of every class.
7.2 Results and Classification Accuracy
To run programs, we must fix the most
important training parameters as following:
hyper _ clonal _ rate
clonal _ rate
,
and
mutation _ rate . These parameters are used in
training steps, as criterion to limit the clone number,
to calculate the ARB’s resources and in the mutation
procedure also. The parameters values are given in
table 1:
www.ubicc.org
Ubiquitous Computing and Communication Journal
Table 1: Training parameters.
Parameters
Type
Values
Hyper_clonal_rate
Integer value
30
Clonal_rate
Integer value
20
Mutation_rate
Real value
[0,1]
0.1
After 50 iterations, the B memory cells
generated from each training program of classes are
used in the classification step (test). In the
classification we take the shortest distance between
the new antigen and the gravity centers of all
memory cells sets, and we affect this antigen in the
same class of the nearest center (KMeans). Using
this principle the classification accuracies are given
in table 2:
Table 2: Classification accuracies
Table 3: Average classification accuracies
Volume 3 Number 4
Page 91
www.ubicc.org
Ubiquitous Computing and Communication Journal
We can observe that AIRS gives in general the
best results, which can give more B cells then to
increase the recognition chance. The AIRS2 and
AIRS2 using factor are amelioration of AIRS, but B
cells generated are less than the original algorithm
(AIRS), that’s why these methods (AIRS2, AIRS2
with factor) don’t give better results in Cancer
diagnostic in exception of some cases; the evolution
of B cells is given in the next session.
The execution of AIRS2 and AIRS2 using factor
are faster than AIRS and AIRS using factor, this can
be explained by using just B cells of the same class
of antigen, and this can reduce treatments then time
processing also.
Figure 3: Evolution of B cells in WDBC (AIRS2)
From figures we observe that evolution of B
cells in AIRS are faster than AIRS2 and AIRS2
using factor because there are more cells deleted on
changing the condition given in equation (8) (using
factor).
7.3 B Cells Evolution
Noting that we have given the same chance to
each training programs (quantity of antigens
introduced, initial memory cells and initial ARBs),
the B cells generated are not necessary the same in
each method; the previews table give us the B cells
generated in each one, we can observe that the cells
generated in AIRS2 and AIRS2 using factor are less
than AIRS and AIRS using factor, as mentioned
before, although we initialized all B cells sets to the
same size. The next figures represent evolution of B
cells in function of iterations for each data base of
the best rate from four methods:
8
The results of experiences can be found in table
2 and 3; comparing the used methods we can observe
that AIRS in general gives best results and we can
observe also that this method generates B cells more
than others.
In all experiences we have used the Euclidian
distance, it is possible to use hamming distance or
other.
The AIRS and AIRS using factor converge
slowly to the most B cells adapted for Cancer
diagnostic, on the contrary of AIRS2 and AIRS2
using factor, which are executed quickly than others
but they don’t give us the best memory cells.
9
Figure 1: Evolution of B cells in BCW (AIRS)
Figure 2: Evolution of B cells in WPBC (AIRS2
using factor)
Volume 3 Number 4
Page
DISCUSSION OF RESULTS
CONCLUSIONS
In this paper we have presented Cancer
diagnostic results for AIRS immuno-computing
algorithm and provided directions for interpretation
of these results. We are interested in immunocomputing because is one of the newest directions in
bio-inspired machine learning and focused on AIRS,
AIRS2, AIRS using factor and AIRS2 using factor
(2001-2005) and it can be also used for classification
(illnesses diagnostic).
We suggest that AIRS is a mature classifier that
delivers reasonable results and that is can safety be
used for real world classifications tasks. The
presented results are good but it must be improved
using optimization algorithms. In our future work we
want to make more importance to the parameters
values and propose a new method to search the best
values of these ones in order to across the
performance of these algorithms.
92
www.ubicc.org
Ubiquitous Computing and Communication Journal
10 REFERENCES
[1] Secker A., Freitas A., Timmis J.: AISEC: An artificial
immune system for e-mail classification. In proceedings
of the Congress on Evolutionary Computation, pp. 131-139, Canberra. Australia (2003)
[2] Lingjun M., Peter V. D. P., Haiyang W., A:
Comprehensive benchmark of the artificial immune
recognition system (AIRS). In proceeding of advanced
data
mining
and
applications
ADMA,
vol. 3584, pp. 575--582, China (2005)
[3] Deneche A., Meshoul S., Batouche M. : Une approche
hybride pour la reconnaissance des formes en utilisant
un systeme immunitaire artificiel. In proceedings of
graphic computer science, Biskra, Algeria (2005)
[4] Deneche A.: Approches bios inspirees pour la
reconnaissance de formes, Master thesis in Mentouri
University, Constantine, Algeria (2006)
[5] Goodman D., Boggess L., Watkins A.: Artificial
immune system classification of multiple class
problems, Intelligent Engineering Systems Through
Artificial Neural press (2002)
[6] Kim J., Bently P.: Towards an artificial immune system
for network intrusion detection: an investigation of
clonal selection with a negative selection operator. In
Proceeding of Congress on Evolutionary Computation,
vol. 2, pp. 1244--1252, South Korea (2001)
[7] Jun J. H., Lee D. W., Sim K. B.: Realization of
cooperative and swarm behavior in distributed
autonomous robotic systems using artificial immune
system. In proceeding IEEE international conference of
Volume 3 Number 4
Page 93
Man and Cybernetics, vol. 6, pp. 614--619. IEEE Press,
New York (1999)
[8] Timmis J.: Artificial immune systems: a novel data
analysis technique inspired by the immune network
theory, PhD thesis Wales UK University (2000)
[9] Watkins A.: AIRS: A resource limited artificial
immune classifier, Master thesis, Mississippi University
(2001)
[10] Watkins A., Timmis J., Boggess L.: Artificial immune
recognition system (airs): an immune inspired
supervised learning algorithm, vol. 5, pp. 291--317,
Genetic Programming and Evolvable Machines press
(2004)
[11] Watkins A.: Exploiting immunological metaphors in
the development of serial, parallel, and distributed
learning algorithms, PhD thesis, Kent University (2005)
[12] Emilie P.: Organisation du system immunitaire felin,
PhD thesis, National school Lyon, France (2006)
[13] Watkins A., Timmis J.: Artificial immune recognition
system (airs): revisions and refinements. In proceedings
of first international conference on artificial immune
system ICARIS, pp. 173--181, Kent University (2005)
[14] Watkins A., Boggess L.: A new classifier based
on resources limited artificial immune systems. In
proceedings of congress de Evolutionary
Computation, IEEE World Congress on
Computational Intelligence held in Honolulu, HI,
USA, pp. 1546--1551, Kent University (2005)
www.ubicc.org