ARTIFICIAL IMMUNE SYSTEMS FOR ILLNESSES DIAGNOSTIC Ubiquitous Computing and Communication Journal
Transcription
ARTIFICIAL IMMUNE SYSTEMS FOR ILLNESSES DIAGNOSTIC Ubiquitous Computing and Communication Journal
Ubiquitous Computing and Communication Journal ARTIFICIAL IMMUNE SYSTEMS FOR ILLNESSES DIAGNOSTIC Hiba Khelil, Abdelkader Benyettou SIMPA Laboratory – University of Sciences and Technology of Oran, PB 1505 M’naouer, 31000 Oran, Algeria [email protected], [email protected] ABSTRACT Lately, a lot of new illnesses are frequently observed in our societies, that it can be avoid by daily visits to the doctor. Cancer is one of these illnesses where patients discover it only when it is too late. In this work we propose an artificial Cancer diagnostic which can classify patients if they are affected by Cancer or no, for this goal we have developed the artificial immune system for Cancer diagnostic. The artificial immune system is one of the newest approaches used in several domains as pattern recognition, robotic, intrusion detection, illnesses diagnostic…a lot of methods are exposed as negative selection, clone selection and artificial immune network (AINet). In this paper we’ll present the natural immune system, we’ll develop also four versions of Artificial Immune Recognition System (AIRS) and after we’ll present results for Cancer diagnostic with some critics and remarks of these methods. Keywords: Antigen, Antibody, B memory cells, Artificial Recognition Ball (ARB), Artificial Immune Recognition System (AIRS), Cancer diagnostic. 1 INTRODUCTION Pattern recognition is very vast domain in the artificial intelligence, where we can find faces, prints, speech and hand writing recognition and others patterns that are not less important as ones mentioned; for this goal several approaches are developed as neuronal networks, evolutionary algorithms, genetic algorithms and others under exploitation. The artificial immune system is a new approach used in different domains as pattern recognition [1] [2] [3] [4] [5], intrusions detection in Internet networks [6], robotics [7], machine learning [8] and other various applications in different domains. The artificial immune functions are inspired from natural immune system where the responsible cells of the immune response are simulated to give an artificial approach adapted according the application domain and the main problem. The present work is an application of the artificial immune recognition system (AIRS) for Cancer diagnostic. AIRS is a method inspired from the biologic immune system for pattern recognition (classification) proposed by A. Watkins in 2001 [9] in his Master thesis to Mississipi university, the improvement was been in 2004 by A. Watkins, J., Timmis and L. Boggess [10] where authors optimize the number of B cells generated. This method is characterized by the distributed training proven by A. Watkins in his PHD to the university of Kent in 2005 [11]. In this paper we will begin by a short definition Volume 3 Number 4 Page 88 of the natural immune system and immune response types. The second part is a representation of an artificial simulation of the immune systems giving a description of training algorithms. As prototype of training we’ll present a preview of Cancer data bases and results of application of artificial immune system for Cancer diagnostic. Finally some critics are given to show limits and differences between methods and some perspectives also. 2 NATURAL IMMUNE SYSTEM The biologic immune system constitutes a weapon against intruders in a given body, for this goal several cells contribute to eliminate this intruder named antigen, these cells participate for a 'biologic immune response'. We distinguish two types of natural immune response one is innate and other is acquired explained in the following points: 2.1 Innate Immunity It is an elementary immune that very reduce number of antigens are used, we can find this type of immunity in newborn not yet vaccinated. A non adaptive immune for long time can drive infections and death because the body is not very armed against antigens of the environment [12]. 2.2 Innate Acquired It is an immune endowed with a memory, named also secondary response, triggered after the www.ubicc.org Ubiquitous Computing and Communication Journal apparition of the same antigen in the same immune system for the second time or more, where it generates the development of B cells memory for this type of antigen already met (memorized) in the system. This answer is faster than innate one [12] and caused the increase of the temperature of the body, which can be explained by fighting of B cells against antigens. The primary immune response is only slower but it keeps information about passage of antigens in the system; this memorization phenomenon will interest us to use it for the artificial pattern recognition; according this principle the artificial immune recognition system is developed and it is the main subject of this paper. 3 was chosen from MC which has the least value of affinity (maximize the stimulation) with this antigen, noting that: stimulation(ag , mc) 1 - affinity(ag , mc) (2) This cell will be conserved for a long time by cloning and generate new ARBs, these ARBs will be add to the old ARBs set; the clone number is calculated by formula (3) clone_ number hyper _clonal _ rate*clonal _ rate*stimulation(mc match, ag) (3) Every clone is mutating according a small algorithm described in [9], which consist to alter the characteristic cells vectors. 4.3 Competition for Resources and Development of a Candidate Memory Cell In this step, ARBs’s information was completed by calculating resources allocations in function of its stimulation as following: THE ARTIFICIAL IMMUNE SYSTEM The natural immune system is very complicated to be artificially simulated; but A. B. Watkins succeeded to simulate the most important functions of the natural immune system for pattern recognition. The main factors entering in the artificial immune system are antigens, antibodies, B memory cells. We’ll present in the next session training algorithms that puts in work the noted factors (antibodies, B memory cells and antigens). 4 resource (4) and calculate the average stimulation for each ARB also. This step cans death some ARBs which are low stimulated. After we clone and mutate the subset of ARBs according their stimulation level. While the average stimulation value of each ARB class (si) is less then a given stimulation thresholds then we repeat the third step. THE AIRS ALGORITHM The present algorithm is inspired from A. B. Watkins thesis [9] [13] [14] which present the artificial immune recognition system intended for pattern recognition named AIRS. First, antigens represent the training data used in the training program in order to generate antibodies (B cells) to be used in the test step (classification). We can note that there are four training steps in the artificial immune training algorithm as following: si si , abj | ABi | (5) A Bi (6) a f fi n i ty _ th r e s h o ld the antigenic pattern is better than that of mc match mc c a n d i d a t e to the then add the candidate cell memory cells set. Additionally, if the affinity of m cm a tch and mc ca n d id a te is below the affinity m cm a tch threshold, then remove from memory set. So we repeat the second step until all antigens was treated. After the end of training phase, test is executed using the memory cells generated from training step, in order to classify the new antigenic patterns. The critter of classification is to attribute the new antigen to the most appropriate class using KMeans or KNN (K Nearest Neighbor); in this paper we’ll present classification results using KMeans algorithm. 2 Noting here that affinity is the Euclidian distance between two antigens, and n is the number of antigens (cardinality of training data) To begin, we must initialize the B cells memory set (MC) and ARBs population by choosing arbitrary examples from training data. 4.2 B Cells Identification and ARBs Generation Ones initialization was finished; this step is executed for each antigen from training data. First, Page | A Bi | ∑ a b j .s ti m j 1 4.4 Memory Cell Introduction Select ARBs of the same class as the antigen with highest affinity. If the affinity of this ARB with 4.1 Initialization Step In this step, all characteristic vectors of antigens are normalized, and affinity threshold is calculated by (1) n 1 n ∑ ∑ affinity(agi , ag j ) i 1j i 1 affinity _ threshold (1) n(n 1) Volume 3 Number 4 stimulation(ag , ARB(antibody)) * clonal _ rate 89 www.ubicc.org Ubiquitous Computing and Communication Journal 5 7 THE AIRS2 ALGORITHM The changes made to the AIRS algorithm are small, but it offers simplicity of implementation, data reduction and minimizes the processing time. The AIRS2 training steps are the same as AIRS one, just some changes which are presented as following: 1- It’s not necessary to initialize the ARB set. 2- It’s not necessary to mutate the ARBs class feature, because in AIRS2 we are interesting only about cells of the same class of antigen. 3- Resources are only allocated to ARBs of the same class as antigen and are allocated in proportion to the ARB’s stimulation level in reaction to the antigen. 4- The training stopping criterion no longer takes into account the stimulation value of ARBs in all classes, but only accounts for the stimulation value of the ARBs of the same class as the antigen. 6 AIRS AND AIRS2 ALGORITHMS USING MERGING FACTOR In this session we’ll present other modification of the AIRS and AIRS2. This modification carries on the last training step (Memory cell introduction), mainly in the cell introduction criterion; the condition was as following: C a n d S tim S tim u la tio n ( a g , m c ca n d id a te ) M a tch S ti m S tim u la tio n ( a g , m c m a tc h ) if ( C a n d S t im M a tc h S tim ) if ( C e llA ff MC MC AT MC * ATS ) m c m a tch m cc a n d id a te MC This mc (7) a ffin ity ( m cc a n d id a te , m c m a tc h ) C e llA ff source explains conditions to add mc m a t c h to the memory and delete cells set; the modification is carried in the following condition: (8) if (CellAff AT * ATS factor ) Noting that factor is calculated by: ca n d id a te factor AT * ATS * dampener * log(np) (9) With ATS and dampener are two parameters between 0 and 1, and np is the number of training programs executed in parallel (number of classes). This change to the merging scheme relaxes the criterion for memory cell removal in the affinity based merging scheme by a small fraction in logarithmic. This modification is used in the two algorithms (AIRS and AIRS2), and all algorithms are applied for Cancer diagnostics and all results we’ll be presented in the next sessions. Volume 3 Number 4 Page 90 RESULTS To determine the relative performance of AIRSs algorithms, it was necessary to test it on data base; so we have chosen three Cancer data bases from hospitable academic center of Wisconsin: Brest Cancer Wisconsin (BCW), Wisconsin Prognostic Breast Cancer (WPBC) and Wisconsin Diagnostic Breast Cancer (WDBC). The description of this data bases are given as following: - Brest Cancer Wisconsin (BCW): This data base was obtained from hospitable academic center of Wisconsin in 1991, which describe the cancerous symptoms and classify them into two classes: ‘Malignant’ or ‘Benin’. The distribution of patients is given as following: (Malignant, 214) (Benin, 458). - Wisconsin Prognostic Breast Cancer (WPBC): This data base is conceived by the same hospitable academic in 1995 but it gives more details then BCW giving nucleus of cell observations. Basing of its characteristics, patients are classify into two classes: ‘Recur’ and ‘NonRecur’, where its distribution as following: (Recur, 47) (NonRecur, 151). - Wisconsin Diagnostic Breast Cancer (WDBC): This data base is conceived also by the same hospitable academic in 1995, it has the same attribute then WPBC, but it classify its patients into two classes: ‘Malignant’ and ‘Benin’, where its distribution as following: (Malignant, 212) (Benin, 357). All training data are antigens, represented by characteristic vectors; also for antibodies have the same characteristic vector size as antigens. The ARB is represented as structure having antibody characteristic vector, his stimulation with antigen and the resources that allowed. 7.1 Software and Hardware Resources In order to apply algorithms we have used the C++ language in Linux Mandriva 2006 environment, every machine is endowed of 512Mo memory space and 3.0 Ghz processor frequency. All training programs have the same number of antigens, the same number of initial memory cells and ARBs also. In the same way the training program is an iterative process; that we have fix 50 iterations for each training program of every class. 7.2 Results and Classification Accuracy To run programs, we must fix the most important training parameters as following: hyper _ clonal _ rate clonal _ rate , and mutation _ rate . These parameters are used in training steps, as criterion to limit the clone number, to calculate the ARB’s resources and in the mutation procedure also. The parameters values are given in table 1: www.ubicc.org Ubiquitous Computing and Communication Journal Table 1: Training parameters. Parameters Type Values Hyper_clonal_rate Integer value 30 Clonal_rate Integer value 20 Mutation_rate Real value [0,1] 0.1 After 50 iterations, the B memory cells generated from each training program of classes are used in the classification step (test). In the classification we take the shortest distance between the new antigen and the gravity centers of all memory cells sets, and we affect this antigen in the same class of the nearest center (KMeans). Using this principle the classification accuracies are given in table 2: Table 2: Classification accuracies Table 3: Average classification accuracies Volume 3 Number 4 Page 91 www.ubicc.org Ubiquitous Computing and Communication Journal We can observe that AIRS gives in general the best results, which can give more B cells then to increase the recognition chance. The AIRS2 and AIRS2 using factor are amelioration of AIRS, but B cells generated are less than the original algorithm (AIRS), that’s why these methods (AIRS2, AIRS2 with factor) don’t give better results in Cancer diagnostic in exception of some cases; the evolution of B cells is given in the next session. The execution of AIRS2 and AIRS2 using factor are faster than AIRS and AIRS using factor, this can be explained by using just B cells of the same class of antigen, and this can reduce treatments then time processing also. Figure 3: Evolution of B cells in WDBC (AIRS2) From figures we observe that evolution of B cells in AIRS are faster than AIRS2 and AIRS2 using factor because there are more cells deleted on changing the condition given in equation (8) (using factor). 7.3 B Cells Evolution Noting that we have given the same chance to each training programs (quantity of antigens introduced, initial memory cells and initial ARBs), the B cells generated are not necessary the same in each method; the previews table give us the B cells generated in each one, we can observe that the cells generated in AIRS2 and AIRS2 using factor are less than AIRS and AIRS using factor, as mentioned before, although we initialized all B cells sets to the same size. The next figures represent evolution of B cells in function of iterations for each data base of the best rate from four methods: 8 The results of experiences can be found in table 2 and 3; comparing the used methods we can observe that AIRS in general gives best results and we can observe also that this method generates B cells more than others. In all experiences we have used the Euclidian distance, it is possible to use hamming distance or other. The AIRS and AIRS using factor converge slowly to the most B cells adapted for Cancer diagnostic, on the contrary of AIRS2 and AIRS2 using factor, which are executed quickly than others but they don’t give us the best memory cells. 9 Figure 1: Evolution of B cells in BCW (AIRS) Figure 2: Evolution of B cells in WPBC (AIRS2 using factor) Volume 3 Number 4 Page DISCUSSION OF RESULTS CONCLUSIONS In this paper we have presented Cancer diagnostic results for AIRS immuno-computing algorithm and provided directions for interpretation of these results. We are interested in immunocomputing because is one of the newest directions in bio-inspired machine learning and focused on AIRS, AIRS2, AIRS using factor and AIRS2 using factor (2001-2005) and it can be also used for classification (illnesses diagnostic). We suggest that AIRS is a mature classifier that delivers reasonable results and that is can safety be used for real world classifications tasks. The presented results are good but it must be improved using optimization algorithms. In our future work we want to make more importance to the parameters values and propose a new method to search the best values of these ones in order to across the performance of these algorithms. 92 www.ubicc.org Ubiquitous Computing and Communication Journal 10 REFERENCES [1] Secker A., Freitas A., Timmis J.: AISEC: An artificial immune system for e-mail classification. In proceedings of the Congress on Evolutionary Computation, pp. 131-139, Canberra. Australia (2003) [2] Lingjun M., Peter V. D. P., Haiyang W., A: Comprehensive benchmark of the artificial immune recognition system (AIRS). In proceeding of advanced data mining and applications ADMA, vol. 3584, pp. 575--582, China (2005) [3] Deneche A., Meshoul S., Batouche M. : Une approche hybride pour la reconnaissance des formes en utilisant un systeme immunitaire artificiel. In proceedings of graphic computer science, Biskra, Algeria (2005) [4] Deneche A.: Approches bios inspirees pour la reconnaissance de formes, Master thesis in Mentouri University, Constantine, Algeria (2006) [5] Goodman D., Boggess L., Watkins A.: Artificial immune system classification of multiple class problems, Intelligent Engineering Systems Through Artificial Neural press (2002) [6] Kim J., Bently P.: Towards an artificial immune system for network intrusion detection: an investigation of clonal selection with a negative selection operator. In Proceeding of Congress on Evolutionary Computation, vol. 2, pp. 1244--1252, South Korea (2001) [7] Jun J. H., Lee D. W., Sim K. B.: Realization of cooperative and swarm behavior in distributed autonomous robotic systems using artificial immune system. In proceeding IEEE international conference of Volume 3 Number 4 Page 93 Man and Cybernetics, vol. 6, pp. 614--619. IEEE Press, New York (1999) [8] Timmis J.: Artificial immune systems: a novel data analysis technique inspired by the immune network theory, PhD thesis Wales UK University (2000) [9] Watkins A.: AIRS: A resource limited artificial immune classifier, Master thesis, Mississippi University (2001) [10] Watkins A., Timmis J., Boggess L.: Artificial immune recognition system (airs): an immune inspired supervised learning algorithm, vol. 5, pp. 291--317, Genetic Programming and Evolvable Machines press (2004) [11] Watkins A.: Exploiting immunological metaphors in the development of serial, parallel, and distributed learning algorithms, PhD thesis, Kent University (2005) [12] Emilie P.: Organisation du system immunitaire felin, PhD thesis, National school Lyon, France (2006) [13] Watkins A., Timmis J.: Artificial immune recognition system (airs): revisions and refinements. In proceedings of first international conference on artificial immune system ICARIS, pp. 173--181, Kent University (2005) [14] Watkins A., Boggess L.: A new classifier based on resources limited artificial immune systems. In proceedings of congress de Evolutionary Computation, IEEE World Congress on Computational Intelligence held in Honolulu, HI, USA, pp. 1546--1551, Kent University (2005) www.ubicc.org