Multiscale Recurrence Quantification Analysis of Spatial Cardiac Vectorcardiogram Signals , Member, IEEE
Transcription
Multiscale Recurrence Quantification Analysis of Spatial Cardiac Vectorcardiogram Signals , Member, IEEE
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 58, NO. 2, FEBRUARY 2011 339 Multiscale Recurrence Quantification Analysis of Spatial Cardiac Vectorcardiogram Signals Hui Yang, Member, IEEE Abstract—Myocardial infarction (MI), also known as a heart attack, is a leading cause of mortality in the world. Spatial vectorcardiogram (VCG) signals are recorded on the body surface to monitor the underlying cardiac electrical activities in three orthogonal directions of the body, namely, frontal, transverse, and sagittal planes. The 3-D VCG vector loops provide a new way to study the cardiac dynamical behaviors, as opposed to the conventional time-delay reconstructed phase space from a single ECG trace. However, few, if any, previous approaches studied the relationships between cardiac disorders and recurrence patterns in VCG signals. This paper presents the recurrence quantification analysis (RQA) of VCG signals in multiple wavelet scales for the identification of cardiac disorders. The linear classification models using multiscale RQA features were shown to detect MI with an average sensitivity of 96.5% and an average specificity of 75% in the randomized classification experiments of PhysioNet PhysikalischTechnische Bundesanstalt database, which is comparable to the performance of human experts. This study is strongly indicative of potential automated MI classification algorithms for diagnostic and therapeutic purposes. Index Terms—Myocardial infarction (MI), recurrence quantification analysis (RQA), vectorcardiogram (VCG), wavelet. I. INTRODUCTION YOCARDIAL infarction (MI), also known as a heart attack, is the single leading cause of mortality in America. Typically, MIs are caused by the insufficient blood supply to the myocardium commonly due to the coronary artery occlusion and can take place in different portions of the heart, i.e., anterior, inferior, posterior, inferior–lateral, anterior–septal, posterior– lateral. The triad of MI is ischemia, injury, and necrosis [1]. Ischemia is due to reduced blood supply; injury indicates acuteness of infarct; and infarction is the symptom of myocardium necrosis [1]. During MI, the damaged heart muscle tissues either conduct electrical impulses more slowly or produce an electrical void, which consequently causes the directional variations in the heart electrical function (i.e., depolarization and repolarization). The ECG signals have been widely used as a powerful tool to assist the detection of abnormal cardiac electrical activities for more than 100 years [2]. The sequential P, QRS, and T waves in the ECG signals were denoted by Einthoven to represent the M Manuscript received October 25, 2009; revised January 20, 2010, and June 3, 2010; accepted July 27, 2010. Date of publication August 5, 2010; date of current version January 21, 2011. The author is with the Department of Industrial and Management Systems Engineering, University of South Florida, Tampa, FL 33620 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TBME.2010.2063704 Fig. 1. Different views of human torso showing various coordinate axes and the positions of the electrodes for acquiring VCG signals [2]. sequences of depolarization and repolarization activities in atria and ventricles. With technological advancement, researchers have been using sophisticated methods in time, frequency, or phase space domains to explore the hidden information from ECG signals for improving the diagnostics of cardiovascular diseases [3]. The standard clinical method of MI diagnostic is an analysis of aberrated patterns in 12-lead ECG signals, for example, the waveform variations of P, QRS, and T waves. However, the waveform patterns often vary significantly due to the locations of ECG electrodes. It may also be noted that one lead ECG signals can only capture one perspective view of the 3-D heart electrical activities. Thus, multiple lead ECG systems, e.g., 12-lead ECG and 3-lead vectorcardiogram (VCG), are designed for a multidirectional view of the cardiac electrical activity. VCG signals monitor the cardiac electrical activity along three orthogonal X, Y , Z planes of the body, namely, frontal, transverse, and sagittal. As shown in Fig. 1, the VCG vector loops contain 3-D recurring, near-periodic patterns of heart dynamics [4]. Each heart cycle consists of three loops corresponding to P, QRS, and T wave activities. Dower et al. [5]–[7] showed that a 3-lead VCG can be linearly transformed to a 12lead ECG without a significant loss of clinically useful information pertaining to the heart dynamics. Therefore, the 3-lead VCG surmounts not only the information loss in one lead ECG but also the redundant information in 12-lead ECG [8]. Due to the infarction locations and myocardium damage levels, MI shows a variety of changes in cardiac electrical activities. As aforementioned, MI can occur in various portions of heart such as anterior, inferior, posterior, etc. There are also three stages of myocardium damage, namely, ischemia, injury, and 0018-9294/$26.00 © 2010 IEEE 340 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 58, NO. 2, FEBRUARY 2011 TABLE I DIAGNOSTIC PERFORMANCES OF COMPUTER PROGRAMS FOR THE IDENTIFICATION OF MI [15] Fig. 2. necrosis. The necrotic area can be surrounded by injured tissues, which in turn can be surrounded by ischemic regions. Such diverse conditions generate various complex ECG signal patterns in the presence of noise, which pose significant challenges to the identification of MI. Most of previous computer programs computed time-domain features from 12-lead ECG signals, e.g., variations in the ST wave segments, depth of Q-wave, ST elevation, inversion of T-wave, and other event intervals for the identification of MI. It may also be noted that some of the previous MI diagnostic approaches considered statistical analysis of QRS, T waves’ morphology, angles and magnitudes [9]–[14] in the ECG or VCG signals. But few of these traditional automated MI diagnostic algorithms were shown to achieve performances exceeding 90% in both specificity and sensitivity. As shown in Table I, these algorithms have considerably low overall classification accuracy. Cardiologists were shown to be the best performance group in identifying MI with a sensitivity of 80.3% and a specificity of 97.1%. Few, if any, of the previous approaches considered recurrence dynamics underlying the three-lead VCG signals and used recurrence quantification analysis (RQA) for the identification of cardiac disorders. Additionally, most of the previous RQA methods only considered the time-delay reconstructed phase space from one-lead ECG signals for the investigation of cardiovascular dynamics [16], [17]. The multiscale RQA analysis of more intuitive three-lead VCG signals has not been reported in the literature. In this paper, we report the variations of the multiscale recurrence dynamics in the measured VCG loops between healthy control (HC) versus MI subjects. RQA was found to be a useful means to detect cardiac disorders, such as atrial fibrillation [16]. The objective of this study is to explore the underlying hidden recurrence patterns buried in the massive physiological time series and develop a novel classification methodology to identify cardiovascular disorders. The remainder of this paper is organized as follows: Section II introduces the research methodology of multiscale RQA analysis, feature selection, and classification; the ECG data description and experimental design are presented in Section III, Section IV contains the implementation results for the PhysioNet Physikalisch-Technische Bundesanstalt (PTB) database, Section V includes discussion and conclusions arising out of this investigation. Procedure of the research methodology used. II. RESEARCH METHODOLOGY It is commonly known that human heart is essentially an autonomous electromechanical blood pump that provides indispensable supplies for maintaining vital living organisms and disposes waste materials. Such autonomous control mechanisms generate noteworthy dynamical behaviors and structure patterns in the ECG signals with the presence of high-level nonlinearity, nonstationarity, and noise. The inherent complex characteristics from living organisms pose significant challenges to the conventional cardiac disorder identification approaches based on temporal-domain inspections or classical statistical methods with linear and stationary assumptions [18]. Recent advancements in nonlinear dynamics for exploring the hidden patterns and relationships in complex physical systems provides a great opportunity to better understand cardiovascular functions from the new perspectives, e.g., wavelet domain and phase-space domain. Wavelet representation can provide an effective time-frequency analysis for nonstationary signals with both steady and transient parts. RQA was shown to be a powerful tool for the dynamical system analysis, which can provide the quantitative characterization of complexity and randomness for the nonlinear and nonstationary signals such as physiological time series [17]. The combined multiscale RQA analysis can lead to effective features, which are more sensitive to the underlying pathological variations and less sensitive to noise or other extraneous variations. As shown in Fig. 2, the acquired VCG signals are first decomposed into multiple wavelet filter banks, using the discrete wavelet transformation (DWT). Then cardiac recurrence dynamics are quantified for the VCG signal components within each wavelet scale. Thus, RQA statistics are not only computed from the original global scale but also multiple wavelet scales, which can thus more effectively capture the local scaledependent transient and intermittent behaviors. Finally, the extracted large group of multiscale RQA features is further reduced into a small subset to avoid the problem of “curse of dimensionality” in classification models. A. Wavelet Signal Representation Wavelet is an effective time-frequency decomposition tool, which elucidates the steady and transient components of the YANG: MULTISCALE RECURRENCE ANALYSIS OF SPATIAL CARDIAC VCG SIGNALS nonstationary signals into various frequency bands while preserving the time information. The time-varying, transients, and nonstationary characteristics in the nature of physiological signals have fueled increasing interests in applying wavelet transformation for the analysis of ECG signals. Such analysis includes characteristic points (QRST) detection [19]–[21], local abnormality analysis [9], noise cancellation [22], cardiac arrhythmias [23], compact representation [24], and ECG data compression [25], [26]. But few, if any, of previous approaches employed wavelet transformation toward the nonlinear dynamics quantification of ECG signals, especially the more intuitive VCG signals. In this investigation, DWTs are used to decompose the VCG signal characteristics in both time and frequency domains. The full spectrum of VCG signals is divided into multiple frequency bands in the form of separated components fa (t) and da (t). Such a multiresolution analysis is formulated based upon the compactly supported orthonormal wavelet functions ψa,b (t) and scaling functions ϕa,b (t). The DWT generally employs the dyadic grid form to achieve the zero redundancy. The dyadic grid wavelet and scaling functions can be written as follows: ψa,b (t) = 2 ϕa,b (t) = 2 −a 2 ψ(2−a t − b) −a 2 ϕ(2−α t − b). The wavelet (or detail) coefficients and approximation coefficients can be computed by taking the inner products of the signal f (t) with the wavelet functions and scaling functions respectively as +∞ Wa,b = f (t), ψa,b (t) = f (t)ψa,b (t)dt −∞ Aa,b = f (t), ϕa,b (t) = +∞ −∞ f (t)ϕa,b (t)dt where Wa,b is the wavelet coefficients at scale and location indices (a, b) and Aa,b is the approximation coefficients. Therefore, the signal f (t) can be expressed as the summation of approximation and detail series as f (t) = fan (t) + an da (t) a=−∞ = ∞ b=−∞ Aan ,b ϕan ,b (t) + ∞ an Wa,b ψa,b (t) a=−∞ b=−∞ where fan (t) is the signal approximation at scale an and da (t) is the signal details at scale a. But there are several existing families of standard wavelets, such as Daubechies, Coiflet, Symlet, and so on. The choice of wavelet basis functions can play a significant role in determining the compactness of the resulting wavelet representation [24]. It is generally understood that the closer the basis functions match the signal patterns, the more compact the representation will be [24]. Therefore, the Daubechies wavelets db4 is selected in this present investigation since its wavelet function has a very similar shape to the ECG signal pattern. 341 Fig. 3. Graphical illustration of relationship among (a) VCG X, Y, Z time series; (b) VCG trajectory; (c) UTRP; (d) TRP. B. Recurrence Quantification of Cardiovascular Dynamics Dynamical system phase space exhibits recurrence characteristics, even in different wavelet scales. Poincare recurrence theorem [27] shows that if one has a measure preserving transformation, the trajectories eventually reappear at the neighborhood of former points. As shown in Fig. 3(c), the unthreshold recurrence plot (UTRP) delineates the distances of every point x(i), the state vector realized at time i, to all the others in the phase space, i.e., Dist(i, j) = ϑ(x(i) − x(j)), where · is a distance measurement (e.g., the Euclidean norm) and ϑ(·) is the color code that maps the distance to a color scale. Fig. 3(c) shows that the color code of the distance between the cardiac vectors x(i) and x(j) is located in coordinate (i, j) of the recurrence plot. If the color code at the point is blue, then the points are located close to each other in the VCG, and if the color code is red, the points are located farther apart. While the thresholded recurrence plot (TRP) [see Fig. 3(d)] only draws points when the distance x(i) − x(j) between two cardiac vectors is below a cutoff distance r: T (i, j) = Θ(r − x(i) − x(j)), where Θ is the Heaviside function [28]. The recurrence plot provides a convenient means to capture the topological relationships existing in the 3-D VCG vector space in the form of 2-D images. It shows the times at which a state of the dynamical system exhibits recurrence, i.e., the time pairs at which the trajectories of a system evolution come within a specified neighborhood. The structures of a recurrence plot have distinct topology and texture patterns [see Figs. 3(c) and 3(d)]. The ridges locate the nonstationarity and/or the switching between local behaviors. The recurring dark (blue) diagonal (45◦ ) lines indicate the near periodicity of the system behaviors over given time segments with a period, heart rate, equal to the separations between successive diagonal lines [28]. C. Multiscale RQA Features extracted based on conventional statistical approaches and linear system perspectives alone tend to have limitations for capturing signal variations resulting from changes in the cardiovascular system dynamics, especially considering the 342 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 58, NO. 2, FEBRUARY 2011 Fig. 5. Fig. 4. Flow chart of the forward feature selection algorithm. Flow diagram of multiscale RQA. D. Feature Selection aforementioned different infarction locations and myocardium damage levels. The use of rigorous nonlinear dynamics principles based on multiscale RQA can lead to extraction of effective features, which are more sensitive to pathological variations and less sensitive to noises or other extraneous variations. As shown in Fig. 3(a) and (b), three lead VCG time series record the spatial heart electrical activities in the form of a 3-D attractor. The recurrence dynamics in the original global scale VCG attractor [see Fig. 3(c)] were explored in our previous investigation [4]. It was found that MI pathological patterns are closely related to the variations in the recurrence dynamics. However, some pathological characteristics can be buried in such a single-scale view. Hence, this present investi−−→ gation decomposes the 3-D VCG signals f (t)|V x,V y .V z −−−→ into multiple scales, namely, An : (fan (t)|)V x,V y ,V z , D1 : −−−→ −−→ (d1 (t)|)V x,V y ,V z , . . ., (Dn : dan (t)|)V x,V y ,V z . Nonstationary characteristics can be approximately separated from the steady behaviors for studying the pathological properties of underlying complex cardiovascular dynamics. As shown in Fig. 4, the 3-D VCG signal components in the same scale are presented in the form of a 2-D UTRP so that mathematical description of the salient patterns contained can be compactly captured, and quantitative recurrence quantifiers that describe the specific recurring patterns in the VCG can be efficiently extracted. The cutoff distance r was selected to be 0.005 × Dist(i, j)m ax for the TRPs in all the MI and HC cases. The six typical RQA features, namely, recurrence rate (RR), determinism (DET), linemax (LMAX), entropy (ENT), laminarity (LAM), and trapping time (TT) are extracted to characterize the VCG signal [29]. The unanimously selected threshold does not necessarily calculate proper dynamical invariants for some of the subjects, but the resulted RQA statistics are utilized to reveal the underlying pathological behaviors due to MI. In total, 6 × (an + 2) recurrence quantifiers are extracted in the an level decomposition for the identifi−−−→ cation of MI pathological clues, including (fan (t)|)V x,V y ,V z , −−→ −−−→ (d1 (t)|)V x,V y ,V z , . . ., (dan (t)|)V x,V y ,V z , and original global −−→ scale (f (t)|)V x,V y ,V z . In the previous section of feature extraction, we used transformation functions, i.e., nonlinear recurrence and wavelet methods, to obtain a group of quantifiers that can track the underlying pathological variations from the massive VCG dataset. But such a high-dimensional feature space cannot only bring model complexity and overfitting problems to the classification stage but also hinder the deeper understanding of the underlying processes. Therefore, feature selection is further employed to reduce the dimensionality of feature space by selecting a smaller subset from existing multiscale RQA quantifiers without a transformation. Feature selection techniques have been successfully applied in many areas including the classification of microarray expression data [30], protein structures [31], and physiological signals, [32] etc. Fig. 5 shows the sequential forward feature search algorithm used for the evaluation and selection of an optimal feature subset from the complete multiscale RQA feature set. The classification model is wrapped into the objective function to search the space of all feature subsets for the best predictive accuracy. Starting from the empty set S0 , the additional feature s+ that results in the highest objective classification function J(Sk + s+ ) is sequentially added into the feature subset Sk that has already been selected. These steps are repeated until the desired feature subset size is reached. The step of feature selection cannot only surmount the aforementioned model complexity and overfitting problems, but also provide faster and more cost-effective models with the optimal feature subset. E. Classification Three classification models, namely, linear, quadratic, and k nearest neighbor (KNN), are considered to separate MI from HC cases in the feature space with linear hyperplane, parabola hyperplane, and arbitrary boundary, respectively. Linear discriminant analysis (LDA) assumes that a linear hyperplane serves as a boundary separating HC and MI points in the feature space, and it uses a least square estimation to determine the coefficients, which give the best fit. The hyperplane that captures the relationships between the input features and YANG: MULTISCALE RECURRENCE ANALYSIS OF SPATIAL CARDIAC VCG SIGNALS the output response is of the form y = αS + ε = α0 + n αi Si + ε i=1 where αi are the model coefficients and si represents the input features. The residual error ε of the difference between the actual value and the model output determines whether the presented VCG is taken from HC or MI subject. Quadratic discriminant analysis (QDA) assumes that a parabola hyperplane serves as a boundary separating HC and MI points in the feature space. The only difference from LDA is that it adds the quadratic term to increase the degree of freedom for the classification boundary. The mathematic form of QDA model is shown as follows: n n n αi Si + βij Si Sj + ε y = αS + S T βS + ε = α0 + i=1 i=1 j =1 where αi , βij are the coefficients, si represents the input features, and ε is the residual errors. The KNN rule is a very intuitive method that classifies unlabeled examples based on their similarity to examples in the training set. For a given unlabeled feature point su , find the k “closest” labeled feature examples in the training dataset and assign su to the class that appears most frequently within the k-subset. The present paper uses the Euclidean metric to measure “closeness” in the KNN classification model. III. MATERIALS AND EXPERIMENTAL DESIGN A. Database Some 448 VCG recordings (368 MIs and 80 HCs) available in the PTB Database [33], [34] from the PhysioNet 2006 QT Challenge [35], [36] were analyzed in this investigation. Each recording contains 15 simultaneous heart-monitoring signals, namely, the conventional 12-lead ECG and the 3-lead Frank (XYZ) VCG. The signals were digitized at 1 kHz sampling rate with a 16-bit resolution over a range of ±16.384 mV. The 80 HC recordings are acquired from 54 healthy subjects and the 368 MI recordings from 148 patients. Within the MI recordings, there are 89 inferior, 56 inferior–lateral, 19 inferior–posterior–lateral, 1 inferior– posterior, 47 anterior, 43 anterior–lateral, 77 anterior–septal, 2 anterior–septal–lateral, 3 lateral, 4 posterior, 5 posterior– lateral, and 22 unknown site cases. The VCG recordings were typically of ∼2 min duration, and all the signals were recorded for at least 30 s. The present investigation first normalized the VCG signals into the unit range and collected a segment of 4000 data points starting from the first R-peak. Then, the data are further resample to 2000 data points to reduce the RQA computational efforts. These two preprocessing steps can also ensure that the resulted recurrence plots are aligned uniformly from the R-peak and reduce the biases from misalignments. B. Cross-Validation and Performance Evaluation To reduce the bias and overfitting of classification performance evaluation, both K-fold cross-validation and random subsampling methods were employed in this investigation. 343 K-fold cross-validation totally conducts K experiments, in each of which K−1 folds are used for the training purpose and the rest onefold for testing. The true performance estimate is obtained as the average of those randomly separate error rates on testing samples. The random subsampling method will further replicate such K-fold cross-validation experiments for 1000 times by randomly creating the K-fold partitions to obtain the probability distribution of performance statistics. This procedure is repeated from twofold to tenfold for the identification of MI subjects in the PTB database. This integration of K-fold crossvalidation and random subsampling methods not only prevent the biases from the inequitable selection of training datasets, but also provide the general performances of feature sets and classification models. The classification results are evaluated based on two performance statistics, namely, sensitivity and specificity from the testing dataset. Sensitivity measures the proportion of actual positives, which are correctly identified as such, and the specificity measures the proportion of negatives, which are correctly identified [37]. In other words, sensitivity is the indictor, which gives the percentage of MI subjects that are identified as having the condition, and specificity is the percentage of healthy control cases that are identified as not having the condition. IV. RESULTS The multiscale recurrence analysis decomposes and characterizes the system recurrence dynamics into various levels, which effectively capture the system transient, intermittent, and steady behaviors. As shown in Fig. 6, the VCG attractor recurrence dynamics are analyzed in multiple wavelet scales A7, D1, D2, D3, D4, D5, D6, D7 using the Daubechies wavelets db4. For the scale A7, the recurrence plot Dist(i, j)|A 7 = −→ −→ ϑ(fa7 (i)|V x,V y ,V z − fa7 (j)|V x,V y ,V z ) is computed from −−−→ the approximation signals A7 : fa7 (t)|V x,V y ,V z . For the scales from D1 to D7, the recurrence plots are computed −−→ from the wavelet detail series D1 : d1 (t)|V x,V y ,V z , . . . , Dn : −−→ d7 (t)|V x,V y ,V z as follows: − → − → Dist(i, j)|D n = ϑ(dn (i)|V x,V y ,V z − dn (j)|V x,V y ,V z ), n = 1, . . . , 7. The recurrence patterns in the original global scale are shown in Fig. 3(c), which are decomposed into different frequency bands (see Fig. 6) using the DWTs. Based on the Nyquist theorem, the highest frequency represented in the VCG signals is 500 Hz that is one-half of the sampling rate 1000 Hz. Therefore, the frequency ranges for this multiscale recurrence analysis are 0–3.9 Hz (A7), 3.9–7.81 Hz (D7), 7.81–15.62 Hz (D6), 15.62–31.25 Hz (D5), 31.25–62.5 Hz (D4), 62.5–125 Hz (D3), 125–250 Hz (D2), and 250–500 Hz (D1) in Fig. 6. The lowest frequency trend is captured in the scale of A7, and the highest frequency is basically in the scale of D1. The various frequency components in the VCG signals are hence partitioned for the analysis of recurrence dynamics, which greatly facilitate the study of system transient, intermittent, and steady behaviors. 344 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 58, NO. 2, FEBRUARY 2011 Fig. 7. Fig. 6. Multiscale recurrence plots of the VCG signals in A7, D1, . . ., D7 wavelet scales. In this present investigation, the decomposition level is selected to be 8 for the VCG signals, i.e., A7 is further decomposed into A8 (0–1.45 Hz) and D8 (1.45–3.9 Hz). Six typical RQA features, including RR, DET, LMAX, ENT, LAM, and TT, are extracted from the recurrence plot in each scale. In total, 60 = 6 × (8 + 2) recurrence quantifiers are extracted in this multiscale recurrence analysis for the identification of MI pathological clues. As shown in Fig. 7, the error rate of classification is varied with respect to feature subset size in the step of feature selection. The optimal feature subset size is identified as 6, because it is not only the number of typical RQA features extracted in each wavelet scale, but also the first local minima of error rate variations. It may be noted that if the number of feature chosen is greater than 6, the error rate oscillates rather than decreases, indicating the possible existence of overfitting. Fig. 7 also shows that LDA yields the least classification error rate compared to the QDA and KNN models. The simplicity of the LDA classification model further signifies the effectiveness of the multiscale RQA features. Fig. 8 shows the distribution of the first three selected features, namely, feature 25, 22, and 40 in the 1-D histogram plot, 2-D, and 3-D scatter plots, respectively. The first selected feature is the 25th feature, which is the RR in the scale of D3 (62.5–125 Hz). As shown in Fig. 8(a), the distribution of feature 25 varies significantly between HC (26.9 ± 19.1) and MI (4.2 ± 6.2) groups. The p-value of feature 25 is 5.21 × 10−26 in the two Variations of classification error rate versus feature subset size. Fig. 8. Multiscale recurrence feature distribution plots; (a) histogram of feature 25; (2) 2-D scatter plot between feature 22 and feature 25; (3) 3-D scatter plot between feature 22, feature 25, and feature 40. sample Kolmogorov–Smirnov test, which further proved the existence of distinct distributions between HC and MI. Fig. 8(b) presents the 2-D scatter plot of feature 25 and feature 22. Feature 22 is the ENT in the scale of D2 (125 ∼ 250 Hz). MI subjects are visualized in the red circles and HC in the blue triangles. The majority of MI subjects are found to be in the upper portion and HC in the lower portion of Fig. 8(b). Additionally, the third feature 40, ENT in the scale of D5 (15.62 ∼ 31.25 Hz), is added to characterize the 3-D distribution of HC (368) and MI (80) recordings in the PTB database. It can be seen that these three features can effectively discriminate HC from MI subjects, and MI patients are found to cluster, while the healthy controls spread mostly outside. It may also be noted that HCs are shown to have a more diverse distribution (i.e., larger mean and standard deviation) than MIs, implying that HC subjects are more dynamically adaptive. The more variability of heart dynamics shown in the HC cases was also observed in Goldberger’s study on fractal dynamics in physiology [18]. The classification results using linear discriminant models are shown in Fig. 9(a) and (b). The distribution of sensitivity and specificity are computed from 1000 replicates of the K-fold cross-validation. The box plot is used to visualize the distribution of classification performance. The red line in the middle of boxplot represents the median, the blue box shows the lower YANG: MULTISCALE RECURRENCE ANALYSIS OF SPATIAL CARDIAC VCG SIGNALS 345 experiments of three aforementioned models, the features extracted from multiscale recurrence analysis are, therefore, found to be adequately discriminatory to capture the dynamic variations between HC and MI and ensure a high-classification accuracy for the MI subjects. The classification results are shown to be comparable to the average of the combined scores of eight experienced cardiologists (sensitivity-80.3%, specificity-97.1% in Table I, especially considering the sparse HC data in this investigation. The average overall classification rate is 92.7% for the LDA in the present investigation, calculated from (368 × 0.965 + 80 × 0.75)/(368 + 80). The experienced cardiologists were shown to achieve an average of 87.2% = (547 × 0.803 + 382 × 0.971)/(547 + 382). Few, if any, previous approaches yield >90% in the overall classification rate for the diagnostic of MI including both various nidus locations and different myocardium damage levels [9]–[15]. It may be noted that multiscale recurrence method yields a better overall accurate classification rate (>5.5%) compared to human expertise or other standard methods. V. DISCUSSION AND CONCLUSION Fig. 9. Performance results of various classification models using K -fold cross-validation and random subsampling methods; (a) linear model sensitivity, (b) linear model specificity, (c) quadratic model sensitivity, (d) quadratic model specificity, (e) KNN model sensitivity, (f) KNN model specificity. quartile and upper quartile of data distributions, and the black dash lines represent the most extreme values within 1.5 times the interquartile range. Outliers are shown as the red cross in the box plot. As shown in Fig. 9(a), the median sensitivity is shown to be higher than 96.5% with small deviations (<0.1%). The K-fold number is varied from twofold to tenfold, and the random subsampling method is also used to randomly creating the K-fold partitions for 1000 times. Despite the randomness of training data selection, the sensitivity performance is shown to be fairly high (>96.5%) and yield a stable increase trend. Fig. 9(b) also demonstrates an average specificity performance of 75% with small deviations (<0.5%). This indicates that the features extracted from the multiscale recurrence analysis can significantly discriminate between MI and HC in the LDA. The feature selection and classification using linear quadratic models are shown in Fig. 9(c) and (d). The sensitivity is shown to increase from 94% to 95% with small deviations (<0.2%) while raising the K-fold number from twofold to tenfold [see Fig. 9(c)]. But the specificity performance is found to be around 77% and 77.5%, which is increased 2% from the linear discriminant models (75%). The parabola hyperplane is shown to achieve a better capability for the identification of healthy control subjects. Fig. 9(e) and (f) presents the results of feature selection and classification using KNN models. The KNN models utilize the nonlinear classification boundaries and are found to further improve sensitivity to 98%. But it is at the expense of the specificity, which is shown to be around 65% and 69%. Based upon the It is challenging to characterize and identify the ECG signal pathological patterns of MI, using automated computer methods or human expertise due to the various nidus locations and different myocardium damage levels. As commonly known, MI can take place in different portions of the heart, i.e., anterior, inferior, posterior, inferior–lateral, anterior–septal, posterior– lateral. Any of the three myocardium damage stages, namely, ischemia, injury, and necrosis, can occur in combination [1]. The natural electrical activity exhibitions, like ECG or VCG signals, from MI are shown to have significantly complex patterns under a different combination of MI conditions. Additionally, the autonomous control mechanisms in the complex biological system generate the high level of nonlinearity, nonstationarity, and noise in the signals. Not only the different MI conditions but also the inherent complexity from the living organisms pose remarkable difficulties for the MI identification, which also defy the conventional temporal-domain analysis and statistical methods. In this investigation, an attempt was made to analyze the variations of multiscale recurrence dynamics in the spatial VCG signals for the identification of MI subjects. The multiscale recurrence plots characterize the system recurrence dynamics in various frequency bands, which can effectively capture the system transient, intermittent, and steady behaviors. The multiscale recurrence quantifiers are found to be adequately discriminatory to capture the dynamic variations between HC and MI. The LDA is demonstrated to yield a sensitivity around 96.5% with small deviations (<0.1%) and an average specificity of 75% with 0.5% deviations. The QDA achieves a sensitivity around 94.5% with small deviations (<0.2%) and a specificity of 77.5%. It is shown to have a better specificity but at the expense of sensitivity. The KNN models identify the nonlinear mapping from the extracted features to the patient’s condition. It provides a relatively high sensitivity (∼98%) with a compromise of specificity (≈69%). The classification results from three models demonstrate a comparable performance to the average of the combined 346 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 58, NO. 2, FEBRUARY 2011 scores of eight experienced cardiologists (sensitivity-80.3%, specificity-97.1%), considering the relative small MI datasets and much sparser HC data in this investigation. The overall classification rate is 92.7% in the present multiscale recurrence investigation, which is better (>5.5%) than human performance and other standard methods. Cardiac monitoring algorithms have been striving to improve the overall classification rate, sensitivity, and specificity. However, limited by the overall classification rate in a particular approach, researchers can either improve the sensitivity at the expense of specificity or sacrifice the sensitivity to improve the specificity. This investigation shows that the VCG features extracted using multiscale RQA analysis are good indicators of the cardiovascular conditions. This research has great potentials for the development of automated MI classification and warning system, which can dramatically reduce the healthcare cost and promote the preventive medicine. In addition, the results from this study can also be applied in the understanding of the recurrence dynamics and nonstationary patterns of other physiological and engineering systems. ACKNOWLEDGMENT The author would like to thank anonymous reviewers for their constructive comments, and also grateful to Dr. S. T. S. Bukkapatnam (Industrial Engineering), Dr. R. Komanduri (Mechanical Engineering), and Dr. B. Benjamin (Physiology) from Oklahoma State University for their valuable discussions and suggestions, and Dr. E. Bennett from the Department of Molecular Pharmacology and Physiology, University of South Florida, for the comments during the revisions. The author is entirely responsible for the data analysis and conclusions and the material presented in this paper. REFERENCES [1] D. Dale, Rapid Interpretation of EKG’s: An Interactive Course. Tampa, FL: Cover Publishing Company, 2000. [2] J. Malmivuo and R. Plonsey, Bioelectromagnetism: Principles and Applications of Bioelectric and Biomagnetic Fields. New York: Oxford University Press, Jul. 18 1995, p. 506. [3] G. D. Clifford, F. Azuaje, and P. McSharry, Advanced Methods and Tools for ECG Data Analysis. Norwood, MA: Artech House, 2006. [4] H. Yang, M. Malshe, S. T. S. Bukkapatnam, and R. Komanduri, “Recurrence quantification analysis and principal components in the detection of myocardial infarction from vectorcardiogram signals,” presented at the 3rd INFORMS Workshop Data Mining Health Informatics, session A2.2, Washington, DC, Oct. 11. 2008. [5] G. E. Dower, A. Yakush, S. B. Nazzal, R. V. Jutzy, and C. E. Ruiz, “Deriving the 12-lead electrocardiogram from four (EASI) electrodes,” J. Electrocardiol., vol. 21, pp. S182–S187, 1988. [6] G. E. Dower, H. B. Machado, and J. A. Osborne, “On deriving the electrocardiogram from vectorcardiographic leads,” Clin. Cardiol., vol. 3, pp. 87–95, 1980. [7] G. E. Dower and H. B. Machado, “XYZ data interpreted by a 12-lead computer program using the derived electrocardiogram,” J. Electrocardiol., vol. 12, pp. 249–261, 1979. [8] D. Dawson, H. Yang, M. Malshe, S. T. S. Bukkapatnam, B. Benjamin, and R. Komanduri, “Linear affine transformations between 3-lead (Frank XYZ leads) vectorcardiogram (VCG) and 12-lead electrocardiogram (ECG) signals,” J. Electrocardiol., vol. 42, pp. 622–630, 2009. [9] D. Lemire, C. Pharand, J. Rajaonah, B. A. Dube, and A. R. LeBlanc, “Wavelet time entropy, T wave morphology and myocardial ischemia,” IEEE Trans. Biomed. Eng., vol. 47, no. 7, pp. 967–970, Jul. 2000. [10] G. Bortolan and I. Christov, “Myocardial infarction and ischemia characterization from T-loop morphology in VCG,” in Proc. Comput. Cardiol., 2001, pp. 633–636. [11] A. Rubulis, J. Jensen, G. Lundahl, J. Tapanainen, L. Wecke, and L. Bergfeldt, “T vector and loop characteristics in coronary artery disease and during acute ischemia,” Heart Rhythm, vol. 1, pp. 317–325, Sep. 2004. [12] A. de Torbal, J. A. Kors, G. van Herpen, S. Meij, S. Nelwan, M. L. Simoons, and E. Boersma, “The electrical T-axis and the spatial QRS-T angle are independent predictors of long-term mortality in patients admitted with acute ischemic chest pain,” Cardiology, vol. 101, pp. 199–207, 2004. [13] J. W. Starr, G. S. Wagner, V. S. Behar, A. Walston II, and J. C. Greenfield Jr., “Vectorcardiographic criteria for the diagnosis of inferior myocardial infarction,” Circulation, vol. 49, pp. 829–836, May 1 1974. [14] W. Carson and Y. Z. Tseng, “Maximal spatial ST-vector patterns in patients with acute anteroseptal myocardial infarction,” Int. J. Cardiol., vol. 43, pp. 165–173, Feb. 1994. [15] J. Willems, C. Abreu-Lima, P. Arnaud, J. van Bemmel, C. Brohet, R. Degani, B. Denis, J. Gehring, I. Graham, and G. van Herpen, “The diagnostic performance of computer programs for the interpretation of electrocardiograms,” N. Engl. J. Med., vol. 325, pp. 1767–1773, Dec. 19. 1991. [16] R. Sun and Y. Wang, “Predicting termination of atrial fibrillation based on the structure and quantification of the recurrence plot,” Med. Eng. Phys., vol. 30, no. 11, pp. 1105–1111, 2008. [17] J. P. Zbilut, N. Thomasson, and C. L. Webber, “Recurrence quantification analysis as a tool for nonlinear exploration of nonstationary cardiac signals,” Med. Eng. Phys., vol. 24, no. 1, pp. 53–60, 2002. [18] A. L. Goldberger, L. A. N. Amaral, J. M. Hausdorff, P. C. Ivanov, C.K. Peng, and H. E. Stanley, “Fractal dynamics in physiology: Alterations with disease and aging,” Proc. Natl Acad. Sci. USA, vol. 99, pp. 2466– 2472, Feb. 19. 2002. [19] C. Li, C. Zheng, and C. Tai, “Detection of ECG characteristic points using wavelet transforms,” IEEE Trans. Biomed. Eng., vol. 42, no. 1, pp. 21–28, Jan. 1995. [20] S. C. Saxena and V. Kumar, “QRS detection using new wavelets,” J. Med. Eng. Technol., vol. 26, pp. 7–15, 2002. [21] S. Kadambe and R. Murray, “Wavelet transform-based QRS complex detector,” IEEE Trans. Biomed. Eng., vol. 46, no. 7, pp. 838–848, Jul. 1999. [22] S. T. S. Bukkapatnam, S. Kumara, A. Lakhtakia, and P. Srinivasan, “Neighborhood method and its coupling with the wavelet method for nonlinear signal separation of contaminated chaotic time-series data,” Signal Process., vol. 82, pp. 1351–1374, 2002. [23] C. Lin, Y. Du, and T. Chen, “Adaptive wavelet network for multiple cardiac arrhythmias recognition,” Expert Syst. Appl., vol. 34, pp. 2601– 2611, 2008. [24] H. Yang, S. T. S. Bukkapatnam, and R. Komanduri, “Nonlinear adaptive wavelet analysis of electrocardiogram signals,” Phys. Rev. E, vol. 76, pp. 026214-1–026214-8, 2007. [25] L. Brechet, M. -F.. Lucas, C. Doncarli, and D. Farina, “Compression of biomedical signals with mother wavelet optimization and best-basis wavelet packet selection,” IEEE Trans. Biomed. Eng., vol. 54, no. 12, pp. 2186–2192, Dec. 2007. [26] Cheng-Tung Ku, Huan-Sheng Wang, King-Chu Hung, and YaoShan Hung, “A novel ECG data compression method based on nonrecursive discrete periodized wavelet transform,” IEEE Trans. Biomed. Eng., vol. 53, no. 12, pp. 2577–2583, Dec. 2006. [27] A. Katok and B. Hasselblatt, Introduction to the Modern Theory of Dynamical Systems, 1st ed. Cambridge, U.K.: Cambridge Univ. Press, 1995, p. 822. [28] N. Marwan, M. Carmen Romano , M. Thiel, and J. Kurths, “Recurrence plots for the analysis of complex systems,” Phys. Rep., vol. 438, pp. 237– 329, 2007. [29] N. Marwan, N. Wessel, U. Meyerfeldt, A. Schirdewan, and J. Kurths, “Recurrence plot based measures of complexity and their application to heart-rate-variability data,” Phys. Rev. E, vol. 66, p. 026702, Aug. 2002. [30] R. Ruiz, J. C. Riquelme, and J. S. Aguilar-Ruiz, “Incremental wrapperbased gene selection from microarray data for cancer classification,” Pattern Recognit., vol. 39, pp. 2383–2392, 2006. [31] S. Nemati, M. E. Basiri, N. Ghasem-Aghaee, and M. H. Aghdam, “A novel ACO–GA hybrid algorithm for feature selection in protein function prediction,” Expert Syst. Appl., vol. 36, pp. 12086–12094, 2009. [32] Kai-Quan Shen, Chong-Jin Ong, Xiao-Ping Li, Zheng Hui, and E. P. V. Wilder-Smith, “A feature selection method for multilevel mental fatigue EEG classification,” IEEE Trans. Biomed. Eng., vol. 54, no. 7, pp. 1231– 1237, Jul. 2007. [33] R. Bousseljot, D. Kreiseler, and A. Schnabel, “Nutzung der EKGSignaldatenbank CARDIODAT der PTB über das Internet,” Biomed. Tech., Band 40, Ergänzungsband, vol. 1, p. S 317, 1995. YANG: MULTISCALE RECURRENCE ANALYSIS OF SPATIAL CARDIAC VCG SIGNALS [34] D. Kreiseler, R. Bousseljot, “Automatisierte EKG-Auswertung mit Hilfe der EKG-Signaldatenbank CARDIODAT der PTB”. Biomedizinische Technik, Band 40, Ergänzungsband 1 S 319, 1995. [35] A. Goldberger, L. Amaral, L. Glass, J. Hausdorff, P. Ivanov, R. Mark, J. Mietus, G. Moody, C. Peng, and H. Stanley, “PhysioBank, physiotoolkit, and physionet: Components of a new research resource for complex physiologic signals,” Circulation, vol. 23, pp. e215–e220, Jun. 13 2000. [36] G. Moody, H. Koch, and U. Steinhoff, “The physionet/computers in cardiology challenge 2006: QT interval measurement,” Comput. Cardiol., vol. 33, pp. 313–316, Sep. 17–20, 2006. [37] D. G. Altman and J. M. Bland, “Statistics notes: Diagnostic tests 1: Sensitivity and specificity,” BMJ, vol. 308, p. 1552, Jun. 11 1994. 347 Hui Yang (M’08) was born in Nanjing, P.R. China. He received the Ph.D. degree from the School of Industrial Engineering and management, Oklahoma State University, Stillwater. He is currently an Assistant Professor at the Department of Industrial and management Systems Engineering, University of South Florida, Tampa. His research interests include sensor-based computational modeling and analysis of complex systems with special focus on nonlinear stochastic dynamics, and the resulting multifractal behaviors.