Tesis Doctoral

Transcription

Tesis Doctoral
Tesis Doctoral
Nuevas Técnicas de Clasificación de Imágenes
Hiperespectrales
New Techniques for Hyperspectral Image
Classification
Autora: Inmaculada Garcı́a Dópido
DEPARTAMENTO DE TECNOLOGÍA DE LOS COMPUTADORES Y DE LAS
COMUNICACIONES
Conformidad del Director: Antonio Plaza Miguel
Fdo.:
2013
ii
Resumen
La principal contribución del presente trabajo de tesis doctoral viene dada por el diseño e implementación
de nuevas técnicas de clasificación de imágenes hiperespectrales de la superficie terrestre, obtenidas de
forma remota mediante sensores aerotransportados o de tipo satélite. En particular, en la presente tesis
doctoral se integran por primera vez en la literatura técnicas para desmezclado espectral y de clasificación
de forma combinada para mejorar el proceso de interpretación de dichas imágenes. La clasificación y
el desmezclado constituyen dos campos muy activos en el análisis de imágenes hiperespectrales. Por un
lado, las técnicas de clasificación se enfrentan a varios problemas debidos a la alta dimensionalidad de las
imágenes y la escasez de muestras etiquetadas, lo que dificulta los procesos de clasificación supervisada o
semi-supervisada que son los más ampliamente utilizados en este campo (en particular, aquellos basados
en un aprendizaje activo por parte del usuario). Por otra parte, el problema de la mezcla en imágenes
hiperespectrales es bastante relevante, debido a que la resolución del sensor no es lo suficientemente alta
para que en un único pixel solamente se encuentre presente un material. En este sentido, las técnicas para
desmezclado intentan caracterizar los diferentes materiales presentes en cada pixel. En el presente trabajo
de tesis doctoral, se ha integrado la información adicional que proporcionan las técnicas para desmezclado
en el proceso de clasificación con el fin de obtener métodos más eficientes adaptados al uso de imágenes
hiperespectrales, y que además resulten eficientes en términos computacionales. Para validar los nuevos
métodos de clasificación propuestos, se utilizan imágenes proporcionadas por sensores de observación
remota de la tierra tales como AVIRIS (Airborne Visible Infra-Red Imaging Spectrometer) de NASA o
ROSIS (Reflective Optics Spectrographic lmaging System) de la Agencia Alemana del Espacio (DLR).
iv
Abstract
The main contribution of the present thesis work is the design and implementation of new techniques
for classification of remotely sensed hyperspectral images, collected by airborne or spaceborne Earth
observation instruments. Specifically, in this thesis work we explore the integration (for the first time in
the literature) of techniques for spectral unmixing and hyperspectral image classification in synergistic
fashion, with the ultimate goal of improving the analysis and interpretation of hyperspectral imaging by
taking advantage of the advanced properties of both analysis techniques in combined fashion. It should be
noted that spectral unmixing and classification have been two very active areas in hyperspectral imaging,
but these techniques have been rarely exploited in synergistic fashion. On the one hand, classification
techniques face problems related with the extremely high dimensionality of the hyperspectral data and
the limited number of training samples available a priori, which makes it difficult to perform supervised
or semi-supervised classification (particularly, those based on active learning techniques). On the other
hand, the mixture problem is very relevant in hyperspectral imaging, mainly due to the fact that the
spatial resolution often cannot separate between different materials participating in a pixel which results
in a predominance of mixed pixels in this kind of images. As a result, hyperspectral images are dominated
by mixed pixels and unmixing techniques are crucial for a correct interpretation and exploitation of the
data. In this thesis, we have explored the integration of unmixing and classification and particularly
explored the possibility that unmixing approaches offer to provide an additional source of information
in the classification process, with the ultimate goal of obtaining more accurate methods for the analysis
of hyperspectral scenes without increasing significantly the computational complexity of the process.
In order to validate the new classification methods developed in the present thesis work, we resort to
hyperspectral images provided by standard and widely used instruments such as NASA’s Airborne Visible
Infra-Red Imaging Spectrometer (AVIRIS) or the Reflective Optics Spectrographic Imaging System
(ROSIS) operated by the German Aerospace Agency (DLR).
vi
Acknowledgement
I would like to thank the supervisor of this thesis, Antonio Plaza, for his great patience, encouragement
and support over the years, and for spending a lot of his valuable time helping me. It has been a pleasure
for me to work with him.
I would like to thank Professor Paolo Gamba for his collaboration in some of the developments presented in this thesis and also for providing the ROSIS data over Pavia University, Italy, along with the
training and test sets. I also gratefully acknowledge his great help and support during a research visit
to the University of Pavia, Italy, funded by Fondazione Cariplo. This research stay was instrumental in
order to conclude some of the developments presented in this thesis. I would also like to thank Devis
Tuia for his collaboration. I would like to thank Professor D. Landgrebe for making the AVIRIS Indian
Pines hyperspectral data set available to the community and Prof. Melba Crawford at Purdue University
is also gratefully acknowledged for making available the AVIRIS Kennedy Space Center to the community.
I would also like to thank the collaboration of Alberto Villa, Jun Li, Prashanth Marpu, Maciel Zortea,
José Manuel Bioucas Dias and Jon Atli Benediktsson, who also contributed to some of the developments
presented in this thesis work. I would like to thank my colleagues and friends in HyperComp: Javier,
Gabriel, Sergio Sánchez, Daniel, Sergio Bernabé, Abel, Jorge, Nacho, Mahdi and Ben who denitely deserve my sincere acknowledgments. I would also like to thank my friends from my village and from
Cáceres.
Last but not least, I would like to thank my parents (Antonia and Juan Antonio), who have trusted
me and given me emotional support all my life; my sisters (Mari Loli and Esther) and my little brother
(Juan Antonio) who have supported and encouraged me to continue. And finally, I would like to make a
special appreciation to Jesús, who has been a fundamental part of this thesis for his help, understanding
and for making me laugh every day.
This thesis work has been developed under the European Community’s Marie Curie Research Training Networks Programme under reference MRTN-CT-2006-035927, Hyperspectral Imaging Network
(HYPER-I-NET). Funding from the Portuguese Science and Technology Foundation, project PEstOE/EEI/LA0008/2011, and from the Spanish Ministry of Science and Innovation (CEOS-SPAIN project,
reference AYA2011-29334-C02-02) is also gratefully acknowledged. It was also supported in part by the
Icelandic Research Fund and the University of Iceland Research Fund. The development of the thesis
has also received support from the Spanish Ministry of Science and Innovation (HYPERCOMP/EODIX
project, reference AYA2008-05965-C04-02). Funding from Junta de Extremadura (local government)
under project PRI09A110 is also gratefully acknowledged.
viii
Contents
1 Introduction
1.1 Context and motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Main contributions of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Unmixing Prior to Supervised Classification
Images
2.1 Summary . . . . . . . . . . . . . . . . . . . . .
2.2 Introduction . . . . . . . . . . . . . . . . . . . .
2.3 Unmixing-based feature extraction . . . . . . .
2.3.1 Unmixing chain #1 . . . . . . . . . . .
2.3.2 Unmixing chain #2 . . . . . . . . . . .
2.3.3 Unmixing chain #3 . . . . . . . . . . .
2.3.4 Unmixing chain #4 . . . . . . . . . . .
2.4 Experimental results . . . . . . . . . . . . . . .
2.4.1 Hyperspectral data sets . . . . . . . . .
2.4.2 Experiments . . . . . . . . . . . . . . .
2.5 Final observations and future directions . . . .
1
1
5
6
of Remotely Sensed Hyperspectral
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
13
13
15
15
16
16
17
18
18
19
24
3 A Comparative Assessment of Unmixing-Based Feature Extraction Techniques
3.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 A new unmixing-based feature extraction technique . . . . . . . . . . . . . . . . . . . .
3.3.1 Linear spectral unmixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.2 Unsupervised unmixing-based feature extraction . . . . . . . . . . . . . . . . .
3.3.3 Supervised unmixing-based feature extraction . . . . . . . . . . . . . . . . . . .
3.4 Hyperspectral data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 AVIRIS Salinas Valley . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.2 ROSIS Pavia University . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.1 Feature extraction techniques used in the comparison . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
27
27
28
29
30
31
32
33
35
35
37
37
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
3.6
3.5.2 Supervised classification system and experimental setup . . . . . . . . . . . . . . . 39
3.5.3 Analysis and discussion of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Final observations and future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4 Semi-Supervised Self-Learning for Hyperspectral Image Classification
4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1 Semi-supervised learning . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.2 Self-learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 Experiments with AVIRIS Indian Pines data set . . . . . . . . . . .
4.4.2 Experiments with ROSIS Pavia University data set . . . . . . . . . .
4.5 Summary and future directions . . . . . . . . . . . . . . . . . . . . . . . . .
5 A New Hybrid Strategy Combining Semi-Supervised
Unmixing
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Proposed approach . . . . . . . . . . . . . . . . . . . . . .
5.3.1 Considered spectral unmixing chains . . . . . . . .
5.3.2 Proposed hybrid strategy . . . . . . . . . . . . . .
5.3.3 Active learning . . . . . . . . . . . . . . . . . . . .
5.4 Experimental results . . . . . . . . . . . . . . . . . . . . .
5.4.1 Balance between classification and unmixing . . .
5.4.2 Results for AVIRIS Indian Pines . . . . . . . . . .
5.4.3 Results for ROSIS Pavia University . . . . . . . .
5.5 Summary and future directions . . . . . . . . . . . . . . .
6 Conclusions and Future Research Lines
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
49
49
49
52
52
54
57
58
66
71
.
.
.
.
.
.
.
.
.
Classification and Spectral
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
77
77
77
79
79
81
82
82
83
83
88
88
93
A Publications
97
A.1 International journal papers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
A.2 Peer-reviewed international conference papers. . . . . . . . . . . . . . . . . . . . . . . . . 98
Bibliography
101
x
List of Figures
1.1
1.2
1.3
1.4
1.5
1.6
1.7
Increase in spectral resolution of remotely sensed data.
Concept of mixed pixels in hyperspectral image. . . .
Flowchart illustrating the organization of this thesis. .
Summary of contributions in Chapter 2. . . . . . . . .
Summary of contributions in Chapter 3. . . . . . . . .
Summary of contributions in Chapter 4. . . . . . . . .
Summary of contributions in Chapter 5. . . . . . . . .
2.1
Unmixing-based feature extraction chains #1 (spectral endmembers) and #2 (spatialspectral endmembers). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Unmixing-based feature extraction chain #3 (chain #4 replaces endmember extraction
with averaging of the signatures associated to each labeled class in the training set). . . .
(a) False color composition of the AVIRIS Indian Pines scene. (b) Ground truth-map
containing 15 mutually exclusive land-cover classes (right). . . . . . . . . . . . . . . . . . .
Best classification results for AVIRIS Indian Pines (using SVM classifier with Gaussian
kernel, trained with 10 percentage of the available samples per class). . . . . . . . . . . .
The comparative classification results per class for AVIRIS Indian Pines (using SVM
classifier with Gaussian kernel, trained with 10 percentage of the available samples per
class) with MNF and Unmixing Chain#4. . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2
2.3
2.4
2.5
3.1
3.2
3.3
3.4
3.5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 1
. 3
. 6
. 7
. 8
. 9
. 10
Block diagram illustrating an unsupervised clustering followed by MTMF (CMTMFunsup )
technique for unmixing-based feature extraction. . . . . . . . . . . . . . . . . . . . . . . .
Block diagram illustrating a supervised clustering followed by MTMF (CMTMFsup )
technique for unmixing-based feature extraction. . . . . . . . . . . . . . . . . . . . . . . .
(a) False color composition an AVIRIS hyperspectral image comprising several agricultural
fields in Salinas Valley, California. (b) Ground truth-map containing 15 mutually exclusive
land-cover classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Photographs taken at the site during data collection. . . . . . . . . . . . . . . . . . . . . .
(a) False color composition of the ROSIS Pavia scene. (b) Ground truth-map containing
9 mutually exclusive land-cover classes. (c) Training set commonly used for the ROSIS
Pavia University scene. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
17
18
23
24
32
33
35
36
37
LIST OF FIGURES
3.6
Classification results for the AVIRIS Indian Pines scene (obtained using an SVM classifier
with Gaussian kernel, trained with 5% of the available samples). . . . . . . . . . . . . . .
3.7 Classification results for the AVIRIS Salinas Valley scene (obtained using an SVM classifier
with Gaussian kernel, trained with 2% of the available samples). . . . . . . . . . . . . . .
3.8 Classification results for the ROSIS Pavia University scene (obtained using an SVM
classifier with Gaussian kernel, trained with 50 pixels of each available ground-truth class).
3.9 Components extracted by MNF from the ROSIS Pavia University scene (ordered from left
to right in terms of amount of information). . . . . . . . . . . . . . . . . . . . . . . . . . .
3.10 Components extracted by the CMTMFunsup feature extraction technique from the ROSIS
Pavia University scene (in no specific order). . . . . . . . . . . . . . . . . . . . . . . . . .
4.1
4.2
4.3
4.5
4.9
4.4
4.6
4.7
4.8
5.1
A graphical example illustrating how spatial information can be used as a criterion for
semi-supervised self-learning in hyperspectral image classification. . . . . . . . . . . . . . .
OA (as a function of the number of unlabeled samples) obtained for the AVIRIS Indian
Pines data set using the MLR (right) and probabilistic SVM (left) classifier, respectively.
Estimated labels were used in all the experiments, i.e., lr = 0. . . . . . . . . . . . . . . . .
Classification maps and OA (in the parentheses) obtained after applying the MLR classifier
to the AVIRIS Indian Pines data set by using 10 labeled training samples and 750 unlabeled
samples, i.e., ln = 160, un = 750 and lr = 0. . . . . . . . . . . . . . . . . . . . . . . . . . .
OA (as a function of the number of unlabeled samples) obtained for the AVIRIS Indian
Pines data set using the MLR classifier with BT sampling by using 5 labeled samples
per class (in total 80 samples). Two cases are displayed: the one in which all unlabeled
samples are estimated by the proposed approach (i.e., lr = 0) and the optimal case, in
which true labels are used whenever possible (i.e., lr = ur ). . . . . . . . . . . . . . . . . .
OA (as a function of the number of unlabeled samples) obtained for the ROSIS Pavia
University data set using the MLR classifier with BT sampling by using 100 labeled
samples per class (in total 900 samples). Two cases are displayed: the one in which all
unlabeled samples are estimated by the proposed approach (i.e., lr = 0) and the optimal
case, in which true labels are used whenever possible (i.e., lr = ur ). . . . . . . . . . . . . .
Classification maps and OA (in the parentheses) obtained after applying the probabilistic
SVM classifier to the AVIRIS Indian Pines data set by using 10 labeled training samples
and 750 unlabeled samples, i.e., ln = 160, un = 750 and lr = 0. . . . . . . . . . . . . . . .
OA (as a function of the number of unlabeled samples) obtained for the ROSIS Pavia
University data set using the MLR (right) and probabilistic SVM (left) classifier,
respectively. Estimated labels were used in all the experiments, i.e., lr = 0. . . . . . . . .
Classification maps and OA (in the parentheses) obtained after applying the MLR classifier
to the ROSIS Pavia University data set (in all cases, ln = 90 and lr = 0). . . . . . . . . .
Classification maps and OA (in the parentheses) obtained after applying the probabilistic
SVM classifier to the ROSIS Pavia University data set (in all cases, ln = 90 and lr = 0). .
45
46
47
48
48
56
59
64
65
71
73
74
75
76
Flowchart of the unmixing-based chain designated as strategy 1. . . . . . . . . . . . . . . 80
xii
LIST OF FIGURES
5.2
5.3
5.4
5.5
5.6
5.7
Flowchart of the unmixing-based chain designated as strategy 2. . . . . . . . . . . . . . .
Flowchart of the unmixing-based designated as strategy 3. . . . . . . . . . . . . . . . . . .
OA (as a function of the number of unlabeled samples) obtained for the AVIRIS Indian
Pines data set by different classifiers. BT is the semi-supervised classifier where unlabeled
samples are selected using breaking ties. RS is the semi-supervised classifier where
unlabeled samples are selected using random sampling. Finally, Strategy 1 to Strategy
4 denote the semi-supervised hybrid classifier integrating classification and spectral
unmixing (with α = 0.75), where unlabeled samples are selected using BT. . . . . . . . . .
Classification maps and OAs (in the parentheses) obtained after applying different
classifiers to the AVIRIS Indian Pines data set. In all cases the number of labeled samples
was 10, and the number of unlabeled samples (used in the semi-supervised strategies: BT,
RS, Strategy 1, Strategy 2, Strategy 3 and Strategy 4) was set to 300. . . . . . . . . . . .
OA (as a function of the number of unlabeled samples) obtained for the ROSIS Pavia
University data set by different classifiers. BT is the semi-supervised classifier where
unlabeled samples are selected using breaking ties. RS is the semi-supervised classifier
where unlabeled samples are selected using random sampling. Finally, Strategy 1 to
Strategy 4 denote the semi-supervised hybrid classifier integrating classification and
spectral unmixing (with α = 0.75), where unlabeled samples are selected using BT. . . . .
Classification maps and OAs (in the parentheses) obtained after applying different
classifiers to the ROSIS Pavia University data set. In all cases the number of labeled
samples was 10, and the number of unlabeled samples (used in the semi-supervised
strategies: BT, RS, Strategy 1, Strategy 2, Strategy 3 and Strategy 4) was set to 300. . .
xiii
80
81
86
87
90
91
LIST OF FIGURES
xiv
Table Index
1.1
List of acronyms used in this thesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1
Classification accuracies (percentage) and standard deviation obtained after applying the
consider ed SVM classification system (with Gaussian and polynomial kernels) to three
different types of features (original, reduced and unmixing-based) extracted from the
AVIRIS Indian Pines and Kennedy Space Center scenes (ten randomly chosen training
sets). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Statistical differences evaluated using McNemar’s test (polynomial kernel). . . . . . . . . 22
2.2
3.1
3.2
3.3
Number of pixels in each ground-truth class in the four considered hyperspectral images.
The number of training and test pixels used in our experiments can be derived from this
table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
OA and AA (in percentage) obtained by the considered classification system for different
hyperspectral image (AVIRIS Indian Pines and AVIRIS Kennedy Space Center) scenes
using the original Spectral information, unsupervised feature extraction techniques, and
supervised feature extraction techniques. Only the best case is reported for each considered
feature extraction technique (with the optimal number of features in the parentheses) and
the best classification result across all methods in each experiment is highlighted in bold
typeface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
OA and AA (in percentage) obtained by the considered classification system for different
hyperspectral image scenes (AVIRIS Salinas Valley and ROSIS Pavia University) Using the
Original Spectral information, unsupervised feature extraction techniques, and supervised
feature extraction techniques. Only the best case is reported for each considered feature
extraction technique (with the optimal number of features in the parentheses) and the best
classification result across all methods in each experiment is highlighted in bold typeface. 42
TABLE INDEX
4.1
4.2
4.3
4.4
4.5
OA, AA, individual classification accuracies [statistic obtained using the MLR probabilistic
classifier when applied to the AVIRIS Indian Pines hyperspectral data set, with 10 labeled
samples per class (160 samples in total) and un = 750 unlabeled training samples. It is
applied two active learning techniques (MS and BT), and the supervised case is also
reported. lr denotes the number of true labels available in Du (used to implement an
optimal version of each sampling algorithm). The standard deviations are also reported
for each test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
OA, AA, individual classification accuracies [and kappa statistic obtained using the MLR
probabilistic classifier when applied to the AVIRIS Indian Pines hyperspectral data set,
with 10 labeled samples per class (160 samples in total) and un = 750 unlabeled training
samples. It is applied two active learning techniques (MBT and nEQB), and the random
sampling case (RS) is also reported. lr denotes the number of true labels available in
Du (used to implement an optimal version of each sampling algorithm). The standard
deviations are also reported for each test. . . . . . . . . . . . . . . . . . . . . . . . . . . .
OA, AA, individual classification accuracies [kappa statistic obtained using the
probabilistic SVM classifier when applied to the AVIRIS Indian Pines hyperspectral data
set, with 10 labeled samples per class (160 samples in total) and un = 750 unlabeled
training samples. It is applied two active learning techniques (MS and BT), and the
supervised case is also reported. lr denotes the number of true labels available in Du (used
to implement an optimal version of each sampling algorithm). The standard deviations
are also reported for each test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
OA, AA, individual classification accuracies [and kappa statistic obtained using the
probabilistic SVM classifier when applied to the AVIRIS Indian Pines hyperspectral data
set, with 10 labeled samples per class (160 samples in total) and un = 750 unlabeled
training samples. It is applied two active learning techniques (MBT and nEQB), and the
random sampling case (RS) is also reported. lr denotes the number of true labels available
in Du (used to implement an optimal version of each sampling algorithm). The standard
deviations are also reported for each test. . . . . . . . . . . . . . . . . . . . . . . . . . . .
OA, AA, individual classification accuracies [%], and kappa statistic obtained using the
MLR probabilistic classifier when applied to the ROSIS University of Pavia hyperspectral
data set by using 10 labeled samples per class (in total 90 samples) and un = 700 unlabeled
training samples. It is applied two active learning techniques (MS and BT), and the
supervised case is also reported. lr denotes the number of true labels available in Du (used
to implement an optimal version of each sampling algorithm). The standard deviations
are also reported for each test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xvi
60
61
62
63
67
TABLE INDEX
4.6
4.7
4.8
5.1
5.2
5.3
5.4
OA, AA, individual classification accuracies [%], and kappa statistic obtained using the
MLR probabilistic classifier when applied to the ROSIS University of Pavia hyperspectral
data set by using 10 labeled samples per class (in total 90 samples) and un = 700 unlabeled
training samples. It is applied two active learning techniques (MBT and nEQB), and the
random sampling case (RS) is also reported. lr denotes the number of true labels available
in Du (used to implement an optimal version of each sampling algorithm). The standard
deviations are also reported for each test. . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
OA, AA, individual classification accuracies [%], and kappa statistic obtained using the
probabilistic SVM classifier when applied to the ROSIS University of Pavia hyperspectral
data set by using 10 labeled samples per class (in total 90 samples) and un = 700 unlabeled
training samples. It is applied two active learning techniques (MS and BT), and the
supervised case is also reported. lr denotes the number of true labels available in Du (used
to implement an optimal version of each sampling algorithm). The standard deviations
are also reported for each test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
OA, AA, individual classification accuracies [%], and kappa statistic obtained using the
probabilistic SVM classifier when applied to the ROSIS University of Pavia hyperspectral
data set by using 10 labeled samples per class (in total 90 samples) and un = 700 unlabeled
training samples. It is applied two active learning techniques (MBT and nEQB), and the
random sampling case (RS) is also reported. lr denotes the number of true labels available
in Du (used to implement an optimal version of each sampling algorithm). The standard
deviations are also reported for each test. . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
OA [%] obtained for different values of parameter α in the analysis of the AVIRIS Indian
Pines hyperspectral data set with 5 labeled samples per class. The four considered
spectral unmixing strategies are compared. The total number of iterations are given
in the parentheses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
OA [%] obtained or different values of parameter α in the analysis of the ROSIS Pavia
University scene with 5 labeled samples per class. The four considered spectral unmixing
strategies are compared. The total number of iterations are given in the parentheses. . . .
OA, AA [%], and kappa statistic obtained using different classifiers when applied to the
AVIRIS Indian Pines hyperspectral data set. The standard deviation is also reported in
each case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
OA, AA [%], and kappa statistic (in the parentheses) obtained using the MLR classifier
when applied to the ROSIS Pavia University hyperspectral data set. The total number of
labeled samples in each ground-truth class is given in the parentheses. . . . . . . . . . . .
xvii
84
84
85
89
TABLE INDEX
xviii
Chapter 1
Introduction
1.1
Context and motivations
The work developed in this thesis is part of the actual research lines of the Hyperspectral Computing
Laboratory (HyperComp) research group at the Department of Technology of Computers and
Communications, University of Extremadura. In this work, we develop new efficient algorithms for
the analysis of remotely sensed hyperspectral data by integrating concepts of (supervised and semisupervised) classification and (unsupervised) spectral unmixing [1].
Figure 1.1: Increase in spectral resolution of remotely sensed data.
Introduction
Hyperspectral images are an extension of the concept of digital image. Figure 1.1 graphically
illustrates the significant increase in the spectral resolution available for remote sensing data, from
panchromatic (comprising only color bands), to multispectral, hyperspectral and even ultraspectral,
where these concepts progressively increase not only the number of spectral bands collected by the
imaging instrument, but also the spectral resolution (modern instruments not only have more bands,
but the bands are also more narrow or close to each other). Particularly, hyperspectral images comprise
hundreds of narrow spectral bands. As a result, each pixel in a hyperspectral image is not only formed
by a single discrete value, but by a wide range of values for different spectral measurements recorded by
a sensor or measuring instrument. The collection of all the wavelength values (one per spectral band)
which are associated to a given pixel is called a spectral signature (see Fig. 1.2) [2]. As a result, we
can understand a hyperspectral image as a collection of spectroscopic measurements that provide very
detailed of information on the properties of the materials appearing on the scene.
The number and variety of processing tasks in hyperspectral remote sensing is enormous [3]. However,
the majority of algorithms can be organized according to the following specific tasks:
• Dimensionality reduction consists of reducing the dimensionality of the input hyperspectral scene
to facilitate subsequent processing tasks [4].
• Target and anomaly detection consist of searching the pixels of a hyperspectral data cube for rare
(either known or unknown) spectral signatures [5].
• Change detection consists of finding the significant (i.e., important to the user) changes between
two hyperspectral scenes of the same geographic region [6].
• Classification consists of assigning a label (class) to each pixel of a hyperspectral data cube [3].
• Spectral unmixing consists of estimating the fraction of the pixel area covered by each material
present in the scene [7].
This thesis is mainly focused on the integration of two of the aforementioned techniques: classification
and spectral unmixing. These are two active areas of research in hyperspectral data interpretation.
Here, we explore the possibility of using spectral unmixing concepts to complement supervised and semisupervised classification techniques. This study represents a novel contribution in the hyperspectral
imaging community, in which these techniques have been traditionally applied in a separate fashion.
On the one hand, spectral unmixing is a fast growing area in which many algorithms have been recently developed to retrieve pure spectral components (endmembers) and to determine their abundance
fractions in mixed pixels [7]. Specifically, in hyperspectral imaging there has been a lot of interest to
address the problem of mixed pixels, which arise when distinct materials are combined into a homogeneous or intimate mixture. This occurs independently of the spatial resolution of the sensor [8]. In order
to mitigate the impact of mixed pixels, several endmember extraction [7, 9, 10] and spectral unmixing
approaches [11, 12, 13] have been developed in the literature under the assumption that a single pixel
vector may comprise the response of multiple underlying materials.
2
1.1 Context and motivations
Figure 1.2: Concept of mixed pixels in hyperspectral image.
On the other hand, hyperspectral image classification has also been a very active area of research in
recent years [14]. Given a set of observations (i.e., possibly mixed pixel vectors), the goal of classification is
to assign a unique label to each pixel so that it is well-defined by a given class [4]. Several techniques have
been successfully used to perform hyperspectral data classification, particularly supervised techniques
such as kernel methods, which can deal effectively with the Hughes phenomenon [15, 16]. However,
supervised classification is generally a difficult task due to the unbalance between the high dimensionality
of the data and the limited availability of labeled training samples in real analysis scenarios [14]. These
labeled samples generally are difficult and expensive to obtain. This has fostered the development of semisupervised techniques able to exploit unlabeled training samples that can be obtained from a (limited)
set of labeled samples without significant effort/cost.
At this point, it is important to emphasize that the analysis of hyperspectral images is not an easy
task; this is due to the great variability of hyperspectral signatures and the high dimensionality of the
data. Another problem that may arise in the analysis of such scenes (as mentioned before) is the intrinsic
nature of the pixels, which may be highly mixed. The most traditional approach in the literature to
describe the phenomenon of the mixture at sub-pixel levels is the linear mixture model [8]. As opposed
to nonlinear unmixing, which generally require detailed information about physical properties that may
not be always available, linear spectral unmixing consists of identifying the pure spectral components
or endmembers. When the pure spectral signatures are identified, the proportion of each material in
each pixel can be estimated. Abundances provide additional information about the composition of each
3
Introduction
pixel; if this information is used in a correct way, it may complement the results provided by traditional
“hard” classification techniques.
For example, Fig. 1.2 illustrates the concept of spectral mixing using a toy example. As Fig. 1.2
shows, it is very likely that the pixel labeled as “vegetation” actually corresponds to several types of
vegetation, or even to a mixture of soil and vegetation. Also, the pixel labeled as “atmosphere” may be
affected by atmospheric interferers that also participate in the mixture, such as clouds, which absorb only
part of the radiation. This example clearly illustrates that the consideration of classification techniques
alone may introduce errors in the pixel characterization from a macroscopical point of view, in particular,
if the pixels are assumed to be homogeneously formed by a predominant substance. To address these
issues, the present thesis work focuses on the development of synergistic approaches for joint classification
and spectral unmixing. Even if we adhere to a classification convention in which each pixel is assigned a
single class, it is our feeling that spectral unmixing can assist in such process by providing an additional
source of information in order to estimate the most accurate classification label to each pixel in the scene.
The specific topics that we will discuss in this thesis work can be summarized as follows:
• The second chapter of the thesis is related with the high spectral dimensionality associated to
hyperspectral scenes, which calls for the development of effective feature extraction approaches that
can extract the most relevant features used for classification purposes. The innovative approach
explored in this chapter is related to the possibility of using unmixing prior to both supervised and
semi-supervised classification. In this context, we explore different ways to obtain the abundances of
each pure material in the hyperspectral image and to use this information to assist the classification
process.
• The third chapter of the thesis expands on these concepts and evaluates several spectral unmixing
chains that can be used to extract features based on abundance fractions for subsequent
classification using different strategies. The evaluation is conducted using a set of highly
representative hyperspectral scenes, and drawing comparisons to other state-of-the-art approaches.
• The fourth chapter of the thesis is related to addressing the problem concerning the limited number
of labeled training samples that can be traditionally found in practice, which affects the design of
supervised classification strategies. The procedure used to collect labeled samples is very expensive
and difficult. For this purpose, new semi-supervised classification techniques (some of them based
on active learning) have been developed. This area has undergone a significant evolution in terms of
the models adopted in recent years. An extensive review of techniques for semi-supervised learning
is available in [17]. Our proposed strategy does not require a large number of labeled samples
because the classifier is trained with both labeled and unlabeled samples which are generated
without extra cost. The unlabeled samples are obtained by the classifier automatically.
• Finally, in the fifth chapter of this thesis we develop new strategies that synergistically combine
hyperspectral unmixing and classification in order to exploit both sources of information in
complementary nature, thus overcoming the limitations of using these techniques in separate
fashion. The result is a new framework that integrates hyperspectral unmixing into the classification
process, with the possibility to control the relative weight of unmixing with regards to classification
4
1.2 Objectives
and vice-versa. We show that spectral unmixing concepts can help in the semi-supervised
classification process. The chapter provides ample experimental evidence supporting our claims.
1.2
Objectives
The main objective of this thesis is to develop new and efficient techniques that integrate concepts of
classification and spectral unmixing, combining their advantages in synergistic fashion while minimizing
the disadvantages associated with the separate application of each technique. In order to achieve this
general objective, several specific objectives have also been accomplished:
1. To study existing techniques for classification (supervised and semi-supervised) of remotely sensed
hyperspectral data sets, with focus on semi-supervised and active learning techniques, evaluating
their advantages and disadvantages.
2. To study existing techniques for spectral unmixing in order to evaluate their advantages and
disadvantages in hyperspectral image analysis and interpretation.
3. To develop new techniques for hyperspectral image classification, based on the integration of
supervised and semi-supervised classification techniques and linear spectral unmixing techniques,
and with the ultimate goal of analyzing the advantages that can be obtained from the joint
exploitation of these approaches.
4. To evaluate the new classification techniques developed in this work, which result from the
combination of traditional classification concepts and also spectral unmixing concepts. In the
context of semi-supervised classification, we also analyze existing active learning techniques in
order to intelligently select the most informative training samples in the classification process.
5. To design, implement and validate new processing chains based on the integration of techniques
for classification and spectral unmixing, thus allowing a thorough comparative study of the
new techniques using real hyperspectral data sets obtained by different sensors, such as the
Airborne Visible Infra-Red Imaging Spectrometer (AVIRIS), operated by NASA’s Jet Propulsion
Laboratory, or the Reflective Optics Spectrographic Imaging System (ROSIS), operated by the
German Aerospace Center (DLR).
6. To provide a set of recommendations of use (best practice) for the new classification techniques
developed, after carefully evaluating and assessing their performance in terms of classification
accuracy using standard metrics such as the overall accuracy (OA), average accuracy (AA) and
kappa index [4].
5
Introduction
1.3
Main contributions of the thesis
Figure 1.3: Flowchart illustrating the organization of this thesis.
The main contributions of the thesis are summarized in Fig. 1.3. The thesis is structured in a set
of chapters which are inter-related as described in Fig. 1.3. As shown in the figure, all the newly
developed techniques are in-between spectral unmixing and classification and designed in a way that
exploits features from either or from both techniques simultaneously. In the following, we provide a
description of the different chapters in which we have structured the present thesis work:
6
1.3 Main contributions of the thesis
Figure 1.4: Summary of contributions in Chapter 2.
* In Chapter 2 (see Fig. 1.4), we explore the possibility of using spectral unmixing as a way to perform feature extraction prior to classification of hyperspectral data, thus addressing the unbalance
between the (limited) number of training samples and the (high) spectral dimensionality of the data.
* Chapter 3 is a follow-up to the previous chapter in which different spectral unmixing chains are explored
in order to determine the specific unmixing chain (and the number of features) that should be
retained by spectral unmixing prior to different classification scenarios, providing recommendations
about the best possible use of different chains in different application contexts. The contributions
of this chapter are summarized in Fig. 1.5.
* Chapter 4 describes a new semi-supervised self-learning approach for the classification of hyperspectral
images using unlabeled training samples. The new unlabeled samples are generated using spatial
information, and active learning approaches are used to select the most informative samples for
the classification process. Fig. 1.6 provides a summary of the contributions in this chapter.
7
Introduction
Figure 1.5: Summary of contributions in Chapter 3.
* Chapter 5 discusses the integration of spectral unmixing and classification in order to design a new
semi-supervised framework using active learning concepts. Several unmixing chains are used for
this purpose, including information about mixed pixels in order to incorporate this information into
the classification process. Specifically, the idea of integrating classification and spectral unmixing
in simultaneous fashion is explored in this chapter. This strategy is summarized in Fig. 1.7.
To conclude this chapter, Table 1.1 provides a list of all the acronyms that have been used throughout
the thesis document. Hereinafter, these acronyms will be used instead of the full terms for simplicity.
8
1.3 Main contributions of the thesis
Figure 1.6: Summary of contributions in Chapter 4.
9
Introduction
Figure 1.7: Summary of contributions in Chapter 5.
10
1.3 Main contributions of the thesis
Table 1.1: List of acronyms used in this thesis.
AA
ANC
ASC
AVIRIS
BT
CEM
CMTMFsup
CMTMFunsup
DAFE
DBFE
DLR
FCunsup
FCLSU
HySime
ICA
JADE
LIBSVM
LORSAL
MAP
MBT
MLR
MNF
MS
MTMF
MTMFavg
MTMFsup
MTMFunsup
nEQB
NWFE
OA
OSP
PCA
RBF
ROSIS
SVM
SA
SMLR
SNR
TSVM
VCA
VD
Acronyms
Average Accuracy [18]
Abundance Non-negativity Constraint [19]
Abundance Sum-to-one Constraint [19]
Airbone Visible Infra-Red Imaging Spectrometer [20]
Breaking Ties [21]
Constrained Energy Minimization [22]
Supervised Clustering followed by Mixture-Tuned Matched Filtering [23]
Unsupervised Clustering followed by Mixture-Tuned Matched Filtering [23]
Discriminant Analysis for Feature Extraction [14]
Decision Boundary Feature Extraction [14]
German aerospace center [Online: www.dlr.de/en/]
Unsupervised Fuzzy Clustering [24]
Fully Constrained Linear Spectral Unmixing [19]
Hyperspectral Subspace Identification by Minimum Error [25]
Independent Component Analysis [26]
Joint Diagonalization of Eigenmatrices [27]
Library of SVM [Online: http://www.csie.ntu.edu.tw/∼cjlin/libsvm/]
Logistic Regression via variable Splitting and Augmented Lagrangian [28]
Maximum A Posteriori [29]
Modified Breaking Ties [29]
Multinomial Logistic Regression [30]
Minimum Noise Fraction [31]
Margin Sampling [32]
Mixture-Tuned Matched Filtering [33]
Averaged Mixture-Tuned Matched Filtering [34]
Supervised Mixture-Tuned Matched Filtering [34]
Unsupervised Mixture-Tuned Matched Filtering [34]
Normalized Entropy Querying by Bagging [35]
Non-parametric Weighted Feature Extraction [14]
Overall Accuracy [18]
Orthogonal Subspace Projection [36]
Principal Component Analysis [37]
Gaussian Radial Basis function [16]
Reflective Optics Spectrographics Imaging System [38]
Support Vector Machine [15]
Spectral Angle [8]
Sparse Multinomial Logistic Regression [39]
Signal-to-Noise Ratio [2]
Transductive Support Vector Machine [40, 41]
Vertex Component Analysis [42]
Virtual Dimensionality [43]
11
Introduction
12
Chapter 2
Unmixing Prior to Supervised
Classification of Remotely Sensed
Hyperspectral Images
2.1
Summary
Supervised classification of hyperspectral images is a very challenging task due to the generally
unfavorable ratio between the number of spectral bands and the number of training samples available a
priori, which results in the Hughes phenomenon. For this purpose, several feature extraction methods
have been investigated in order to reduce the dimensionality of the data to the right subspace without
significant loss of the original information that allows for the separation of classes. In this chapter,
we explore the use of spectral unmixing for feature extraction prior to supervised classification of
hyperspectral data using SVM. The proposed feature extraction strategy has been implemented in the
form of four different unmixing chains, and evaluated using two different scenes collected by NASA
Jet Propulsion Laboratory’s AVIRIS. Experiments suggest competitive results, but also show that the
definition of the unmixing chains plays an important role in the final classification accuracy. Moreover,
differently from most feature extraction techniques available in literature, the features obtained using
linear spectral unmixing are potentially easier to interpret due to their physical meaning.1
2.2
Introduction
In many studies, hyperspectral analysis techniques are divided into full-pixel and mixed-pixel
classification techniques [8, 37, 44], where each pixel vector defines a spectral signature or fingerprint
that uniquely characterizes the underlying materials at each site in a scene. Full-pixel classification
techniques assume that each spectral signature comprises the response of one single underlying material.
Often, this is not a realistic assumption. If the spatial resolution of the sensor is not fine enough to
1 Part of this chapter has been published in: I. Dopido, M. Zortea, A. Villa, A. Plaza and P. Gamba, Unmixing Prior to
Supervised Classification of Remotely Sensed Hyperspectral Images, IEEE Geoscience and Remote Sensing Letters, vol.
8, no. 4, pp. 760-764, July 2011 [JCR(2011)=1.560].
Unmixing Prior to Supervised Classification of Remotely Sensed Hyperspectral Images
separate different pure signature classes at a macroscopic level, these can jointly occupy a single pixel,
and the resulting spectral signature will be a composite of the individual pure spectral, often called
endmembers in hyperspectral imaging terminology [45]. Let us denote a remotely sensed hyperspectral
scene with n bands by I, in which each pixel is represented by a vector X = [x1 , x2 , · · · , xn ] ∈ ℜn ,
where ℜ denotes the set of real numbers in which the pixel’s spectral response xk at sensor channels
k = 1, . . . , n is included. Under the linear mixture model assumption, each pixel vector can be modeled
using:
X=
p
∑
Φz · Ez + n,
(2.1)
z=1
where Ez denotes the spectral response of endmember z, Φz is a scalar value designating the fractional
abundance of the endmember z at the pixel X, p is the total number of endmembers, and n is a noise
vector. Two physical constrains can be imposed into the model described in (3.1): ANC, i.e., Φz ≥ 0,
∑p
and ASC, i.e., z=1 Φz = 1 [19].
Several machine learning techniques have been applied, under the full-pixel assumption, to extract
relevant information from hyperspectral data. The good classification performance exhibited by SVM
[15, 44, 46] using spectral signatures as input features, can be improved by applying suitable feature
extraction strategies able to reduce the dimensionality of the data to a subspace without losing the
original information [47, 48]. We consider three traditional feature extraction techniques addressing the
aforementioned issues:
• PCA is an orthogonal linear transformation which projects the data into new coordinate system,
such that the greatest amount of variance of the original data is contained in the first principal
components [37]. The resulting components are uncorrelated.
• MNF differs from PCA in the fact that MNF ranks the obtained components according to their
SNR [31].
• ICA tries to find components as statistically independent as possible, minimizing all the
dependencies in the order up to fourth [26].
There are several strategies that can be adopted to define independence (i.e., minimization of
mutual information, maximization of non-Gaussianity, etc.). In this chapter, among several possible
implementations, we have chosen JADE [27] which provides a good tradeoff between performance and
computational complexity, when used for dimensionality reduction of hyperspectral images. However,
all these methods maximize the information contained in the first transformed components, relegating
variations of less significant size to low order components. If such low order components are not preserved,
small classes may be affected. The inclusion of spatial features such as morphological profiles can be
used to address this issue [38, 47, 49].
In this chapter, we explore an alternative strategy focused on the use of spectral unmixing for feature
extraction prior to classification. Previous efforts in this direction were presented in [50, 51], but the
analysis of whether spectral unmixing can replace standard feature extraction transformations remains
an unexplored topic. Although classification techniques often neglect the impact of mixed pixels in the
14
2.3 Unmixing-based feature extraction
provision of a set of final class labels, widely used benchmark data sets in the literature –i.e., the AVIRIS
Indian Pines scene– are known to be dominated by mixed pixels, even if the associated ground-truth
information is only available in full-pixel form. Hence, the use of spectral unmixing presents distinctive
features with regards to other approaches such as PCA, MNF or ICA. First, it provides additional
information for classification in hyperspectral analysis scenarios with moderate spatial resolution, since
the sub-pixel composition of training samples can be used as part of the learning process of the classifier.
Second, the components estimated by spectral unmixing can be physically explained as the abundances of
spectral endmembers. Third, spectral unmixing does not penalize classes which are not relevant in terms
of variance or SNR. Here, we design different unmixing processing chains with the goal of addressing
three specific research questions:
1. Is spectral unmixing a feasible strategy for feature extraction prior to classification?
2. Does the inclusion of spatial information at the endmember extraction stage lead to better
classification results?
3. Is it really necessary to estimate pure spectral endmembers for classification purposes?
We have structured the remainder of this chapter as follows. Section 2.3 describes the considered
spectral unmixing chains. Section 2.4 presents different experiments specifically designed to address
the research questions above and provide a comparison between the proposed unmixing-based strategy
and other feature extraction approaches in the literature. Section 2.5 concludes with some remarks and
future research avenues.
2.3
2.3.1
Unmixing-based feature extraction
Unmixing chain #1
In this subsection we describe our first approach to design an unmixing-based feature extraction chain
which can be summarized by the flowchart in Fig. 2.1. First, we estimate the number of endmembers, p,
directly from the original n-dimensional hyperspectral image I. For this purpose, we use in this chapter
two standard techniques widely used in the literature such as the HySime method [25] and the VD concept
[43]. Once the number of endmembers p has been estimated, we apply an automatic algorithm to extract
a set of endmembers from the original hyperspectral image [9]. Here, we use OSP technique [36] which
has been shown in previous work to provide a very good trade-off between the signature purity of the
extracted endmembers and the computational time to obtain them. Preliminary experiments conducted
with other endmember extraction techniques, such as VCA [42] and N-FINDR [52], have shown very
similar results in terms of classification accuracy. Finally, linear spectral unmixing (either unconstrained
or constrained) can be used to estimate the abundance of each endmember in each pixel of the scene,
providing a set of p abundance maps. Then, standard SVM classification is performed on the stack of
abundance fractions using randomly selected training samples.
15
Unmixing Prior to Supervised Classification of Remotely Sensed Hyperspectral Images
Figure 2.1: Unmixing-based feature extraction chains #1 (spectral endmembers) and #2 (spatial-spectral
endmembers).
2.3.2
Unmixing chain #2
In this subsection we introduce a variation of the unmixing-based feature extraction chain which includes
spatial preprocessing prior to endmember extraction in order to guide the endmember searching process
to those areas which are more spatially homogeneous. This approach is represented in Fig. 2.1. The
spatial preprocessing strategy adopted in this work is described in detail in [53]. As in the previous
chain, the features resulting from the proposed (spatially enhanced) unmixing process are used to train
an SVM classifier with a few randomly selected labeled samples. The classifier is then tested using the
remaining labeled samples.
2.3.3
Unmixing chain #3
Our main motivation for introducing a third unmixing-based feature extraction chain is the fact that
the estimation of the number of endmembers p in the original image is a very challenging issue. Fig. 2.2
describes a new chain in which the endmembers are extracted from the set of available (labeled) training
samples instead of from the original image. This chain introduces two important variations: 1) first,
as a simplification to the challenging estimation problem, the number of endmembers to be extracted
is set as the total number of different classes, c, in the training set; and 2) the endmember searching
16
2.3 Unmixing-based feature extraction
Figure 2.2: Unmixing-based feature extraction chain #3 (chain #4 replaces endmember extraction with
averaging of the signatures associated to each labeled class in the training set).
process is conducted only on the training set, which reduces computational complexity. However, the
number of endmembers in the original image, p, is probably different than c, the number of labeled
classes. Therefore, in order to unmix the original image we need to address a partial unmixing problem
(in which not all endmembers may be available a priori ). A successful technique for this purpose is
MTMF [33], also known as CEM [22], which combines linear spectral unmixing and statistical matched
filtering. From matched filtering, it inherits the ability to map a single known target without knowing
the other background endmember signatures. From spectral mixture modeling, it inherits the leverage
arising from the mixed pixel model and the constraints on feasibility.
2.3.4
Unmixing chain #4
The fourth unmixing chain tested in our experiments [54] represents a slight variation of the unmixing
chain #3 in which the spectral signatures used for unmixing purposes are not obtained via endmember
extraction but through averaging of the spectral signatures associated to each labeled class in the training
set. To keep the number of estimated components low, only one component is allowed for each class.
This averaging strategy produces c signatures, each representative of a labeled class, which are then used
to partially unmix the original hyperspectral scene using MTMF.
17
Unmixing Prior to Supervised Classification of Remotely Sensed Hyperspectral Images
(a)
(b)
Figure 2.3: (a) False color composition of the AVIRIS Indian Pines scene. (b) Ground truth-map
containing 15 mutually exclusive land-cover classes (right).
2.4
Experimental results
2.4.1
Hyperspectral data sets
2.4.1.1
AVIRIS Indian Pines
The data set used in our experiments was collected by the AVIRIS sensor over the Indian Pines region
in Northwestern Indiana in 1992. This scene, with a size of 145 lines by 145 samples, was acquired
over a mixed agricultural/forest area, early in the growing season. The scene comprises 202 spectral
channels in the wavelength range from 0.4 to 2.5 µm, nominal spectral resolution of 10 nm, moderate
spatial resolution of 20 meters by pixel, and 16-bit radiometric resolution. After an initial screening,
several spectral bands were removed from the data set due to noise and water absorption phenomena,
leaving a total of 164 radiance channels to be used in the experiments. For illustrative purposes, Fig.
2.3 (a) shows a false color composition of the AVIRIS Indian Pines scene, while Fig. 2.3 (b) shows the
ground-truth map available for the scene, displayed in the form of a class assignment for each labeled
pixel, with 16 mutually exclusive ground-truth classes. These data, including ground-truth information,
are available online2 , a fact which has made this scene a widely used benchmark for testing the accuracy
of hyperspectral data classification algorithms.
2.4.1.2
AVIRIS Kennedy Space Center
The data set was collected by the AVIRIS sensor over the Kennedy Space Center3 , Florida, on March
1996. The portion of this scene used in our experiments has dimensions of 292 × 383 pixels. After
removing water absorption and low SNR bands, 176 bands were used for the analysis. The spatial
2 http://dynamo.ecn.purdue.edu/biehl/MultiSpec
3 Available
online:http://www.csr.utexas.edu/hyperspectral/data/KSC/
18
2.4 Experimental results
resolution is 20 meters by pixel. 12 ground-truth classes where available, where the number of pixels in
the smallest class is 134 while the number of pixels in the largest class is 761.
2.4.2
Experiments
2.4.2.1
Experiment 1. Use of unmixing as a feature extraction strategy
In this experiment, we use the AVIRIS Indian Pines and Kennedy Space Center data sets to analyze
the impact of imposing ANC and ASC in abundance estimation prior to classification. For the AVIRIS
Indian Pines image, we construct ten small training sets by randomly selecting 5%, 10% and 15% of the
ground-truth pixels.
For the AVIRIS Kennedy Space Center, since the size of the smaller classes is bigger, we decided to
reduce the training sets even more and selected 1%, 3% and 5% of the available ground-truth pixels.
Then, the three considered types of input features (original, reduced and unmixing-based) are built for the
selected training samples and used to train an SVM classifier in which two types of kernels: polynomial
and Gaussian are used. The SVM was trained with each of these training subsets and then evaluated
with the remaining test set. Each experiment was repeated ten times, and the mean and standard
deviation accuracy values were reported. Kernel parameters were optimized by a grid search procedure,
and the optimal parameters were selected using 10-fold cross-validation. The LIBSVM library4 was used
for experiments.
Table 2.1 summarizes the OAs obtained after applying the considered SVM classification system
(with polynomial and Gaussian kernels) to the features extracted after applying the unmixing chain #1
(see Fig. 2.1) to the AVIRIS scenes. The dimensionality of the input data, as estimated by a consensus
between the HySime and the VD methods, was p = 18 for the Indian Pines scene, and p = 15 for the
Kennedy Space Center scene. The chain #1 was implemented using two different linear spectral unmixing
algorithms [19]: unconstrained and fully constrained; due to better accuracy and faster computation,
only results for the unconstrained case are presented. The results after applying the classification system
to the original spectral features, and to those extracted using PCA, MNF and ICA are also reported.
As shown by Table 2.1, the classification accuracy is correlated with the training set size (the larger
the training set, the higher the classification accuracy). The good generalization ability exhibited by
SVMs is demonstrated by the classification results reported for the original spectral information, even
with very limited training sets. The fact that MNF is more effective than PCA and ICA for feature
extraction purposes is also remarkable, since MNF has been more widely used in the context of spectral
unmixing rather than classification. Most importantly, Table 2.1 also reveals that the use of unmixing
chain #1 as feature extraction strategy cannot improve the classification results provided by PCA, MNF,
ICA or the original spectral information. This is because endmember extraction is generally sensitive
to outliers and anomalies, hence a strategy for directing the endmember searching process to spatially
homogeneous areas could improve the final classification results.
4 http://www.csie.ntu.edu.tw/∼cjlin/libsvm/
19
Image
Type of
feature
Original spectral information
PCA
MNF
ICA
Chain #1
Chain #2
Chain #3
Chain #4
Image
Type of
feature
Original spectral information
PCA
MNF
ICA
Chain #1
Chain #2
Chain #3
Chain #4
20
# of
features
176
15
15
15
15
15
12
12
# of
features
202
18
18
18
18
18
16
16
AVIRIS Indian Pines
Polynomial kernel
Gaussian kernel
5%
10%
15%
5%
10%
15%
75.23±1.23 81.55±0.86 83.58±0.78 75.78±1.06 82.11±0.43 84.49±0.53
77.07±1.46 81.66±0.88 83.11±0.52 77.12±1.29 81.68±0.61 82.96±0.58
82.97±1.93 87.41±0.31 88.38±0.57 84.04±0.75 87.66±0.52 89.34±0.43
76.63±1.27 81.00±0.71 82.94±0.36 76.92±0.72 81.27±0.61 82.95±0.71
74.56±1.04 79.20±1.12 80.97±0.50 74.65±0.99 79.45±0.40 80.91±0.39
71.93±0.96 77.58±0.92 79.31±0.33 72.31±0.98 77.36±0.72 79.17±0.28
81.32±0.84 85.56±0.84 86.83±0.55 81.78±0.62 86.12±0.66 87.40±0.82
82.36±1.09 86.87±0.59 87.97±0.57 82.72±1.04 87.59±0.57 88.92±0.80
AVIRIS Kennedy Space Center
Polynomial kernel
Gaussian kernel
1%
3%
5%
1%
3%
5%
70.97±3.32 82.53±1.63 85.71±1.40 72.26±2.42 82.91±1.38 85.50±1.35
73.52±3.69 83.26±1.26 86.11±1.16 74.66±2.94 82.54±1.70 86.28±1.46
77.01±3.77 86.85±2.19 89.59±1.89 77.94±3.48 87.43±2.11 90.01±1.52
70.09±2.91 80.28±1.73 84.59±1.50 70.39±1.58 80.79±1.60 84.58±1.58
69.41±2.64 78.62±1.58 82.84±1.17 69.02±5.40 79.08±1.46 83.53±1.25
67.91±3.98 78.61±3.56 84.26±1.41 68.56±4.70 83.86±1.89 83.86±1.22
74.28±3.23 85.37±1.30 87.88±1.57 75.02±4.13 84.92±1.97 88.47±1.38
76.10±2.49 86.38±1.40 87.84±1.28 77.53±2.58 86.57±0.97 87.72±1.13
Table 2.1: Classification accuracies (percentage) and standard deviation obtained after applying the consider ed SVM classification
system (with Gaussian and polynomial kernels) to three different types of features (original, reduced and unmixing-based) extracted
from the AVIRIS Indian Pines and Kennedy Space Center scenes (ten randomly chosen training sets).
Unmixing Prior to Supervised Classification of Remotely Sensed Hyperspectral Images
2.4 Experimental results
2.4.2.2
Experiment 2.
extraction stage
Impact of including spatial information at the endmember
In this experiment we apply the unmixing chain #2 for feature extraction prior to classification. As shown
by Table 2.1, spatial preprocessing prior to endmember extraction cannot lead to improved classification
results with regards to the chain #1 and the original spectral information. This is due to the spectral
similarity of the most spatially representative classes in our considered scenes. For instance, in the
AVIRIS Indian Pines scene the corns and soybeans were very early in their growth cycle at the time of
data collection, which resulted in low coverage of the soil (≈ 5%) [14].
Given this low canopy ground cover, the variation in spectral response among different classes is
very low and spatial information cannot significantly increase discrimination between different classes.
In order to address this issue, a possible solution is to conduct the endmember extraction process in
supervised fashion, taking advantage of the information contained in the available labeled samples in
order to guarantee that a highly representative endmember is selected per each class.
2.4.2.3
Experiment 3. Impact of endmember purity on the final classification results
In a supervised endmember extraction framework, our first experiment is based on applying the
unmixing chain #3 to select endmembers only from the available training samples. Apart from reducing
computational complexity (which in this case involves a search for c endmembers in the pixels belonging
to the training set), table 2.1 reveals that this strategy improves the classification results reported for the
chains #1 and #2. However, in order to make sure that only one endmember per labeled class is used
for unmixing purposes, we also apply unmixing chain #4 in which spectral averaging of the available
training pixels in each class is conducted in order to produce a final set of c spectral signatures. Despite
averaging of endmembers can lead to degradation of spectral purity, it can also reduce the effects of noise
and/or average out the subtle spectral variability of a given class, thus obtaining a more representative
endmember for the class as a whole. This is illustrated by the classification results for unmixing chain #4
in Table 2.1, which outperform those reported for most other tested methods except MNF. This indicates
that, in a supervised unmixing scenario, the use of spectrally pure signatures is not as important as the
choice of signatures which are representative of the available training samples.
Table 2.2 shows the statistical differences (average value of ten comparisons) between traditional
dimensionality reduction methods and the unmixing chains #3 and #4, computed using McNemar’s test
[55] for the case of the polynomial kernel. The differences are statistically significant at a confidence level
of 95% if |Z| > 1.96. For each couple of compared feature extraction chains, we report also how many
times each chain wins/ties/loses after comparing the thematic maps obtained using the same training
set. If the value of Z reported for each entry of Table 2.2 is positive and larger than 1.96, the first
compared chain wins. By convention, the comparison is always performed with the first chain in a line
of Table 2.2 and the second chain in a column of Table 2.2. It can be noticed that unmixing chains #3
and #4 always perform significantly better than PCA and ICA. MNF performs better than chain #3,
while the differences with chain #4 are in general not significant.
To conclude this section, Fig. 2.4 displays the best classification results (out of 10 runs) obtained
after applying the SVM –trained with 10% of the available training samples– to each feature extraction
21
AVIRIS
Indian Pines
PCA
MNF
ICA
AVIRIS
Kennedy Space Center
PCA
MNF
ICA
5%
Chain #3
Chain #4
-9.52 (0/0/10) -11.88 (0/0/10)
5.22 (8/1/1)
2.05 (6/2/2)
-10.45 (0/0/10) -12.85 (0/0/10)
1%
Chain #3
Chain #4
-2.09 (1/2/7)
-5.73 (0/2/8)
3.08 (6/3/1)
-0.58 (3/4/3)
-5.32 (0/0/10)
-7.90 (0/0/10)
10%
Chain #3
Chain #4
-9.24 (0/0/10) -12.40 (0/0/10)
6.13 (10/0/0)
1.72 (5/4/1)
-17.37(0/0/10) -20.26(0/0/10)
3%
Chain #3
Chain #4
-2.61 (1/2/7)
-5.14 (0/0/10)
3.48 (6/2/2)
0.78 (5/1/4)
-7.14 (0/0/10)
-8.82 (0/0/10)
15%
Chain #3
Chain #4
-8.86(0/0/10) -11.40 (0/0/10)
5.23 (10/0/0)
1.35 (3/7/0)
-9.28 (0/0/10) -11.87 (0/0/10)
5%
Chain #3
Chain #4
-3.78 (0/1/9)
-2.56 (0/4/6)
2.42 (6/2/2)
4.04 (7/3/0)
-5.08 (0/1/9)
-5.29 (0/0/10)
Table 2.2: Statistical differences evaluated using McNemar’s test (polynomial kernel).
Unmixing Prior to Supervised Classification of Remotely Sensed Hyperspectral Images
22
2.4 Experimental results
(a) Ground-truth
(b) Original image (84.27%)
(c) PCA (83.33%)
(d) MNF (89.41%)
(e) Chain #1 (81.29%)
(f) Chain #2 (79.64%)
(g) Chain #3 (87.99%)
(h) Chain #4 (89.26%)
Figure 2.4: Best classification results for AVIRIS Indian Pines (using SVM classifier with Gaussian
kernel, trained with 10 percentage of the available samples per class).
23
Unmixing Prior to Supervised Classification of Remotely Sensed Hyperspectral Images
(i) Class accuracies (MNF vs. Chain #4)
Figure 2.5: The comparative classification results per class for AVIRIS Indian Pines (using SVM classifier
with Gaussian kernel, trained with 10 percentage of the available samples per class) with MNF and
Unmixing Chain#4.
strategy considered for the AVIRIS Indian Pines scene. As shown by Fig. 2.5, both MNF and the
chain #4 provide the best classification scores, with less confusion in heavily mixed classes. grass/trees,
and reasonable confusion between spectrally very similar classes such as corn and corn-min, or between
soybeans-notill and soybeans-min.
2.5
Final observations and future directions
In this chapter, we have investigated several strategies to extract relevant features from hyperspectral
scenes prior to classification. For classification scenarios using SVMs trained with relatively small subsets
of labeled samples, our experimental results reveal that MNF greatly improves accuracies when compared
to the more well-known PCA and ICA transformations, used as an unsupervised feature reduction tool
24
2.5 Final observations and future directions
prior to classification. Due to the reduced dimensionality, classification using both MNF and PCA
subspaces generally improved the OA when compared to using all the original pixel’s spectral signature.
Results indicate that the proposed unmixing-based feature extraction chains can provide an alternative
strategy to PCA or MNF by incorporating information about the (possibly) mixed nature of the training
samples during the learning stage, with the potential advantage of improved interpretability of features
due the physical nature of the extracted abundance maps. Although final classification accuracies are
likely to be dependent on the particular data set considered, the chains tested suggest higher accuracies
with respect to traditional methods, such as PCA and ICA, and comparable accuracies related to MNF.
Further research is needed to define an optimality criterion to design unmixing chains as a feature
reduction tool for classification purposes. A start point might be the chain #4 which indicates that, in
the context of a supervised unmixing scenario, the use of spectrally pure signatures is not as important
as the choice of signatures which are highly representative of the available training samples.
25
Unmixing Prior to Supervised Classification of Remotely Sensed Hyperspectral Images
26
Chapter 3
A Comparative Assessment of
Unmixing-Based Feature Extraction
Techniques
3.1
Summary
Over the last years, many feature extraction techniques have been integrated in processing chains intended for hyperspectral image classification. In the context of supervised classification, it has been
shown that the good generalization capability of machine learning techniques such as the SVM can still
be enhanced by an adequate extraction of features prior to classification, thus mitigating the curse of
dimensionality introduced by the Hughes effect. Recently, a new strategy for feature extraction prior to
classification based on spectral unmixing concepts has been introduced. This strategy has shown success when the spatial resolution of the hyperspectral image is not enough to separate different spectral
constituents at a sub-pixel level. Another advantage over statistical transformations such as PCA or
MNF is that unmixing-based features are physically meaningful since they can be interpreted as the
abundance of spectral constituents. In turn, previously developed unmixing-based feature extraction
chains do not include spatial information. In this chapter, two new contributions are proposed. First, we
develop a new unmixing-based feature extraction technique which integrates the spatial and the spectral
information using a combination of unsupervised clustering and partial spectral unmixing. Second, we
conduct a quantitative and comparative assessment of unmixing-based versus traditional (supervised
and unsupervised) feature extraction techniques in the context of hyperspectral image classification.
Our study, conducted using a variety of hyperspectral scenes collected by different instruments, provides
practical observations regarding the utility and type of feature extraction techniques needed for different
classification scenarios.5
5 Part of this chapter has been published in: I. Dopido, A. Villa, A. Plaza and P. Gamba, A Quantitative and
Comparative Assessment of Unmixing-Based Feature Extraction Techniques for Hyperspectral Image Classification, IEEE
Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 5, no. 2, pp. 421-435, April 2012
[JCR(2012)=2.874].
A Comparative Assessment of Unmixing-Based Feature Extraction Techniques
3.2
Introduction
The rich spectral information available in remotely sensed hyperspectral images allows for the
possibility to distinguish between spectrally similar materials [44]. However, supervised classification
of hyperspectral images is a very challenging task due to the generally unfavorable ratio between the
(large) number of spectral bands and the (limited) number of training samples available a priori, which
results in the Hughes phenomenon [56]. As shown in [57], when the number of features considered for
classification is larger than a given threshold (which is strongly application-dependent), the classification
accuracy starts to decrease. The application of methods originally developed for the classification of
lower dimensional data sets (such as multispectral images) provides therefore poor results when applied
to hyperspectral images, especially in the case of small training sets [14]. On the other hand, the
collection of reliable training samples is very expensive in terms of time and finance, and the possibility
to exploit large ground truth information is not common [58]. To address this issue, a dimensionality
reduction step is often performed prior to the classification process, in order to bring the information in
the original space (which in the case of hyperspectral data is almost empty [14]) to the right subspace
which allows separating the classes by discarding information that is useless for classification purposes.
Several feature extraction techniques have been proposed to reduce the dimensionality of the data prior
to classification, thus mitigating the Hughes phenomenon. These methods can be unsupervised (if a
priori information is not available) or supervised (if available training samples are used to project the
data onto a classification-optimized subspace [59, 60]). Classic unsupervised techniques include PCA [4],
MNF [31], or ICA [26]. Supervised approaches comprise DAFE, DBFE, and NWFE, among many others
[14, 37].
In the context of supervised classification, kernel methods have been widely used due to their
insensitivity to the curse of dimensionality [16]. However, the good generalization capability of machine
learning techniques such as SVM [46] can still be enhanced by an adequate extraction of relevant features
to be used for classification purposes [47], especially if limited training sets are available a priori. Recently,
we have investigated this issue by developing a new set of feature extraction techniques based on spectral
unmixing concepts (see chapter 2 of this thesis and references [34, 48]). These techniques are intended
to take advantage of spectral unmixing models [8] in the characterization of training samples, thus
including additional information about sub-pixel composition that can be exploited at the classification
stage. Another advantage of unmixing-based techniques over statistical transformations such as PCA,
MNF or ICA is the fact that the features derived by spectral unmixing are physically meaningful since
they can be interpreted as the abundance of spectrally pure constituents. Although unmixing-based
feature extraction offers an interesting alternative to classic (supervised and unsupervised approaches),
several important aspects deserve further attention [61]:
1. First, the unmixing-based chains discussed in chapter 2 do not include spatial information, which is
an important source of information since hyperspectral images exhibit spatial correlation between
image features.
2. Second, the study in chapter 2 suggested that partial unmixing [33, 62] could be an effective solution
to deal with the likely fact that not all pure spectral constituents in the scene (needed for spectral
28
3.3 A new unmixing-based feature extraction technique
unmixing purposes) are known a priori, but a more exhaustive investigation of partial unmixing
(particularly in combination with spatial information) is needed.
3. Finally, the number of features to be extracted prior to classification was set in chapter 2 to an
empirical value given by the intrinsic dimensionality of the input data. However, in the context
of supervised feature extraction the number of features to be retained is probably linked to the
characteristics of the training set rather than the full hyperspectral image. Hence, a detailed
investigation of the optimal number of features that need to be extracted prior to classification is
highly desirable as a follow-up to the experiments conducted in chapter 2 of this thesis.
In this chapter, we address the aforementioned issues by means of two highly innovative contributions.
First, a new feature extraction technique exploiting sub-pixel information is proposed. This approach
integrates spatial and spectral information using unsupervised clustering in order to define spatially
homogeneous regions prior to the partial unmixing stage. A second contribution of this chapter is a
detailed investigation on the issue of how many (and what type of) features should be extracted prior to
SVM-based classification of hyperspectral data. For this purpose, different types of (classic and unmixingbased) feature extraction strategies, both unsupervised and supervised in nature, are considered.
The remainder of the chapter is organized as follows. Section 3.3 describes a new unmixing-based
feature extraction technique which integrates the spatial and the spectral information. A supervised
and an unsupervised version of this technique are developed. Section 3.4 describes several representative
hyperspectral scenes which have been used in our experiments. This includes three scenes collected
by AVIRIS [20] system over the regions of Indian Pines, Indiana, Kennedy Space Center, Florida, and
Salinas Valley, California, and also a hyperspectral scene collected by ROSIS [38] over the city of Pavia,
Italy. Section 3.5 provides an experimental comparison of the proposed feature extraction chains with
regards to other classic and unmixing-based approaches, using the four considered hyperspectral image
scenes. Section 3.6 concludes with some remarks and hints at plausible future research lines.
3.3
A new unmixing-based feature extraction technique
This section is organized as follows. In subsection 3.3.1 we fix notation and describe some general
concepts about linear spectral unmixing, adopted as our baseline mixture model due to its simplicity
and computational tractability. Subsection 3.3.2 describes an unsupervised feature extraction strategy
based on spectral unmixing concepts. This strategy first performs k-means clustering, searching for as
many classes as the number of features that need to be retained. The centroids of each cluster are
considered as the endmembers, and then the features are obtained by applying spectral unmixing for
abundance estimation. The main objective of this chain is to solve problems highlighted by endmember
extraction based algorithms, which are sensitive to outliers and pixels with extreme values of reflectance.
By using an unsupervised clustering method, the endmembers extracted are expected to be more spatially
significant. Finally, subsection 3.5.1.2 describes a modified version of the feature extraction technique
in which the endmembers are searched in the available training set instead of the entire original image.
Here, our assumption is that training samples may better represent the available land cover classes in
the subsequent classification process.
29
A Comparative Assessment of Unmixing-Based Feature Extraction Techniques
3.3.1
Linear spectral unmixing
Since in this chapter we will include the spatial information together with the spectral information
when describing the discussed unmixing chains, a slight abuse of notation is used here (with regards
to the notations introduced in chapter 2) in order to redefine the mathematical formulation of linear
spectral unmixing using the spatial coordinates of the pixels involved. Let us denote a remotely sensed
hyperspectral scene with n bands by I, in which the pixel at the discrete spatial coordinates (i, j) of the
scene is represented by a vector X(i, j) = [x1 (i, j), x2 (i, j), · · · , xn (i, j)] ∈ ℜn , where ℜ denotes the set of
real numbers in which the pixel’s spectral response xk (i, j) at sensor channels k = 1, . . . , n is included.
Under the linear mixture model assumption, each pixel vector in the original scene can now be modeled
using the following expression:
X(i, j) =
p
∑
Φz (i, j) · Ez + n(i, j),
(3.1)
z=1
where Ez denotes the spectral response of endmember z, Φz (i, j) is a scalar value designating the
fractional abundance of the endmember z at the pixel X(i, j), p is the total number of endmembers,
and n(i, j) is a noise vector. An unconstrained solution to Eq. (3.1) is simply given by the following
expression [2]:
Φ̂UC (i, j) = (ET E)−1 ET X(i, j).
(3.2)
Two physical constrains are generally imposed into the model described in Eq. (3.1), these are ANC,
∑p
i.e., Φz (i, j) ≥ 0, and ASC, i.e.,
z=1 Φz (i, j) = 1 [19]. Imposing the ASC constraint results in the
following optimization problem:
{
}
T
minΦ(i,j)∈∆ (X(i, j) − Φ(i, j) · E) (X(i, j) − Φ(i, j) · E) ,
subject to: ∆ =
{
 p
}
∑

Φ(i, j)
Φz (i, j) = 1 . (3.3)

z=1
Similarly, imposing the ANC constraint results in the following optimization problem:
}
{
T
minΦ(i,j)∈∆ (X(i, j) − Φ(i, j) · E) (X(i, j) − Φ(i, j) · E) ,
subject to: ∆ = {Φ(i, j)|Φz (i, j) ≥ 0 for all 1 ≤ z ≤ p} . (3.4)
As indicated in [19], a fully constrained (i.e., ASC-constrained and ANC-constrained) estimate can
be obtained in least-squares sense by solving the optimization problems in Eq. (3.3) and Eq. (3.4)
simultaneously. However, in order for such estimate to be meaningful, it is required that the spectral
signatures of all endmembers, i.e., {Ez }pz=1 , are available a priori, which is not always possible.
In the case where not all endmember signatures are available in advance, partial unmixing has emerged
as a suitable alternative to solve the linear spectral unmixing problem [33].
30
3.3 A new unmixing-based feature extraction technique
3.3.2
Unsupervised unmixing-based feature extraction
In this subsection we describe our first approach to design a new unmixing-based feature extraction
technique which integrates spatial and spectral information. It can be summarized by the flowchart in
Fig. 3.1. First, we apply the k-means algorithm [63] to the original hyperspectral image. Its goal is to
determine a set of c points, called centers, so as to minimize the mean squared distance from each pixel
vector to its nearest center. The algorithm is based on the observation that the optimal placement of
a center is at the centroid of the associated cluster. It starts with a random initial placement. At each
stage, the algorithm moves every center point to the centroid of the set of pixel vectors for which the
center is a nearest neighbor according to SA [8], and then updates the neighborhood by recomputing the
SA from each pixel vector to its nearest center. These steps are repeated until the algorithm converges
to a point that is a minimum for the distortion [63]. The output of k-means is a set of spectral clusters,
each made up of one or more spatially connected regions. In order to determine the number of clusters
(endmembers) in advance, techniques used to estimate the number of endmembers like VD [43] or HySime
[25] can be used. In our experiments we vary the number of clusters in a certain range in order to analyze
the impact of this parameter. In fact, our main motivation for using a partial unmixing technique at
this point is the fact that the estimation of the number of endmembers in the original image is a very
challenging issue. It is possible that the actual number of endmembers in the original image, p, is larger
than the number of clusters derived by k-means. In this case, in order to unmix the original image we
need to address a situation in which not all endmembers may be available a priori ). It has been shown
in the previous chapter that the FCLSU technique does not provide accurate results in this scenario [34].
In turn, it is also possible that p ≤ c. In this case, partial unmixing has shown great success [33] in
abundance estimation. Following this line of reasoning, we have decided to resort to partial unmixing
techniques in this chapter.
A successful technique to estimate abundance fractions in such partial unmixing scenarios is MTMF
[33] –also known in the literature as CEM [2, 62]– which combines the best parts of the linear spectral
unmixing model and the statistical matched filter model while avoiding some drawbacks of each parent
method. From matched filtering, it inherits the ability to map a single known target without knowing
the other background endmember signatures, unlike the standard linear unmixing model. From spectral
mixture modeling, it inherits the leverage arising from the mixed pixel model and the constraints on
feasibility including the ASC and ANC requirements. It is essentially a target detection algorithm
designed to identify the presence (or absence) of a specified material by producing a score of 1 for pixels
wholly covered by the material of interest, while keeping the average score over an image as small as
possible. It uses just one endmember spectrum (that of the target of interest) and therefore behaves as
a partial unmixing method that suppresses background noise and estimates the sub-pixel abundance of
a single endmember material without assuming the presence of all endmembers in the scene, as it is the
case with FCLSU. If we assume that Ez is the endmember to be characterized, MTMF estimates the
abundance fraction Φ(i, j) of Ez in a specific pixel vector X(i, j) of the scene as follows:
Φ̂MTMF (i, j) = ((ETz R−1 Ez )−1 R−1 Ez )T X(i, j),
31
(3.5)
A Comparative Assessment of Unmixing-Based Feature Extraction Techniques
Figure 3.1: Block diagram illustrating an unsupervised clustering followed by MTMF (CMTMFunsup )
technique for unmixing-based feature extraction.
where R is the matrix:
R=
y
x
1 ∑∑
X(i, j)XT (i, j),
s × l i=1 j=1
(3.6)
with s and l respectively denoting the number of samples and the number of lines in the original
hyperspectral image. As shown by Fig. 3.1, the features resulting from the proposed unmixing-based
technique, referred to hereinafter as unsupervised clustering followed by MTMF (CMTMFunsup ) [23],
are used to train an SVM classifier with a few randomly selected labeled samples. The classifier is then
tested using the remaining labeled samples.
3.3.3
Supervised unmixing-based feature extraction
Fig. 3.2 describes a variation of the CMTMFunsup technique presented in the previous subsection in which
the endmembers are extracted from the available (labeled) training samples instead of from the original
image. This introduces two main properties with regards to CMTMFunsup : 1) the number of endmembers
to be extracted is given by the total number of different classes, c, in the labeled samples available in
the training set, and 2) the endmembers (class centers) are obtained after clustering the training set,
which reduces computational complexity significantly. The increase in computational performance comes
at the expense of introducing an additional consideration. In this scenario, it is likely that the actual
32
3.4 Hyperspectral data sets
Figure 3.2: Block diagram illustrating a supervised clustering followed by MTMF (CMTMFsup ) technique
for unmixing-based feature extraction.
number of endmembers in the original image, p, is larger than the number of different classes comprised
by available labeled training samples, c. Therefore, in order to unmix the original image we again need
to address a partial unmixing problem. Then, as shown by Fig. 3.2, standard SVM classification is
performed on the stack of abundance fractions using randomly selected training samples. Hereinafter,
we refer to the feature extraction technique described in Fig. 3.2 as supervised clustering followed by
MTMF (CMTMFsup ) [23].
3.4
Hyperspectral data sets
In order to have a fair experimental comparison between the proposed and available feature extraction
approaches, several representative hyperspectral data sets are investigated. In this chapter, we have
considered four different images captured by two different sensors: AVIRIS and ROSIS. The images span
a wide range of land cover use, from agricultural areas of Indian Pines and Salinas, to urban zones in
the town of Pavia and mixed vegetation/urban features in Kennedy Space Center. The AVIRIS Indian
Pines scene was already described in subsection 2.4.1.1, while the AVIRIS Kennedy Space Center scene
was described in subsection 2.4.1.2. Hence, here we will only describe the remaining two scenes. The
number of ground-truth pixels per class for all the considered hyperspectral images is given in Table 3.1.
In the following, we briefly describe each of the data sets considered in our study.
33
AVIRIS Indian Pines
Alfalfa (54)
Corn-Notill (1434)
Corn-Min (834)
Corn (234)
Grass-Pasture (497)
Grass-Trees (747)
Grass-Pasture-Mowed (26)
Hay-Windrowed (489)
Oats (20)
Soybeans-Notill (968)
Soybeans-Min (2468)
Soybeans-Clean (614)
Wheat (212)
Woods (1294)
Bldg-Grass-Tree-Drives (380)
Stone-Steel-Towers (95)
AVIRIS Kennedy Space Center
Scrub (761)
Willow (243)
Hammock (256)
Oak (252)
Slash Pine (161)
Oak/Broadleaf (229)
Hardwood Swamp (105)
Graminoid Marsh (311)
Spartina Marsh (520)
Cattail Marsh (404)
Salt Marsh (186)
Mud Flats (134)
AVIRIS Salinas Valley
Broccoli Green Weeds 1 (1893)
Broccoli Green Weeds 2 (3704)
Fallow (1960)
Fallow Rough Plow (1228)
Fallow Smooth (2560)
Stubble (3841)
Celery (3543)
Grapes Untrained (11287)
Soil Vinyard Develop (6128)
Corn Senesced Green Weeds (3154)
Lettuce Romaine 4 weeks (984)
Lettuce Romaine 5 weeks (1850)
Lettuce Romaine 6 weeks (818)
Lettuce Romaine 7 weeks (1003)
Vinyard Untrained (7055)
Vinyard Vertical Trellis (1622)
ROSIS Pavia University
Asphalt (6631)
Meadows (18649)
Gravel (2099)
Trees (3064)
Metal Sheets (1345)
Bare Soil (5029)
Bitumen (1330)
Self-Blocking Bricks (3682)
Shadow (947)
Table 3.1: Number of pixels in each ground-truth class in the four considered hyperspectral images. The number of training and test
pixels used in our experiments can be derived from this table.
A Comparative Assessment of Unmixing-Based Feature Extraction Techniques
34
3.4 Hyperspectral data sets
(a)
(b)
Figure 3.3: (a) False color composition an AVIRIS hyperspectral image comprising several agricultural
fields in Salinas Valley, California. (b) Ground truth-map containing 15 mutually exclusive land-cover
classes.
3.4.1
AVIRIS Salinas Valley
This scene was collected over the Valley of Salinas in Southern California. The full scene consists of 512
lines by 217 samples with 186 spectral bands (after removal of water absorption and noisy bands) from
0.4 to 2.5 µm, nominal spectral resolution of 10 nm, and 16-bit radiometric resolution. It was taken at
low altitude with a pixel size of 3.7 meters (high spatial resolution). The data include vegetables, bare
soils and vineyard fields. Fig. 3.3(a) shows a false color composition of the scene and Fig. 3.3(b) shows
the available ground-truth regions for this scene, which cover about two thirds of the entire Salinas scene.
Finally, Fig. 3.4 shows some pictures of selected land-cover classes taken on the imaged site at the same
time as the data was being collected by the sensor. Of particular interest are the relevant differences in
the romaine lettuce classes resulting from different soil cover proportions.
3.4.2
ROSIS Pavia University
This scene was collected by the ROSIS optical sensor over the urban area of the University of Pavia,
Italy. The flight was operated by DLR in the framework of the HySens project, managed and sponsored
by the European Union. The image size in pixels is 610 × 340, with very high spatial resolution of 1.3
meters per pixel. The number of data channels in the acquired image is 115 (with spectral range from
35
Figure 3.4: Photographs taken at the site during data collection.
A Comparative Assessment of Unmixing-Based Feature Extraction Techniques
36
3.5 Experimental results
(a)
(b)
(c)
Figure 3.5: (a) False color composition of the ROSIS Pavia scene. (b) Ground truth-map containing 9
mutually exclusive land-cover classes. (c) Training set commonly used for the ROSIS Pavia University
scene.
0.43 to 0.86 µm). Fig. 3.5(a) shows a false color composite of the image, while Fig. 3.5(b) shows nine
ground-truth classes of interest, which comprise urban features, as well as soil and vegetation features.
Finally, Fig. 3.5(c) shows a commonly used training set directly derived from the ground-truth in Fig.
3.5(b).
3.5
Experimental results
In this section we conduct a quantitative and comparative analysis of different feature extraction
techniques for hyperspectral image classification, including unmixing-based and more traditional
(supervised and unsupervised) approaches. The main goal is to use spectral unmixing and classification
as complementary techniques, since the latter are more suitable for the classification of pixels dominated
by a single land cover class, while the former are devoted to the characterization of mixed pixels. Because
hyperspectral images often contain areas with both pure and mixed pixels, the combination of these two
analysis techniques provides a synergistic data processing approach that has been explored in previous
contributions [34, 51, 64, 65, 66]. Before describing the results obtained in experimental validation, we
first describe the feature extraction techniques that will be used in our comparison in subsection 3.5.1.
Then, subsection 3.5.2 describes the adopted supervised classification system and the experimental setup.
Finally, subsection 3.5.3 discusses the obtained results in comparative fashion.
3.5.1
Feature extraction techniques used in the comparison
In our classification system, relevant features are first extracted from the original image. Several types
of input features have been considered in the classification experiments conducted in this chapter. In
37
A Comparative Assessment of Unmixing-Based Feature Extraction Techniques
the following, we provide an overview of the new techniques used to extract features from the original
hyperspectral data. A detailed mathematical description of these techniques is out of the scope of this
chapter, since most of them are algorithms well known in the remote sensing literature, so only a short
description of the conceptual basics for each method is given here. The techniques are divided into
unsupervised approaches, if the algorithm is applied on the whole data cube, or supervised techniques,
if the information associated with the training set of the data is somehow exploited during the feature
extraction step.
3.5.1.1
Unsupervised Feature Extraction Techniques
We consider five unsupervised feature extraction techniques in this chapter. Three of them are classic
algorithms available in the literature (PCA, MNF and ICA described in the last chapter), and the
two remaining ones are based on the exploitation of sub-pixel information through spectral unmixing,
including the best unsupervised method in [34] and a newly proposed technique CMTMFunsup . A brief
summary of the new considered unsupervised techniques follows:
• MTMFunsup , which first performs an MNF-based dimensionality reduction and then applies the
MTMF method in order to estimate fractional abundances of spectral endmembers extracted from
the original data using the OSP algorithm [36]. In [34, 48] it is shown that MTMF outperforms
other techniques for abundance estimation such as unconstrained and FCLSU [19] since it can
provide meaningful abundance maps by means of partial unmixing in case not all endmembers are
available a priori.
• CMTMFunsup developed and intended to solve the problems highlighted by endmember extraction
algorithms which are sensitive to outliers and pixels with extreme values of reflectance. By using an
unsupervised clustering method such as the k-means to extract features, the endmembers extracted
are expected to be more spatially significant.
• FCunsup is an extension of the k-means clustering method [67] which provides soft clusters, where
a particular pixel has a degree of membership in each cluster. This strategy is faster than the two
previous strategies as it does not include a spectral unmixing step.
3.5.1.2
Supervised feature extraction techniques
We consider several supervised feature extraction techniques in this chapter. The first techniques
considered were DAFE and DBFE [14]. However, DBFE could not be applied in the case of very
limited training sets since it requires a number of samples (for each class) bigger than the number of
dimensions of the original data set in order to estimate the statistics used to project the data. As it will
be shown in the next sections, this requirement was not satisfied for most of the experiments carried out.
In turn, the results provided by DAFE were poor compared to the other methods for a low number of
training samples, hence we did not include them in our comparison. As a result, the supervised methods
adopted in our comparison were NWFE and three sub-pixel techniques based on estimating fractional
abundances. Two of them were already presented in [34, 48], and the third one is the CMTMFsup
technique developed in this chapter. Although a number of supervised feature extraction techniques has
38
3.5 Experimental results
been available in the literature [14], according to our experiments the advantages provided by supervised
techniques is not always evident, especially in the case of limited training sets [68]. A brief summary of
the considered supervised techniques follows:
• NWFE focuses on selecting samples near the eventual decision boundaries that best separate the
classes. The main ideas of the NWFE are: 1) assigning different weights to every training sample
in order to compute local means, and 2) defining non-parametric between-class and within-class
scatter matrices to perform feature extraction [14].
• MTMFsup is equivalent to MTMFunsup but assuming that the pure spectral components are
searched by the OSP endmember extraction algorithm in the training set instead of in the entire
hyperspectral image. Our assumption is that training samples may better represent the available
land cover classes in the subsequent classification process [34].
• MTMFavg is equivalent to MTMFsup but assuming that the representative spectral signatures are
obtained as the average of the signatures belonging to each class in the training set (here, the
number of components to be retained by MNF applied prior to the MTMF is varied in a given
range). In this case, the OSP algorithm is not used to extract the spectral signatures, which are
obtained in supervised fashion from the available training samples [34].
• CMTMFsup developed and acting as the supervised counterpart of CMTMFunsup . It mainly differs
with regards to that technique in the fact that the clustering process is performed in the training
samples, and not in the full hyperspectral image.
3.5.2
Supervised classification system and experimental setup
In our supervised classification system, different types of input features are extracted from the original
hyperspectral image prior to classification. In addition to the unsupervised and supervised feature
extraction techniques described in the previous subsection, we also use the (full) original spectral
information available in the hyperspectral data as input to the proposed classification system. In the
latter case, the dimensionality of the input features used for classification equals n, the number of spectral
bands in the original data set. When using feature extraction techniques, the number of features was
varied empirically in our experiments and only the best results are reported. In all cases, a supervised
classification process was performed using the SVM classifier with Gaussian kernel (observed to perform
better than other tested kernels, such as polynomial or linear). Kernel parameters were optimized by a
grid search procedure, and the optimal parameters were selected using 10-fold cross-validation (selected
after testing different configurations). The LIBSVM library was in our experiments. In order to evaluate
the ability of the tested methods to perform under training sets with different number of samples, we
adopted the following training-test configurations:
• In our experiments with the AVIRIS Indian Pines data set in Fig. 2.3(a), we randomly selected
5% and 15% of the pixels in each ground-truth class in Table 3.1 and used them to build the
training set. The remaining pixels were used as test pixels.
39
A Comparative Assessment of Unmixing-Based Feature Extraction Techniques
• In our experiments with the AVIRIS Salinas data set in Fig. 3.3(a), in which the size of the
smaller classes is bigger when compared to those in the AVIRIS Indian Pines data set, we decided
to reduce the training sets even more and selected only 2% and 5% of the available ground-truth
pixels in Table 3.1 for training purposes.
• In our experiments with the AVIRIS Kennedy Space Center data set, we decided to reduce the
training sets even more and selected only 1% and 5% of the available ground-truth pixels in Table
3.1 for training purposes.
• Finally, in our experiments with the ROSIS Pavia data set in Fig. 3.5(a), we used the training set
in Fig. 3.5(c) and also a different training set made up of only 50 pixels for each class in Table 3.1
for comparative purposes.
Based on the aforementioned training sets, OA and AA were computed over the remaining test
samples for each data set. This experiment was repeated ten times to guarantee statistical consistency,
and the average results after ten runs are provided. An assessment of the obtained results is reported in
the following subsection.
3.5.3
Analysis and discussion of results
Table 3.2 and table 3.3 show the OA and AA (in percentage) obtained by the considered classification
system for different hyperspectral scenes using the original spectral information as input feature, and
also the features provided by the unsupervised and supervised feature extraction techniques described in
subsection 3.5.1. It is important to emphasize that, in the tables, we only report the best case (meaning
the one with highest OA) for each considered feature extraction technique, after testing numbers of
extracted features ranging from 5 to 50. In all cases, this range was sufficient to observe a decline in
classification OA after a certain number of features, so the number given in the parentheses in the tables
correspond to the optimal number of features for each considered feature extraction technique (in the case
of the original spectral information, the number in the parentheses corresponds to the number of bands
of the original hyperspectral image). Finally, in order to outline the best feature extraction technique in
each considered experiment, we highlight in bold typeface the best classification result observed across
all tested feature extraction methods. In previous chapter [34], the statistical significance of some of the
processing chains considered in Table 3.2 and Table 3.3 were assessed using the McNemar’s test [55],
concluding that the differences between the tested methods were statistically significant. Other similar
tests are also available in the literature [69]. According to our experimental results, the same observations
regarding statistical significance apply to the new processing chains included in this chapter.
From Table 3.2 and Table 3.3, several conclusions can be drawn. First and foremost, we can observe
that the use of supervised techniques for feature extraction is not always beneficial to improve the OA
and AA, especially in case of limited training sets and statistical feature extraction approaches. For
example, NWFE exhibits better results when compared to traditional unsupervised techniques such
as PCA or ICA. However, DAFE (not included in the tables) exhibited quite poor results. The low
performances obtained by DAFE should be therefore attributed to the very small size of the training
set and to the fact that the land cover classes can be spectrally very close (as in the case of the AVIRIS
40
3.5 Experimental results
Table 3.2: OA and AA (in percentage) obtained by the considered classification system for different
hyperspectral image (AVIRIS Indian Pines and AVIRIS Kennedy Space Center) scenes using the original
Spectral information, unsupervised feature extraction techniques, and supervised feature extraction
techniques. Only the best case is reported for each considered feature extraction technique (with the
optimal number of features in the parentheses) and the best classification result across all methods in
each experiment is highlighted in bold typeface.
Overall
Features
Original info
PCA
ICA
MNF
MTMFunsup
CMTMFunsup
FCunsup
NWFE
MTMFsup
MTMFavg
CMTMFsup
Average
Features
Original info
PCA
ICA
MNF
MTMFunsup
CMTMFunsup
FCunsup
NWFE
MTMFsup
MTMFavg
CMTMFsup
AVIRIS Indian Pines
5% Training
15% Training
75.78% (202)
84.49% (202)
77.25% (20)
83.86% (20)
76.84% (20)
83.52% (20)
86.67% (10)
91.35% (10)
84.90% (10)
89.50% (10)
87.18% (30) 91.61% (25)
74.57% (30)
79.45% (25)
79.76% (10)
86.11% (10)
85.96% (10)
90.28% (10)
86.24% (10)
91.00% (10)
85.08% (10)
90.19% (20)
AVIRIS Indian Pines
5% Training
15% Training
66.37% (202)
79.52% (202)
70.59% (15)
80.37% (15)
70.03% (15)
80.03% (15)
83.31% (10)
89.04% (10)
80.12% (10)
86.65% (10)
82.17% (20)
89.55% (20)
69.49% (10)
76.10% (10)
72.46% (5)
78.80% (5)
82.57% (10)
87.76% (10)
82.31% (10)
89.16% (10)
83.34% (10)
89.47% (10)
AVIRIS Kennedy Space Center
1% Training
5% Training
72.26% (176)
85.50% (176)
75.54% (5)
86.64% (10)
73.88% (10)
86.36% (10)
78.09% (15)
90.12% (15)
76.48% (10)
88.61% (15)
80.34% (35)
90.20% (45)
64.71% (10)
77.55% (30)
73.17% (10)
85.66% (10)
77.78% (10)
89.24% (15)
74.87% (10)
86.61% (20)
76.48% (10)
88.74% (15)
AVIRIS Kennedy Space Center
1% Training
5% Training
67.10% (176)
81.95% (176)
68.92% (5)
82.37% (10)
65.92% (10)
81.85% (10)
79.23% (10)
87.69% (15)
69.13% (10)
85.48% (15)
74.32% (35)
87.72% (45)
59.49% (30)
73.01% (30)
64.26% (5)
81.31% (10)
70.37% (10)
86.79% (15)
67.11% (15)
83.39% (20)
70.28% (15)
86.19% (15)
Indian Pines scene) thus making it very difficult to separate them by using spectral means and covariance
matrices. Moreover, the importance of integrating the additional information provided by the training
samples is strictly connected with the nature of the considered approach. This can be noticed when
comparing the MTMF versus the CMTMF chains. In the former case, the best results are generally
provided by the supervised approach (MTMFsup ) since the supervised strategy for extracting spectral
endmembers using the OSP approach benefits from the reduction of outliers and pixels with extreme
values of reflectance, which affect negatively this endmember extraction algorithm. In the latter case, the
best results are generally provided by the unsupervised approach (CMTMFunsup ) due to the fact that,
when trying to identify clusters in a very small training set, several problems appear, such as the bad
conditioning of matrices when computing the inverse (in the k-means clustering step) or the eventual
41
A Comparative Assessment of Unmixing-Based Feature Extraction Techniques
Table 3.3: OA and AA (in percentage) obtained by the considered classification system for different
hyperspectral image scenes (AVIRIS Salinas Valley and ROSIS Pavia University) Using the Original
Spectral information, unsupervised feature extraction techniques, and supervised feature extraction
techniques. Only the best case is reported for each considered feature extraction technique (with the
optimal number of features in the parentheses) and the best classification result across all methods in
each experiment is highlighted in bold typeface.
Overall
Features
Original info
PCA
ICA
MNF
MTMFunsup
CMTMFunsup
FCunsup
NWFE
MTMFsup
MTMFavg
CMTMFsup
Average
Features
Original info
PCA
ICA
MNF
MTMFunsup
CMTMFunsup
FCunsup
NWFE
MTMFsup
MTMFavg
CMTMFsup
AVIRIS Salinas Valley
2% Training
5% Training
88.39% (186)
90.66% (186)
91.93% (10)
93.55% (10)
91.72% (20)
93.33% (20)
93.71% (15) 94.90% (15)
93.27% (15)
94.38% (15)
92.83% (30)
94.47% (35)
88.87% (30)
90.29% (30)
92.28% (10)
93.47% (10)
92.67% (15)
94.49% (15)
93.42% (15)
94.67% (15)
92.63% (30)
93.95% (25)
AVIRIS Salinas Valley
2% Training
5% Training
92.29% (186)
94.46% (186)
95.48% (10)
96.77% (20)
95.33% (10)
96.62% (20)
96.60% (15) 97.48% (20)
96.27% (15)
97.10% (15)
96.18% (30)
97.23% (35)
88.96% (30)
91.54% (25)
96.09% (10)
96.91% (10)
95.78% (10)
97.22% (20)
96.37% (15)
97.30% (15)
95.39% (30)
96.73% (25)
ROSIS Pavia University
50 pixels
Standard
84.09% (103)
80.99% (103)
81.65% (15)
81.81% (10)
81.39% (15)
81.44% (10)
83.52% (5)
77.77% (10)
83.16% (5)
75.56% (10)
86.83% (20) 84.25% (15)
66.57% (25)
70.07% (30)
81.39% (15)
78.56% (15)
81.41% (5)
75.21% (5)
84.91% (10)
83.58% (10)
85.34% (10)
81.48% (15)
ROSIS Pavia University
50 pixels
Standard
87.78% (103)
88.28% (103)
84.78% (15)
86.97% (10)
84.63% (15)
86.92% (10)
87.74% (5)
86.63% (10)
87.70% (5)
86.23% (10)
88.14% (20) 89.98% (20)
72.87% (25)
76.75% (30)
84.46% (15)
86.68% (20)
86.16% (5)
86.20% (10)
88.25% (5)
88.08% (10)
87.72% (10)
88.90% (15)
selection of very similar clusters, leading to redundant information in class prototyping which ultimately
affects the subsequent partial unmixing step and the obtained classification performances. In addition
to the aforementioned observations, we emphasize that the supervised version derives the endmembers
(via clustering) from a limited training set, while the unsupervised version derives the endmembers from
the whole hyperspectral image. The former approach has the advantage of computational complexity, as
the search for endmembers is only conducted in the small training set, but this comes at the expense of
reduced modelling accuracy as expected. Although in previous chapter we developed MTMFavg in the
hope of addressing these problems, our experimental results indicate that CMTMF techniques in general
and CMTMFunsup in particular (an unsupervised approach as opposed to MTMFavg ) performs a better
job in characterizing the sub-pixel information prior to classification of hyperspectral data. Finally, it
is also worth noting the good performance achieved in all experiments by MNF, another unsupervised
42
3.5 Experimental results
feature extraction strategy. Figs. 3.6 , 3.7, 3.8 show the results obtained in some of the experiments.
An arising question at this point is whether there is any advantage of using unmixing chains versus the
MNF transform. Since both feature extraction methods are unsupervised, with similar computational
complexity and leading to similar classification results, it is not clear from the context if there exists any
advantage of using an unmixing-based technique over a well-known, statistical method such as MNF. In
order to address this issue, Fig. 3.9 shows the first 9 components extracted by MNF from the ROSIS Pavia
University image. These components are ordered in terms of SNR, with the first component providing the
maximum amount of information. Here, noise can be clearly appreciated in the last three components.
In turn, Fig. 3.10 shows the components extracted for the same image by the CMTMFunsup technique.
The components are arranged in no specific order, as spectral unmixing assigns the same priority to
each endmember when deriving the associated abundance map. As shown by Fig. 3.10, the components
provided by the unmixing-based technique can be interpreted in a physical manner (as the abundances
of each spectral constituent in the scene) and most importantly these components can be related to the
ground-truth classes in Fig. 3.5(b). This suggests that unmixing-based chains can provide an alternative
strategy to classic feature extraction chains such as MNF with three main differences:
1. Unmixing-based feature extraction techniques incorporate information about mixed pixels, which
are the dominant type of pixel in hyperspectral images. Quite opposite, standard feature extraction
techniques such as MNF do not incorporate the pure/mixed nature of the pixels in hyperspectral
data, disregarding a source of information that could be useful for the final classification.
2. The components provided by unmixing-based feature extraction techniques can be interpreted as
the abundance of spectral constituents in the scene, while the components provided by other classic
feature extraction techniques such as MNF do not necessarily have any physical meaning.
3. Unmixing-based feature extraction techniques do not penalize classes which are not relevant in
terms of variance or SNR, while some classic feature extraction techniques such as MNF relegate
variations of less significant size to low-order components. If such low-order components are not
preserved, small classes may be affected.
An additional aspect resulting from our experiments is that unmixing-based chains allow for a
natural integration of the spatial information available in the original hyperspectral image (through
the clustering strategy for endmember extraction designed). Although the aforementioned aspects may
offer important advantages in hyperspectral data classification, the true fact is that our comparative
assessment (conducted in terms of OA and AA using four representative hyperspectral images) only
indicates a moderate improvement (or comparable performance) of the best unmixing-guided feature
extraction method (CMTMFunsup ) with regards to the best statistical feature extraction method (MNF)
reported in our experiments. This leads us to believe that further improvements to the integration of
the information provided by spectral unmixing into the classification process are possible. With this
in mind, we anticipate significant advances in the integration of spectral unmixing and classification of
hyperspectral data in future developments.
43
A Comparative Assessment of Unmixing-Based Feature Extraction Techniques
3.6
Final observations and future directions
In this chapter, we have investigated the advantages that can be gained by including information about
spectral mixing at sub-pixel levels in the feature extraction stage that is usually conducted prior to
hyperspectral image classification. For this purpose, we have developed a new unmixing-based feature
extraction technique that combines the spatial and the spectral information through a combination of
unsupervised clustering and partial spectral unmixing. We have compared our newly developed technique
(which can be applied in both unsupervised and supervised fashion) with other classic and unmixingbased techniques for feature extraction. Our detailed quantitative and comparative assessment has
been conducted using four representative hyperspectral images collected by two different instruments
(AVIRIS and ROSIS) over a variety of test sites and in the framework of supervised classification
scenarios dominated by the limited availability of training samples. Our experimental results indicate
that the unsupervised version of our newly developed technique provides components which are physically
meaningful and significant from a spatial point of view, resulting in good classification accuracies (without
penalizing very small classes) when compared to the other feature extraction techniques tested in this
chapter. In turn, since our analysis scenarios are dominated by very limited training sets, we have
experimentally observed that, in this context, the use of supervised feature extraction techniques can
lead to lower classification accuracies as the information considered for projecting the data into a lowerdimensional space is not representative of the thematic classes of the image.
Future developments will include an investigation of additional techniques for feature extraction
from a spectral unmixing point of view, in order to fully substantiate the advantages that can be
gained at the feature extraction stage by including additional information about mixed pixels (which are
predominant in hyperspectral images) prior to classification purposes. Another research line deserving
future attention is the determination of automatic procedures to determine the optimal number of
features to be extracted from each tested method. While several methods for estimating the intrinsic
dimensionality of hyperspectral images exist, the determination of the number of features suitable for
classification purposes depends on each particular method and, in the case of supervised feature extraction
methods, on the available training. Although we have investigated performance in a suitable range of
extracted features, the automatic determination of the optimal number of features for each method
should be investigated in future work for practical reasons. Finally, future work should also consider
nonlinear feature extraction methods such as kernel PCA [70] in addition to the linear feature extraction
methods considered.
44
3.6 Final observations and future directions
(a) Ground Truth
(b) PCA
(c) ICA
(d) MNF
(e) MTMFunsup
(f) CMTMFunsup
(g) NWFE
(h) MTMFsup
(i) MTMFavg
(i) CMTMFsup
Figure 3.6: Classification results for the AVIRIS Indian Pines scene (obtained using an SVM classifier
with Gaussian kernel, trained with 5% of the available samples).
45
A Comparative Assessment of Unmixing-Based Feature Extraction Techniques
(a) Ground Truth
(b) PCA
(e) MTMFunsup
(h) MTMFsup
(c) ICA
(f) CMTMFunsup
(i) MTMFavg
(d) MNF
(g) NWFE
(i) CMTMFsup
Figure 3.7: Classification results for the AVIRIS Salinas Valley scene (obtained using an SVM classifier
with Gaussian kernel, trained with 2% of the available samples).
46
3.6 Final observations and future directions
(a) Ground Truth
(b) PCA
(c) ICA
(d) MNF
(e) MTMFunsup
(f) CMTMFunsup
(g) NWFE
(h) MTMFsup
(i) MTMFavg
(i) CMTMFsup
Figure 3.8: Classification results for the ROSIS Pavia University scene (obtained using an SVM classifier
with Gaussian kernel, trained with 50 pixels of each available ground-truth class).
47
48
Figure 3.10: Components extracted by the CMTMFunsup feature extraction technique from the ROSIS Pavia University scene (in no
specific order).
Figure 3.9: Components extracted by MNF from the ROSIS Pavia University scene (ordered from left to right in terms of amount of
information).
A Comparative Assessment of Unmixing-Based Feature Extraction Techniques
Chapter 4
Semi-Supervised Self-Learning for
Hyperspectral Image Classification
4.1
Summary
As it has been shown in previous chapters, supervised hyperspectral image classification is a difficult
task due to the unbalance between the high dimensionality of the data and the limited availability of
labeled training samples in real analysis scenarios. While the collection of labeled samples is generally
difficult, expensive and time-consuming, unlabeled samples can be generated in a much easier way. This
observation has fostered the idea of adopting semi-supervised learning techniques in hyperspectral image
classification. The main assumption of such techniques is that the new (unlabeled) training samples
can be obtained from a (limited) set of available labeled samples without significant effort/cost. In this
chapter, we develop a new approach for semi-supervised learning which adapts available active learning
methods (in which a trained expert actively selects unlabeled samples) to a self-learning framework in
which the machine learning algorithm itself selects the most useful and informative unlabeled samples for
classification purposes. In this way, the labels of the selected pixels are estimated by the classifier itself,
with the advantage that no extra cost is required for labeling the selected pixels using this machinemachine framework when compared with traditional machine-human active learning. The proposed
approach is illustrated with two different classifiers: MLR and a probabilistic pixel-wise SVM. Our
experimental results with real hyperspectral images collected by the NASA Jet Propulsion Laboratory’s
AVIRIS and ROSIS, indicate that the use of self-learning represents an effective and promising strategy
in the context of hyperspectral image classification.6
4.2
Introduction
Remotely sensed hyperspectral image classification [14] takes advantage of the detailed information
contained in each pixel (vector) of the hyperspectral image to generate thematic maps from detailed
6 Part of this chapter has been published in: I. Dopido, J. Li, P. R. Marpu, A. Plaza, J. M. Bioucas-Dias and J. A.
Benediktsson, Semi-Supervised Self-Learning for Hyperspectral Image Classification, IEEE Transactions on Geoscience
and Remote Sensing, vol. 51, no. 7, pp. 4032-4044, July 2013 [JCR(2012)=3.467].
Semi-Supervised Self-Learning for Hyperspectral Image Classification
spectral signatures. A relevant challenge for supervised classification techniques (which assume prior
knowledge in the form of class labels for different spectral signatures) is the limited availability of labeled
training samples, since their collection generally involves expensive ground campaigns [71]. While the
collection of labeled samples is generally difficult, expensive and time-consuming, unlabeled samples can
be generated in a much easier way. This observation has fostered the idea of adopting semi-supervised
learning techniques in hyperspectral image classification. The main assumption of such techniques is
that new (unlabeled) training samples can be obtained from a (limited) set of available labeled samples
without significant effort/cost [72].
The area of semi-supervised learning has experienced a significant evolution in terms of the adopted
models, which comprise complex generative models [73, 74, 75], self-learning models [76, 77], multi-view
learning models [78, 79], TSVMs [40, 41], and graph-based methods [80]. A survey of semi-supervised
learning algorithms is available in [17]. Most of these algorithms use some type of regularization
which encourages the fact that “similar” features are associated to the same class. The effect of
such regularization is to push the boundaries between classes towards regions with low data density
[81], where the usual strategy adopted first associates the vertices of a graph to the complete set of
samples and then builds the regularizer depending on variables defined on the vertices. This trend has
been successfully adopted in several recent remote sensing image classification studies. For instance, in
[58] TSVMs are used to gradually search a reliable separating hyperplane (in the kernel space) with a
transductive process that incorporates both labeled and unlabeled samples in the training phase. In
[82], a semi-supervised method is presented that exploits the wealth of unlabeled samples in the image,
and naturally gives relative importance to the labeled ones through a graph-based methodology. In [83],
kernels combining spectral-spatial information are constructed by applying spatial smoothing over the
original hyperspectral data and then using composite kernels in graph-based classifiers. In [84], a semisupervised SVM is presented that exploits the wealth of unlabeled samples for regularizing the training
kernel representation locally by means of cluster kernels. In [85, 86], a new semi-supervised approach
is presented that exploits unlabeled training samples (selected by means of an active selection strategy
based on the entropy of the samples). Here, unlabeled samples are used to improve the estimation of
the class distributions, and the obtained classification is refined by using a spatial multi-level logistic
prior. In [87], a novel context-sensitive semi-supervised SVM is presented that exploits the contextual
information of the pixels belonging to the neighborhood system of each training sample in the learning
phase to improve the robustness to possible mislabeled training patterns. In [88], two semi-supervised
one-class (SVM-based) approaches are presented in which the information provided by unlabeled samples
present in the scene is used to improve classification accuracy and alleviate the problem of free-parameter
selection. The first approach models data marginal distribution with the graph Laplacian built with both
labeled and unlabeled samples. The second approach is a modification of the SVM cost function that
penalizes more the errors made when classifying samples of the target class. In [89] a new method to
combine labeled and unlabeled pixels to increase classification reliability and accuracy, thus addressing
the sample selection bias problem, is presented and discussed. In [90], an SVM is trained with the linear
combination of two kernels: a base kernel working only with labeled examples is deformed by a likelihood
kernel encoding similarities between labeled and unlabeled examples, and then applied in the context of
urban hyperspectral image classification. In [91], similar concepts to those addressed before are adopted
50
4.2 Introduction
using a neural network as the baseline classifier. In [92], a semi-automatic procedure to generate land
cover maps from remote sensing images using active queries is presented and discussed.
In contrast to supervised classification, the aforementioned semi-supervised algorithms generally
assume that a limited number of labeled samples are available a priori, and then enlarge the training
set using unlabeled samples, thus allowing these approaches to address ill-posed problems. However,
in order for this strategy to work, several requirements need to be met. First and foremost, the new
(unlabeled) samples should be generated without significant cost/effort. Second, the number of unlabeled
samples required in order for the semi-supervised classifier to perform properly should not be too high
in order to avoid increasing computational complexity in the classification stage. In other words, as
the number of unlabeled samples increases, it may be unbearable for the classifier to properly exploit
all the available training samples due to computational issues. Further, if the unlabeled samples are
not properly selected, these may confuse the classifier, thus introducing significant divergence or even
reducing the classification accuracy obtained with the initial set of labeled samples. In order to address
these issues, it is very important that the most highly informative unlabeled samples are identified in
computationally efficient fashion, so that significant improvements in classification performance can be
observed without the need to use a very high number of unlabeled samples.
In this chapter, we evaluate the feasibility of adapting available active learning techniques (in which
a trained expert actively selects unlabeled samples) to a self-learning framework in which the machine
learning algorithm itself selects the most useful unlabeled samples for classification purposes, with the
ultimate goal of systematically achieving noticeable improvements in classification results with regards
to those found by randomly selected training sets of the same size. In the literature, active learning
techniques have been mainly exploited in a supervised context, i.e., a given supervised classifier is
trained with the most representative training samples selected after a (machine-human) interaction
process in which the samples are actively selected according to some criteria based on the considered
classifier, and then the labels of those samples are assigned by a trained expert in fully supervised fashion
[32, 35, 86, 93, 94, 95]. In this supervised context, samples with high uncertainty are generally preferred
as they are usually more informative. At the same time, since the samples are labeled by a human expert,
high confidence can be expected in the class label assignments. As a result, classic (supervised) active
learning generally focuses on samples with high confidence at the human level and high uncertainty at
the machine level.
In turn, we adapt standard active learning methods into a self-learning scenario. The main idea is to
obtain new (unlabeled) samples using machine-machine interaction instead of human supervision. Our
first (machine) level –similar to the human level in classic (supervised) active learning– is used to infer
a set of candidate unlabeled samples with high confidence. In our second (machine) level –similar to the
machine level for supervised active learning– the machine learning algorithm itself automatically selects
the samples with highest uncertainty from the obtained candidate set. As a result, in our proposed
approach the classifier replaces the human expert. In other words, here we propose a novel two-step
semi-supervised self-learning approach:
1. The first step infers a candidate set using a self-learning strategy based on the available (labeled
and unlabeled) training samples. Here, a spatial neighborhood criterion is used to derive new
51
Semi-Supervised Self-Learning for Hyperspectral Image Classification
candidate samples as those which are spatially adjacent to the available (labeled) samples.
2. The second step automatically selects (and labels) new samples from the candidate pool by
assuming that those pixels which are spatially adjacent to a given class can be labeled with high
confidence as belonging to the same class.
As a result, our proposed strategy relies on two main assumptions. The first assumption (global) is
that training samples having the same spectral structure likely belonging to the same class. The second
assumption (local) is that spatially neighboring pixels likely belong to the same class. As a result, our
proposed approach naturally integrates the spatial and the spectral information in the semi-supervised
classification process.
The remainder of the chapter is organized as follows. Section 4.3 describes the proposed approach
for semi-supervised self-learning. We illustrate the proposed approach with two probabilistic classifiers:
MLR and a probabilistic pixel-wise SVM, which are both shown to achieve significant improvements
in classification accuracy resulting from its combination with the proposed semi-supervised self-learning
approach. Section 4.4 reports classification results using two real hyperspectral images collected by
AVIRIS [20] and ROSIS [49] imaging spectrometers. Finally, section 4.5 concludes with some remarks
and hints at plausible future research lines.
4.3
Proposed approach
First, we briefly define the notations used in this chapter. Let K ≡ {1, . . . , K} denote a set
of K class labels, S ≡ {1, . . . , n} a set of integers indexing the n pixels of an image, x ≡
(x1 , . . . , xn ) ∈ Rd×n an image of d-dimensional feature vectors, y ≡ (y1 , . . . , yn ) an image of labels,
Dl ≡ {(yl1 , xl1 ), . . . , (yln , xln )} a set of labeled samples, ln the number of labeled training samples,
Yl ≡ {yl1 , . . . , yln } the set of labels in Dl , Xl ≡ {xl1 , . . . , xln } the set of feature vectors in Dl ,
Du ≡ {Xu , Yu } a set of unlabeled samples, Xu ≡ {xu1 , . . . , xun } the set of unlabeled feature vectors in
Du , Yu ≡ {yu1 , . . . , yun } the set of labels associated with Xu , and un the number of unlabeled samples.
With this notation in mind, the proposed semi-supervised self-learning approach consists of two main
ingredients: semi-supervised learning and self-learning, which are described next.
4.3.1
Semi-supervised learning
For the semi-supervised part of our approach, we use two different probabilistic classifiers [96] to model
the class posterior density. The first one is the MLR, which is formally given by [30]:
T
exp(ω (k) h(xi ))
p(yi = k|xi , ω) = ∑K
,
(k)T h(x ))
i
k=1 exp(ω
(4.1)
where h(x) = [h1 (x), ..., hl (x)]T is a vector of l fixed functions of the input, often termed features; ω
T
T
are the regressors and ω = [ω (1) , ..., ω (K) ]T . Notice that, the function h may be linear, i.e., h(xi ) =
[1, xi,1 , ..., xi,d ]T , where xi,j is the j-th component of xi ; or nonlinear, i.e., h(xi ) = [1, Kxi ,x1 , ..., Kxi ,xl ]T ,
52
4.3 Proposed approach
where Kxi ,xj = K(xi , xj ) and K(·, ·) is some symmetric kernel function. Kernels have been largely
used because they tend to improve the data separability in the transformed space. We use a RBF
K(xi , xj ) = exp(−∥xi − xj ∥2 /2σ 2 ) kernel, which is widely used in hyperspectral image classification
[16]. We selected this kernel (after extensive experimentation using other kernels, including linear and
polynomial kernels) because we empirically observed that it provided the best results. From now on,
d denotes the dimension of h(x). Under the present setup, learning the class densities amounts to
estimating the logistic regressors. Following the work in [29, 39], we can compute ω by obtaining MAP
estimate:
b = arg max
ω
ω
ℓ(ω) + log p(ω),
(4.2)
where p(ω) ∝ exp(−λ∥ω∥1 ) is a Laplacian prior to promote sparsity and λ is a regularization parameter
b in [29, 39]. In our previous work [29], it was shown that
controlling the degree of sparseness of ω
parameter λ is rather insensitive to the use of different datasets, and that there are many suboptimal
values for this parameter which lead to very accurate estimation of parameter ω. In our experiments, we
set λ = 0.001 as we have empirically found that this parameter setting provides very good performance
[97]. Finally, ℓ(ω) is the log-likelihood function over the training samples Dl+u ≡ Dl + Du , given by:
ℓ(ω)
≡
ln∑
+un
(4.3)
log p(yi = k|xi , ω).
i=1
As shown by Eq. (4.3), labeled and unlabeled samples are integrated to learn the regressors ω.
The considered semi-supervised approach belongs to the family of self-learning approaches, where the
training set Dl+u is incremented under the following criterion. Let DN (i) ≡ {(b
yi1 , xi1 ), . . . , (b
yin , xin )}
be the set of neighboring samples of (yi , xi ) for i ∈ {l1 , . . . , ln , u1 , . . . , un }, where in is the number of
samples in DN (i) and ybij is MAP estimate from the MLR classifier, with ij ∈ {i1 , . . . , in }. If ybij = yi ,
we increment the unlabeled training set by adding (b
yij , xij ), i.e., Du = {Du , (b
yij , xij )}. This increment
is reasonable due to the following considerations. First, from a global viewpoint, samples which have
the same spectral structure likely belong to the same class. Second, from a local viewpoint, it is very
likely that two neighboring pixels also belong to the same class. Therefore, the newly included samples
are reliable for learning the classifier. In this chapter, we run an iterative scheme to increment the
training set as this strategy can refine the estimates and enlarge the neighborhood set such that the set
of potential unlabeled training samples is increased.
It is important to mention that problem (4.2), although convex, is very difficult to compute because
the term ℓ(ω) is non-quadratic and the term log p(ω) is non-smooth. SMLR algorithm presented in [39]
solves this problem with O((d(K − 1))3 ) complexity. However, most hyperspectral data sets are beyond
the reach of this algorithm as their analysis becomes unbearable when the number of classes increases.
In order to address this issue, we take advantage of LORSAL algorithm [28] which allows replacing
a difficult non-smooth convex problem with a sequence of quadratic plus diagonal l2 -l1 problems with
practical complexity of O(d2 (K − 1)). Compared with the figure O((d(K − 1))3 ) of the SMLR algorithm,
the complexity reduction of d(K − 1)2 is quite significant [28, 29].
Finally, we have also used an alternative probabilistic classifier for the semi-supervised learning part of
our methodology. This is the probabilistic SVM in [98, 40]. Other probabilistic classifiers could be used,
53
Semi-Supervised Self-Learning for Hyperspectral Image Classification
but we have selected the SVM as a possible alternative to MLR since this classifier is already widely used
to analyze hyperspectral data [58, 82], while the MLR has only recently emerged as a feasible technique
for this purpose. It should be noted that the standard SVMs do not provide probability estimates for
the individual classes. In order to get these estimates, pairwise coupling of binary probabilistic estimates
is applied [98, 99], which has been applied for hyperspectral classifications [100].
4.3.2
Self-learning
The proposed semi-supervised self-learning approach is based on two steps. In the first step, a candidate
set (based on labeled and unlabeled samples) is inferred using a self-learning strategy based on spatial
information, so that high confidence can be expected in the class labels of the obtained candidate set.
This is similar to human interaction in classic (supervised) active learning, in which the class labels are
known and given by an expert. In a second step, we run standard active learning algorithms on the
previously derived candidate set, so that they are adapted to a self-learning scenario to automatically
(and intelligently) select the most informative samples from the candidate set. Here, the goal is to find
the samples with higher uncertainty.
As a result, in the proposed semi-supervised self-learning scheme our aim is to select the most
informative samples without the need for human supervision. The class labels of the newly selected
unlabeled training samples are predicted by the considered semi-supervised algorithm as mentioned in
subsection 4.3.1. Let Dc be the newly generated unlabeled training set at each iteration, which meets the
criteria of the considered semi-supervised algorithm. Notice that the self-learning step in the proposed
approach leads to high confidence in the class labels of the newly generated set Dc . Now we can run
standard active learning algorithms over Dc to find the most informative set Du , i.e., samples with high
uncertainty, such that Du ⊆ Dc . Due to the fact that we use discriminative classifiers and a self-learning
strategy for the semi-supervised algorithm, algorithms which focus on the boundaries between the classes
are preferred. In our study, we use four different techniques to evaluate the proposed approach [90]: 1)
MS, 2) BT, 3) MBT, and 4) nEQB, in addition to random selection (RS) in which the new samples are
randomly selected from the candidate set. In the following we briefly outline each method (for a more
detailed description of these approaches, we refer to [32, 86]):
• The MS technique [32] samples the candidates lying within the margin by computing their distance
to the hyperplane separating the classes. In other words, the MS minimizes the distance of
the sample to the optimal separating hyperplane defined for class in a one-against-all setting for
multiclass problems.
• The BT algorithm [21] relies on the smallest difference of the posterior probabilities for each
sample. In a multi-class setting, the algorithm can be applied (independently of the number of
classes available) by calculating the difference between the two highest probabilities. As a result,
the algorithm finds the samples minimizing the distance between the first two most probable classes.
In previous work [29], it has been shown that the BT criterion generally focuses on the boundaries
comprising many samples, possibly disregarding boundaries with fewer samples.
• The MBT scheme [29] was originally proposed to include more diversity in the sampling process as
54
4.3 Proposed approach
compared to the BT approach. It finds the samples maximizing the probability of the largest class
for each individual class. This method takes into account all the class boundaries by conducting the
sampling in cyclic fashion, making sure that the MBT does not get trapped in any class whereas
BT could be trapped in a single (complex) boundary.
• The nEQB approach [35] is a form of committee-based sampling algorithm that quantifies the
uncertainty of a pixel by considering a committee of learners. Each member of the committee
exploits different hypotheses about the classification problem and consequently labels the pixels
in the pool of candidates. The algorithm then selects the samples showing maximal disagreement
between the different classification models in the committee. Specifically, the nEQB approach
uses bagging to build the committee and Entropy maximization as the multiclass heuristic, which
provides a measure that is then normalized in order to bound it with respect to the number of
classes predicted by the committee and avoid hot spots of the value of uncertainty in regions where
several classes overlap. The version of nEQB used in this work is the one implemented in7 .
At this point, it is important to emphasize that the aforementioned sampling algorithms have been
used for intelligently selecting the most useful candidate samples based on the available probabilistic
information. As a result, spatial information is not directly addressed by these methods, but by the
strategy adopted to generate the pool of candidate samples. Since spatial information is the main
criterion adopted in this stage, there is a risk that the initial pool of candidate samples may smooth
out broad areas in the scene. However, we emphasize that our proposed method for generating the
pool of initial candidates is not exclusively spatial as we use the probabilistic information provided by
spectral-based classifiers (such as MLR or probabilistic SVM) in order to assess the similarity between
the previously selected samples and the new candidates. Hence, as we have experimentally observed, no
significant smoothing effects happen in broad areas and good initial candidates are generally selected.
It is also worth noting that we use two classifiers with probabilistic output that are well-suited for
the aforementioned algorithms (MLR and probabilistic SVM). However, the proposed approach can be
adapted to any other probabilistic classifiers.
For illustrative purposes, Fig. 4.1 shows how spatial information can be adopted as a reasonable
criterion to select unlabeled samples and prevent labeling errors in a semi-supervised classification
process using a probabilistic classifier. As Fig. 4.1 shows, we use an iterative process to achieve the
final classification results. First, we use a probabilistic classifier (the MLR or the probabilistic SVM) to
produce a global classification map which contains the probability of each pixel to belong to each class
in the considered hyperspectral image. Based on a local similarity assumption, we identify the neighbors
of the labeled training samples (using first-order spatial connectivity) and then compute the candidate
set Dc by analyzing the spectral similarity of the spatial neighbors with regards to the original labeled
samples. This is done by analyzing the probabilistic output associated to each neighboring sample. In
this way, the candidate set Dc is obtained based on spectral and spatial information and its samples are
highly reliable. At the same time, it is expected that there may be redundant information in Dc . In
other words, some of the samples in the candidate set may not be useful for training the classifier as they
may be too similar to the original labeled samples. This could introduce difficulties from the viewpoint
7 http://code.google.com/p/altoolbox
55
Figure 4.1: A graphical example illustrating how spatial information can be used as a criterion for semi-supervised self-learning in
hyperspectral image classification.
Semi-Supervised Self-Learning for Hyperspectral Image Classification
56
4.4 Experimental results
of computational complexity. Therefore, after Dc is obtained, we run active learning algorithms on the
candidate set in order to automatically select the most informative unlabeled training samples. Since
the active learning algorithms are based on the available probabilistic information, they are adapted to
a self-learning scenario and used to intelligently reduce possibly existing redundancies in the candidate
set, thus obtaining a highly informative pool of training samples which ultimately contain only the most
relevant samples for classification purposes. The newly obtained labeled and unlabeled training samples
are finally used to retrain the classifier. The procedure is repeated in iterative fashion until a convergence
criterion is met, for example, until a certain number of unlabeled training samples is obtained.
4.4
Experimental results
In this section, two real hyperspectral images are used to evaluate the proposed approach for semisupervised self-learning. In our experiments with the MLR and SVM classifiers, we apply the Gaussian
RBF kernel to a normalized version of the considered hyperspectral data set8 . We reiterate that the
Gaussian RBF kernel was selected after extensive experimentation with other kernels. In all cases, the
reported figures of OA, AA, kappa statistic, and class individual accuracies are obtained by averaging the
results obtained after conducting 10 independent Monte Carlo runs with respect to the labeled training
set Dl from the ground truth image, where the remaining samples are used for validation purposes.
Finally, the optimal parameters C (parameter that controls the amount of penalty during the SVM
optimization [40]) and σ (spread of the Gaussian RBF kernel) were chosen by 10-fold cross validation.
These parameters are updated at each iteration.
In order to illustrate the good performance of the proposed approach, we use very small labeled
training sets on purpose. As a result, the main difficulties that our proposed approach should circumvent
can be summarized as follows. First and foremost, it is very difficult for supervised algorithms to provide
good classification results as very little information is generally available about the class distribution.
Poor generalization is also a risk when estimating class boundaries in scenarios dominated by limited
training samples. Since our approach is semi-supervised, we take advantage of unlabeled samples in order
to improve classification accuracy. However, if the number of labeled samples l is very small, increasing
the number of unlabeled samples u could bias the learning process.
In order to analyze the aforementioned issues and provide a quantitative evaluation of our proposed
approach with regards to the optimal case in which true active learning methods (i.e., those relying on
the knowledge of the true labels of the selected samples) were used, we have implemented the following
validation framework. Let Dur be a set of unlabeled samples for which true labels are available. These
samples are included in the ground-truth associated to the hyperspectral image but are not used in
the set of labeled samples used initially by the classifier. In order to evaluate the effectiveness of the
proposed approach, we can effectively label these samples in Dur using their true (ground-truth) labels
instead of estimating the labels by our proposed approach. Clearly, these samples will be favored over
those selected by our proposed method which makes use of estimated labels. But it is interesting to
quantify such an advantage (the lower it is, the better for our method). Following this rationale, the
8 The
normalization is simply given by xi := √∑x i
(
∥xi ∥2 )
, for i = 1, . . . , n, where xi is a spectral vector.
57
Semi-Supervised Self-Learning for Hyperspectral Image Classification
optimal case is that most samples in Du have true labels available, which means that Dur contains most
of the unlabeled samples in Du . In our experiments, we denote by lr the number of unlabeled samples
for which a true label is available in the ground-truth associated to the considered hyperspectral image.
If lr = 0, this means that the labels of all unlabeled samples are estimated by our proposed approach.
If lr = ur , this means that true labels are available for all the samples in Dur . Using this strategy, we
can substantiate the deviation of our proposed approach with regards to the optimal case in which true
labels for the selected samples are available. Typically, true labels will be only available for part of the
samples as the considered hyperspectral data sets do not contain ground-truth information for all pixels.
In this scenario, the optimal case comprises both true (whenever available) and estimated labels (the
value of lr is given in all experiments).
The remainder of this section is organized as follows. In subsection 4.4.1, we describe the experiments
conducted using the first data set: AVIRIS Indian Pines [101]. Finally, subsection 4.4.2 conducts
experiments using a second data set: ROSIS Pavia University [102]. In all cases, the results obtained by
the supervised versions of the considered classifiers are also reported for comparative purposes.
4.4.1
Experiments with AVIRIS Indian Pines data set
In the first experiment we evaluated the impact of the number of unlabeled samples on the classification
performance achieved by the two considered probabilistic classifiers using the AVIRIS Indian Pines data
set in Fig. 2.3(a), described in subsection 2.4.1.1. Fig. 4.2 shows the OAs in classification accuracy as a
function of the number of unlabeled samples obtained by the MLR (top) and probabilistic SVM (bottom)
classifiers, respectively. The plots in Fig. 4.2, which were generated using estimated labels only, reveal
clear advantages of using unlabeled samples for the proposed semi-supervised self-learning approach
when compared with the supervised algorithm alone. In all cases, the proposed strategy outperforms
the corresponding supervised algorithm significantly, and the increase in performance is more relevant
as the number of unlabeled samples increases. These unlabeled samples are automatically selected by
the proposed approach, and represent no cost in terms of data collection or human supervision which
are key aspects for self-learning.
In Fig. 4.2 it can also be seen that using intelligent training sample selection algorithms such as
MS, BT, MBT or nEQB greatly improved the obtained accuracies in comparison with simple random
selection (RS). The results in Fig. 4.2 also reveal that BT outperformed other strategies in most cases,
with MBT providing lower classification accuracies than BT. This is expected, as the candidate set Dc is
more relevant when the samples are obtained from the class boundaries. Finally, it can also be observed
that the MLR always performed better than the probabilistic SVM in terms of classification accuracies.
In order to show the classification results in more details, Tables 4.1 to 4.4 show OA, AA, individual
classification accuracies (in percentage) and the kappa statistic obtained by the supervised MLR and
probabilistic SVM –trained using only 10 labeled samples per class– and by the proposed approach
(based on the same classifier) using the four considered sample selection algorithms (executed using 30
iterations) in comparison with the optimal case for the same algorithms, in which true labels are used
whenever available in the ground-truth. In all cases, we report the value of lr to provide an indication of
the number of true versus estimated labels used in the experiments. It is noticeable that, by including
58
59
ln = 240 (SVM classifier)
ln = 240 (MLR classifier)
Figure 4.2: OA (as a function of the number of unlabeled samples) obtained for the AVIRIS Indian Pines data set using the MLR
(right) and probabilistic SVM (left) classifier, respectively. Estimated labels were used in all the experiments, i.e., lr = 0.
ln = 160 (SVM classifier)
ln = 80 (SVM classifier)
ln = 160 (MLR classifier)
ln = 80 (MLR classifier)
4.4 Experimental results
Alfalfa (54)
Corn-Notill (1434)
Corn-Min (834)
Corn (234)
Grass-Pasture (497)
Grass-Trees (747)
Grass-Pasture-Mowed (26)
Hay-Windrowed (489)
Oats (20)
Soybeans-Notill (968)
Soybeans-Min (2468)
Soybeans-Clean (614)
Wheat (212)
Woods (1294)
Bldg-Grass-Tree-Drives (380)
Stone-Steel-Towers (95)
OA
AA
kappa
Supervised
MLR classifier
MS
lr = 0
lr = 683
83.64±5.12
84.55±6.10
86.82±5.00
48.38±6.54
71.64±6.05
75.23±6.07
47.65±7.33 66.36±12.63 72.73±12.55
70.63±9.43
85.76±8.13
85.49±5.74
75.42±7.35
85.50±4.93
87.37±7.43
86.01±4.61
96.54±1.17
96.65±1.21
88.12±6.88
93.75±6.62
87.50±5.89
88.89±5.41
97.45±0.82
97.43±0.89
98.00±4.22 96.00±11.35 95.00±10.80
58.68±9.18
80.87±7.17
83.39±7.99
44.85±10.85 72.51±4.70
74.49±7.29
52.50±9.91 80.88±10.40 85.02±7.99
98.76±1.57
99.21±0.33
99.26±0.42
75.63±9.38
92.40±3.41
93.23±3.76
50.84±7.65
66.70±7.56
65.62±6.12
79.88±8.22
82.94±7.91 84.12±10.90
60.12± 3.08 80.00± 1.09
82.14±5.88
71.74± 1.54 84.57± 1.03
85.58±3.60
55.43± 3.20 77.31± 1.26
79.74±6.50
BT
lr = 0
lr = 668
85.00±6.43
84.77±5.87
72.88±4.58
74.23±4.32
64.60±12.79 72.28±11.97
87.54±5.86
88.04±4.53
85.48±5.32
88.67±5.57
95.97±2.02
97.06±1.17
93.75±5.47
86.88±8.56
98.27±0.55
98.16±0.64
97.00±11.35 96.00±6.99
83.36±7.39
86.03±5.47
70.14±5.28
72.76±5.72
82.04±9.54
86.61±6.53
99.16±0.41
99.31±0.71
94.21±5.14
94.07±2.80
67.38±11.11 68.86±7.84
80.94±7.75
83.29±9.79
80.04± 1.28
82.28±6.12
84.86± 1.53
86.06±3.86
77.39± 1.45
79.93±6.79
Table 4.1: OA, AA, individual classification accuracies [statistic obtained using the MLR probabilistic classifier when applied to the
AVIRIS Indian Pines hyperspectral data set, with 10 labeled samples per class (160 samples in total) and un = 750 unlabeled training
samples. It is applied two active learning techniques (MS and BT), and the supervised case is also reported. lr denotes the number
of true labels available in Du (used to implement an optimal version of each sampling algorithm). The standard deviations are also
reported for each test.
Semi-Supervised Self-Learning for Hyperspectral Image Classification
60
Alfalfa (54)
Corn-Notill (1434)
Corn-Min (834)
Corn (234)
Grass-Pasture (497)
Grass-Trees (747)
Grass-Pasture-Mowed (26)
Hay-Windrowed (489)
Oats (20)
Soybeans-Notill (968)
Soybeans-Min (2468)
Soybeans-Clean (614)
Wheat (212)
Woods (1294)
Bldg-Grass-Tree-Drives (380)
Stone-Steel-Towers (95)
OA
AA
kappa
lr = 0
87.27±2.92
72.23±3.86
63.86±10.46
92.23±2.45
87.08±6.30
96.53±1.23
89.38±7.25
98.77±0.39
99.00±3.16
79.84±7.40
62.58±8.20
85.45±8.62
99.60±0.31
94.81±3.74
66.89±7.02
91.06±3.19
78.34± 2.11
85.41± 1.12
75.59± 2.29
nEQB
lr = 0
lr = 603
82.50±3.40
81.14±4.92
77.96±4.56
73.62±3.16
64.82±11.64 69.14±10.11
86.38±6.30 80.40±13.18
79.49±8.35
83.78±7.28
91.37±5.16
93.31±2.93
90.63±4.42
88.12±9.97
99.19±0.33
96.43±1.75
97.00±6.75
96.00±6.99
82.00±8.82
81.86±6.29
68.04±5.60
69.29±5.43
83.77±10.90 87.28±6.05
98.96±0.28
97.77±0.85
86.45±10.15 82.32±7.40
78.30±12.87 72.73±7.75
79.53±5.74 85.06±10.23
79.02±1.53
79.64±4.88
84.15±1.24
83.64±3.05
76.31±1.66
76.85±5.40
MLR classifier
lr = 646
89.09±3.18
72.16±5.00
68.50±8.56
90.67±6.48
89.45±5.96
97.08±1.77
90.63±5.31
98.60±0.61
99.00±3.16
83.25±5.37
65.36±5.96
85.12±9.42
99.31±0.35
93.78±3.95
67.51±7.20
90.82±3.91
79.68±5.28
86.27±3.84
77.08±5.85
MBT
RS
lr = 0
lr = 747
79.55±4.48
80.23±5.87
60.25±7.97
61.84±9.02
53.39±8.47
53.18±6.63
66.29±16.34 71.74±12.94
81.79±5.15
83.59±6.71
94.02±2.75
94.12±2.96
85.00±6.72
86.25±5.74
96.74±1.33
96.35±1.38
99.00±4.22
98.00±4.22
67.47±11.43 65.50±11.99
50.81±12.98 54.02±8.23
61.79±12.36 65.71±11.30
99.55±0.28
99.50±0.33
88.86±6.18
89.55±6.78
55.38±8.20
54.16±9.98
77.53±8.55
78.00±7.73
68.01± 3.04
69.28±2.63
76.09± 1.76
76.98±1.46
64.01± 3.30
65.39±2.86
Table 4.2: OA, AA, individual classification accuracies [and kappa statistic obtained using the MLR probabilistic classifier when
applied to the AVIRIS Indian Pines hyperspectral data set, with 10 labeled samples per class (160 samples in total) and un = 750
unlabeled training samples. It is applied two active learning techniques (MBT and nEQB), and the random sampling case (RS) is also
reported. lr denotes the number of true labels available in Du (used to implement an optimal version of each sampling algorithm).
The standard deviations are also reported for each test.
4.4 Experimental results
61
Alfalfa (54)
Corn-Notill (1434)
Corn-Min (834)
Corn (234)
Grass-Pasture (497)
Grass-Trees (747)
Grass-Pasture-Mowed (26)
Hay-Windrowed (489)
Oats (20)
Soybeans-Notill (968)
Soybeans-Min (2468)
Soybeans-Clean (614)
Wheat (212)
Woods (1294)
Bldg-Grass-Tree-Drives (380)
Stone-Steel-Towers (95)
OA
AA
kappa
Probabilistic SVM classifier
Supervised
MS
lr = 0
lr = 695
79.77±12.70 75.23±8.67 65.23±11.19
32.32±14.21 63.90±13.67 77.46±1.89
37.17±19.56 56.70±25.76 80.24±3.09
68.62±10.32 87.95±3.29
89.24±1.73
77.19±7.29
87.54±7.09
91.21±3.01
65.36±14.50 93.96±2.75
91.90±2.82
90.63±6.75
90.00±7.34
93.75±2.95
78.06±8.12
95.80±1.75
97.70±0.60
97.00±6.75
93.00±9.49
100.00
49.42±18.23 80.96±7.68
88.68±3.02
33.90±12.83 65.50±12.51 65.98±2.15
43.31±12.88 77.90±10.32 90.79±2.09
93.61±3.96
98.37±1.07
97.82±1.40
72.39±15.02 89.24±6.07
93.90±1.92
47.84±14.90 68.11±14.08 64.95±5.97
86.35±10.26 96.35±4.72
93.53±3.65
50.61±5.34
75.87±3.44
81.82±7.54
65.93±2.99
82.53±2.03
86.40±4.47
45.14±5.35
72.76±3.76
79.49±8.26
BT
lr = 0
lr = 717
84.32±3.78
84.77±3.72
62.97±15.49 76.54±3.16
58.12±24.62 76.58±4.23
82.10±13.80 86.38±3.52
89.16±6.02
93.37±1.35
95.29±2.62
94.02±2.53
92.50±4.93
95.00±3.95
97.89±0.89
98.10±0.46
93.00±6.75
99.00±3.16
82.03±8.88
91.39±2.14
63.36±15.50 68.60±2.36
81.42±11.08 91.42±1.24
98.66±0.81
97.52±1.34
92.94±4.58
97.34±0.40
66.81±16.28 61.97±3.04
93.18±5.62
90.82±3.79
76.23±5.40
82.91±0.75
83.36±2.15 87.68± 0.67
73.18±5.81
80.71±0.83
Table 4.3: OA, AA, individual classification accuracies [kappa statistic obtained using the probabilistic SVM classifier when applied
to the AVIRIS Indian Pines hyperspectral data set, with 10 labeled samples per class (160 samples in total) and un = 750 unlabeled
training samples. It is applied two active learning techniques (MS and BT), and the supervised case is also reported. lr denotes the
number of true labels available in Du (used to implement an optimal version of each sampling algorithm). The standard deviations
are also reported for each test.
Semi-Supervised Self-Learning for Hyperspectral Image Classification
62
Alfalfa (54)
Corn-Notill (1434)
Corn-Min (834)
Corn (234)
Grass-Pasture (497)
Grass-Trees (747)
Grass-Pasture-Mowed (26)
Hay-Windrowed (489)
Oats (20)
Soybeans-Notill (968)
Soybeans-Min (2468)
Soybeans-Clean (614)
Wheat (212)
Woods (1294)
Bldg-Grass-Tree-Drives (380)
Stone-Steel-Towers (95)
OA
AA
kappa
Probabilistic SVM classifier
MBT
nEQB
lr = 0
lr = 649
lr = 0
lr = 701
89.77±3.08 85.91±0.96 80.00±12.21 55.45±7.74
51.33±19.49 59.70±2.85 60.72±17.53 75.67±2.12
55.98±22.21 72.34±2.15 55.42±22.33 77.97±1.64
81.03±13.28 84.06±2.72 86.38±4.02 86.34±4.26
88.17±6.40 93.24±1.23 82.40±6.03 90.60±2.99
90.39±4.96 88.66±2.22 87.72±7.29 92.29±2.42
90.00±4.37 93.75±2.95 89.38±6.62 93.13±1.98
98.52±1.19 98.27±0.43 93.26±3.95 97.93±1.38
95.00±12.69
100.00
98.00±4.22 97.00±4.83
72.13±24.41 87.21±2.60 71.34±27.13 85.75±2.73
50.16±12.02 53.59±5.69 58.33±23.25 62.12±2.40
63.00±17.91 84.39±7.02 76.71±13.10 92.04±1.71
98.22±2.40 99.01±0.52 97.28±0.91 97.48±1.00
92.10±6.25 97.81±0.55 77.73±10.45 90.73±2.72
65.46±8.72 58.51±4.37 72.54±12.16 64.86±5.76
88.35±9.87 83.18±2.29 94.47±5.82 87.41±4.11
68.66±5.35 75.26±1.39 70.47±5.24 79.69±0.62
79.35±2.16 83.73±0.79 80.10±2.43 84.17±0.65
64.90±5.75 72.39±1.49 66.79±5.65 77.14±0.67
RS
lr = 0
lr = 740
82.05±7.68
66.14±7.98
44.56 ±18.39 55.32±3.61
43.28±25.34
61.77±6.22
72.50±13.19
85.49±2.64
85.73±5.77
89.45±2.47
88.36±5.99
82.63±4.95
87.50 ±8.33
93.13±1.98
93.49±4.39
97.24±0.67
95.00±7.07
100.00
65.10±18.05
84.38±3.66
50.44±15.80 44.10±13.02
52.91±8.92
61.94±11.52
97.38±1.51
97.62±0.45
89.36±6.60
96.94±0.74
42.35±13.44
40.00±7.62
90.35±4.95
84.35±2.54
63.59±5.59
68.40±2.85
73.77±2.18
77.53±0.96
59.13±5.68
64.73±2.99
Table 4.4: OA, AA, individual classification accuracies [and kappa statistic obtained using the probabilistic SVM classifier when
applied to the AVIRIS Indian Pines hyperspectral data set, with 10 labeled samples per class (160 samples in total) and un = 750
unlabeled training samples. It is applied two active learning techniques (MBT and nEQB), and the random sampling case (RS) is also
reported. lr denotes the number of true labels available in Du (used to implement an optimal version of each sampling algorithm).
The standard deviations are also reported for each test.
4.4 Experimental results
63
Semi-Supervised Self-Learning for Hyperspectral Image Classification
Supervised (60.12%)
MS (80.00%)
BT (80.04%)
MBT with (78.34%)
nEQB (79.02%)
RS (68.01%)
Figure 4.3: Classification maps and OA (in the parentheses) obtained after applying the MLR classifier
to the AVIRIS Indian Pines data set by using 10 labeled training samples and 750 unlabeled samples,
i.e., ln = 160, un = 750 and lr = 0.
64
4.4 Experimental results
unlabeled samples, the classification results are significantly improved in all cases. Furthermore, it can
be observed that the MLR classifier is more robust than the probabilistic SVM in our framework. For
example, with un = 750 and BT sampling, only 2.24% difference in classification can be observed between
the implementation using only estimated labels and the optimal case in which both true and estimated
labels are considered. However, for the probabilistic SVM classifier the difference is 6.67%. Similar
observations can be made for the other sampling algorithms considered in our experiments.
Figure 4.5: OA (as a function of the number of unlabeled samples) obtained for the AVIRIS Indian
Pines data set using the MLR classifier with BT sampling by using 5 labeled samples per class (in total
80 samples). Two cases are displayed: the one in which all unlabeled samples are estimated by the
proposed approach (i.e., lr = 0) and the optimal case, in which true labels are used whenever possible
(i.e., lr = ur ).
For illustrative purposes, Fig. 4.5 analyzes the convergence of our proposed approach by plotting
the obtained classification accuracies for the AVIRIS Indian Pines scene as a function of the number
of unlabeled samples, using only 5 labeled samples per class (in total 80 labeled samples) for the MLR
classifier with BT sampling approach. In the figure, we report the case in which all unlabeled samples
are estimated by the proposed approach (i.e., lr = 0) and also the optimal case in which true labels
are used whenever possible (i.e., lr = ur ). As can be seen in Fig. 4.5, the proposed approach achieved
good performance when compared with the optimal case, with a difference of about 5% in classification
accuracy when 3500 training samples were used.
Finally, Figs. 4.3 and 4.4 respectively show some of the classification maps obtained by the MLR and
probabilistic SVM classifiers for the AVIRIS Indian Pines scene. These classification maps correspond to
one of the 10 Monte-Carlo runs that were averaged in order to generate the classification scores reported
in Tables 4.1 to 4.4. Similarly, Fig.4.4 shows some of the classification maps obtained by the probabilistic
SVM classifier for the same scene, which correspond to one of the 10 Monte-Carlo runs that were averaged
in order to generate the classification scores reported in Tables 4.1 to 4.4. The advantages obtained by
adopting a semi-supervised learning approach with regards to the corresponding supervised case can be
65
Semi-Supervised Self-Learning for Hyperspectral Image Classification
clearly appreciated in the classification maps displayed in Fig. 4.3 and Fig. 4.4, which also report the
classification OAs obtained for each method in the parentheses.
4.4.2
Experiments with ROSIS Pavia University data set
In this subsection we perform a set of experiments to evaluate the proposed approach using the ROSIS
Pavia University dataset in Fig. 3.5, described in subsection 3.4.2. This problem represents a very
challenging classification scenario dominated by complex urban classes and nested regions. First, Fig.
4.6 shows how the OA results increase as the number of unlabeled samples increases, indicating again
clear advantages of using unlabeled samples for the proposed semi-supervised self-learning approach in
comparison with the supervised case. In this experiment, the four considered sample selection approaches
(MS, BT, MBT and nEQB) perform similarly and slightly better than simple random selection. For
instance, when ln = 45 labeled samples were used, the performance increase observed after including
un = 700 unlabeled samples with regards to the supervised case was 13.93% (for the MS), 13.86% (for
the BT), 10.27% (for the MBT) and 9.56% (for the nEQB). These results confirm our introspection that
the proposed semi-supervised self-learning approach can greatly assist in improving the results obtained
by different supervised classifiers based on limited training samples.
Furthermore, Tables 4.5 to 4.8 show OA, AA, individual classification accuracies (in percentage) and
the kappa statistic using only 10 labeled samples per class, in total, ln = 90 samples and un = 700
unlabeled samples for the semi-supervised cases in comparison with the optimal case, in which true
labels are used whenever available in the ground-truth. In all cases, we provide the value of lr to provide
an indication of the number of true versus estimated labels used in the experiments. It can be observed
from Table 4.5 to 4.8 that the proposed approach is quite robust as it achieved classification results which
are very similar to those found by the optimal case.
For example, by using the BT sampling algorithm the proposed approach obtained an OA of 83.73%
which is almost the same as the one obtained the optimal case, which achieved an OA of 84.07% by using
true labels whenever possible. This observation is confirmed by Fig. 4.9, which plots the classification
accuracy obtained (as a function of the number of unlabeled samples) for a case in which 100 labeled
training samples per class were used (a total 900 samples) for the MLR classifier with BT sampling
approach. In the figure, we report the case in which all unlabeled samples are estimated by the proposed
approach (i.e., lr = 0) and also the optimal case in which true labels are used whenever possible (i.e.,
lr = ur ). Although in this experiment the number of initial labeled samples is significant, it is remarkable
that the results obtained by the proposed approach using only estimated labels are almost the same than
those obtained with the optimal version using true labels, which means that the unlabeled training
samples estimated by the proposed approach are highly reliable in this experiment.
66
Asphalt (6631)
Meadows (18649)
Gravel (2099)
Trees (3064)
Metal Sheets (1345)
Bare Soil (5029)
Bitumen (1330)
Self-Blocking Bricks (3682)
Shadow (947)
OA
AA
kappa
MLR classifier
Supervised
MS
lr = 0
lr = 443
64.05±7.34 74.57±7.48 75.41±6.01
63.15±7.27 80.71±5.71 83.92±2.84
66.28±9.21 80.05±9.35 80.33±8.86
84.74±11.11 84.88±9.97 85.47±8.66
8.64±0.60
99.49±0.44 98.68±1.24
69.54±8.79 89.61±3.22 89.74±4.15
87.70±3.31 95.29±1.66 93.93±2.18
73.22±7.57 82.19±7.02 81.38±5.06
98.44±1.91 98.90±2.56 97.88±3.33
69.25±3.75 82.63±2.55 84.08±0.98
78.42±1.75 87.30±1.28 87.41±0.76
61.69±4.01 77.78±3.08 79.50±1.14
BT
lr = 0
lr = 356
72.62±4.97 74.72±6.14
83.33±4.49 84.62±2.24
82.07±9.31 81.09±9.00
88.07±8.87 83.32±9.45
99.29±0.36 99.32±0.47
89.59±3.86 88.93±4.62
96.17±0.99 95.39±2.02
80.99±7.09 80.48±4.46
99.12±1.79 98.60±1.86
83.73±1.86 84.07±1.52
87.92±1.13 87.39±1.25
79.12±2.23 79.45±1.90
Table 4.5: OA, AA, individual classification accuracies [%], and kappa statistic obtained using the MLR probabilistic classifier when
applied to the ROSIS University of Pavia hyperspectral data set by using 10 labeled samples per class (in total 90 samples) and
un = 700 unlabeled training samples. It is applied two active learning techniques (MS and BT), and the supervised case is also
reported. lr denotes the number of true labels available in Du (used to implement an optimal version of each sampling algorithm).
The standard deviations are also reported for each test.
4.4 Experimental results
67
Asphalt (6631)
Meadows (18649)
Gravel (2099)
Trees (3064)
Metal Sheets (1345)
Bare Soil (5029)
Bitumen (1330)
Self-Blocking Bricks (3682)
Shadow (947)
OA
AA
kappa
MBT
lr = 0
lr = 412
71.43±4.75 71.54±4.57
77.35±3.56 80.57±4.67
79.24±9.19 81.55±7.65
94.41±3.58 88.45±6.82
99.77±0.21 99.70±0.29
82.45±6.58 86.31±4.13
96.53±1.18 96.52±1.17
82.87±6.76 77.83±7.48
98.98±1.88 99.30±0.49
80.59±1.38 81.72±1.96
87.00±0.77 86.86±0.73
75.44±1.61 76.70±2.27
nEQB
lr = 0
lr = 365
72.91±7.37 72.40±7.63
74.08±6.95 81.18±4.75
81.86±7.50 82.28±7.59
91.46±4.05 85.64±8.83
98.79±0.82 98.85±0.82
71.29±6.26 82.99±5.29
85.39±7.60 90.26±5.56
79.29±9.17 80.16±8.40
99.88±0.15 99.55±0.72
77.33±3.80 81.5±51.54
83.88±2.30 85.92±0.96
71.27±4.53 76.36±1.74
MLR classifier
RS
lr = 0
lr = 558
66.40±7.55
68.85±6.03
76.23±6.93
81.01±4.62
73.44±7.55 77.21±10.77
82.97±9.16
85.04±5.01
99.03±0.48
98.87±0.54
76.84±11.58 82.57±8.73
92.07±3.52
93.27±3.79
76.08±7.85 75.74±10.34
98.85±2.28
99.52±0.32
76.81±3.38
80.30±2.54
82.43±1.60
84.68±1.39
70.45±3.86
74.75±3.03
Table 4.6: OA, AA, individual classification accuracies [%], and kappa statistic obtained using the MLR probabilistic classifier when
applied to the ROSIS University of Pavia hyperspectral data set by using 10 labeled samples per class (in total 90 samples) and
un = 700 unlabeled training samples. It is applied two active learning techniques (MBT and nEQB), and the random sampling case
(RS) is also reported. lr denotes the number of true labels available in Du (used to implement an optimal version of each sampling
algorithm). The standard deviations are also reported for each test.
Semi-Supervised Self-Learning for Hyperspectral Image Classification
68
Asphalt (6631)
Meadows (18649)
Gravel (2099)
Trees (3064)
Metal Sheets (1345)
Bare Soil (5029)
Bitumen (1330)
Self-Blocking Bricks (3682)
Shadow (947)
OA
AA
kappa
Probabilistic SVM classifier
Supervised
MS
lr = 0
lr = 454
60.43± 8.23 75.71± 12.63 76.25±9.46
54.36± 9.43
68.35± 7.10
69.95±6.72
62.23± 10.33 75.72±14.03 75.30±11.18
90.75± 7.19
88.77±9.31
88.94±5.97
96.68± 5.68
99.91±0.08
99.90±0.11
62.74± 19.59
87.47±4.81
88.08±5.22
89.90± 5.14
92.47±4.12
93.21±3.22
66.50± 8.44
71.64±18.83
75.52±9.44
99.26± 1.62
99.77±0.19
99.73±0.52
63.68± 4.97
76.27± 4.68
77.47±3.26
75.76± 3.74
84.42±2.22
85.21±1.47
55.48± 5.55
70.40±5.26
71.79±3.71
BT
lr = 0
lr = 382
74.38± 7.89 72.82±8.00
79.57± 8.28 78.96±9.21
80.01± 9.72 80.25±8.33
85.14± 8.41 87.80±7.00
99.84± 0.10 99.83±0.12
88.60± 3.92 90.26±2.66
94.38± 3.55 95.67±1.97
80.89± 8.04 80.39±8.07
97.98± 2.74 98.54±1.43
81.85± 4.44 81.95±4.68
86.75± 1.55 87.17±1.45
76.89± 5.19 76.94±5.42
Table 4.7: OA, AA, individual classification accuracies [%], and kappa statistic obtained using the probabilistic SVM classifier when
applied to the ROSIS University of Pavia hyperspectral data set by using 10 labeled samples per class (in total 90 samples) and
un = 700 unlabeled training samples. It is applied two active learning techniques (MS and BT), and the supervised case is also
reported. lr denotes the number of true labels available in Du (used to implement an optimal version of each sampling algorithm).
The standard deviations are also reported for each test.
4.4 Experimental results
69
Asphalt (6631)
Meadows (18649)
Gravel (2099)
Trees (3064)
Metal Sheets (1345)
Bare Soil (5029)
Bitumen (1330)
Self-Blocking Bricks (3682)
Shadow (947)
OA
AA
kappa
Probabilistic SVM classifier
MBT
nEQB
lr = 0
lr = 324
lr = 0
lr = 337
72.27± 3.13
70.68±3.72
70.16±8.34 70.01±9.05
63.53± 11.57 64.61±12.56 66.16±13.17 66.62±7.29
72.58± 12.90 75.14±9.40
80.82±9.35 80.05±9.60
92.31± 7.43
92.01±5.25
89.52±7.01 89.43±7.24
99.55± 0.33
99.64±0.34
99.86±0.10 99.86±0.10
77.89± 12.67 78.95±9.39 73.03±10.42 76.96±7.93
94.84± 1.39
95.56±1.67
90.33±3.92 90.79±3.22
74.95± 24.57 81.00±7.17
72.01±5.71 71.82±7.16
97.56± 2.68
99.11±2.16
99.90±0.01 99.88±0.14
72.90± 5.42
73.93±4.95
73.61±3.89 77.02±5.87
82.83± 2.43
84.08±1.46
82.42±1.98 82.82±2.03
66.60± 5.85
67.86±5.40
66.44±6.26 67.08±4.40
RS
lr = 0
lr = 557
61.14± 7.06
61.52±5.37
62.35± 12.01 65.95±12.93
70.97± 12.39 70.61±10.49
89.90± 5.20
85.15±7.60
99.71± 0.12
99.69±0.19
75.01± 14.21 73.05±23.65
92.93± 4.66
92.97±3.67
70.04± 12.33 72.23±13.42
99.31± 1.47
99.77±0.26
69.63± 5.25
70.88±5.20
80.15± 2.92
80.11±3.46
62.46± 5.57
63.70±5.70
Table 4.8: OA, AA, individual classification accuracies [%], and kappa statistic obtained using the probabilistic SVM classifier when
applied to the ROSIS University of Pavia hyperspectral data set by using 10 labeled samples per class (in total 90 samples) and
un = 700 unlabeled training samples. It is applied two active learning techniques (MBT and nEQB), and the random sampling case
(RS) is also reported. lr denotes the number of true labels available in Du (used to implement an optimal version of each sampling
algorithm). The standard deviations are also reported for each test.
Semi-Supervised Self-Learning for Hyperspectral Image Classification
70
4.5 Summary and future directions
Figure 4.9: OA (as a function of the number of unlabeled samples) obtained for the ROSIS Pavia
University data set using the MLR classifier with BT sampling by using 100 labeled samples per class
(in total 900 samples). Two cases are displayed: the one in which all unlabeled samples are estimated
by the proposed approach (i.e., lr = 0) and the optimal case, in which true labels are used whenever
possible (i.e., lr = ur ).
For illustrative purposes, Figs. 4.7 and 4.8 respectively show some of the classification maps obtained
by the MLR and probabilistic SVM classifiers for the ROSIS Pavia University dataset, which corresponds
to one of the 10 Monte-Carlo runs that were averaged in order to generate the classification scores reported
in Tables 4.5 to 4.8.
4.5
Summary and future directions
In this chapter, we have developed a new approach for semi-supervised classification of hyperspectral
images in which unlabeled samples are intelligently selected using a self-learning approach. Specifically,
we automatically select the most informative unlabeled training samples with the ultimate goal of
improving classification results obtained using randomly selected training samples. In our semisupervised context, the labels of the selected training samples are estimated by the classifier itself,
with the advantage that no extra cost is required for labeling the selected samples when compared to
classic (supervised) active learning. Our experimental results, conducted using two different classifiers:
MLR and probabilistic SVM, indicate that the proposed approach can greatly increase the classification
accuracies obtained in the supervised case through the incorporation of unlabeled samples which can
be obtained with very little cost and effort. The obtained results have been compared to the optimal
case in which true labels are used, and the differences observed when using estimated samples by our
proposed approach were always quite small. This is a good quantitative indicator of the good performance
achieved by our proposed approach, which has been illustrated using two hyperspectral scenes collected
by different instruments. In future work, we are planning on combining the proposed approach with
other probabilistic classifiers. We are also considering the use of expectation-maximization as a form of
71
Semi-Supervised Self-Learning for Hyperspectral Image Classification
self-learning [17].
Although in this chapter we focused our experiments on hyperspectral data, the proposed approach
can also be applied to other types of remote sensing data, such as multispectral data sets. In fact,
since the dimensionality of the considered hyperspectral data sets is quite high, the proposed approach
could greatly benefit from the use of feature extraction/selection methods prior to classification in order
to make the proposed less sensitive to the Hughes effect [4] and to the possibly very limited initial
availability of training samples. This research topic also deserves future attention. Another interesting
future research line is to adapt our proposed sample selection strategy (which is based on the selection of
individual pixels) to the selection and labeling of spatial sub-regions or boxes withing the image, which
could be beneficial in certain applications. Finally, another important research topic deserving future
attention is the inclusion of a cost associated to the labels generated by the proposed algorithm. This
may allow a better evaluation of the training samples actively selected by our proposed approach.
72
4.5 Summary and future directions
Supervised (50.61%)
MS (75.87%)
BT (76.23%)
MBT (68.66%)
nEQB (70.47%)
RS (63.51%)
Figure 4.4: Classification maps and OA (in the parentheses) obtained after applying the probabilistic
SVM classifier to the AVIRIS Indian Pines data set by using 10 labeled training samples and 750 unlabeled
samples, i.e., ln = 160, un = 750 and lr = 0.
73
74
ln = 135 (SVM classifier)
ln = 135 (MLR classifier)
Figure 4.6: OA (as a function of the number of unlabeled samples) obtained for the ROSIS Pavia University data set using the MLR
(right) and probabilistic SVM (left) classifier, respectively. Estimated labels were used in all the experiments, i.e., lr = 0.
ln = 90 (SVM classifier)
ln = 45 (SVM classifier)
ln = 90 (MLR classifier)
ln = 45 (MLR classifier)
Semi-Supervised Self-Learning for Hyperspectral Image Classification
4.5 Summary and future directions
Supervised (69.25%)
MS (82.63%)
BT (83.73%)
MBT (80.59%)
nEQB (77.33%)
RS (76.81%)
Figure 4.7: Classification maps and OA (in the parentheses) obtained after applying the MLR classifier
to the ROSIS Pavia University data set (in all cases, ln = 90 and lr = 0).
75
Semi-Supervised Self-Learning for Hyperspectral Image Classification
Supervised (63.68%)
MS (76.27%)
BT (81.85%)
MBT (72.90%)
nEQB (77.02%)
RS (69.63%)
Figure 4.8: Classification maps and OA (in the parentheses) obtained after applying the probabilistic
SVM classifier to the ROSIS Pavia University data set (in all cases, ln = 90 and lr = 0).
76
Chapter 5
A New Hybrid Strategy Combining
Semi-Supervised Classification and
Spectral Unmixing
5.1
Summary
Spectral unmixing and classification have been widely used in the recent literature to analyze remotely
sensed data. However, few strategies have combined these two approaches in the analysis of hyperspectral
data. In this chapter, we propose a new hybrid strategy for semi-supervised classification of hyperspectral
data which exploits spectral unmixing and classification concepts (already discussed in previous chapters)
in synergistic fashion. During the process, active learning techniques are used in order to select the most
informative unlabeled samples in the pool of candidates, thus reducing the computational cost of the
process by including only the most informative unlabeled samples. Here, we integrate a well-established
discriminative probabilistic classifier (the MLR) with different spectral unmixing chains, thus bridging
the gap between unmixing and classification. The effectiveness of the proposed method is evaluated
using two real hyperspectral images.9
5.2
Introduction
Spectral unmixing and classification are two active areas of research in hyperspectral data interpretation.
On the one hand, spectral unmixing is a fast growing area in which many algorithms have been recently
developed to retrieve pure spectral components (endmembers) and determine their abundance fractions
in mixed pixels, which dominate hyperspectral images [7]. On the other hand, supervised hyperspectral
image classification is a difficult task due to the unbalance between the high dimensionality of the data
9 Part of this chapter has been published in: I. Dopido, J. Li, A. Plaza and P. Gamba, Semi-Supervised Classification
of Urban Hyperspectral Data Using Spectral Unmixing Concepts, IEEE Urban Remote Sensing Event (JURSE 2013),
Sao Paulo, Brazil, 2013 and I. Dopido, J. Li, P. Gamba and A. Plaza, Semi-Supervised Classification of Hyperspectral
Data Using Spectral Unmixing Concepts, Tyrrhenian Workshop 2012 on Advances in Radar and Remote Sensing, Naples,
Italy, 2012. Also, we are currently working towards the preparation of a journal contribution based on this chapter, to be
submitted to the IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.
A New Hybrid Strategy Combining Semi-Supervised Classification and Spectral Unmixing
and the limited availability of labeled training samples in real analysis scenarios. While the collection of
labeled samples is generally difficult, expensive and time-consuming, unlabeled samples can be generated
in a much easier way [71]. As indicated in the previous chapter, this observation has fostered the idea of
adopting semi-supervised learning techniques in hyperspectral image classification.
In remote sensing image classification, it is usually difficult or expensive to get a sufficient number
of training samples to develop robust classifiers. Hurdles, i.e., the Hughes phenomenon, come out as
the data dimensionality increases. These difficulties have fostered the development of new algorithms.
Supervised classifiers, such as the SVM or MLR, excel in using labeled information, and exhibit stateof-the-art performance when dealing with hyperspectral problems [16, 103]. However, when limited
labeled information is available, the supervised case is troublesome as the probability distribution of the
image can not be properly derived and poor generalization is also a risk. Based on these observations,
a recent trend, namely, semi-supervised learning, which integrates labeled and unlabeled samples to
learn the classifiers, has been widely studied in the literature for hyperspectral classification problem
[82, 83, 84, 86, 104, 105]. This approach was extensively discussed in the previous chapter of the thesis.
As it was mentioned in the previous chapter, semi-supervised learning, first known as co-training
[74, 106], has evolved into more complex generative models [73, 75], self-learning models [76, 77], multiview learning models [78, 79], TSVMs [40, 41], and graph-based methods [80, 107]. We refer to [17]
for a literature review. Most semi-supervised learning algorithms use some type of regularization which
encourages the fact that “similar” features belong to the same class. The effect of this regularization is
to push the boundaries between classes towards regions of low data density [81], where a rather usual
way of building such regularizer is to associate the vertices of a graph to the complete set of samples
and then build the regularizer depending on variables defined on the vertices. This trend has been
successfully adopted in several remote sensing image classification studies [44, 58, 82, 84, 86, 108]. The
aforementioned semi-supervised algorithms generally assume that a limited number of labeled samples
are available a priori, and then enlarge the training set using unlabeled samples, thus allowing these
approaches to address ill-posed problems. However, in order for this strategy to be successful, several
requirements need to be satisfied. First and foremost, the new (unlabeled) samples should be generated
without significant cost/effort. Second, the number of unlabeled samples required in order for the
semi-supervised classifier to perform properly should not be too high in order to avoid increasing the
computational complexity of the classification stage. In other words, as the number of unlabeled samples
increases, it may be unbearable for the classifier to properly exploit all the available training samples
due to computational issues. Further, if the unlabeled samples are not properly selected, these may
confuse the classifier, thus introducing significant divergence or even reducing the classification accuracy
obtained with the initial set of labeled samples. In order to address these issues, it is very important
that the most highly informative unlabeled samples are identified in computationally efficient fashion,
so that significant improvements in classification performance can be observed without the need to use
a high number of unlabeled samples.
In this chapter, we develop a new approach to perform semi-supervised classification of hyperspectral
images by exploiting the information retrieved with spectral unmixing. Our main goal is to synergize
two of the most widely used approaches to interpret hyperspectral data into a unified framework which
uses active learning techniques for automatically selecting unlabeled samples in semi-supervised fashion.
78
5.3 Proposed approach
Specifically, we use active learning to select highly informative unlabeled training samples in order to
enlarge the initial (possibly very limited) set of labeled samples and perform semi-supervised classification
based on the information provided by a well-established discriminative classifier (MLR [30]) and different
spectral unmixing chains [24, 34].
5.3
Proposed approach
The proposed approach consists of three main ingredients: semi-supervised learning (already described
in the previous chapter), spectral unmixing (for which we specifically describe the unmixing chains
considered in this section), and active learning techniques, which are specifically described before
introducing our method.
5.3.1
Considered spectral unmixing chains
Several spectral unmixing chains were described in chapters 2 and 3 of this thesis. These chains are
based on the well-known linear mixture model [7] presented in subsection 3.3.1 of this document. In
this section, we outline the specific spectral unmixing chains that will be used for experiments in this
chapter, where some of these developments are described in more detail in [24]:
1. FCLSU-based unmixing (hereinafter, strategy 1), which first assumes that the labeled samples are
made up of spectrally pure constituents (endmembers) and then calculates their abundances by
means of the FCLSU method and provides a set of fractional abundance maps (one per labeled
class) as shown by Fig. 5.1.
2. MTMF-based unmixing (hereinafter, strategy 2), which also assumes that the labeled samples are
made up of spectrally pure constituents (endmembers) but now calculates their abundances by
means of MTMF [33] method, thus providing a set of fractional abundance maps (one per labeled
class) as shown by Fig. 5.2.
3. Unsupervised clustering followed by FCLSU (hereinafter, strategy 3), which is intended to solve
the problems highlighted by endmember extraction algorithms which are sensitive to outliers and
pixels with extreme values of reflectance. By using an unsupervised clustering method such as the
k-means on the original image, the endmembers extracted (from class centers) are expected to be
more spatially significant. Then, FLCSU is conducted using the resulting endmembers as shown
by Fig. 5.3.
4. Unsupervised clustering followed by MTMF (hereinafter, strategy 4), which is exactly the same as
the previous strategy (strategy 3) but this time MTMF method is conducted instead of FCLSU,
using the resulting endmembers after k-means clustering. This chain is described in Fig. 3.1,
subsection 3.3.2.
79
A New Hybrid Strategy Combining Semi-Supervised Classification and Spectral Unmixing
Figure 5.1: Flowchart of the unmixing-based chain designated as strategy 1.
Figure 5.2: Flowchart of the unmixing-based chain designated as strategy 2.
80
5.3 Proposed approach
Figure 5.3: Flowchart of the unmixing-based designated as strategy 3.
5.3.2
Proposed hybrid strategy
Let x ≡ (x1 , . . . , xn ) ∈ Rd×n be a set of d-dimensional feature vectors, and let y ≡ (y1 , . . . , yn ) a set of
labels. With this notation in mind, the proposed hybrid classification method can be simply described
as follows:
pbi (yi = k|xi ) = αf1 (yi = k|xi ) + (1 − α)f2 (yi = k|xi ),
(5.1)
where pbi (·) is the joint estimate for the kth class, i.e., yi = k, obtained by the classification and unmixing
methods given observation xi , where pbi (·) will serve as the indicator, i.e., probability, for the semisupervised active learning. In this work, function f1 (·) is the probability obtained by the classification
algorithm, i.e., the MLR classifier; and function f2 (·) is the abundance fraction obtained by any of
the spectral unmixing chains presented in subsection 5.3.1. The balance between the classification
probabilities and the abundance fractions is controlled by parameter α, where 0 ≤ α ≤ 1. As shown
in (5.1), if α = 1, only classification probabilities are considered by the proposed strategy, which
leads to the semi-supervised learning strategy presented in previous chapter. On the other hand, if
α = 0, only spectral unmixing is taken into account for the proposed strategy. Therefore, by tuning
α to a value ranging between 0 and 1, we can adjust the impact between classification and unmixing
methods. Moreover, by introducing parameter α, the proposed hybrid strategy takes advantage from
both classification and unmixing such that the new unlabeled samples selected are more informative in
comparison with those samples selected only from classification or unmixing methods. In the following
subsection, we explain in more details how the unlabeled samples are selected using active learning
concepts.
81
A New Hybrid Strategy Combining Semi-Supervised Classification and Spectral Unmixing
5.3.3
Active learning
In order to train the hybrid classifier described in the previous subsection, active learning techniques
are used in order to improve the selection of unlabeled samples for semi-supervised learning. In the
process, the candidate set for the active learning process (based on the available labeled and unlabeled
samples) is inferred using spatial information (specifically, by applying a first-order spatial neighborhood
on available samples) so that high confidence can be expected in the class labels of the obtained candidate
set. This idea, which was presented in the previous chapter devoted to self-learning, is similar to human
interaction in supervised active learning, where the class labels are known and given by an expert. In a
second step, we run active learning to select the most informative samples from the candidate set. This
is similar to the machine interaction level in supervised active learning, where in both cases the goal is to
find the samples with higher uncertainty. Due to the fact that we use a discriminative classifier (MLR)
and spectral unmixing techniques, active learning algorithms which focus on the boundaries between
the classes (which are often dominated by mixed pixels) are preferred. In this way, we can combine the
properties of the probabilistic MLR classifier and spectral unmixing concepts to find the most suitable
(complex) unlabeled samples for improving the classification results through the selected active learning
strategy. It should be noted that many active learning techniques are available in the literature [90].
Several of these techniques were inter-compared in the previous chapter. In this chapter, we use only the
BT method [21] to evaluate the proposed approach, this method is described in detail in the previous
chapter.
5.4
Experimental results
In this section, we evaluate the new methodology presented in this chapter using two different
hyperspectral images: AVIRIS Indian Pines described in subsection 2.4.1.1 [109], and ROSIS Pavia
University described in subsection 3.4.2 [110]. In our experiments with the MLR classifier, we apply the
Gaussian RBF kernel to a normalized version of the considered hyperspectral data sets. In all cases, the
reported figures of OA, AA, kappa statistic, and class individual accuracies are obtained by averaging the
results obtained after conducting 10 independent Monte Carlo runs with respect to the labeled training
set from the ground-truth data, where the remaining samples are used for validation purposes.
In order to illustrate the good performance of the proposed approach, we use very small labeled
training sets on purpose. As a result, the main difficulties that our proposed approach should circumvent
can be summarized as follows. First and foremost, it is very difficult for supervised algorithms to
provide good classification results as very little information is available about the class distribution.
Poor generalization is also a risk when estimating class boundaries in scenarios dominated by limited
training samples. Since our approach is semi-supervised, we take advantage of unlabeled samples in order
to improve classification accuracy. However, if the number of labeled samples is very small, increasing the
number of unlabeled samples could bias the learning process. This effect is explored in the remainder
of this section, which is organized as follows. In subsection 5.4.1 we study the impact of parameter
α (which balances unmixing and classification in our proposed hybrid strategy) in the final obtained
results. Subsections 5.4.2 and 5.4.3 respectively describe the experiments conducted with the AVIRIS
82
5.4 Experimental results
Indian Pines and ROSIS Pavia University scenes. In all cases, the results obtained by the supervised
version of the considered classifier are also reported for comparative purposes.
5.4.1
Balance between classification and unmixing
In this set of experiments, we evaluate the impact of parameter α controlling the relative weight of
classification and unmixing in the proposed hybrid classifier. Here, the semi-supervised classifier is
trained using only 5 labeled samples per class, using different spectral unmixing chains and testing the
following values for parameter α: {1, 0.75, 0.5, 0.25, 0}. In all cases, we execute 300 iterations to actively
select 300 unlabeled samples.
After an extensive set of experiments with the two considered hyperspectral scenes, Table 5.1 (AVIRIS
Indian Pines) and 5.2 (ROSIS Pavia University) reveal that a good compromise value for parameter α
is 0.75, which means that classification generally needs more weight than unmixing in order to obtain
the best analysis results from our proposed hybrid classifier. This is expected, since the information
provided by classification is indeed very important but can be refined by including unmixing information
also in the process. The results in this experiment confirm our introspection that the joint exploitation
of classification and unmixing provides advantages over the use of either technique alone, at least in the
framework of semi-supervised classification using limited training samples.
5.4.2
Results for AVIRIS Indian Pines
In this section, the proposed approach is evaluated using the AVIRIS Indian Pines data set described in
Fig. 2.3. We consider different numbers of labeled samples per class: 5, 10 and 15. Table 5.3 shows the
OA, AA and the kappa statistic obtained by the supervised strategy based on the MLR classifier and
by the proposed semi-supervised approach, using two different strategies for active learning: RS and BT
(both executed using 300 iterations to actively select 300 unlabeled samples). Table 5.3 also reports the
classification results obtained by the hybrid classification-unmixing approach with the MLR classifier
and four spectral unmixing chains (with α = 0.75). It should be noted that the selected classification
scenario represents a very challenging one. For instance, when 5 labeled samples are used per class,
only 80 labeled samples in total are assumed to be available as the initial condition for the considered
classifier, which is much lower than the number of spectral bands available in the scene.
As we can observe in Table 5.3, the inclusion of unlabeled samples significantly improved the
classification results in all cases. If we compare the supervised case with the semi-supervised techniques,
we can observe that the unlabeled samples always significantly help to improve the accuracy results.
When the proposed strategy (combining classification and spectral unmixing) was used, a significant
improvement is observed over the supervised case. As shown by Table 5.3, the classification accuracies
increase as the number of labeled training samples increases. This is expected, since the uncertainty of
the classifier boundaries decreases as more labeled samples are used in the supervised case.
In Fig. 5.4, we evaluated the impact of the number of unlabeled samples on the classification
performance achieved by the considered probabilistic classifiers. Specifically, we plot the OAs in
classification accuracy obtained by the supervised MLR (trained using only 5, 10 and 15 labeled samples
per class) and by the proposed approach (based on the same classifier plus spectral unmixing) using the
83
0.00
52.01
58.90
52.64
60.71
84
ROSIS Pavia University
Number of labeled samples per class (l = 5)
Parameter (ite.)
1.0
0.75
0.50
0.25
Strategy 1 (300) 75.48 78.05 68.20 72.39
Strategy 2 (300) 75.48 79.39 78.80 77.76
Strategy 3 (300) 75.48 79.53 65.10 66.88
Strategy 4 (300) 75.48 79.53 73.58 71.08
0.00
64.41
71.42
64.82
69.11
Table 5.2: OA [%] obtained or different values of parameter α in the analysis of the ROSIS Pavia University scene with 5 labeled
samples per class. The four considered spectral unmixing strategies are compared. The total number of iterations are given in the
parentheses.
AVIRIS Indian Pines
Number of labeled samples per class (l = 5)
Parameter (ite.)
1.0
0.75
0.50
0.25
Strategy 1 (300) 65.25 69.10 55.60 55.47
Strategy 2 (300) 65.25 68.89 66.45 68.76
Strategy 3 (300) 65.25 69.83 51.90 59.23
Strategy 4 (300) 65.25 70.45 66.34 60.50
Table 5.1: OA [%] obtained for different values of parameter α in the analysis of the AVIRIS Indian Pines hyperspectral data set with
5 labeled samples per class. The four considered spectral unmixing strategies are compared. The total number of iterations are given
in the parentheses.
A New Hybrid Strategy Combining Semi-Supervised Classification and Spectral Unmixing
BT
68.51±5.16
75.66±2.33
64.40±5.56
BT
70.46±3.16
79.47±1.81
66.65±3.38
Supervised
60.12±3.08
71.74±1.54
55.43±3.20
Supervised
66.20±1.99
77.39±1.06
62.09±2.13
OA
AA
kappa
BT
65.25±5.76
69.98±2.91
60.65±6.05
OA
AA
kappa
OA
AA
kappa
Supervised
51.78±2.51
63.82±2.69
46.26±2.74
5 labeled samples per class
RS
Strategy 1
Strategy 2
60.03±2.91 69.10±2.66 68.89±4.13
66.31±2.93 72.73±2.24 70.81±2.24
54.81±3.19 64.82±2.94 64.54±4.31
10 labeled samples per class
RS
Strategy 1
Strategy 2
65.59±2.94 70.91±4.87 74.91±2.10
73.49±1.83 77.13±2.70 78.21±2.36
61.17±3.16 67.08±5.40 70.69±2.40
15 labeled samples per class
RS
Strategy 1
Strategy 2
70.29±1.93 73.57±3.83 74.04±2.26
78.76±1.09 80.86±1.85 79.76±1.26
66.41±2.13 70.16±4.26 70.51±2.49
85
Strategy 3
76.89±2.56
81.90±1.46
73.71±2.85
Strategy 3
75.05±1.68
78.44±1.96
71.60±1.91
Strategy 3
69.83±1.78
71.37±2.49
65.58±2.08
Strategy 4
76.68±2.04
81.94±1.63
73.53±2.22
Strategy 4
75.29±2.40
79.05±2.00
71.84±2.75
Strategy 4
70.45±1.78
72.32±2.20
66.25±2.16
Table 5.3: OA, AA [%], and kappa statistic obtained using different classifiers when applied to the AVIRIS Indian Pines hyperspectral
data set. The standard deviation is also reported in each case.
5.4 Experimental results
86
50
0
55
60
65
50
0
70
75
(a) 5 labeled samples
100
150
200
The number of unlabeled samples
50
60
0
100
150
200
The number of unlabeled samples
300
65
70
75
(c) 15 labeled samples
BT
RS
Strategy 1
Strategy 2
Strategy 3
Strategy 4
250
Overall Accuracy (%)
250
50
100
150
200
The number of unlabeled samples
300
(b) 10 labeled samples
BT
RS
Strategy 1
Strategy 2
Strategy 3
Strategy 4
250
300
Figure 5.4: OA (as a function of the number of unlabeled samples) obtained for the AVIRIS Indian Pines data set by different classifiers.
BT is the semi-supervised classifier where unlabeled samples are selected using breaking ties. RS is the semi-supervised classifier where
unlabeled samples are selected using random sampling. Finally, Strategy 1 to Strategy 4 denote the semi-supervised hybrid classifier
integrating classification and spectral unmixing (with α = 0.75), where unlabeled samples are selected using BT.
Overall Accuracy (%)
70
BT
RS
Strategy 1
Strategy 2
Strategy 3
Strategy 4
Overall Accuracy (%)
75
A New Hybrid Strategy Combining Semi-Supervised Classification and Spectral Unmixing
87
Strategy 2 (74.91)
Supervised (60.12%)
Strategy 3 (75.05)
BT (68.51%)
Strategy 4 (75.29)
RS (65.59%)
Figure 5.5: Classification maps and OAs (in the parentheses) obtained after applying different classifiers to the AVIRIS Indian Pines
data set. In all cases the number of labeled samples was 10, and the number of unlabeled samples (used in the semi-supervised
strategies: BT, RS, Strategy 1, Strategy 2, Strategy 3 and Strategy 4) was set to 300.
Strategy 1 (70.91)
Ground-truth
5.4 Experimental results
A New Hybrid Strategy Combining Semi-Supervised Classification and Spectral Unmixing
four considered strategies for including unmixing information, as a function of the number of unlabeled
samples. For the active learning part, we considered two strategies: RS and BT. Again we can observe
in Fig. 5.4 how the unlabeled samples help to improve the accuracy of the obtained results.
For illustrative purposes, Fig. 5.5 shows some of the classification maps obtained for the AVIRIS
Indian Pines scene. These classification maps correspond to one of the 10 Monte-Carlo runs that were
averaged in order to generate the classification scores reported in Table 5.3. The advantages obtained by
adopting a semi-supervised learning approach which combines classification and unmixing concepts can
be clearly appreciated in the classification maps displayed in Fig. 5.5, which also reports the classification
OAs obtained for each method in the parentheses.
5.4.3
Results for ROSIS Pavia University
The second data set used in this experiment is ROSIS Pavia University described in Fig. 3.5. Table 5.4
shows the OA, AA and the kappa statistic obtained by the supervised strategy (trained with 5, 10 and
15 labeled samples per class) and by the proposed approach (based on the MLR classifier and different
spectral unmixing approaches), using two different strategies for active learning: RS and the BT (both
executed using 300 iterations to actively select 300 unlabeled samples).
Several conclusions can be obtained from the results reported on Table 5.4. The use of unlabeled
samples provides advantages with respect to the supervised algorithm alone. In all cases, the proposed
unmixing strategies significantly outperform the corresponding supervised algorithm, and the increase
in performance is more relevant as the number of unlabeled samples increases. These unlabeled samples
are automatically selected by the proposed approach, and represent no cost in terms of data collection
or human supervision. In Fig. 5.6 we can observe how the accuracy results improve as the number
of unlabeled samples increases. For instance, in the case with 10 labeled samples per class (see Table
5.4), the supervised approach obtained an OA of 69.25%. When the proposed strategy (combining
classification and spectral unmixing) was used, the classification accuracies improved to an OA of 83.93%
(Strategy 4) which represents a significant improvement over the supervised case.
For illustrative purposes, Fig. 5.7 shows some of the classification maps obtained in our experiments.
These maps correspond to one of the 10 Monte-Carlo runs that were averaged in order to generate
the classification scores reported in Table 5.4. The advantages obtained by adopting a semi-supervised
learning approach, which combines classification and unmixing concepts, can be clearly appreciated in
the classification maps displayed in Fig. 5.7, which also report the classification OAs obtained for each
method in the parentheses.
5.5
Summary and future directions
In this chapter, we have presented a new hybrid technique which incorporates the information provided
by spectral unmixing concepts in the classification process. In the validation of the method, we have
considered four different unmixing-based feature extraction chains and used a limited number of training
samples. Unlabeled samples are selected using active learning techniques in order to select the most
informative unlabeled samples. The effectiveness of the proposed approach has been illustrated using
88
89
BT
80.70±3.07
82.81±1.55
74.87±3.75
BT
81.00±5.75
83.35±2.82
75.44±3.80
Supervised
72.34±2.22
80.01±2.29
65.21±2.23
OA
AA
kappa
BT
75.48±4.63
79.33±2.62
68.30±5.38
Supervised
69.25±3.75
78.42±1.75
61.69±4.01
Supervised
63.56±4.63
72.93±2.08
54.78±4.45
OA
AA
kappa
OA
AA
kappa
Number of labeled samples per class
l=5
RS
Strategy 1
Strategy 2
71.80±2.16 78.05±3.20 79.09±1.94
75.45±1.86 79.62±2.45 79.68±3.63
63.50±2.12 71.54±3.69 72.69±2.73
Number of labeled samples per class
l = 10
RS
Strategy 1
Strategy 2
75.72±2.19 82.86±2.30 83.31±2.04
80.52±1.52 83.48±1.40 83.51±1.95
68.95±2.54 77.68±2.78 78.06±2.55
Number of labeled samples per class
l = 15
RS
Strategy 1
Strategy 2
76.88±2.08 83.54±2.47 83.47±2.18
81.61±2.46 83.62±2.43 83.60±2.34
70.39±2.09 78.49±2.85 78.56±2.68
Strategy 3
84.83±3.21
85.54±2.08
80.15±3.84
Strategy 3
84.14±1.97
84.48±1.04
79.23±2.37
Strategy 3
79.53±2.12
79.97±2.43
73.05±2.63
Strategy 4
85.25±2.59
85.48±1.22
80.63±3.11
Strategy 4
83.93±1.73
84.83±1.53
78.97±2.07
Strategy 4
79.10±2.34
80.29±2.67
72.48±3.03
Table 5.4: OA, AA [%], and kappa statistic (in the parentheses) obtained using the MLR classifier when applied to the ROSIS Pavia
University hyperspectral data set. The total number of labeled samples in each ground-truth class is given in the parentheses.
5.5 Summary and future directions
90
50
0
75
80
85
(a) 5 labeled samples
100
150
200
The number of unlabeled samples
50
Overall Accuracy (%)
0
100
150
200
The number of unlabeled samples
300
70
75
80
85
(c) 15 labeled samples
BT
RS
Strategy 1
Strategy 2
Strategy 3
Strategy 4
250
Overall Accuracy (%)
250
50
100
150
200
The number of unlabeled samples
300
(b) 10 labeled samples
BT
RS
Strategy 1
Strategy 2
Strategy 3
Strategy 4
250
300
Figure 5.6: OA (as a function of the number of unlabeled samples) obtained for the ROSIS Pavia University data set by different
classifiers. BT is the semi-supervised classifier where unlabeled samples are selected using breaking ties. RS is the semi-supervised
classifier where unlabeled samples are selected using random sampling. Finally, Strategy 1 to Strategy 4 denote the semi-supervised
hybrid classifier integrating classification and spectral unmixing (with α = 0.75), where unlabeled samples are selected using BT.
0
65
70
Overall Accuracy (%)
75
BT
RS
Strategy 1
Strategy 2
Strategy 3
Strategy 4
A New Hybrid Strategy Combining Semi-Supervised Classification and Spectral Unmixing
91
Strategy 2 (77.92)
Supervised (69.25%)
Strategy 3 (84.14)
BT (80.70%)
Strategy 4 (83.93)
RS (75.72%)
Figure 5.7: Classification maps and OAs (in the parentheses) obtained after applying different classifiers to the ROSIS Pavia University
data set. In all cases the number of labeled samples was 10, and the number of unlabeled samples (used in the semi-supervised strategies:
BT, RS, Strategy 1, Strategy 2, Strategy 3 and Strategy 4) was set to 300.
Strategy 1 (82.86)
Ground-truth
5.5 Summary and future directions
A New Hybrid Strategy Combining Semi-Supervised Classification and Spectral Unmixing
two representative hyperspectral images collected by the AVIRIS and ROSIS sensors over a variety of test
sites. The experimental results obtained indicate that the combination of spectral unmixing and semisupervised classification leads to a powerful new framework for hyperspectral data interpretation. In the
future, we will explore additional strategies to generate unlabeled samples through active learning and
also consider additional probabilistic classifiers that can be easily integrated in the proposed framework.
92
Chapter 6
Conclusions and Future Research
Lines
Spectral unmixing and classification of hyperspectral data have been the main topics addressed in this
thesis work. These concepts have rarely been studied together, although they exhibit complementary
properties that can offer several advantages when they are applied to hyperspectral image analysis.
• On the one hand, classification is a challenging topic. It has been conducted typically using
supervised and semi-supervised techniques. But this process has encountered several problems due
to the structure of the hyperspectral data and the limited number of available training samples.
• On the other hand, spectral unmixing allows one to analyze the properties of each pixel including
additional information about the characterization of mixed pixels in hyperspectral data. This
information could be effective for the classification process since it provides a kind of soft
information that can properly complement the hard output generally provided by classification
techniques.
These reasons have motivated us to focus this thesis work on the development of new efficient
hyperspectral techniques able to combine concepts of spectral unmixing and classification. As a result, a
main goal of this thesis is to include detailed information about mixed pixels in the classification process.
For this purpose, we have exhaustively studied several processing techniques for joint hyperspectral
unmixing and classification.
One of the most challenging aspects addressed in this thesis is the high dimensional nature of
hyperspectral data; the first two chapters of this thesis focus on how to provide the most suitable
input features for the classification process and, particularly, on the possible role of spectral unmixing
techniques in this task. In our proposal, the input of the classifier has been replaced by the abundance
maps derived by different spectral unmixing chains in order to include additional information about
mixed pixels. At the same time, the computational cost of the classification is also significantly reduced
because the number of pure materials is usually lower than the number of spectral bands in the original
hyperspectral data. We conclude that there are several potential advantages resulting from the use
of abundance fractions as input features for classification purposes: in the first place, they supply
Conclusions and Future Research Lines
information about mixed pixels in hyperspectral data; in the second place, each abundance map can
be physically explained as the proportion of each pure material in the data, and in the third place, the
use of abundance fractions as input features does not penalize very small classes. Several experiments
have been conducted studying different techniques to compute the abundance maps. The effectiveness
of such strategy could be appreciated after including spectral unmixing concepts prior to classification
fo hyperspectral data.
Another important aspect of the thesis is how to increase the set of training samples for semisupervised classification. This is a very important task, due to the limited availability of training samples
that introduces many problems in supervised classification. Active learning concepts have been used to
develop new self-learning strategies in which the classifier itself selects the most useful unlabeled samples
without the need for human interaction. Our semi-supervised approach also includes spatial information
as a criterion to select the new samples. In order to retain informative samples, several active learning
approaches have been used. In this work, we have used two different probabilistic classifiers to test the
presented approach: MLR and a probabilistic SVM. The proposed framework avoids the need to have a
large number of training samples in advance, while at the same time it allows for the intelligent generation
of unlabeled samples. The effectiveness of the presented framework is illustrated using real hyperspectral
datasets. The results obtained reveal that the proposed approach can provide good classification results
with very limited labeled samples.
A final contribution of the present thesis work is the joint consideration of techniques for spectral
unmixing and classification. This has been done by injecting spectral unmixing information in the semisupervised classification process based on self-learning concepts, that we also developed as part of this
thesis work. In this case, it is important to define what is the relative weight given to unmixing with
regards to classification and vice-versa. Several experiments have been performed in order to analyze
this issue. Our conclusion is that there are several potential advantages of jointly considering spectral
unmixing and classification, which are apparent in the classification accuracies obtained by the proposed
semi-supervised hybrid framework.
Summarizing, the innovative contributions presented in this thesis work are related with the joint
exploitation of classification and spectral unmixing concepts, and with the intelligent generation of
unlabeled samples for semi-supervised learning (called self-learning in this thesis work). Several strategies
have been developed for the combination of spectral unmixing and classification under different scenarios
(unmixing prior to classification, joint unmixing and classification based on self-learning, etc.). To
the best of our knowledge, this study represents one of the first efforts in the literature in order to
synergistically exploit two analysis techniques (unmixing and classification) that have been traditionally
exploited independently in hyperspectral image analysis. In this regard, the connections and possible
bridges between both techniques represent another unique contribution of this thesis.
As future work, we are planning on developing computationally efficient implementations of the new
techniques developed in this thesis using high performance computing architectures, such as clusters
of computers (possibly, with specialized hardware accelerators such as graphics processing units). We
are also planning on testing the presented developments on large-scale data repositories, in order to
facilitate the processing of larger volumes of data than those reported in this work. In this case, domain
adaptation techniques will be certainly needed. Although the results presented in this thesis are focused
94
on a few hyperspectral scenes only (due to the reliable ground-truth and reference information available
for those scenes), the extrapolation of these techniques to larger data collections will allow a more detailed
assessment of the requirements and benefits of applying the presented approaches in practical scenarios.
95
Conclusions and Future Research Lines
96
Apendix A
Publications
The results of this thesis work have been published in several international journal papers, book chapters
and peer-reviewed international conference papers. The candidate is the first author of 3 JCR journal
papers, and 11 peer-reviewed conference papers directly related to this thesis work. The candidate has
been a pre-doctoral researcher in the Hyperspectral Computing Laboratory (HyperComp), Department of
Technology of Computers and Communications, University of Extremadura, Spain. Below, we provide a
description of the publications achieved by the candidate providing also a short description of the journal
or workshop where they were presented.
A.1
International journal papers.
1. I. Dopido, M. Zortea, A. Villa, A. Plaza and P. Gamba. Unmixing Prior to Supervised Classification
of Remotely Sensed Hyperspectral Images. IEEE Geoscience and Remote Sensing Letters, vol. 8,
no. 4, pp. 760-764, July 2011 [JCR(2011)=1.560].
This paper was published in the journal IEEE Geoscience and Remote Sensing Letters, which is
one of the main journals of the remote sensing category of JCR. It is also in the second quarter of
the electrical and electronic engineering category of JCR. The paper explores the use of spectral
unmixing for feature extraction prior to classification of hyperspectral data, and constitutes the
basis of the second chapter of this thesis. This paper was selected as one of the finalists for the
Best Paper Award of the IEEE Geoscience and Remote Sensing Letters in 2011.
2. I. Dopido, A. Villa, A. Plaza and P. Gamba. A Quantitative and Comparative Assessment of
Unmixing-Based Feature Extraction Techniques for Hyperspectral Image Classification. IEEE
Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 5, no. 2, pp.
421-435, April 2012 [JCR(2012)=2.874].
This paper was published in the journal IEEE Journal of Selected Topics in Applied Earth
Observations and Remote Sensing, which is a very important journal in the first quarter of
the remote sensing and electrical and electronic engineering areas of JCR. This paper provides
an experimental comparison of different unmixing chains prior to supervised classification of
Publications
hyperspectral data, and constitutes the basis for the third chapter of the thesis.
3. I. Dopido, J. Li, P. R. Marpu, A. Plaza, J. M. Bioucas-Dias and J. A. Benediktsson. SemiSupervised Self-Learning for Hyperspectral Image Classification. IEEE Transactions on Geoscience
and Remote Sensing, vol. 51, no. 7, pp. 4032-4044, July 2013 [JCR(2012)=3.467].
This paper was published in the IEEE Transactions on Geoscience and Remote Sensing, which is
a top scholarly journal in the field of remote sensing. This paper develops a new semi-supervised
self-learning framework for hyperspectral classification which constitutes the basis of the fourth
chapter presented in this thesis.
A.2
Peer-reviewed international conference papers.
1. I. Dopido, P. Gamba and A. Plaza. Spectral Unmixing-Based Post-Processing for Hyperspectral
Image Classification. This work was presented as an oral contribution in the IEEE Workshop on
Hyperspectral Image and Signal Processing (WHISPERS) held in Gainesville, Florida, in 2013.
WHISPERS is one of the most important international workshop specialized in hyperspectral
remote sensing. This paper explores the possible use of spectral unmixing as a post-processing
strategy to improve the classification results provided by supervised and semi-supervised techniques
for hyperspectral image classification.
2. I. Dopido, J. Li, A. Plaza and P. Gamba. Semi-Supervised Classification of Urban Hyperspectral
Data using spectral unmixing concepts. This work was presented as a poster in the IEEE Urban
Remote Sensing Joint Event (JURSE) held in Sao Paulo, Brazil, in 2013. JURSE is one of the
most important international workshops that specialize in urban hyperspectral remote sensing.
The paper explores the application of the semi-supervised hybrid strategy described in the fifth
chapter of the thesis to urban hyperspectral data.
3. I. Dopido, J. Li, P. Gamba and A. Plaza. Semi-Supervised Classification of Hyperspectral Data
Using Spectral Unmixing Concepts. This paper was presented in the Tyrrhenian Workshop 2012
on Advances in Radar and Remote Sensing, held in Naples, Italy, in 2012. This paper presents the
semi-supervised hybrid framework for joint unmixing and classification that constitutes the basis
of the fifth chapter of the present thesis work.
4. I. Dopido, J. Li, J. Bioucas-Dias and A. Plaza. A New Semi-Supervised Approach for Hyperspectral
Image Classification with Different Active Learning Strategies. This work was presented as an oral
contribution in the IEEE Workshop on Hyperspectral Image and Signal Processing (WHISPERS)
held in Shanghai, China, in 2012. The paper presents a preliminary version of the methodology
presented in the fourth chapter of the thesis, focusing on the role of different active learning
techniques in the selection of informative unlabeled samples for semi-supervised self-learning.
5. I. Dopido, J. Li, A. Plaza and J. Bioucas-Dias. Semi-Supervised Active Learning for Urban
Hyperspectral Image Classification. This work was presented as an oral presentation in the IEEE
International Geoscience and Remote Sensing Symposium (IGARSS) held in Munich, Germany, in
98
A.2 Peer-reviewed international conference papers.
2012. This is the most important international workshop in the remote sensing field. The paper
described the concept of semi-supervised active learning explored in the fourth chapter of the thesis,
and particularly its application to urban data as the paper was invited in a special session devoted
to this topic.
6. I. Dopido, J. Li and A. Plaza. Semi-Supervised Active Learning Approach for Hyperspectral Image
Classification: Application to Multinomial Logistic Regression and Support Vector Machines. This
work was presented as an oral contribution in the SPIE Optics and Photonics conference, which
is a very important event held yearly in San Diego, USA. The paper explores the impact of using
different probabilistic classifiers in the design of the semi-supervised self-learning method presented
in the fourth chapter of this thesis.
7. I. Dopido, A. Villa, A. Plaza and P. Gamba. A Comparative Assessment of Several Processing
Chains for Hyperspectral Image Classification: What Features to Use? This work was presented
as an oral presentation in the IEEE Workshop on Hyperspectral Image and Signal Processing:
Evolution in Remote Sensing (WHISPERS) held in Lisbon, Portugal, in 2011. The candidate was
a member of the organizing committee of this important international workshop in the Lisbon
edition. The paper explores the use of different spectral unmixing chains prior to classification of
remotely sensed data using supervised techniques, which is presented in detail in the third chapter
of the thesis.
8. I. Dopido and A. Plaza. Unmixing Prior to Supervised Classification of Urban Hyperspectral
Images. This work was presented as an oral contribution in the IEEE Urban Remote Sensing Joint
Event (JURSE) held in Munich, Germany, in 2011. The paper explores the use of unmixing prior
to supervised classification in the context of urban areas, as described in the second chapter of the
thesis.
9. I. Dopido, A. Villa and A. Plaza. Unsupervised Clustering and Spectral Unmixing for Feature
Extraction Prior to Supervised Classification of Hyperspectral Images. This work was presented
as an oral contribution in the SPIE Optics and Photonics conference, which is a very important
event held yearly in San Diego, USA. The paper describes the use of unsupervised clustering
as a mechanism to refine the development of unmixing-based chains for feature extraction prior
to supervised classification of hyperspectral data. This technique is one of the unmixing chains
compared in the third chapter of the present thesis.
10. M. Rojas, I. Dopido, A. Plaza and P. Gamba. Comparison of Support Vector Machine-Based
Processing Chains for Hyperspectral Image Classification. This work was presented as an oral
contribution in the SPIE Optics and Photonics conference, which is a very important event held
yearly in San Diego, USA. The paper explores the use of different feature extraction approaches
prior to supervised classification of hyperspectral scenes using the SVM classifier. The methods
described in this contribution are compared in the second chapter of this thesis with the newly
developed unmixing-based feature extraction methods presented in the same chapter.
99
Publications
11. A. Plaza, J. Plaza, I. Dopido, G. Martin, M. D. Iordache and S. Sanchez. New Hyperspectral
Unmixing Techniques in the Framework of the Earth Observation Optical Data Calibration
and Information Extraction (EODIX) Project. This paper, presented as a poster in the 3rd
International Symposium on Recent Advances in Quantitative Remote Sensing (RAQS) held in
Valencia, Spain, in 2010, summarizes some of the advances in supervised classification using spatial
and spectral information developed in the framework of the HYPERCOMP/EODIX project which
has partially supported the thesis work of the candidate.
100
Bibliography
[1] A. Plaza, J. A. Benediktsson, J. W. Boardman, J. Brazile, L. Bruzzone, G. Camps-Valls, and
G. Trianni. Recent advances in techniques for hyperspectral image processing. Remote Sensing of
Environment, 113:110–122, 2009.
[Cited in pag. 1]
[2] C. I. Chang. Hyperspectral imaging: techniques for spectral detection and classification. Kluwer
Academic/Plenum Publishers: New York, 2003.
[Cited in pags. 2, 11, 30 and 31]
[3] C. I. Chang. Recent advances in hyperspectral signal and image processing. John Wiley & Sons:
New York, 2007.
[Cited in pag. 2]
[4] J. A. Richards and X. Jia. Remote sensing digital image analysis: an introduction. Springer, 2006.
[Cited in pags. 2, 3, 5, 28 and 72]
[5] G. Shaw and D. Manolakis. Signal processing for hyperspectral image exploitation. IEEE Signal
Processing Magazine, 19(1):12–16, 2002.
[Cited in pag. 2]
[6] L. Bruzzone and D. F. Prieto. Automatic analysis of the difference image for unsupervised
change detection. IEEE Transactions on Geoscience and Remote Sensing, 38(3):1171–1182, 2000.
[Cited in pag. 2]
[7] J. M. Bioucas-Dias, A. Plaza, N. Dobigeon, M. Parente, Q. Du, P. Gader, and J. Chanussot.
Hyperspectral unmixing overview: Geometrical, statistical and sparse regression-based approaches.
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 5(2):354–379,
2012.
[Cited in pags. 2, 77 and 79]
[8] N. Keshava and J. F. Mustard. Spectral unmixing. IEEE Signal Processing Magazine, 19(1):44–57,
2002.
[Cited in pags. 2, 3, 11, 13, 28 and 31]
[9] A. Plaza, P. Martinez, R. Perez, and J. Plaza. A quantitative and comparative analysis of
endmember extraction algorithms from hyperspectral data. IEEE Transactions on Geoscience
and Remote Sensing, 42(3):650–663, 2004.
[Cited in pags. 2 and 15]
[10] Q. Du, N. Raksuntorn, N. H. Younan, and R. L. King. End-member extraction for hyperspectral
image analysis. Applied Optics, 47(28):77–84, 2008.
[Cited in pag. 2]
[11] N. Keshava. A survey of spectral unmixing algorithms. Lincoln Laboratory Journal, 14(1):55–78,
2003.
[Cited in pag. 2]
BIBLIOGRAPHY
[12] M. Parente and A. Plaza. Survey of geometric and statistical unmixing algorithms for hyperspectral
images. In IEEE GRSS Workshop on Hyperspectral Image and Signal Processing: Evolution in
Remote Sensing (WHISPERS’10), pages 1–4, 2010.
[Cited in pag. 2]
[13] A. Plaza, Q. Du, J. M. Bioucas-Dias, X. Jia, and F. Kruse. Foreword to the special issue on
spectral unmixing of remotely sensed data. IEEE Transactions on Geoscience and Remote Sensing,
49(11):4103–4110, 2011.
[Cited in pag. 2]
[14] D. A. Landgrebe. Signal theory methods in multispectral remote sensing. John Wiley & Sons: New
York, 2003.
[Cited in pags. 3, 11, 21, 28, 38, 39 and 49]
[15] F. Melgani and L. Bruzzone. Classification of hyperspectral remote-sensing images with support
vector machines. IEEE Transactions on Geoscience and Remote Sensing, 42(8):1778–1790, 2004.
[Cited in pags. 3, 11 and 14]
[16] G. Camps-Valls and L. Bruzzone.
Kernel-based methods for hyperspectral image
classification. IEEE Transactions on Geoscience and Remote Sensing, 43(6):1351–1362, 2005.
[Cited in pags. 3, 11, 28, 53 and 78]
[17] X. Zhu. Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences,
University of Wisconsin-Madison, 2005.
[Cited in pags. 4, 50, 72 and 78]
[18] R. G. Congalton and K. Green. Assessing the accuracy of remotely sensed data: principles and
practices. CRC press, 2008.
[Cited in pag. 11]
[19] D. C. Heinz and C. I. Chang. Fully constrained least squares linear mixture analysis for material
quantification in hyperspectral imagery. IEEE Transactions on Geoscience and Remote Sensing,
39(3):529–545, 2001.
[Cited in pags. 11, 14, 19, 30 and 38]
[20] R. O. Green, M. L. Eastwooda, C. M. Sarturea, T. G. Chriena, M. Aronssona, J. A. Chippendalea,
B. J. Fausta, B. E. Pavria, C. J. Chovita, M. Solisa, M. R. Olaha, and O. Williamsa. Imaging
spectroscopy and the airborne visible/infrared imaging spectrometer (aviris). Remote Sensing of
Environment, 65(3):227–248, 1998.
[Cited in pags. 11, 29 and 52]
[21] T. Luo, K. Kramer, D. B. Goldgof, S. Samson, A. Remsen, T. S. Hopkins, and D. Cohn. Active
learning to recognize multiple types of plankton. Journal of Machine Learning Research, 3:589–613,
2005.
[Cited in pags. 11, 54 and 82]
[22] Q. Du, H. Ren, and C. I. Chang. A comparative study for orthogonal subspace projection
and constrained energy minimization. IEEE Transactions on Geoscience and Remote Sensing,
41(6):1525–1529, 2003.
[Cited in pags. 11 and 17]
[23] I. Dopido, A. Villa, and A. Plaza.
Unsupervised clustering and spectral unmixing for
feature extraction prior to supervised classification of hyperspectral images. In SPIE Optics
and Photonics, Satellite Data Compression, Communication, and Processing Conference, 2011.
[Cited in pags. 11, 32 and 33]
102
BIBLIOGRAPHY
[24] I. Dopido, A. Villa, A. Plaza, and P. Gamba. A quantitative and comparative assessment
of unmixing-based feature extraction techniques for hyperspectral image classification. IEEE
Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 5(2):421–435, 2012.
[Cited in pags. 11 and 79]
[25] J. M. Bioucas-Dias and J. M. P. Nascimento. Hyperspectral subspace identification. IEEE
Transactions on Geoscience and Remote Sensing, 46(8):2435–2445, 2008. [Cited in pags. 11, 15 and 31]
[26] P. Comon. Independent component analysis, a new concept? Signal Processing, 36(3):287–314,
1994.
[Cited in pags. 11, 14 and 28]
[27] J. F. Cardoso. High-order contrasts for independent component analysis. Neural Computation,
11(1):157–192, 1999.
[Cited in pags. 11 and 14]
[28] J. M. Bioucas-Dias and M. Figueiredo.
Logistic regression via variable splitting and
augmented Lagrangian tools. Technical report, Instituto Superior Técnico, TULisbon, 2009.
[Cited in pags. 11 and 53]
[29] J. Li, J. M. Bioucas-Dias, and A. Plaza. Hyperspectral image segmentation using a new
bayesian approach with active learning. IEEE Transactions on Geoscience and Remote Sensing,
49(10):3947–3960, 2011.
[Cited in pags. 11, 53 and 54]
[30] D. Bohning. Multinomial logistic regression algorithm. Annals of the Institute of Statistical
Mathematics, 44(1):197–200, 1992.
[Cited in pags. 11, 52 and 79]
[31] A. A. Green, M. Berman, P. Switzer, and M. D. Craig. A transformation for ordering multispectral
data in terms of image quality with implications for noise removal. IEEE Transactions on
Geoscience and Remote Sensing, 26(1):65–74, 1988.
[Cited in pags. 11, 14 and 28]
[32] D. Tuia, M. Volpi, L. Copa, M. Kanevski, and J. Muñoz-Mari. A survey of active learning
algorithms for supervised remote sensing image classification. IEEE Journal of Selected Topics
in Signal Processing, 5(3):606–617, 2011.
[Cited in pags. 11, 51 and 54]
[33] J. Boardman. Leveraging the high dimensionality of AVIRIS data for improved subpixel target
unmixing and rejection of false positives: mixture tuned matched filtering. Proceedings of the 5th
JPL Geoscience Workshop, pages 55–56, 1998.
[Cited in pags. 11, 17, 28, 30, 31 and 79]
[34] I. Dopido, M. Zortea, A. Villa, A. Plaza, and P. Gamba. Unmixing prior to supervised classification
of remotely sensed hyperspectral images. IEEE Geoscience and Remote Sensing Letters, 8(4):760–
764, 2011.
[Cited in pags. 11, 28, 31, 37, 38, 39, 40 and 79]
[35] D. Tuia, F. Ratle, F. Pacifici, M. F. Kanevski, and W. J. Emery. Active learning methods for remote
sensing image classification. IEEE Transactions on Geoscience and Remote Sensing, 47(7):2218–
2232, 2009.
[Cited in pags. 11, 51 and 55]
103
BIBLIOGRAPHY
[36] J. C. Harsanyi and C. I. Chang. Hyperspectral image classification and dimensionality reduction:
An orthogonal subspace projection. IEEE Transactions on Geoscience and Remote Sensing,
32(4):779–785, 1994.
[Cited in pags. 11, 15 and 38]
[37] J. A. Richards.
Analysis of remotely sensed data: the formative decades and the
future.
IEEE Transactions on Geoscience and Remote Sensing, 43(3):422–432, 2005.
[Cited in pags. 11, 13, 14 and 28]
[38] F. Dell’Acqua, P. Gamba, A. Ferrari, J. A. Palmason, and J. A. Benediktsson. Exploiting spectral
and spatial information in hyperspectral urban data with high resolution. IEEE Geoscience and
Remote Sensing Letters, 1(4):322–326, 2004.
[Cited in pags. 11, 14 and 29]
[39] B. Krishnapuram, L. Carin, M. Figueiredo, and A. Hartemink. Sparse multinomial logistic
regression: fast algorithms and generalization bounds. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 27(6):957–968, 2005.
[Cited in pags. 11 and 53]
[40] V.
N.
Vapnik.
Statistical
learning
theory.
John
Wiley,
New
York,
1998.
[Cited in pags. 11, 50, 53, 57 and 78]
[41] T. Joachims. Transductive inference for text classification using support vector machines. In
Proceedings of the Sixteenth International Conference on Machine Learning, volume 99 of ICML’99,
pages 200–209, 1999.
[Cited in pags. 11, 50 and 78]
[42] J. M. P. Nascimento and J. M. Bioucas-Dias. Vertex component analysis: a fast algorithm to
unmix hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing, 43(4):898–910,
2005.
[Cited in pags. 11 and 15]
[43] Q. Du and C. I. Chang. Estimation of number of spectrally distinct signal sources in
hyperspectral imagery. IEEE Transactions on Geoscience and Remote Sensing, 42(3):608–619,
2004.
[Cited in pags. 11, 15 and 31]
[44] A. Plaza, J. A. Benediktsson, J. W. Boardman, J. Brazile, L. Bruzzone, G. Camps-Valls,
J. Chanussot, M. Fauvel, P. Gamba, A. Gualtieri, M. Marconcini, J. C. Tilton, and G. Trianni.
Recent advances in techniques for hyperspectral image processing. Remote Sensing of Environment,
113:110–122, 2009.
[Cited in pags. 13, 14, 28 and 78]
[45] J. B. Adams, M. O. Smith, and P. E. Johnson. Spectral mixture modeling: a new analysis of rock
and soil types at the viking lander 1 site. Journal of Geophysical Research, 91(B8):8098–8112,
1986.
[Cited in pag. 14]
[46] G. Camps-Valls, L. Gomez-Chova, J. Muñoz-Mari, J. Vila-Frances, and J. Calpe-Maravilla.
Composite kernels for hyperspectral image classification. IEEE Geoscience and Remote Sensing
Letters, 3(1):93–97, 2006.
[Cited in pags. 14 and 28]
[47] A. Plaza, P. Martine, J. Plaza, and R. Perez. Dimensionality reduction and classification of
hyperspectral image data using sequences of extended morphological transformations. IEEE
Transactions on Geoscience and Remote Sensing, 43(3):466–479, 2005.
[Cited in pags. 14 and 28]
104
BIBLIOGRAPHY
[48] M. Rojas, I. Dopido, A. Plaza, and P. Gamba. Comparison of support vector machine-based
processing chains for hyperspectral image classification. In SPIE Optics and Photonics, Satellite
Data Compression, Communication, and Processing Conference, 2010. [Cited in pags. 14, 28 and 38]
[49] J. A. Benediktsson, J. A. Palmason, and J. R. Sveinsson. Classification of hyperspectral data
from urban areas based on extended morphological profiles. IEEE Transactions on Geoscience and
Remote Sensing, 43(3):480 – 491, 2005.
[Cited in pags. 14 and 52]
[50] B. Luo and J. Chanussot. Hyperspectral image classification based on spectral and geometrical
features. Proceedings of IEEE International Workshop on Machine Learning for Signal Processing,
pages 1–6, 2009.
[Cited in pag. 14]
[51] B. Luo and J. Chanussot. Unsupervised classification of hyperspectral images by using linear
unmixing algorithm. Proceedings of IEEE International Conference on Image Processing, pages
2877–2880, 2009.
[Cited in pags. 14 and 37]
[52] M. E. Winter. N-FINDR: an algorithm for fast autonomous spectral end-member determination
in hyperspectral data.
Proceedings of SPIE Image Spectrometry V, 3753:266–277, 1999.
[Cited in pag. 15]
[53] M. Zortea and A. Plaza. Spatial preprocessing for endmember extraction. IEEE Transactions on
Geoscience and Remote Sensing, 47(8):2679–2693, 2009.
[Cited in pag. 16]
[54] I. Dopido and A. Plaza. Unmixing prior to supervised classification of urban hyperspectral images.
In 6th IEEE GRSS/ISPRS Joint Workshop on Remote Sensing and Data Fusion over Urban Areas
(JURSE’11), pages 97–100, 2011.
[Cited in pag. 17]
[55] G. M. Foody. Thematic map comparison: Evaluating the statistical significance of differences in
classification accuracy. Photogrammetric Engineering and Remote Sensing, 70(5):627–634, 2004.
[Cited in pags. 21 and 40]
[56] G. F. Hughes. On the mean accuracy of statistical pattern recognizers. IEEE Transactions on
Information Theory, 14(1):55–63, 1968.
[Cited in pag. 28]
[57] K. Fukunaga.
Introduction to statistical pattern recognition.
CA: Academic Press, 1990.
[Cited in pag. 28]
[58] L. Bruzzone, M. Chi, and M. Marconcini. A novel transductive SVM for the semisupervised
classification of remote sensing images. IEEE Transactions on Geoscience and Remote Sensing,
44(11):3363–3373, 2006.
[Cited in pags. 28, 50, 54 and 78]
[59] L. O. Jimenez and D. A. Landgrebe. Supervised classification in high dimensional space:
geometrical, statistical and asymptotical properties of multivariate data. IEEE Transactions on
Systems, Man, and Cybernetics-Part B: Cybernetics, 28(1):39–54, 1993.
[Cited in pag. 28]
105
BIBLIOGRAPHY
[60] Q. Jackson and D. A. Landgrebe. An adaptive classifier design for high dimensional data
analysis with a limited training data set. IEEE Transactions on Geoscience and Remote Sensing,
39(12):2664–2679, 2001.
[Cited in pag. 28]
[61] I. Dopido, A. Villa, A. Plaza, and P. Gamba. A comparative assessment of several processing
chains for hyperspectral image classification: what features to use?
Proceedings of the
IEEE/GRSS Workshop on Hyperspectral Image and Signal Processing: Evolution in remote Sensing
(WHISPERS’11), pages 1–4, 2011.
[Cited in pag. 28]
[62] C. I. Chang, J. M. Liu, B. C. Chieu, H. Ren, C. M. Wang, C. S. Lo, P. C. Chung, C. W. Yang,
and D. J. Ma. Generalized constrained energy minimization approach to subpixel target detection
for multispectral imagery. Optical Engineering, 39(5):1275–1281, 2000.
[Cited in pags. 28 and 31]
[63] J. A. Hartigan and M. A. Wong. Algorithm as 136: a k-means clustering algorithm. Journal of
the Royal Statistical Society, Series C (Applied Statistics), 28(1):100–108, 1979. [Cited in pag. 31]
[64] L. Wang and X. Jia. Integration of soft and hard classifications using extended support vector
machines. IEEE Geoscience and Remote Sensing Letters, 6(3):543–547, 2009.
[Cited in pag. 37]
[65] A. Villa, J. Chanussot, J. A. Benediktsson, and C. Jutten. Spectral unmixing for the classification
of hyperspectral images at a finer spatial resolution. IEEE Journal of Selected Topics in Signal
Processing, 5(3):521–533, 2011.
[Cited in pag. 37]
[66] F. A. Mianji and Y. Zhang. SVM-based unmixing-to-classification conversion for hyperspectral
abundance quantification. IEEE Transactions on Geoscience and Remote Sensing, 49(11):4318–
4327, 2011.
[Cited in pag. 37]
[67] J. C. Bezdek. Pattern recognition with fuzzy objective function algorithms. Plenum Press, New
York, 1981.
[Cited in pag. 38]
[68] B. Mojaradi, H. Abrishami-Moghaddam, M. J. V. Zoej, and R. P. W. Duin. Dimensionality
reduction of hyperspectral data via spectral feature extraction. IEEE Transactions on Geoscience
and Remote Sensing, 47(7):2091–2105, 2009.
[Cited in pag. 39]
[69] S. Garcia, A. Fernandez, J. Luengo, and F. Herrera. Advanced nonparametric tests for
multiple comparisons in the design of experiments in computational intelligence and data mining:
experimental analysis of power. Information Sciences, 180(10):2044–2064, 2010. [Cited in pag. 40]
[70] B. Schölkopf, A. Smola, and K. R. Müller. Nonlinear component analysis as a kernel eigenvalue
problem. Neural computation, 10(5):1299–1319, 1998.
[Cited in pag. 44]
[71] F. Bovolo, L. Bruzzone, and L. Carline. A novel technique for subpixel image classification based
on support vector machine. IEEE Transactions on Image Processing, 19(11):2983–2999, 2010.
[Cited in pags. 50 and 78]
106
BIBLIOGRAPHY
[72] B. M. Shahshahani and D. A. Landgrebe. The effect of unlabeled samples in reducing the small
sample size problem and mitigating the hughes phenomenon. IEEE Transactions on Geoscience
and Remote Sensing, 32(5):1087 –1095, 1994.
[Cited in pag. 50]
[73] S. Baluja. Probabilistic modeling for face orientation discrimination: learning from labeled and
unlabeled data. In Neural Information Procesing systems (NIPS ’98), 1998. [Cited in pags. 50 and 78]
[74] T. Mitchell. The role of unlabeled data in supervised learning. In Proceedings of the Sixth
International Colloquium on Cognitive Science, pages 2–11, 1999.
[Cited in pags. 50 and 78]
[75] A. Fujino, N. Ueda, and K. Saito. A hybrid generative/discriminative approach to semi-supervised
classifier design. In AAAI’05 Proceedings of the 20th National Coference on Artificial Intelligence,
volume 20, page 764, 2005.
[Cited in pags. 50 and 78]
[76] D. Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings
of the 33rd annual meeting on Association for Computational Linguistics, ACL’95, pages 189–196,
1995.
[Cited in pags. 50 and 78]
[77] C. Rosenberg, M. Hebert, and H. Schneiderman. Semi-supervised self-training of object
detection models. In Seventh IEEE Workshop on Applications of Computer Vision, 2005.
[Cited in pags. 50 and 78]
[78] V. R. De Sa. Learning classification with unlabeled data, 1994.
[Cited in pags. 50 and 78]
[79] U. Brefeld, T. Gartner, T. Scheffer, and S. Wrobel. Efficient co-regularised least squares regression.
In Proceedings of the 23rd international conference on Machine learning, ICML’06, pages 137–144,
2006.
[Cited in pags. 50 and 78]
[80] A. Blum and S. Chawla. Learning from labeled and unlabeled data using graph mincuts. In
Proceedings of the Eighteenth International Conference on Machine Learning, ICML’01, pages 19–
26, 2001.
[Cited in pags. 50 and 78]
[81] O. Chapelle, M. Chi, and A. Zien. A continuation method for semi-supervised SVMs. In Proceedings
of the 23rd International Conference on Machine Learning, pages 185–192. ACM Press, 2006.
[Cited in pags. 50 and 78]
[82] G. Camps-Valls, T. Bandos, and D. Zhou. Semi-supervised graph-based hyperspectral image
classification. IEEE Transactions on Geoscience and Remote Sensing, 45(10):3044–3054, 2007.
[Cited in pags. 50, 54 and 78]
[83] S. Velasco-Forero and V. Manian.
Improving hyperspectral image classification using
spatial preprocessing. IEEE Geoscience and Remote Sensing Letters, 6(2):297–301, 2009.
[Cited in pags. 50 and 78]
[84] D. Tuia and G. Camps-Valls. Semisupervised remote sensing image classification with cluster
kernels. IEEE Geoscience and Remote Sensing Letters, 6(2):224 –228, 2009. [Cited in pags. 50 and 78]
107
BIBLIOGRAPHY
[85] J. Li, J. M. Bioucas-Dias, and A. Plaza. Semi-supervised hyperspectral classification. In First
IEEE GRSS Workshop on Hyperspectral Image and Signal Processing, 2009.
[Cited in pag. 50]
[86] J. Li, J. M. Bioucas-Dias, and A. Plaza. Semi-supervised hyperspectral image segmentation using
multinomial logistic regression with active learning. IEEE Transactions on Geoscience and Remote
Sensing, 48(11):4085–4098, 2010.
[Cited in pags. 50, 51, 54 and 78]
[87] L. Bruzzone and C. Persello. A novel context-sensitive semisupervised SVM classifier robust to
mislabeled training samples. IEEE Transactions on Geoscience and Remote Sensing, 47(7):2142
–2154, 2009.
[Cited in pag. 50]
[88] J. Muñoz-Mari, F. Bovolo, L. Gomez-Chova, L. Bruzzone, and G. Camps-Valls. Semisupervised
one-class support vector machines for classification of remote sensing data. IEEE Transactions on
Geoscience and Remote Sensing, 48(8):3188 –3197, 2010.
[Cited in pag. 50]
[89] L. Gomez-Chova, G. Camps-Valls, L. Bruzzone, and J. Calpe-Maravilla. Mean map kernel methods
for semisupervised cloud classification. IEEE Transactions on Geoscience and Remote Sensing,
48(1):207 –220, 2010.
[Cited in pag. 50]
[90] D. Tuia and G. Camps-Valls. Urban image classification with semisupervised multiscale cluster
kernels. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,
4(1):65 –74, 2011.
[Cited in pags. 50, 54 and 82]
[91] F. Ratle, G. Camps-Valls, and J. Weston.
Semisupervised neural networks for efficient
hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing,
48(5):2271 –2282, 2010.
[Cited in pag. 50]
[92] J. Muñoz-Mari, D. Tuia, and G. Camps-Valls. Semisupervised classification of remote sensing
images with active queries. IEEE Transactions on Geoscience and Remote Sensing, 50(10):3751 –
3763, 2012.
[Cited in pag. 51]
[93] S. Rajan, J. Ghosh, and M. M. Crawford. An active learning approach to hyperspectral data
classification. IEEE Transactions on Geoscience and Remote Sensing, 46(4):1231–1242, 2008.
[Cited in pag. 51]
[94] W. Di and M. M. Crawford. Active learning via multi-view and local proximity co-regularization
for hyperspectral image classification. IEEE Journal of Selected Topics Signal Processing, 5(3):618–
628, 2011.
[Cited in pag. 51]
[95] S. Patra and L. Bruzzone. A batch-mode active learning technique based on multiple uncertainty for
SVM classifier. IEEE Geoscience and Remote Sensing Letters, 9(3):497 –501, 2012. [Cited in pag. 51]
[96] I. Dopido, J. Li, and A. Plaza. Semi-supervised active learning approach for hyperspectral image
classification: Application to multinomial logistic regression and support vector machines. In SPIE
Optics and Photonics, Satellite Data Compression, Communication, and Processing Conference,
2012.
[Cited in pag. 52]
108
BIBLIOGRAPHY
[97] J. Li, J. M. Bioucas-Dias, and A. Plaza. Spectral-spatial hyperspectral image segmentation
using subspace multinomial logistic regression and markov random fields. IEEE Transactions
on Geoscience and Remote Sensing, 50(3):809–823, 2012.
[Cited in pag. 53]
[98] J. Platt. Probabilistic outputs for support vector machines and comparisons to regularized
likelihood methods. In Advances in large margin classifiers, volume 10, pages 61–74. MIT Press,
2000.
[Cited in pags. 53 and 54]
[99] T. F. Wu, C. J. Lin, and R. C. Weng. Probability estimates for multiclass classification by pairwise
coupling. Journal of Machine Learning Research, 5:975–1005, 2004.
[Cited in pag. 54]
[100] Y. Tarabalka, M. Fauvel, J. Chanussot, and J. A. Benediktsson. SVM- and MRF-based method
for accurate classification of hyperspectral images. IEEE Geoscience and Remote Sensing Letters,
7(4):736 –740, 2010.
[Cited in pag. 54]
[101] I. Dopido, J. Li, P. Gamba, and A. Plaza. A new semi-supervised approach for hyperspectral image
classification with different active learning strategies. In IEEE GRSS Workshop on Hyperspectral
Image and Signal Processing: Evolution in Remote Sensing (WHISPERS’12), 2012. [Cited in pag. 58]
[102] I. Dopido, J. Li, A. Plaza, and J. M. Bioucas-Dias. Semi-supervised active learning for
urban hyperspectral image classification. In IEEE Geoscience and Remote Sensing Symposium
(IGARSS’12), pages 1586–1589, 2012.
[Cited in pag. 58]
[103] J. Borges, J. M. Bioucas-Dias, and A. Marçal. Evaluation of bayesian hyperspectral imaging
segmentation with a discriminative class learning. In Proceedings of IEEE International Geoscience
and Remote sensing Symposium, pages 3810–3813, 2007.
[Cited in pag. 78]
[104] J. Li, J. M. Bioucas-Dias, and A. Plaza. Supervised hyperspectral image segmentation using active
learning. In IEEE GRSS Workshop on Hyperspectral Image and Signal Processing: Evolution in
Remote Sensing (WHISPERS’10), pages 1–4, 2010.
[Cited in pag. 78]
[105] J. Li, J. M. Bioucas-Dias, and A. Plaza. Exploiting spatial information in semi-supervised
hyperspectral image segmentation. In IEEE GRSS Workshop on Hyperspectral Image and Signal
Processing: Evolution in Remote Sensing (WHISPERS’10), pages 1–4, 2010.
[Cited in pag. 78]
[106] A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In In
Proceedings of the eleventh annual conference on Computational learning theory, pages 92–100,
1998.
[Cited in pag. 78]
[107] X. Zhu, Z. Ghahramani, T. Jaakkola, and M. Ii. Semi-supervised learning with graphs (technical
report cmu-cald-02-106). Technical report, Carnegie Mellon University, 2005.
[Cited in pag. 78]
[108] Y. Zhong, L. Zhang, B. Huang, and L. Pingxiang. An unsupervised artificial immune classifier
for multi/hyperspectral remote sensing imagery. IEEE Transations on Geoscience and Remote
Sensing, 44(2):420–431, 2006.
[Cited in pag. 78]
109
BIBLIOGRAPHY
[109] I. Dopido, J. Li, P. Gamba, and A. Plaza. Semi-supervised classification of hyperspectral data
using spectral unmixing concepts. In In Advances in Radar and Remote Sensing (TyWRRS),
pages 353–358, 2012.
[Cited in pag. 82]
[110] I. Dopido, J. Li, A. Plaza, and P. Gamba. Semi-supervised classification of urban hyperspectral
data using spectral unmixing concepts. In 8th IEEE GRSS/ISPRS Joint Workshop on Remote
Sensing and Data Fusion over Urban Areas (JURSE’13), pages 186–189, 2013.
[Cited in pag. 82]
110