A Classic Thesis Style

Transcription

A Classic Thesis Style
UNIVERSITÀ DEGLI STUDI DI BARI
DIPARTIMENTO INTERATENEO DI FISICA
“Michelangelo Merlin”
DOTTORATO DI RICERCA IN FISICA XXVI CICLO
Settore Scientifico Disciplinare FIS/07
Quantitative MRI analysis
in Alzheimer’s disease
Tutore: Prof. Roberto Bellotti
Coordinatore: Prof. Gaetano Scamarcio
Dottorando: Nicola Amoroso
ANNO ACCADEMICO 2012-2013
CONTENTS
1
introduction
1.1 Quantitative neuroscience frontiers . . . . . . . . . . . . . . .
1.2 The Physics contribution . . . . . . . . . . . . . . . . . . . . .
1.3 An overview . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 the hippocampus role in alzheimer’s disease
2.1 The Alzheimer’s disease diagnosis . . . . . . . . . . . . . . .
2.2 Diagnosis criteria . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 The need for a revision of Alzheimer’s disease definition . .
2.3.1 Performance issues of the NINCDS-ADRDA criteria
2.3.2 Non-AD dementia involvement . . . . . . . . . . . . .
2.3.3 Histopathological improvements . . . . . . . . . . . .
2.3.4 AD phenotype . . . . . . . . . . . . . . . . . . . . . . .
2.4 Revising the AD definition . . . . . . . . . . . . . . . . . . . .
2.5 The need for early intervention . . . . . . . . . . . . . . . . .
2.6 The AD biomarkers . . . . . . . . . . . . . . . . . . . . . . . .
2.7 The Hippocampus anatomy . . . . . . . . . . . . . . . . . . .
2.7.1 The hippocampal head . . . . . . . . . . . . . . . . . .
2.7.2 The hippocampal body . . . . . . . . . . . . . . . . . .
2.7.3 The hippocampal tail . . . . . . . . . . . . . . . . . . .
2.7.4 General features . . . . . . . . . . . . . . . . . . . . . .
3 magnetic resonance imaging
3.1 The Magnetic Resonance effect . . . . . . . . . . . . . . . . .
3.2 Spin-magnetic field coupling . . . . . . . . . . . . . . . . . .
3.3 The Motion equation . . . . . . . . . . . . . . . . . . . . . . .
3.3.1 The Resonance . . . . . . . . . . . . . . . . . . . . . .
3.4 Magnetization and Relaxation . . . . . . . . . . . . . . . . . .
3.4.1 Relaxation times . . . . . . . . . . . . . . . . . . . . . .
3.5 The Bloch Equation . . . . . . . . . . . . . . . . . . . . . . . .
3.6 Signal Acquisition . . . . . . . . . . . . . . . . . . . . . . . . .
3.7 Image noise and contrast . . . . . . . . . . . . . . . . . . . . .
3.7.1 Signal to noise ratio and spatial resolution . . . . . .
3.7.2 Contrast: Proton density, T1 and T2 weighting . . . .
3.8 Segmentation algorithms: state of the art . . . . . . . . . . .
4 the hippocampus segmentation
4.1 A combined strategy for segmentation . . . . . . . . . . . . .
. . .
. . .
. . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . .
7
7
9
10
13
13
16
17
18
18
18
19
19
21
23
26
29
30
30
31
33
33
34
37
40
41
42
44
45
48
48
49
53
55
55
3
Why an automated segmentation? . . . . . . . . . . . .
Materials: database properties . . . . . . . . . . . . . . .
Preprocessing, automated registration . . . . . . . . . .
4.4.1 Registration Methodology . . . . . . . . . . . . .
4.4.2 Rigid transformations . . . . . . . . . . . . . . .
4.4.3 Non rigid transformations . . . . . . . . . . . . .
4.4.4 Intensity based registration . . . . . . . . . . . .
4.4.5 Similarity Measures . . . . . . . . . . . . . . . . .
4.4.6 Optimization algorithms for registration . . . .
4.5 Shape Analysis . . . . . . . . . . . . . . . . . . . . . . .
4.5.1 SPHARM analysis . . . . . . . . . . . . . . . . .
4.5.2 SPHARM description . . . . . . . . . . . . . . . .
4.5.3 The SPHARM average shape algorithm . . . . .
4.6 A novel FAPoD algorithm . . . . . . . . . . . . . . . . .
4.6.1 Simulated Data . . . . . . . . . . . . . . . . . . .
4.6.2 Shape model construction . . . . . . . . . . . . .
4.6.3 Modeling the variations . . . . . . . . . . . . . .
4.7 Ensemble Classifier Segmentation . . . . . . . . . . . .
4.7.1 Voxel-wise analysis with machine learning . . .
4.7.2 Feature Extraction . . . . . . . . . . . . . . . . .
4.7.3 Classification methods . . . . . . . . . . . . . . .
4.7.4 Random Forests . . . . . . . . . . . . . . . . . . .
4.7.5 RUSBoost . . . . . . . . . . . . . . . . . . . . . .
4.8 Analyses on distributed infrastructures . . . . . . . . .
4.8.1 Medical imaging and distributed environments
4.8.2 Workflow managers . . . . . . . . . . . . . . . .
4.8.3 Workflow Implementation . . . . . . . . . . . . .
4.8.4 Distributed infrastructure employment . . . . .
4.8.5 Grid services and interface . . . . . . . . . . . .
4.8.6 Security and Data Management . . . . . . . . . .
4.8.7 Segmentation workflow deployment . . . . . . .
4.8.8 Workflow setup . . . . . . . . . . . . . . . . . . .
4.8.9 Summary . . . . . . . . . . . . . . . . . . . . . . .
5 experimental results
5.1 Hippocampus Segmentation . . . . . . . . . . . . . . . .
5.1.1 Exploratory Analysis . . . . . . . . . . . . . . . .
5.1.2 VOI extraction . . . . . . . . . . . . . . . . . . . .
5.1.3 Correlation analysis . . . . . . . . . . . . . . . .
5.1.4 Feature importance . . . . . . . . . . . . . . . . .
4.2
4.3
4.4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
56
57
59
60
60
61
63
63
64
65
65
66
69
72
72
74
76
78
78
79
81
82
85
86
87
88
89
91
92
92
93
94
96
99
100
100
103
105
107
5.1.5 Random Forest classification . .
5.1.6 RUSBoost classification . . . . . .
5.1.7 The segmentation error . . . . .
5.1.8 Statistical agreement assessment
5.2 Alzheimer’s disease classification . . . .
5.2.1 Stability of the results . . . . . .
5.2.2 The method uncertainty . . . . .
5.3 Distributed infrastructure exploitation .
6 conclusions
6.1 Motivations and summary . . . . . . . .
6.2 Segmentation . . . . . . . . . . . . . . .
6.3 Clinical Classification . . . . . . . . . . .
6.4 Computation . . . . . . . . . . . . . . . .
index
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
108
110
111
114
119
124
127
131
135
135
137
139
140
171
5
1
INTRODUCTION
Neuroscience is apporaching quantitative methods of analysis and new computing technologies to unveil the brain levels and functions. The contribution Physics can give in
term of data mining strategies and big data analyses range from the methodological to
the computational level.
1.1
quantitative neuroscience frontiers
Neuroscience is generating exponentially growing volumes of data and knowledge on specific aspects of the healthy and diseased brain, in different species,
at different ages. However, there is no effective strategy to experimentally map
the brain across all its levels and functions, yet. Modern supercomputing technology offers a solution, making it possible to integrate the data in detailed
computer reconstructed models and simulations of the brain. Brain models
and simulations allow researchers to predict missing data and principles and
to take measurements and perform experimental manipulations that would be
ethically or technically impossible in animal or humans. Combined with new
knowledge from big data projects across the world, in silico neuroscience has
the potential to reveal the detailed mechanisms leading frome genes to cells
and circuits and ultimately to cognition and behavior, the very heart of that
which makes us human.
According to this new perspective, the fist decade of this century has seen
the spread of several notable initiatives in the neuroscience field. In 2003 the
Allen Brain Atlas project was initiated, with a 100 million dollar donation. The
main goals of the project were to advance the research and knowledge about
neurobiological conditions and to release an open platform for both data and
7
findings, in order to allow researchers from other fields all over the world
to take them into account while designing their own experiments. The main
contribution to neuroscience has been the map of gene expression in the brain
allowing researchers to correlate forms and functions and to compare patterns
of healthy subjects with those affected by a disease.
A couple of years later, in 2005, another important project was funded, the
Blue Brain Project. This project led by the École Polytechnique Fédérale de
Lausanne is important not only for the scientific relevance. In fact, even if
primarily funded by Swiss, it is an example of the European flagship Future
and Emerging Technologies. Briefly, this project represents the first attempt to
re-create in a synthetic way the brain. The first goal of the project was to build
a realistic and detailed model of the neocortical column. Once this goal was
achieved, a further investigation aimed to larger and more accurate models
took place.
In the very last years, a major effort has been made. In 2009 the Human
Connectome Project was launched. It is a five year project sponsored by the US
National Institutes of Health. It is a part of the Blueprint Grand Challenges and
its goal is to map the connections within the healthy brain. It is expected to
help answer questions about how genes influence brain connectivity, and how
this in turn relates to mood, personality and behavior. Another of its goals
is the optimization of brain imaging techniques to see the brain’s wiring in
unprecedented detail. However, it is 2013 what eventually will be recorded as
the neuroscience annus mirabilis. In fact, respectively in January and April, the
two projects Human Brain Project and BRAIN Initiative (BRAIN stands for Brain
Research through Advancing Innovative Neurotechnologies ) were funded by
the European Commission and the US administration.
The main reason behind these efforts is that understanding the human brain
is one of the greatest challenges facing 21st century science. The challenge
rewards are enormous:
a. gaining profound insights into what makes us human and brain functionalities;
b. development of new treatments for brain diseases;
c. implementation of revolutionary new computing technologies.
At the present, for the first time in history, modern Infrastructure and Communications Technology seems to have brought these goals within sight.
8
1.2
the physics contribution
Medicine is experiencing a data explosion, primarily driven by advances in
genetics and imaging technology. However, effective strategies to integrate
the data and to identify the unique "biological signatures" of neurological and
psychiatric diseases still lack. In this sense the contribution of physical methodologies, deriving from the expertise gained in analyzing and interpreting big
data, for example in high energy or nuclear physics, can be strategic.
In fact, new databasing and data mining technologies now make it possible
to federate and analyze the huge volumes of data accumulating in hospital
archives, leading eventually researchers to identify the biological changes associated with disease and opening new possibilities for early diagnosis and
personalized medicine. In the longer term, a multidisciplinary integrated approach will allow to modify models of the healthy brain to simulate disease.
Disease simulation will provide researchers with a powerful new tool to probe
the causal mechanisms responsible for disease, and to screen putative treatments, accelerating medical research and reducing the huge suffering and costs
associated with diseases of the brain.
The more urgent tasks neuroscience has to take into account are: to federate data of clinical, genetics and imaging provenance, to develop tools making
it possible to extract unique disease signatures or to unveil patterns, to access multi-level data analysis in order to combine different informations and
sources, to define classification models based on biological features and markers for brain diseases. These tasks naturally require the development of a new
class of data classification techniques, new computational strategies and algorithmic solutions. Above all, these tasks require a comprehensive approach,
able to take into account both scientific and computational concerns.
The need for dedicated informatics infrastructures to manage workflows for
"big data" analyses can only be faced in a multidisciplinary framework. Specific analytic methodologies and infrastructures that will conjointly enable researchers and physicians to determine the biological signatures of brain diseases have to be carefully planned to make an efficient use of computing distributed structures modern ICT provides nowadays. In particular two research
lines are explored in this work:
a. The development, the implementation and the validation of pattern recognition algorithms capable of dealing with data (both numerical and textual) that are: of heterogeneous nature, from multiple sites and times,
comprising at least medical images and meta-data (concerning age, sex,
etc.). The algorithms should identify reliable and robust patterns within
9
the data and identify categories sharing common disease signatures. This
is realized through the development of a fully automated segmentation
workflow and using its outputs as clinical signatures for the Alzheimer’s
disease.
b. The development and the implementation of data workflows for automated processing of magnetic resonance scans. The proposed solution
includes state of the art methods for quality controls and traceability of
the workflow. Another not secondary aspect to be concerned with is the
adoption of reliable security protocols.
According to this the present work discusses how the typical methodologies
of physical analyses apply to the construction of medical imaging supervised
classifications. In particular focus is given to the hippocampal segmentation,
as a challenging task which involves all the major issues previously presented.
1.3
an overview
Hippocampus is a small brain structure which plays a relevant role in a number of physiological processes. In particular a strong connection between its
atrophy and the manifestation of several neurodegenerative diseases has been
widely established. In Chap. 2 the particular case of the Alzheimer’s disease
is discussed with the privileged role Hippocampus plays on the diagnosis of
this disease. In fact, it will be examined the chronological development of
the Alzheimer’s disease diagnosis criteria and how in the last years, several
imaging bio-markers have been proved to be supportive features for the diagnosis. A major focus will be given to hippocampal atrophy. The measure of
the hippocampal volume is correlated to neurodegenerative processes and as a
consequence diseased brains show atrophic situations, in particular for Alzheimer’s disease this is what happens to Hippocampus, and therefore it would be
possible to detect pathological conditions by comparing healthy hippocampal
volumes with those of the examined subjects.
The Hippocampus is a brain structure with a peculiar convoluted shape, surrounded by other brain structures sharing the same composition, therefore it
results difficult to be segmented even for expert neuroradiologists. These difficulties and the correlation with the hippocampal anatomy will be pointed out
at the end of Chap. 2. Nevertheless, structural imaging in the last decades has
developed hardware innovations which could make this task easier than before
if conveniently exploited. Specifically the introduction of high field magnetic
resonance scanners is the most noticeable technical improvement as it is able
10
to reduce the signal to noise ratio of the scans and therefore to improve the
image quality. In Chap. 3 an overview of the physical concepts of the magnetic
resonance leading to the development of magnetic resonance imaging is given,
besides, signal and noise are discussed with a particular attention to the latest
developments for high field scanners.
In the last decades the hippocampal segmentation has relied principally on
manual or semi-automated protocols; the ICT improvements of the last years
seem to have pushed several scientists to make higher efforts to reach fully
automated procedures. In Chap. 4 the detailed description of the most recent
strategies to achieve fully automated hippocampal segmentation is provided.
Then a further insight is given about weaknesses of the previous methodologies and therefore the novel solutions proposed in this work. The proposed
methodology relies on the use of supervised learning algorithms to distinguish
which voxels (the unit of 3-D images) of a magnetic resonance scanner should
be labeled to belong or not to the Hippocampus. The proposed methodology
makes use of modern classifiers as Random Forests and RUSBoost. As it will
be explained in Chap. 4 this choice is strictly tied with the particularly high
imbalance characterizing hippocampal images, i. e. the imbalancing between
the hippocampal volume and the background. Given the small hippocampal
dimensions, this imbalance is always present.
The automated segmentation algorithm described in Chap. 4 is thought to
be able not only to provide a fully automated computer aided detection system,
but also to provide a general scheme for medical imaging workflows. In fact,
the proposed analysis is completely modular and is able to employ several
kinds of distributed computing infrastructures. Thus, in Chap. 4 a detailed
description of the workflow implementation is given; besides further insight
on the distributed infrastructure exploitation and security data management
is furnished. The implemented solutions allow the dynamic use of local computer clusters as the BC2 S farm of the Istituto Nazionale di Fisica Nucleare in
Bari or the geographically distributed grid. Moreover, the proposed solution
not only guarantees the modern security standard protocols but also a null
failure rate thanks to automated job re-submission tools.
In Chap. 5 the results are presented. Particular emphasis is given to the
segmentation performances which compare well with other state of the art
algorithms. It is worthwhile to note how these results are obtained without
any assumption or knowledge on the clinical state of the examined subjects.
The segmentation performances are then studied from another perspective using a different validation database. For this dataset a longitudinal analysis
is performed and a classification probability is given for the Alzheimer’s disease diagnosis. The main result in this case is that hippocampal atrophy is
11
confirmed as a biomarker of the Alzheimer’s disease and according to the volumetric measures obtained through segmentation it is possible to distinguish
between healthy subjects and those affected by mild cognitive impairment or
the Alzheimer’s disease itself. These distinctions are statistically significant.
The chapter is concluded with a detailed description of the computational performances, in particular a major attention is given to the strategy employed to
decrease the computational times and the job failure rate.
The presented results are finally discussed in Chap. 6. Some conclusions
and suggestions are drawn, besides possible directions for future research are
also addressed.
12
2
THE HIPPOCAMPUS ROLE IN
ALZHEIMER’S DISEASE
The Alzheimer’s disease is one of the most common neurodegenerative diseases. One of
the supportive features for its diagnosis is the atrophy assessment of the medial temporal lobe through structural magnetic resonance imaging. Particularly relevant at this
aim is the volumetric measure of the Hippocampus.
2.1
the alzheimer’s disease diagnosis
Alzheimer’s disease (AD) is the most common type of dementia [1]. "Dementia" is by definition a term describing a variety of diseases and conditions affecting the normal functions of brain. The death or the malfunction of neurons causes in fact changes in one’s memory behavior and capability of clearly
thinking. The pathologic characteristics are degeneration of specific nerve cells,
presence of neuritic plaques and, in some cases, noradrenergic and somatostatinergic systems that innervate the telencephalon.
For research purposes the AD diagnosis is based on general criteria usually
defined by the Diagnostic and Statistical Manual of Mental Disorders fourth
edition (DSM-IV) [2] and specifically by the National Institute of Neurological
and Communicative Disorders and Stroke (NINCDS) [3] and by the Alzheimer’s Disease and Related Disorders Association (ADRDA) [3]. These criteria
have been extremely useful and have survived intact without modification for
more than a quarter of century. According to them, the following requirements
are needed to support a dementia diagnosis:
a. Symptoms must include decline in memory and cognitive functions as:
13
a) the ability to speak or understand spoken or written language;
b) the ability to recognize or identify objects;
c) the ability to perform motor activities;
d) the ability to think abstractly, make sound judgments and carry out
complex tasks.
b. The decline in cognitive abilities must be severe enough to have repercussions in everyday life.
From a physician point of view establishing a dementia diagnosis is therefore equivalent to determine the causes of the above cited symptoms, in fact
some conditions could be caused by symptoms that mimic dementia but that
can be reversed with treatment. This is statistically found in the 10% of dementia cases, common causes are depression, delirium, medication side effects,
thyroid problems, vitamin deficiencies and alcohol abuse. In contrast, AD and
other dementias are caused by damage to neurons that cannot be reversed with
current treatments.
Different types of dementia are associated with distinct symptom patterns
and brain abnormalities, as described in the following table:
Alzheimer’s disease
Vascular dementia
14
Most common type of dementia; accounts for an
estimated 60 to 80 percent of cases.
Difficulty to remember names and recent events
is often an early clinical symptom; apathy and
depression are also often early symptoms.
Later symptoms include impaired judgment, disorientation, confusion, behavior changes and difficulty speaking, swallowing and walking.
Hallmark brain abnormalities are deposits of
the protein fragment beta-amyloid (plaques) and
twisted stands of the protein tau (tangles).
Previously known as multi-infarct or post-stroke
dementia, vascular dementia is less common as a
sole cause of dementia than is AD.
Impaired judgment or ability to make plans is
more likely to be the initial symptom, as opposed
to the memory loss often associated with the initial symptoms of Alzheimer’s.
Vascular dementia occurs because of brain injuries. The location of the brain injury determines
how the individual’s thinking and physical functioning are affected.
Dementia with Levy People with DLB have some of the symptoms
bodies (DLB)
common in AD, but are more likely than people
with Alzheimer’s to have initial or early symptoms such as sleep disturbances or well-formed
visual hallucinations.
DLB alone can cause dementia, or it can occur
with AD and/or vascular dementia. The individual is said to have "mixed dementia" when this
happens.
Frontotemporal lobar Typical symptoms include changes in personality
degeneration (FTLD)
and behavior. Nerve cells in the front and side
regions of the brain are especially affected.
No distinguishing microscopic abnormality is
linked to all possible cases.
Parkinson’s disease
As Parkinson’s disease progresses it often results
in a severe dementia similar to DLB or AD.
Problems with movement are common symptom
early in the disease.
The Parkinson’s disease incidence is about one
tenth of AD subjects.
Table 1: A schematic view of the different types of dementia and of their principal
symptom patterns.
A diagnosis of AD is most commonly made by an individual’s primary care
physician. The physician obtains relevant informations as family history, including psychiatric history and history of cognitive and behavioral changes.
He also conducts cognitive tests and physical and neurological examinations;
in particular he can request the individual to undergo magnetic resonance
imaging (MRI) scans. MRI scans may help identifying brain changes, such as
the presence of a tumor or the evidence of a stroke, that could explain the individual symptoms. With the continuous developments in the neuroimaging
field, MRI is no longer helpful only to exclude other causes of the individual’s
symptoms but it has also become a supportive feature for early AD diagnosis
[4].
15
2.2
diagnosis criteria
Since early 80’s the established criteria for AD diagnosis were based on discriminant features, supportive features and consistent features as schematically
reported in the following.
a. The criteria for the clinical diagnosis of probable Alzheimer’s disease include:
• Dementia established according to Mini-Mental test [5], Blessed Dementia Scale [6] or similar examination;
• Deficits in two or more areas of cognition;
• Progressive worsening of memory and other cognitive functions;
• No disturbance of consciousness;
• Onset between ages 40 and 90, most often after age 65;
• Absence of systemic disorders or other brain diseases that could
account for the progressive deficits in memory and cognition.
b. The diagnosis of probable AD is supported by:
• Progressive deterioration of specific cognitive functions such as language (aphasia), motor skills (apraxia) and perception (agnosia);
• Impaired activities of daily living and altered patterns of behavior;
• Family history of similar disorders, particularly if confirmed neuropathologically;
• Laboratory results of: normal lumbar puncture as evaluated by standard techniques, normal pattern or nonspecific changes in EEG, such
as increased slow-wave activity, evidence of cerebral atrophy with
progression documented by serial observation.
c. Other clinical features consistent with the diagnosis of probable AD, after
the exclusion of causes of dementia other than AD, include:
• Plateaux in the course of progression of the illness;
• Associated symptoms of depression, insomnia, incontinence, illusions, hallucinations, catastrophic verbal, emotional or physical outbursts, sexual disorders and weight loss;
• Other neurological abnormalities in some patients, especially with
more advanced disease and including motor signs such as increased
muscle tone, myoclonus or gait disorder;
16
• Seizures in advanced disease;
• Computed tomography normal for age.
d. Criteria diagnosis of definite Alzheimer’s disease are:
• The clinical criteria for probable AD;
• Histopathological evidence obtained from a biopsy or autopsy.
It is worthwhile to note that a definite diagnosis of AD is only made when
there is histopathological confirmation of the clinical diagnosis. These widely
adopted criteria have been extremely useful and have survived intact without modification for more than a quarter of a century. However, since the
publication of the NINCDS-ADRDA criteria in 1984 the comprehension of the
biological basis of AD has advanced greatly allowing to understand in an unprecedented way the disease process. Distinctive markers of the disease are
now recognized including structural brain changes visible on MRI with early
and extensive involvement of the medial temporal lobe (MTL); molecular neuroimaging changes seen with PET with hypomethabolism or hypoperfusion in
tempoparietal areas, and changes in cerebrospinal fluid (CSF) biomarkers.
2.3
the need for a revision of alzheimer’s disease definition
By 2009, broad consensus existed throughout academia and industry that the
NINCDS-ADRDA criteria should be revised in order to incorporate scientific
advances in the field. As a consequence of the technological advancements and
the improvements in the comprehension of the disease, particularly intense
efforts have been accomplished to characterize the earliest stages of AD [7].
The original NINCDS-ADRDA criteria rested on the notion that AD is a
clinical-pathological entity. The criteria were designed with the expectation
that in most cases subjects who met the clinical criteria would have AD pathology as the underlying etiology if the subject were to be presented for an
autopsy. It was believed that AD, like many other brain diseases, always exhibited a close correspondence between clinical symptoms and the underlying
pathology, such that AD pathology and clinical symptoms were synonymous,
and individuals either had fully developed AD pathology, in which case they
were not demented. However it has become clear by far that this clinicalpathological correspondence is not always consistent. Extensive AD pathology, particularly diffuse amyloid plaques, can be present in the absence of any
obvious symptoms [8, 9].
17
Additionally, AD pathophysiology can manifest itself with clinically atypical
presentations and prominent language and visuospatial disturbances [10, 11].
The 1984 criteria did not account for cognitive impairment that did not reach
the threshold for dementia nor for the fact that AD develops slowly over time,
with dementia representing the end stage. This is why in the very last few years
a debate about the revision of the diagnosis criteria for AD has developed.
2.3.1 Performance issues of the NINCDS-ADRDA criteria
The NINCDS-ADRDA criteria have been validated against neuropathological
gold standards with accuracy ranging from 65% − 96% [12, 13]. However, the
specificity of these diagnostic criteria against other dementias is only 23% −
88% [14, 15]. The accuracy of these estimates is difficult to assess, given that
the neuropathological standard is not the same in all studies. Nevertheless, the
low specificity must be addressed through both revised AD and accurate nonAD dementia diagnostic criteria. This is very important because it can be seen
as the main motivation for the research of alternative supporting diagnosis
features.
2.3.2 Non-AD dementia involvement
Since the publication of the NINCDS-ADRDA criteria, operational definition
and characterization of non-AD dementia has improved. Entities for which
there are diagnostic criteria include the frontotemporal lobar degenerations
(frontotemporal dementia frontal variant, semantic dementia, progressive nonfluent aphasia) [16, 17, 18], corticobasal degeneration [19, 20], posterior cortical
atrophy [21], dementia with Lewy bodies [22] and vascular dementia [23, 24].
Many of these disorders can fulfill the NINCDS-ADRDA criteria and it is likely
that they have been included in AD research studies. Meanwhile, for each of
these disorders, criteria have been developed that aim for high specificity. The
development of disease-specific criteria that are applicable in some cases before
dementia is fully manifested has been enabled the criteria to be used without
going through the two-step process of dementia recognition (the syndrome)
followed by the specific disease (the aetiology).
2.3.3 Histopathological improvements
The histopathological diagnosis of the non-AD dementias has also advanced.
In the example of frontotemporal lobar degeneration, the identification of ubi-
18
quitin immunoreactive cytoplasmic and intranuclear inclusions as an important pathology in patients has reduced the neuropathological diagnostic prevalence of dementia lacking distinctive histopathology from 40% to 10% in autopsy series [25, 26]. There is no doubt that progress in the clinical definition
of non-AD dementia improves the sensitivity of the currently accepted diagnostic criteria for AD by reducing the level of uncertainty. Another important
improvement is yielded by a most detailed knowledge of AD phenotype.
2.3.4 AD phenotype
In most patients (86% − 94%) there is a progressive amnestic core that appears
as an impairment of episodic memory [27, 28]. The pathological pathway of
Alzheimer’s related changes has been fully described [29, 30] and involves the
medial temporal structures (eg, enthorinal cortex, hippocampal formation, parahippocampal gyrus) early in the course of the disease. Moreover, the episodic
memory disorder of AD correlates well with the distribution of neurofibrillary
tangles within the MTL and with MRI volumetric loss of the Hippocampus,
structures known to be critical for episodic memory. The availability of neuroimaging techniques that can reliably measure the MTL have further supported
this vital clinico-neuroanatomic correlation.
2.4
revising the ad definition
The previously described advances provide in vivo evidence of the disease and
therefore the necessity to conceptualize a novel diagnosis framework, with a
particular attention to atypical presentations and early stages [31] as shown in
the following table:
Alzheimer’s
disease
This diagnostic label is restricted to the clinical disorder.
AD diagnosis is established in vivo relying evidence of
both specific memory changes and in vivo markers.
19
Preclinical states There is a long asymptomatic phase between the earliof AD
est pathogenic events/brain lesions of AD and the first
appearance of specific cognitive changes. Two preclinical states can be isolated at present:
• Asymptomatic at-risk-state for AD characterized
by in vivo evidence of amyloidosis in the brain or
in the CSF;
• Presymptomatic AD state applies to individuals
from families affected by rare autosomal AD mutations.
Prodromal AD
AD dementia
Typical AD
Atypical AD
Mixed AD
Alzheimer’s
pathology
20
This term refers to the early symptomatic phase. It is
determined by clinical symptoms, not enough severe to
affect daily living, and biomarker evidence. It is possible that future AD definitions will include this phase.
AD dementia refers to cases in which clinical symptoms
start to affect daily activities.
When an early significant and progressive episodic
memory deficit appears and is followed by other cognitive impairments (as executive dysfunction or language
impairments) clinical phenotype of AD is matched and
the individuals are affected by typical AD.
This situation is referred to less common and well characterized clinical phenotypes as primary progressive
non-fluent aphasia or posterior cortical atrophy.
This term refers to patients who fully fulfill the diagnostic criteria of typical AD and are additionally affected by
other comorbid disorders such as possibly DLB.
Alzheimer’s pathology refers to neurobiological
changes that span the earliest pathogenic events, even
when lacking of clinical manifestation.
Mild
cognitive This term applies to individuals with measurable MCI
impairment
in the absence of a significant effect in daily living. This
(MCI)
label is applied if there is no pathology to which MCI
can be attributed and it remains as a threshold for individuals who are suspected to be affected by AD but
do not fulfill the diagnosis criteria because they deviate
clinical phenotype of prodromal AD or because they are
biomarker negative.
Table 2: The revised lexicon of AD. Particular attention is given to recent advances in
use of reliable biomarkers of AD for definition of early stages of the disease
as the Prodromal AD or ambiguous situations as for the MCI.
A definition of Prodromal AD has been introduced to take into account the
symptomatic predementia phase of AD, generally included in the mild cognitive impairment category which is characterized by symptoms not severe
enough to meet the accepted diagnostic criteria for AD [32]. It must be distinguished within the broad and heterogeneous state of cognitive functioning that falls outside normal aging. This state has been described by a wide
range of nosological terms including age-associated memory impairment, agerelated cognitive decline, age-associated cognitive decline, mild neurocognitive
disorder, cognitively impaired not demented, and mild cognitive impairment
[33, 34, 35, 36]. Mild cognitive impairment (MCI) is the most widely used diagnostic term for the disorder in individuals who have subjective memory or
cognitive symptoms, objective memory or cognitive impairment, and whose
activities of daily living are generally normal. Progression to clinically diagnosable dementia occurs at a higher rate from MCI than from an unimpaired
state, but is clearly not the invariable clinical outcome at follow-up. Therefore
a more refined definition of AD is still needed to reliably identify the diseases
at earliest stages [37].
2.5
the need for early intervention
The rapid growth of knowledge about the potential pathogenic mechanisms
of AD including the amyloidopathy and tauopathy has spawned numerous
experimental therapeutic approaches to enter into clinical trials. There is accruing evidence that, years before the onset of clinical symptoms, there is an
AD process evolving along a predictable pattern of progression in the brain
[29, 30]. The neurobiological advantage of earlier intervention within this cas-
21
cade is clear. Earlier intervention with disease-modifying therapies is likely
to be more effective when there is a lower burden of amyloid and hyperphosphorylated tau and may truncate the ill effects of secondary events due to
inflammatory, oxidation, excitotoxicity and apoptosis [4]. Early intervention
may also be directly targeted against these events because they may play an
important role in early phases of AD. By time there is clear functional disability,
the disease process is significantly advanced and even definitive interventions
are likely to be suboptimal. Revised research criteria would allow diagnosis
when symptoms first appear, before full-blown dementia, thus supporting earlier intervention at the prodromal stage. In this sense a sound definition of
mild cognitive impairment is necessary, nevertheless several issues still stand,
especially with regards of randomized controlled trials.
With only small variations in the inclusion criteria for mild cognitive impairment, four trials (ADCS-MIS, InDDEx, Gal-Int 11 and Rofecoxib) have had a
very wide range of annual rates of progression to AD dementia [38, 39, 40, 41].
The intention in these trials on mild cognitive impairment was to include many
individuals with prodromal AD (i. e., individuals with symptoms not sufficiently severe to meet currently accepted diagnostic criteria) that later progress
to meet these criteria. When the mild cognitive impairment inclusion criteria
of these trials were applied to a cohort of memory clinic patients in an observation study, they had diagnostic sensitivities of 46% − 88% and specificities of
37% − 90% in identifying prodromal AD [42].
Given these numbers, these trials have clearly treated many patients who do
not have AD or are not going to progress AD for a long time. This has diluted
the potential for a significant treatment effect and may have contributed to the
negative outcomes where non of these drugs were successful at delaying the
time to diagnosis of AD. These trials have also incurred significant costs where
sample sizes of 750 to 1000 have been called for with durations of 3 − 4 years.
Increasing the severity of mild cognitive impairment needed for inclusion in
trials might improve sensitivity, specificity and predictive values. However,
participants would then be much closer to the current dementia threshold and
would have a greater pathological burden, making the clinical gain marginal
and disease modification difficult.
Neuropathological findings in mild cognitive impairment have also reinforced the heterogeneity of the clinical disorders subsumed under the definition mild cognitive impairment. To address the recognized clinical and pathological heterogeneity, it has been proposed that subtyping of mild cognitive
impairment might be useful. The term amnestic mild cognitive impairment
has been proposed to include individuals with subjective memory symptoms,
objective memory impairment and with other cognitive domains and activities
22
of daily living generally assessed as being normal. However, only 70% of a
selected cohort of people with amnestic cognitive impairment clinically identified to have progressed to dementia actually met neuropathological criteria
for AD [43]. This finding indicates that applying the criteria for this subtype
of mild cognitive impairment clinically, without other evidence such as neuroimaging or results of cerebrospinal fluid analyses, will lack specificity for
predicting the future development of AD since at least 30% of cases will have
non-AD pathology.
In the planning of trials of disease-modifying treatment, special care will be
needed to limit not only the exposure of potentially toxic therapies to those
with prodromal AD but also to reliably exclude those who are destined to
develop non-AD dementia.
2.6
the ad biomarkers
Over the past two decades since the NINCDS-ADRDA criteria had been published, great progress was made in identifying the AD-associated structural
and molecular changes in the brain and their biochemical footprints. MRI enables detailed visualization of MTL structures implicated in the core diagnostic
feature of AD, PET with fluorodeoxyglucose (FDG) has been approved in the
USA for diagnostic purposes ans is sensitive and specific in detecting AD in
early stages, cerebrospinal fluid biomarkers for detecting the key molecular
pathological features of AD in vivo are available and can be assessed reliably
[44, 45, 46].
According to this, novel frameworks have been proposed for the designation of probable AD [4] even if retaining the old ones. In particular novel
frameworks address the disease presentation that is typical for AD. Atypical
presentations are excluded such as those presenting focal cortical syndromes
(primarily progressive aphasia, visuospatial dysfunction) where ante mortem
diagnosis would at best receive the designation of possible AD from the framework itself. This may change in the future as work on diagnostic biomarkers
advances and reliance on a well characterized clinical phenotype is lessened.
In the absence of completely specific biomarkers, the clinical diagnosis of AD
can still be only probabilistic, even in the case of typical AD. To meet criteria for
probable AD, an effected individual must fulfill a core criterion and at least one
of the following supportive features:
a. The core clinical criterion:
a) gradual and progressive change in memory function at disease onset reported by patients or informants for a period greater than 6 months.
23
The reporting of subjective memory complaints is a common symptom in aging population, however such self-reported symptom is associated with a high risk of future development of AD and therefore
should be carefully taken into account.
b) Objective evidence of significantly impaired episodic memory on testing
A diagnosis of AD requires an objective deficit on memory testing.
This generally consists of recall deficit that does not improve significantly or does not normalize with cueing or recognition testing
and after effective encoding of information has been previously controlled.
c) The episodic memory impairment can be isolated or associated with other
cognitive changes at onset of AD or as AD advances
In most cases, even at the earliest stages of the disease, the memory disorder is associated with other cognitive changes. As AD advances, these changes become notable and can involve the following domains: executive function (conceptualization with impaired
abstract thinking; working memory with decreased digit span or
mental ordering; activation of mental set with decreased verbal fluencies); language (naming difficulties and impaired comprehension);
praxis (impaired imitation, production, recognition of gestures); complex visual processing and gnosis (impaired recognition of objects or
faces).
b. Supportive features
a) Atrophy of medial temporal structures on MRI
Atrophy of the MTL on MRI seems to be common in AD (71% − 96%
depending on disease severity), frequent in mild cognitive impairment (59% − 78%), but less frequent in normal aging (29%) [47, 48].
MTL atrophy is related to the presence of AD neuropathological and
its severity, both in terms of fulfillment of AD neuropathological criteria and Braak stages [44, 49]. MRI measurements of MTL structures include qualitative ratings of the atrophy in the hippocampal
formation or quantitative techniques with tissue segmentation and
digital computation of volume. Both techniques can reliably separate AD group data from normal age-matched control group data,
with sensitivities greater than 85% [50, 51, 52, 53].
In studies of mild cognitive impairment the accuracy of MTL atrophy measures in identifying prodromal AD has been generally lower,
24
possibly because individuals who did not meet currently accepted
AD diagnostic criteria at study completion included some cases that
would have done so at a later time Qualitatively MTL ratings can
identify prodromal AD; however the sensitivities and the specificities, respectively of 51% − 70% and 68% − 69% at present limit their
usefulness. The predictive usefulness of quantitative measures of
hippocampal volume in identifying prodromal AD is inconsistent,
measures of hippocampal subfields might be more useful than measures of the entire structure [54, 55].
In turn there is a potential incremental value of MTL measurements.
In several studies MTL measures (quantitative and qualitative) contributed independently of memory scores to the identification of
prodromal AD. The reported accuracy increased from 74% to 81%
and from 88% to 96% when MTL measures were added to age and
memory scores, respectively. Inclusion of MTL atrophy as a diagnostic criterion of AD, irrespective of the age at onset, mandates
exclusion of other causes of MTL structural abnormality including
bilateral ischaemia, bilateral hippocampal sclerosis, herpes simplex
encephalitis and temporal lobe epilepsy [56, 57, 58, 59].
b) Abnormal cerebrospinal fluid biomarkers
In the NINCDS-ADRDA guidelines, cerebrospinal fluid examination
was recommended as an exclusion procedure for non-AD dementia,
due to inflammatory disease, vasculitis or demyelination. Since then,
there has been a lot of research into the usefulness of AD-specific
biomarkers that are reflective of the central pathogenic processes
of amyloid β aggregation and hyperphosphorylation of τ protein.
These markers have included amyloid β1−42 , total τ and phosphoτ. In AD the concentration of β1−42 in cerebrospinal fluid is low
and that of total τ and phospho-τ are high compared with those in
healthy controls. Combinations of abnormal markers reached sensitivities and specificities greater than 90% and 85% respectively [60].
c) Specific metabolic pattern evidence with molecular neuroimaging methods
PET and single photon emission computed tomography (SPECT) are
in vivo nuclear radioisotopic scans that can measure blood flow, glucose metabolism and, more recently, protein aggregates. Within an
AD diagnostic framework their ideal role is to increase the specificity
of clinical criteria.
25
For instance, a reduction of glucose metabolism as seen in PET in bilateral temporal parietal regions and in the posterior cingulate is
the most commonly described diagnostic criterion for AD. There
are promising techniques that provide visualization of amyloid and
potentially neurofibrillary tangles. Furthermore, these visualization
techniques clearly have the potential of increasing the usefulness of
PET in AD within the diagnostic framework, but their diagnostic
accuracy, in particular their specificity for AD, requires further investigation as there is evidence of high AD-like behavior in some
healthy people and some people with mild cognitive impairment
[61, 62, 63].
Because SPECT is more widely available and cheaper than PET, it
has received much attention as an alternative technique. However,
at present, the technique has not yet reached sound accuracy levels.
d) Familial genetic mutations
Three autosomal dominant mutations that cause AD have been identified on chromosomes 21, 14 and 1. The presence of a proband with
genetic-testing evidence of one of these mutations can be considered
as strongly supportive for the diagnosis of AD for affected individuals within the immediate family who did not themselves have a
genetic test for this mutation. If individuals with a positive mutation history of the described type present with the core amnestic
criterion (A), they should be considered for probable AD.
From the previous overview it is clearly underlined how quantitative measurements can have huge impact especially with supportive features. In particular the atrophy of medial temporal structures and especially of the Hippocampus can be assessed with structural imaging measures. In this way the
role of the hippocampal segmentation can be decisive, even if the challenge is
not trivial especially because of its complex anatomy.
2.7
the hippocampus anatomy
The hippocampal anatomy description is a difficult task for two distinct problems:
a. The complexity of the Hippocampus itself, which makes it one of the
most mysterious structures of the central nervous system;
26
b. A great confusion has affected its terminology since the first studies appeared almost a hundred years ago.
The very first description of the Hippocampus and the word itself were coined
in 1587 by the Italian anatomist Julius Caesar Arantius [64] who compared the
protrusion on the floor of the temporal horn to a Hippocampus or sea horse.
Several terminologies are available, but the views of Lewis [65] have been used
in this thesis. After almost a century of confusion, the terminology that is
currently in use needs to be clarified. The name Hippocampus is referred to
the entire ventricular protrusion which comprises two cortical laminae rolled
up one inside the other: the cornu Ammonis and the textitgyrus dentatus. The
subiculum is sometimes considered as a part of the main structure. The general
situation of the Hippocampus in relation to the hemisphere is shown in the
high resolution Fig. 1 representing a 3-D view of the International Consortium
for Brain Mapping (ICBM) ICBM152 template 1 .
Fig. 1: The figure shows a T1 high resolution brain template from the International
Consortium for Brain Mapping. The primary goal of the ICBM project is the
development of a probabilistic reference system for the human brain.
1 http://www.loni.ucla.edu/ICBM/
27
The Hippocampus is prolonged by the subiculum which forms part of the
parahippocampal gyrus and the amygdala; both these gray matter structures
surrounding the Hippocampus make definitely the precise segmentation of
the Hippocampus itself an awkward problem.
The Hippocampus forms an arc whose shape is enlarged in the anterior
extremity then narrowing like a comma (Fig. 2).
Fig. 2: The figure shows the intraventricular aspect of the Hippocampus.
1, hippocampal body; 2, hippocampal head and its digitations; 3, hippocampal
tail; 4, fimbria; 5, crus of the fornix; 6, subiculum; 7, splenium of the corpus
callosum; 8, calcar avis; 9, collateral trigone; 10, collateral eminence; 11, uncal
recess of the temporal horn. Image scale of 1 cm is represented in the right
lower corner.
It can be divided in three parts:
a. an anterior part, the head;
b. a middle part, the body;
28
c. a posterior part, the tail.
The hippocampal length is of about 4.5 cm, with the head being wide on average 1.5 − 2.0 cm. The mean hippocampal volume is of about 3300 mm3 , no
particular differences exist between the right and the left Hippocampi even if
recent studies show the right ones being slightly greater than the left and male
size slightly greater than the female one [66, 67].
2.7.1 The hippocampal head
The head is the anterior part of the arc of the Hippocampus, it consists of an
intraventricular and an extraventricular part. The intraventricular part features
the digitationes Hippocampi on the intraventricular side, when they appear at
the junction of the body and the head the fimbria gives way to a thick alveus
that covers them. The digitations and the amygdala are often joined together
with the intraventricular surface of the amygdala overlapping almost the entire
surface of the hippocampal head Fig. 3.
Fig. 3: The figure shows a T1 sagittal view of the ICBM 152 template; the Hippocampus and the adjacent amygdala boundaries were manually pointed out.
29
The extraventricular part mainly consists of an inferior surface, visible only
after ablation of the parahippocampal gyrus, divided into the band of Giacomini,
the external digitations and the inferior surface of the uncal apex.
2.7.2
The hippocampal body
As for the head, two views can be considered, the intraventricular and the
extraventricular descriptions. The intraventricular part of the body is an element of the floor in the lateral ventricular (temporal or inferior horn). It is
a strongly convex protrusion, smooth and padded with ependyma covering
the alveus. Numerous subependymal veins radiate on its surface. The body
is surrounded medially by the fimbria and laterally by the narrow collateral
eminence. The roof of the temporal horn overhangs the intraventricular part,
it is composed of the temporal stem, the tail of the caudate nucleus and the stria
terminalis. The extraventricular part is visible on the medial surface of the temporal lobe; it is limited by the gyrus dentatus, the fimbria and the superficial
hippocampal sulcus. The fimbria is a narrow white stripe that hides partially
the margo denticulatus which is the superficial part of the gyrus dentatus. The
margo denticulatus is bordered inferiorly by the superficial hippocampal sulcus
which separates it from the adjacent subiculum.
2.7.3
The hippocampal tail
The tail is the posterior part of the Hippocampus and as the head or the body
its structure can be divided in intraventricular and extraventricular parts. The
intraventricular part reminds the head in shape but it as a smaller size. Although digitations do not appear on the surface of the tail, its internal structures is similar to that of the head as it is composed mainly of a vast layer
of the cornu Ammonis. The intraventricular surface of the tail is thickly covered by the alveus and the subependymal veins. It is medially flanked by the
fimbria and laterally by the collateral trigone; the flat surface of the collateral
trigone and the intraventricular part of the tail form together the floor of the
atrium (Fig. 2). The extraventricular part of the tail may be divided into an
initial segment which is a continuation of the body, a middle segment and a
terminal one. The initial segment resembles the body, the margo denticulatus
of the tail is divided into two dentes which successively decrease in size. In
the middle segment some important changes appear, the most important is
that concerning the fimbria which in the initial segment hides the margo denticulatus while successively separates from it ascending to join the crus of fornix.
30
The main parts of the middle segment of the hippocampal tail are: the gyrus
fasciolaris, the fasciola cinerea and the gyri of Andreas Retzius. The last segment of the tail covers the inferior splenial surface and alone merits the name
subsplenial gyrus.
2.7.4 General features
As a summary of the properties encountered so far, some points should be
outlined:
a. The Hippocampus has throughout its entire structure the same composition. The cornu Ammonis has in fact an analogous hierarchical organization in the head, the body and the tail; the same can be said for the gyrus
dentatus even if from a strictly terminological point of view this could be
seem untrue. In fact, the visible segment of the gyrus dentatus is known
as margo denticulatus in the body, band of Giacomini in the uncus and
fasciola cinerea in the tail. However, it is the same structure and the same
term could be used for its whole length.
b. Because of its arched shape, the coronal view of the body, the sagittal
views of the tail and of the head have similar appearance.
c. Also because of the hippocampal curve, coronal sections are often difficult to interpret, nevertheless they are the most used to trace the hippocampal boundaries especially since 3D-software have proven themselves to be a valid help for the work of the expert neuroradiologists.
This summary points out the shape complexity of the Hippocampus suggesting thus which difficulties expert neuroradiologists must face when dealing
with its segmentation. However it should also be noted that, for automated
or semi-automated segmentation tools, the primary difficulty arises from the
need to discriminate neighbor structures and especially the amygdala which
shares the same material composition.
Finally, multiple studies have explored the relationship between MRI-based
volumetric measurements. Considerable variability exists with regard to the
reported volumetric values of the Hippocampus; results showed that it is an
asymmetrical structure, with larger right hippocampal volumes (p = 0.001) and
that differences in MRI magnet field strength and slice thickness values might
differentially contribute to volumetric asymmetry estimates [68]. Besides, right
Hippocampi seem to show a higher variability in terms of shape, making them
more difficult to segment [69]. Hippocampal volume asymmetry is associated
31
with dementia, may be increased in mild cognitive impairment and correlates
with cognitive performances [70, 71, 72]. However, remarkable systematic errors arise from the right/left visual bias that may cause estimated volumes
to depend on the orientation of the images presented to a human rater. The
adoption of more and more refined manual labeling protocols can afford these
systematic errors, but their existence still remains confirmed [73].
32
3
MAGNETIC RESONANCE IMAGING
Structural magnetic resonance imaging is based on the well known resonance effect
discovered by Purcell and Bloch. Latest developments have increased signal to noise ratios to further levels, especially with ultra high field scanners. Therefore nowadays the
use of structural imaging guarantees higher accuracy and robustness for segmentation
then ever before.
3.1
the magnetic resonance effect
Magnetic resonance is based upon the interaction between an applied magnetic field and a nucleus that possesses spin. Nuclear spin or, more precisely,
nuclear spin angular momentum, is one of several intrinsic properties of an
atom and its value depends on the precise atomic composition. Every element
in the Periodic Table except argon and cerium has at least one naturally occurring isotope that possesses spin. Thus, in principle, nearly every element
can be examined using MR, and the basic ideas of resonance absorption and
relaxation are common to all of these elements. The precise details will vary
from nucleus to nucleus and from system to system.
The concept of nuclear magnetic resonance had its underpinnings with the
discovery of the spin nature of the proton. Leaning on the works of the early
1920’s [74] and the developments yielded since late 1930’s [75], in 1946 Bloch
and Purcell extended these early quantum mechanical concepts to a measurement of an effect of the precession of the spins around a magnetic field. Not
only did they succeeded in the measure of a precessional signal from a water
sample [76] and a paraffin sample [77] , respectively. Besides, they explained
precociously many of the experimental and theoretical details that we continue
33
to draw from still today. For this work they shared the Nobel prize in physics
in 1952.
The basic elements of MRI can be summarized as follows [78]:
a. Fundamental interaction between the proton spin and a magnetic field:
how nuclei react to a magnetic field;
b. Equilibrium alignment of spin: how magnetization and relaxation are
coded;
c. Magnetization detection: signal acquisition and retrieval;
d. Imaging.
In the following a detailed description is given, starting from the description
of the fundamental interaction the resonance phenomen arises from: the spinmagnetic field coupling.
3.2
spin-magnetic field coupling
Indeed much of MRI theory can be explained through classical analogies. In
particular, looking at the interactions between a current loop and a magnetic
~ with current I in a region
field; the d~F force experienced by a current loop dl
~
with a magnetic field B is given by the Lorentz force:
~
d~F = Id~l ∧ B
(1)
The loop can be rotated if a torque d~τ is generated by the forces according to:
d~τ = ~r ∧ d~F
(2)
that in this case yields a straightforward definition for the magnetic moment
~µ:
τ
= r sinθ IlB
= IΣ
=
⇒
(3)
~µ = IΣûn
where ûn represents the versor perpendicular to the loops. As a consequence
the torque ~τ can be rewritten in terms of magnetic moment ~µ and magnetic
~
field B:
~
~τ = ~µ ∧ B
34
(4)
The equation (4) is exact for constant fields and also very accurate for small
loops in a non uniform field if the loop principal dimension, say D, is much
less than typical distances over which the field changes. (For example, |∆B| '
|∂B/∂x| D << |B|). Corrections would be needed for higher moments, such as
electric quadrupole moments, but that is not the case of the proton, for whom
higher moments are null.
The net effect of the torque is on one hand to realign the magnetic moment ~µ
~ as (4) states; the same conclusion can be achieved in order to minimize
to the B
the potential energy:
τ=−
dU
dθ
(5)
On the other hand, there is another fundamental effect that must be taken
into account. The behavior that have been described so far can be analyzed
from a quantum mechanics point of view, especially in term of the analogy
between the classical angular momentum and the spin. The direct relationship
between the magnetic moment and the spin is found by observing that the
angular momentum ~L and the torque ~τ are related by:
d~L
= ~τ =
⇒ ~µ ∝ ~L
dt
(6)
so that:
~µ = γ ~L
(7)
where γ is the gyromagnetic ratio and depends on the particle or nucleus. For
the proton it is found to be
γ = 2.675 × 102 MHz/T
(8)
or, it is often used the γ̄:
γ̄ =
γ
= 42.58 MHz/T
2π
(9)
where T is the Tesla unit of magnetic field and is equal to 10000 Gauss. If we
consider a circulating charged particle moving with respect to the center with
velocity v, mass m and charge q at a distance r the resulting angular moment
is L = rmv (in the perpendicular situation) so that the magnetic moment is:
~ =
~µ = IA
qv
q ~
πr2 ~un =
L
2πr
2m
(10)
35
and therefore:
γ =
q
2m
(11)
The equation still holds with respect of the analogy between classical and quantum mechanics. In particular for an electron being h the quantum unit of
angular momentum, the following relation defines the Bohr magneton:
µB =
eh
= 9.27 × 10−24 A m2
2me
(12)
A similar relation holds for every particle, proton included, in which case:
µP =
eh
= 5.05 × 10−27 A m2
2mp
(13)
From the comparison of equations (11) (12) and (13) one can evaluate the difference between the electron and the proton gyromagnetic ratio:
γe / γP = 658
(14)
The difference between the observed value and the measured mass ratio (which
is equal to 1836) is due to the difference in the structure of the two particles.
This difference is the principal reason because proton are used in magnetic resonance imaging instead of electrons. For nuclei the situation is pretty much the
same. The intrinsic angular momentum has to be non zero, and therefore no
"even-even" nucleus can be used for magnetic resonance; for "odd-odd" nuclei
only the unpaired neutron and proton contribute to nuclear spin, however in
general γ remains within an order of magnitude or so of that for the proton.
In conclusion the hydrogen nucleus remains the most useful both for energetic
considerations due to its gyromagnetic ratio and both for its high concentration
in the human body. The combination of
~
~τ = ~µ ∧ B
(15)
~µ = γ ~L
(16)
and
completely determines the motion of the magnetic moment ~µ.
36
3.3
the motion equation
The fundamental equation for describing the phenomena involved by the coupling between spins and a magnetic field can be derived from equations (4)
and (7). According to these:
d~µ
~
= γ~µ ∧ B
dt
(17)
It is a simple version of the Bloch equation which will be discussed in the
following, in that important corrections will arise from the interactions among
spins and their surroundings, however it contains the theoretical core upon
which the resonance phenomena are based.
~ field is readily solved. Among the several
The equation (17) for a static B
approaches that can be used, let us keep the classical analogy. There is, in
fact, a direct correspondence between the equations for a magnetic moment
immersed in a vertical constant magnetic field and a spinning top in a constant
vertical gravitational field Fig. 4.
~ and mass m precessing
Fig. 4: A symmetrical spinning top with angular velocity Ω
in a constant gravitational field and in correspondence the precession of the
angular momentum ~J. The angular momentum increment d~J involves a counterclockwise precession.
Let us consider a magnetic field parallel to the z axis, let the magnetic moment ~µ be parallel to the ~rcm of Fig. 4; the differential change in the moment
d~µ in time dt pushes ~µ on a clockwise precession as shown in Fig. 5.
37
~ direction.
Fig. 5: Clockwise precession of a spin around the magnetic field B
~
If φ is the angle between ~µ and B:
|d~µ| = µ sinφ |dθ|
(18)
and
~
|d~µ| = γ|~µ ∧ B|dt
= γµB sinφ dt
(19)
therefore by comparing these two relations :
dθ ~
µsinφ |dθ| = γ|~µ ∧ B|dt =
⇒ ω ≡ dt
(20)
the fundamental formula defining the Larmor frequency ω can be obtained,
even if in fact ω is an angular velocity in MRI terminology the frequency is
preferred:
ω = γB
38
(21)
Equation (21) defines the angular velocity of the magnetic moment ~µ preces~ parallel
sion, whose components motion is described by equation (17). For B
to the z-axis:

dµx


 dt = γµy B0 = ω0 µy








dµy
= −γµx B0 = −ω0 µx

dt









 dµz = 0
dt
(22)
By taking the second derivatives, the well-known armonic oscillator equations are retrieved:
 2
d µx


= −ω20 µx


 dt2


2


 d µy = −ω2 µ
0 y
dt2
(23)
The system solutions are finally:
µx (t) = µx (0)cosω0 t + µy (0)sinω0 t
µy (t) = µy (0)cosω0 t + µx (0)sinω0 t
(24)
µz (t) = µz (0)
The fact that the motion of the spin in a constant magnetic field can be
described in the plane orthogonal to the magnetic field suggests that a complex
2-dimensional representation could be useful. In order to describe the rotations
in a lower dimensional representation let us introduce:
µ+ (t) = µx (t) + iµy (t)
(25)
which yields:
dµ+
= −iω0 µ+ =
⇒ µ+ (t) = µ+ (0)e−iω0 t
dt
(26)
39
While the amplitude remains constant, the phase varies over time. The staticfield solution for the phase is therefore:
φ0 (t) = −ω0 t + φ0 (0)
(27)
For a static field in conclusion the interaction of a classical magnetic moment
with an external magnetic field is equivalent to an instantaneous rotation of
the moment about the field itself.
3.3.1 The Resonance
The act of turning on a magnetic field for some time as a result make the spins
to align to its direction; an additional field perpendicular to the first one on
the contrary tips the spins away from that direction. Such rotations leave the
magnetic moment precessing around the original magnetic field at the Larmor
frequency ω.
Let us consider a reference frame precessing with around the z direction
with angular velocity Ω, the magnetic moments are at rest. In the laboratory
inertial frame the total derivative of time of ~µ is:
d~µ
~ ∧ ~µ
= Ω
(28)
dt
Therefore this relation between the rotating (primed) frame and the inertial
one stands:
0
d~µ
d~µ
~ ∧ mu
=
+ Ω
~
(29)
dt
dt
On the other hand by comparing equation (17):
~ =
γ~µ ∧ B
d~µ
dt
0
~ ∧ mu
+ Ω
~
=
⇒
0
d~µ
~ −Ω
~ ∧ ~µ
= γ~µ ∧ B
dt
~ eff
= γ~µ ∧ B
(30)
where
~
~ eff = B
~ + Ω
B
γ
40
(31)
is the effective magnetic field in the rotating frame. In the rotational frame it
~ = −Ω /γ then ~µ 0 is constant. Therefore
is straightforward to prove that if B
in the inertial laboratory frame ~µ rotates at a fixed angle with respect to the
~ As a consequence if B
~ 1 is a
z direction and with fixed angular velocity Ω.
radiofrequency magnetic field (called radiofrequency rf pulse) added to tip the
proton spin from the z axis (with only transverse components, see equation
~ 0 is the static z oriented magnetic
(30)) corresponding to a frequency ω1 and B
field with Larmor frequency ω
~ 0:
~ eff = [ẑ 0 (ω0 − ω) + x̂ 0 ω1 ]/γ
B
(32)
with ω being the frequency of a general rotating laboratory frame and x̂ 0
given by x̂ cosωt − ŷ 0 sinωt is an arbitrary rotating axis perpendicular to
the z direction. The important result here is that in the rotating frame whose
~ 1 field is maximally
frequency matches Larmor frequency (resonance condition) B
synchronized to tip the spin around its axis
d~µ
dt
0
= ω1~µ ∧ x̂ 0
(33)
~ 1 field, the radiofrequency pulse, is applied for a time ∆t it causes a
If the B
rotation flip angle ∆φ:
∆φ = γB1 ∆t
(34)
It is important to note that B1 is the full rotating amplitude available for
spin flipping. The more fundamental quantum picture allows only two spin
states for any measurements along the static direction, however the use of
a classical picture of a spin precessing at some angle is still appropriate in
magnetic resonance because it can be shown that a two-state mixture of parallel
and anti-parallel spins results in a continuous spectrum for the polarization
angle φ.
3.4
magnetization and relaxation
The interactions between the proton spin and the neighboring atoms has been
neglected, however they lead to important modifications in the global behavior.
To take into account the behavior of a volume element voxel let us consider
a volume V:
41
• containing a large number of protons;
• over which external fields can be considered constant
• where the set of spins defines an ensemble of spins with the same phase.
The magnetization is:
~ = 1
M
V
X
~µi
(35)
protons
and for non interacting protons it satisfies an analogous equation of (17):
~
dM
~ ∧B
~
= γM
dt
(36)
and therefore:
dMz
=0
dt
~⊥
M
~⊥ ∧ B
~
= γM
dt
(37)
(38)
These equations neglect the important fact that protons naturally tend to
align with the external field through an exchange of energy (5). Let us assume
that for the effect of radiofrequency pulse field magnetization is flipped from
the static z axis. By returning to its original configuration two main effects
show up:
a. the magnetization z component reaches its equilibrium value M0 ;
b. the transverse components must be vanishing.
The main effect is therefore the presence of a relaxation phenomenon which
can be measured in terms of relaxation times to quantify the interaction.
3.4.1 Relaxation times
The equation which models the behavior described in the previous section is:
dMz
1
=
(M0 − Mz )
dt
T1
42
(39)
and its solution is:
Mz (t) = Mz (0)et/T1 + M0 (1 − e−t/T1 )
(40)
The time T1 is called spin-lattice relaxation time. It ranges from tens to thousands of milliseconds for protons in human being tissues (with external fields
∼ 10−2 T ), as shown in table 3:
Tissue
T1 (ms) T2 (ms)
gray matter (GM)
950
100
white matter (WM)
600
80
muscle
900
50
cerebrospinal fluid (CSF)
4500
2200
fat
250
60
arterial blood
1200
200
venous blood
1200
100
Table 3: The relaxation times T1 and T2 for different types of human tissues.
The time T2 of table 3 is the spin-spin relaxation time and takes into account
the vanishing of transverse magnetization components. Since spins experience
local field variations as a result they lost coherence in the xy plane, a process
called dephasing, this causing a loss of transverse magnetization. The equation
which models this effect is:
~⊥
dM
~⊥ ∧ B
~− 1M
~⊥
= γM
dt
T2
(41)
~ ⊥ which can be
The additional terms involves an exponential decay for M
straightforwardly calculated in the rotating reference frame where for modulus:
dM⊥ 0
1
= −
dt
T2
(42)
whose solution is:
M⊥ (t) 0 = M⊥ (0) 0 e(−t/T2 )
(43)
43
In practice, there is an additional dephasing introduced by external field inhomogeneities which further reduce transverse magnetization. For this reason
it is usual to refer to T2 and T2∗ . The latter being the sum of the two effects:
the spin-spin interaction induced dephasing T2 and the external field induced
dephasing T20 . It is important to keep in mind that while T20 is recoverable which
means that it is able to dominate the entire effect allowing then to recover initial phase relationship, intrinsic time T2 is not, being these losses in transverse
magnetization completely random.
3.5
the bloch equation
The differential equations (39) and (41) can be combined into one vector differential equation, the Bloch Equation:
~
dM
~ ∧ B
~ + 1 (M0 − Mz )ẑ − 1 M
~⊥
= γM
dt
T1
T2
(44)
~ = B0 ẑ the component equations become:
If B

dMz
M0 − Mz


=


dt
T1







 dM
Mx
x
= ω0 My −

dt
T2









dMy
My


= −ω0 Mx −
dt
T2
(45)
whose solutions are:



Mz (t) = Mz (0)e−t/T1 + M0 (1 − e−t/T1 )








Mx (t) = e−t/T2 (Mx (0)cosω0 t + My (0)sinω0 t)









M (t) = e−t/T2 (M (0)cosω t − M (0)sinω t)
y
y
x
0
0
For t → ∞ all exponentials vanish and the solution is:
44
(46)
Mz (∞) = M0
(47)
Mx (∞) = My (∞) = 0
(48)
If the external field is (as it is in real cases) the sum of an external static field
~ 1 the Bloch equation
B0 ẑ and a much smaller radiofrequency orthogonal field B
~ 1 is at rest. The effective field in this
can be solved in a rotating frame where B
frame is:
~ eff
B
ω
= B0 − ẑ + B1 x̂ 0
γ
(49)
and the component equations are:


M0 − Mz
dMz 0


= −ω1 My 0 +


dt
T1









dMx 0 0
Mx 0
= ∆ωMy 0 −

dt
T2









 dMy 0 0
Mx 0


= ∆ωMx 0 + ω1 Mz −

dt
T2
(50)
with ∆ ≡ ω0 − ω where ω1 is the spin precession frequency due to the rf
pulse, ω is the rotating frame frequency, as usual it is in general different from
Larmor frequency, and ω0 is the Larmor frequency induced by the static field.
In the resonance condition ∆ω should be neglected, in fact it is usual to keep
it in the equations to take into account deviations from ideal situations such as
static field impurities or variations.
These equations model the already described behavior according to whom
~ 1 direction of transverse components is suon-resonance precession around B
perimposed on the relaxation decay.
3.6
signal acquisition
The signal detection from magnetic resonance is based on Faraday’s law of
electromagnetic induction:
= −
dΦ
dt
(51)
45
where Φ is the flux of the magnetic field through a coil and the inducted
electromotive force (emf). The equation (51) is generally referred to an emf
generated by variations of the magnetic flux, however it can be converted into
a form more useful for MRI where roles are reversed.
~ r, t) induces a density of current ~Jm (~r, t):
The magnetization M(~
~Jm (~r, t) = ∇
~ r, t)
~ × M(~
(52)
~ and therefore of a magnetic field
which is the source of a potential vector A
~
B:
~ r, t) = µ0
A(~
4π
~ = ∇
~
~ × A
B
Z
d3 r 0
~J(~r, t)
|~r − ~r 0 |
(53)
(54)
and therefore using Stokes theorem:
Z
~ · d~S
~ × A)
(∇
Φ=
IS
~
d~l · A
=
l
I
!
Z
0 × M(~
0)
~
~
µ
∇
r
0
d3 r 0
= d~l ·
4π
|~r − ~r 0 |
l
V
Z
I
µ0
1
3 0
0
0
~ r)
~ ×
=
d r
d~l ·
−∇
× M(~
4π V
|~r − ~r 0 |
l
"
!#
Z
I
~
µ0
dl
~ r 0) · ∇
~0 ×
=
d3 r 0 M(~
4π V
r − ~r 0 |
l |~
Z
~ 0
~ r 0 ) · (∇
~ 0 × A(~r ))
=
d3 r 0 M(~
I
V
⇒
Z
~ coil (~r 0 )
~ r 0) · B
Φ=
d3 r 0 M(~
(55)
V
The latter equation shows explicitly the Principle of Reciprocity:
~ 0
~ coil (~r 0 ) = B(~r )
B
I
46
(56)
As a consequence the flux through a coil induced by a magnetization source
~ r, t) is the same that would be produced by the magnetic field per unit
M(~
~ coil (~r 0 ) associated with the coil. Mathematically it is worthwhile to
current B
note that the original integral over the coil surface is in the end substituted by
a volumetric integral over the magnetized sample region.
The emf deduced by equation (55) is:
d
d
= − Φ(t) = −
dt
dt
Z
~ coil (~r 0 )
~ r 0) · B
d3 rM(~
(57)
V
Since the signal is assumed to be detected in presence of a static B0 ẑ field
and a rf pulse, the signal detected depends on the magnetization components
and on their relaxation times. In particular using notations of equation (25) for
longitudinal and transverse magnetization:
Mz (~r, t) = e−t/T1 (~r) Mz (~r, 0) + (1 − e−t/T1 (~r) )M0
(58)
M+ (~r, t) = e−t/T2 (~r) e−iω0 t M+ (~r, 0)
= et/T2 (~r) e−iω0 t+iφ(~r) M⊥ (~r, 0)
(59)
The phase φ0 and the amplitude M⊥ (~r, 0) are determined by the initial rf
pulse conditions. The Larmor frequency ω0 with a static field at the Tesla level
for protons is at least of four orders of magnitude larger than typical values
1/T1 and 1/T2 therefore the exponentials containing the relaxation times can be
neglected in the derivative. By considering the more general case in which the
presence of field inhomogeneities or deviations from ideal case must be taken
into account with the replacement T2 → T2∗ , the signal s is:
Z
h
∗
d3 r e−t/T2 (~r) M⊥ (~r, 0) Bcoil
r)sin(ω0 t − φ0 (~r)) +
x (~
i
Bcoil
(~
r)cos(ω
t
−
φ
(~
r))
0
0
y
s ∼ ω0
(60)
A simplified version of the equation (60) can be obtained through the introduction of:
Bx coil ≡ B⊥ cosφB
By coil ≡ B⊥ sinφB
(61)
47
finally the signal is:
Z
s ∼ ω0
∗
d3 r e−t/T2 (~r) M⊥ (~r, 0)B⊥ (~r)sin(ω0 t + θB~r − θ0 (~r))
(62)
The equation (62) has general validity, in fact the only correction which has
been neglected depends on the presence of possible time-independent (or timeaveraged) variations in the z direction, in that case ω0 would not be constant
but an ω(~r) should be included.
3.7
image noise and contrast
3.7.1 Signal to noise ratio and spatial resolution
All physical measurements involve either random or systematic noise. The
quantitative measure for the noise affecting a signal s is given by the signal-tonoise ratio (SNR).
SNR =
s
σ
(63)
where σ is an estimate of the noise affecting the signal. In MRI the goal is
to localize as accurately as possible the signal and to be able to discriminate
among the tissue types.
As we have seen in equation (60) the signal is directly proportional to the
magnetization and therefore (but not only) to the spin density of the sample.
In a 3D image the voxel intensity summarize this information in a gray scale,
or a probability map. It is important that in every voxel an adequate SNR is
achieved. The principal noise to take into account is the thermal gaussian noise
causing uncertainties in the coil and in the sample:
σthermal ∼ 4kT Rβw
(64)
where R is the effective resistance of the coil, the body and the electrons and
βw is the bandwidth of the noise-voltage detecting system. Of course this kind
of noise can be reduced by averaging over several acquisitions:
p
∆x ∆y ∆z Nacq
s
SNR/voxel ∼
βw
N x Ny Nz
48
(65)
being Ni the sampled point in the image space, ∆i window dimensions for
signal reconstruction and Nacq the number of acquisitions, more in general the
field of view (FOV) Li = ∆i · Ni is introduced. As a consequence improving
resolution (∆x → 1/2∆x) lowers the SNR. On the contrary lowering resolution
could be tolerated because this would result in several benefits:
• artifacts reduction;
• overcoming field inhomogeneities;
• reducing relaxation effects (which weaken the signal).
In clinical applications the patient comfort is important, this is why it is
preferred lowering the window sizes and consequently having images with
less resolution but higher SNR and especially lower acquisition time (which is
linearly dependent on the window sizes).
3.7.2 Contrast: Proton density, T1 and T2 weighting
Even the highest SNR is not a guarantee for the image to be useful. It is
important in fact for clinical purposes to be able to distinguish different tissues.
The problem of distinguishing different signals in presence of noise falls under
the broad category of signal detection problems. To this aim a fundamental
concept is that of contrast-to-noise ratio (CNR). In the case of MRI, if two tissues
correspond to signals sA and sB their contrast C is by definition:
C ≡ sA − sB
(66)
and therefore the (CNR) is:
CNR = CAB = SNRA − SNRB
(67)
The most basic contrast mechanisms in MRI derive from three different physical parameters:
a. the spin density ρ0 ;
b. the T1 relaxation time;
c. the T2 relaxation time.
49
The physical law describing this behavior is:
CAB = ρ0,A (1 − e−TR /T1A )e
−TE /T2∗
A
− ρ0,B (1 − e−TR /T1B )e
−TE /T2∗
B
(68)
where TE is the echo time which is by definition the time spins need to return to the initial phase after a rf pulse; TR is the repetition time: it determines
the number of acquisitions for averaging the signal and for the actual purpose
more importantly it determines the amount of regrowth for longitudinal magnetization.
Proton Density imaging is based on spin density. In this case TE and TR must
be minimized, appropriate choices are:
∗
⇒ eTE /T2 → 1
TE T2∗A,B =
(69)
TR T1A,B =
⇒ eTR /T1 → 0
(70)
=
⇒
CAB = (ρ0,A − ρ0,B ) − ρ0,A (e−TR /T1A +
TE
)
T2∗A
+ O(TE ) + O(TR )
' ρ0,A − ρ0,B
In this approximation the contrast does not depend on TR or TE , this gives
a general rule for spin density weighting, all it is needed is to keep TR much
longer than T1 and TE shorter than the shortest value of T2∗A,B .
Normal soft tissue T1 values are quite different from one another. For this
reason T1 -weighted images are a powerful method to individuate different tissues. As seen with (69) a short TE minimizes the T2∗ effects. As a consequence
the contrast CA B is:
CA B = SA (TE ) − SB (TE )
' ρ0,A (1 − e−TR /T1,A ) − ρ0,B (1 − e−TR /T1,B )
= (ρ0,A − ρ0,B ) − (ρ0,A e−TR /T1,A − ρ0,B e−TR /T1,B )
50
(71)
When several tissues are present it is useful to adopt two different values for
TR . The optimal value can be obtained graphically by plotting the expression
for CA B as a function of TR . As an example the contrast for gray matter versus
white matter and cerebrospinal fluid is shown respectively in Fig. 6 and Fig. 7.
0.18
0.16
0.14
contrast
0.12
0.1
0.08
0.06
0.04
0.02
0
−0.02
0
500
1000
1500
2000
2500
3000
3500
4000
repetion time (ms)
Fig. 6: The figure shows the contrast as a function of the relaxation time for gray
matter and white matter.
0.3
contrast
0.2
0.1
0
−0.1
−0.2
0
2000
4000
6000
8000
10000
12000
14000
16000
repetition time (ms)
Fig. 7: This figure shows the contrast as a function of the relaxation time for gray
matter and cerebrospinal fluid; it is important how contrast changes according
to the examined tissues.
51
The last contrast mechanism relies on transverse decay. In the T2 -weighting
case to avoid T1 contribution TR is chosen according to (70), therefore:
∗
∗
CA B = ρ0,A e−TE /T2,A − ρ0,B e−TE /T2,B
(72)
As in T1 case, a finer tuning of signal acquisition is can be achieved by combining two different values for TE , the different results for contrast are shown
in Fig. 8 and Fig. 9.
0.15
contrast
0.1
0.05
0
0
100
200
300
400
500
echo time (ms)
Fig. 8: The figure shows the contrast as a function of the echo time for gray matter
and white matter.
The general appearance of the three weighting types is shown in figure 10:
It is worthwhile to note that proton density images are in general interpreted
as a count of the proton number in tissues, this is false for CSF where T1 is
larger than TR (4.5s vs 2.5s); T1 and T2 weighted appear to have reversed intensities, in particular CSF is brighter in T2 images because of its long transverse
relaxation time.
In conclusion let us stress how magnetic resonance scanners with high (3T)
and ultra-high (7T) magnetic fields have recently become widespread. The
introduction of such high field scanners has allowed to increase the image
signal-to-noise ratio and to extend the boundaries of spatial resolution and
sensitivity, thus improving image processing algorithms especially dedicated
to structural anlayses.
52
0.8
0.7
contrast
0.6
0.5
0.4
0.3
0.2
0.1
0
0
50
100
150
200
250
300
350
400
450
echo time (ms)
Fig. 9: This figure shows the contrast as a function of the echo time for gray matter
and cerebrospinal fluid.
Fig. 10: The figure shows a comparison among (in clockwise order): a Proton density,
a T1 -weighted and a T2 -weighted brain scan from ICBM.
3.8
segmentation algorithms: state of the art
The accurate and robust segmentation of anatomical structures is an essential
step in quantitative brain magnetic resonance imaging analysis. Many clinical
53
applications rely on the segmentation of MRI scans which allows to describe
how brain anatomy changes over time, by aging or disease. However, manual
labeling by clinical experts is a laborious time consuming task and, above all,
it is subject to inter and intra rater variability. That is why automatic techniques are desirable to enable a routine analysis of brain MRIs in clinical use,
especially for the hippocampal segmentation.
Despite the large number of existing techniques and their combinations as
probabilistic and multi-atlas segmentations [79, 80, 81], graph cuts [82, 83], label fusion [84], voxel-based classifications [85, 86] and warping patch-based
methods [87, 88], it still remains a challenging task to develop a fast and accurate fully automated segmentation method.
Among them, atlas-based methods have been shown to outperform other
state-of-the-art algorithms in terms of similarity index k (k > 0.88) and other
error metrics [88, 89, 83]. These methods rely on the registration of the target
image with one or several templates, the segmentation is achieved by warping
these templates onto the target and then finally labeling voxel belonging or not
to the Hippocampus according to different possible strategies as label fusion,
voxel classification, similarity measures, etc. Nevertheless, segmentation errors
produced by atlas-based methods have been proved to be both random and
systematic [81, 80].
Random errors derive mainly from image noise in acquisition and subject
biological variability while systematic errors occur consistently as the disagreement between manual labeling and automatic segmentations derives from differences in the segmentation protocols [90]. For example, a manual segmentation protocol may follow a specific anatomical criterion to assign labels to
different voxels and the automatic method could otherwise rely on a slightly
different criterion thus yielding a systematic labeling error. However, multiatlas segmentations are known to introduce a spatial bias [91] and another important issue for multi-atlas based segmentation methods is the computational
burden.
54
4
T H E H I P P O C A M P U S S E G M E N TAT I O N
The hippocampal segmentation can be achieved through a combined strategy using
both shape and voxel based information. A priori shape constraints individuate thinner regions of interests where supervised learning techniques provide robust voxelwise
classifications. The implementation of strategies employing distributed infrastructures
makes the segmentation task no more computationally challenging.
4.1
a combined strategy for segmentation
The Hippocampus is primary involved in the pathogenesis of a number of
conditions, firstly the Alzheimer’s disease. As described in previous chapters,
its segmentation can play a fundamental role in early diagnosis, clinical trials
and protocol assessments of several neurodegenerative diseases. Nevertheless,
segmentation issues deriving from acquisition noise and the complexity itself
of this anatomical structure made hippocampal segmentation a challenging
task.
Until recent years the segmentation of the Hippocampus, i. e. its identification and separation from surrounding brain structures, was performed
mainly manually or with semi-automated techniques, followed by manual editing. This is obviously time-consuming and subject to investigator variability,
so a number of automated segmentation methods have been developed. These
have relied so far mainly on image intensity, often adopting multi-atlas registration approaches, in order to minimize errors due to individual anatomical
variations. More recently, though, a number of methods that exploit shape
information have been developed, based on preliminary work carried out in
the nineties with the active shape models and the active appearance models
55
[92, 93]. These models address the issue of identifying objects of a known
shape in a digital image, when such shape is characterized by a certain degree
of variability, as in the case of anatomical structures.
Alternative methods have used strategies based on principal component
analysis, deformable representations, diffeomorphic mappings and Bayesian
frameworks [94, 95, 96, 97]. In recent years a number of studies have used
probabilistic tree frames for brain segmentation, in some cases adopting specific models such as Markovian random fields or graph cuts [98, 99, 100]. These
studies have encouraged so far the exploration of several machine learning
techniques to address the hippocampal segmentation. The method presented
in this work makes use of a dedicated classifier to label voxels belonging or not
to the Hippocampus.
In this work a combined strategy has been developed to tackle the main
issues involved in hippocampal segmentation:
• Shape analysis;
• Intensity-based segmentation;
• Distributed computing implementation.
4.2
why an automated segmentation?
The hippocampal segmentation is one of the most challenging issues in the
scenery of medical image processing. But this challenge is no longer just a
matter of image processing. The large amount of data and the technical developments have brought to a substantial improvement of the standard quality
(mainly) provided by magnetic resonances, however distributed computing resources as far as we know have not been successfully involved in this research
field even if computational times, storage issues and computer resources have
been proved to be formidable obstacles to neuroimaging development in the
last years. According to this reason the proper individuation of small and not
well defined (in intensity) structures as the Hippocampus has become a less
difficult task for experts; nevertheless manual segmentation is time-consuming,
expensive and not completely reliable, being strongly dependent on the human
expertise. These features prevent larger diagnosis programs to be scheduled
and even in the case they would undermine the diagnosis certainties: on one
hand because of costs that larger programs would require, on the other hand
because even if these programs could be afforded the results should face the
unreliability of the manual segmentation methods.
56
A fully automatic segmentation procedure would allow medical structures
to obtain fast and economic diagnosis tools; besides, an automatic procedure
would guarantee the standardization of the segmentation which would be no
longer dependent on the intrinsic variabilities involved by the use of human
experts. Moreover, the possibility to take advantage from distributed computing infrastructures would allow to perform computational intensive analysis
on large scales in a fast and efficient way.
Fig. 11: Flow chart of the segmentation method, according to the following steps: 1)
volume of interest extraction, 2) determination of voxel features and 3) voxel
classification. The learning phase is represented in detail in the classification
box, while in red input and output data.
In Fig. 11 a synthetic overview of the segmentation pipeline is shown. Following sections will be dealt with a more detailed description of these strategies, in the first place focus will be given to database properties.
4.3
materials: database properties
The research activity discussed in this work is primarily based on a set of
real medical images consisting of 56 T1 -weighted whole brain magnetic resonance scans and the corresponding manually segmented bilateral hippocampi
(masks). Data comes from the Laboratory of Epidemiology and Neuroimaging,
IRCCS San Giovanni di Dio - FBF in Brescia (Italy). Scans were performed on
57
healthy subjects of different sex and age as well on subjects affected mainly
by Alzheimer’s disease, mild cognitive impairment or subjective memory complaints.
All images were acquired on a 1.0 T scanner according to the following parameters: gradient echo 3D technique, (repetition time) TR = 20 ms, (echo
time) TE = 5 ms, flip angle = 30o , field of view = 220 mm, acquisition matrix
of 256 × 256 and contiguous slice thickness of 1.3 mm resulting in images with
181 × 145 × 181 overall dimensions [101].
For manual segmentation, the images were automatically resampled through
an algorithm included in the MINC package 1 and normalized to the Colin27
template 2 with a voxel size of 1.00 × 1.50 × 1.00 mm3 . When automated registration failed, manual registration was performed, based on 11 anatomical
landmarks. Manual hippocampal segmentations were performed on contiguous coronal brain sections by a single individual blind to diagnosis using the
software Display 1.3 3 , following the protocol defined by Pruessner et al. [66].
The protocol mandates the scans to be acquired with the three dimensional
gradient technique and consequently to correct the non uniformity of the scans
and to register them into standard stereotaxic space prior to segmentation. The
Hippocampus is a bilaminar formed and symmetrically located structure, the
manual tracing in coronal plane has to be preferred to sagittal and axial orientations to preserve convexity properties of the shape, however the other views
can be used whenever these are felt to prove more valuable for boundary detection. Hippocampus manual segmentation protocol prescribes also a number of rules of thumb which take into account the brain morphology and the
complexity of the neighbor medial temporal lobe structures. For example the
Hippocampus was defined to include the dentate gyrus, the cornu ammonis
regions, the part of the fasciolar gyrus that is adjacent to the cornu ammonis
regions, the alveus and the fimbria. The Andreas-Retzius gyrus, the part of
the fasciolar gyrus which is adjacent to this gyrus, and the crus of the fornix
were omitted from the Hippocampus. As a consequence a consistent segmentation of the Hippocampus should exclude arbitrarily a number of pixels in the
region where the hippocampal tail is adjacent to the Andreas-Retzius gyrus
which are indistinguishable since they both appear as gray matter in T1 scans.
The same situation can be found in several other hippocampal regions, such
as where the Hippocampus is attached to the trigone of the lateral ventricle or
where its boundaries literally fade into the amygdala’s ones.
1 www.bic.mni.mcgill.ca/software
2 www.bic.mni.mcgill.ca/ServicesAtlases/Colin27
3 www.bic.mni.mcgill.ca/ServicesSoftwareVisualization/Display
58
This data was used to train and test in a cross-validation framework the
segmentation workflow and the distributed environment. To validate results,
however, a second database was used. This second dataset without available
manual labellings was downloaded from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) 4 . This data consisted of a random sample of 1.5T1 scanner
from 456 subjects having 4 different time acquisitions: screening, repeat, month
12 and month 24, where screening and repeat scans where acquired almost simultaneously, the other scans were acquired respectively 12 and 24 months
later. A further discussion of these characteristics will be provided in the Chap.
5. This data is characterized by high variability, different scanner protocols are
used, in particular those from General Electric Healthcare, Philips Medical Systems and Siemens Medical Solutions. ADNI collected data include also high
field scans at 3.0 T, but with the aim of providing a clinical validation to the
proposed segmentation workflow (which is trained on 1.0 T scans), only 1.5 T
scans were used.
It is important to underline that this second dataset provided also sex, age
and clinical information. In particular among the 456 downloaded scans there
were 94 AD patients and 217 MCI subjects. In this way, segmented hippocampal volumes were used to provide a diagnosis and an evaluation of the hippocampal volume as an AD biomarker was performed.
4.4
preprocessing, automated registration
A preliminary processing is necessary to align images so that corresponding
features can easily be related, by definition this is what is commonly intended
for image registration.
In the last 25 years remarkable progresses in this field have been accomplished, huge investments both by universities and industry have been funded.
The main reason is that image registration has evolved from being considered
as a minor precursor to some medical imaging applications to a significant
subdiscipline in itslef [102]. Why has registration become so important is the
necessity for medical imaging to establish correspondences of spatial information in images and equivalent body structures. In many clinical scenarios,
images from several modalities may be acquired and the diagnostician’s task
is to mentally combine or "fuse" this information to draw useful clinical conclusions. However, this task is naturally time-consuming and therefore international concerns about health-care costs drive development of automated meth4 http://adni.loni.usc.edu
59
ods to make the best possible use of medical images. In this sense automated
registration is a condicio sine qua non for medical image processing.
4.4.1
Registration Methodology
As previously stated registration is a subdiscipline of image processing which
has a wide range of applications:
a. Combining information from multiple imaging modalities;
b. Monitoring changes in size, shape or image intensities over time;
c. Relating preoperative images and surgical plans;
d. Relating an individual’s anatomy to a standardized atlas.
In this work, related to hippocampal segmentation and its volumetric changes
according to pathological conditions, primarily the Alzheimer’s disease, the
registration is mainly used to monitor anatomical changes with respect of a
standardized template.
To be effective, these tasks require the establishment of spatial correspondence. The process of image registration aims to fulfill this requirement finding the mappings that relate the spatial information conveyed in one image
to those in another or in physical space, namely a template reference. The
degrees of freedom needed to describe a transformation depend on the type
of transformation itself and the data type. The main possible transformations
for structural magnetic resonances are the rigid, the affine and the deformable
transformations, while the data type is manly defined in terms of the acquisition source (MRI, PET, ...) or the dimensionality (2-D or 3-D).
4.4.2 Rigid transformations
The spatial mapping between two 3-D magnetic resonance images is in general defined as a function T which maps a vector ~x representing the spatial
coordinates of a point in a new reference frame coordinates ~x 0 :
T : ~x ∈ R3 7→ ~x 0
(73)
In general the input data is also referred as moving space/image whilst the
arrival space is called target space/image. The registration is an optimization
60
problem, in the sense that the process itself stops when the moving image and
the target image reach the best possible match. Obviously, this require the
introduction of a suitable metric to measure the matching, in general metrics
based on the squared sum of errors or correlation measures are adopted. As
an optimization problem, registration can be highly computing intensive, this
is why an initial guess or an initial setting is usually adopted. In a fully automated registration framework, the best initial guess is obtained through a rigid
registration:
~x 0 = Trigid (~x) = R(~x) + ~t
(74)
where R is a real orthogonal matrix, meaning that:
RT R = RRT = I
(75)
with I the identity matrix and ~t a translation vector. Matrices as those defined by equation 75 have determinant det(R) = ±1, and correspond to proper
and improper rotations (the latter are no properly rotations in the sense that
they can be obtained only by the combination of a rotation and a reflection);
however improper rotations can be eliminated requiring det(R) = +1. As a
consequence Trigid has six degrees of freedom which can be interpreted as
the three components of translation ~t and three Euler angles : α, β, γ which
uniquely define a proper rotation.
The intensity information can be used to calculate the images center of mass
or the momentum distributions. In the first case the moving and the target
images are registered shifting the moving image to align the centers of mass
while in the second the first moments are aligned. In general this second
procedure is used, because of its generality. In particular, because it does not
make any assumption about the physical origins of the moving and target
images.
Once images are rigidly registered, a second registration is usually performed to achieve a better match. Several options can be explored, however
for structural data the most common choice is for a non rigid transformation.
4.4.3 Non rigid transformations
The simplest non rigid transformation, the scaling transformation can be obtained from a rigid one with the introduction of a scaling parameter:
61
~x 0 = T(~x) = RS(~x) + ~t
(76)
where S is a diagonal matrix whose non null elements sx , sy , sz represent the
scale factors along the different coordinate axes:


sx
0
0

S = 0
sy

0
0
0
sz
If the scaling is isotropic then sx = sy = sz , however in general anisotropic
solutions are explored and give the best results. The scaling transformations
are a special case of affine transformations Taffine :
~x 0 = Taffine (~x) = A(~x) + ~t
(77)
The real matrix A, as a difference with rotations R, has no restrictions on the
elements aij . The affine transformation preserves the straightness of lines, and
hence, the planarity of surfaces; it preserves parallelism, but it allows angles
between line to change. The affine transformations are frequently represented
in homogeneous coordinates:


a11 a12 a13 t1


a21 a22 a23 t2 


A=

a31 a32 a33 t3 
0
0
0 1
(78)
Both rigid and non rigid transformations can be used to register a set of
images, several different algorithms have been proposed so far and in particular for 3-D boundaries. The first works on the field where strictly connected
with the idea that registration should have been point wise. According to this
approach, registration problem could be classified as a distance optimization
problem. The main reason behind this approach was the lack of computing
power and facilities to tackle the registration problem with a global approach.
However, in recent years, intensity based registration methods, which rely on
the whole image content, have obtained wide spread use.
62
4.4.4 Intensity based registration
Intensity based registration involves calculating a transformation between the
moving M and the target T images using the voxel values alone. No need for
landmarks and no need for a priori intervention make this approach suitable
for automated algorithms.
In its purest form, intensity based registration is determined by iteratively
optimizing a similarity measure calculated from all voxel intensities M(i) and
T (i). The effect of an affine transformation T is to iteratively modify M(i)
until a cost function, defined through the similarity metric adopted, reaches a
minimum. It is worthwhile to note that the metric choice plays a fundamental
role to achieve satisfactory results.
4.4.5 Similarity Measures
Let us suppose that the moving image M and the target image T of voxel size
N are identical except for the misalignment. An intuitively obvious similarity
measure would then be the sum of squared differences. In this case, perfect
registration would return a null result. The sum of squared differences
1 X
|T (i) − T[M(i)]|2 ∀i ∈ T ∩ T(M)
N
N
(79)
i
is the optimum measure even if the two images M and T differ only by
Gaussian noise [103].
Certain image registration problems are reasonably close to this ideal case.
For example, in serial registration of structural MR images, it is expected that
the aligned images are identical except for small changes, resulting from disease progression or inter subject variability. Similarly, in functional imaging, as
for example functional MRI, only a small number of voxels is expected to be
change during the study, so images can be registered under the assumption to
be pretty much identical. In the end, the sum of squared differences is likely
to work well as similarity metric in all those cases where moving and target
images are supposed to be different only for a small fraction of voxels.
However, in several cases the previous assumption can be no longer considered correct. For example, when a small number of voxels change intensity by
a large amount passing from the moving to the target image, this can result in
a poor registration. Another important issue comes from the fact that medical
images, especially magnetic resonances, have as noise distribution a the Rician
63
behavior [104], so that the Gaussian assumption does not hold. This is why for
magnetic resonances a preferable choice consists in adopting the correlation
coefficient r
P
[T (i)− < T >][T(M(i))− < T(M(i)) >]
r= P i
P
1
{ i [T (i)− < T >]2 i [T(M(i))− < T(M(i)) >]2 } 2
∀i ∈ T ∩ T(M)
(80)
where the sums are intended over every voxel i in the overlap region of the
moving and the target images.
Nevertheless, the choice of a suitable metric for similarity does not exhaust
the different settings characterizing the registration task. In fact, unless a robust optimization procedure is established, no guarantee of a correct registration can be given.
4.4.6 Optimization algorithms for registration
In many optimization problems it is desirable to determine the globally optimal solution. This is not true for image registration, an example can be parenthetical. Let us consider two images characterized by wide low intensity
noisy regions as two heads surrounded by acquisition noise, then the sum of
squared differences is minimized aligning the noisy regions rather than aligning regions significant for the analysis.
The fact that the desired optimum is local rather than global does not invalidate the use of voxel similarity measures for registration, on the contrary it has,
though, implications for robust implementations. The correct local optimum
value can be found provided that a starting estimate T is given and that this
initial guess is not "so far" from the desired local minimum. It is worthwhile to
note that what "not so far" really means is that the initial guess must be closer
to the desired minimum than to the background alignment configuration, however this can be easily tackled by using a rigid registration as initial guess and
imposing this registration to match physical conditions as the alignment of the
centers of mass of the images or of the momentum distributions.
Once images are registered to a template or a reference atlas they all share
common features that can be therefore analyzed in a statistical framework. In
particular, registration is a necessary prerequisite for a robust segmentation
and a robust statistical shape analysis.
64
4.5
shape analysis
Segmentation algorithms are naturally based on intensity analysis, however
this information in the case of the Hippocampus is not sufficient to achieve a
consistent segmentation, as discussed in the previous section. This is why a
shape model for the Hippocampus has been built. We developed a fully automatic point distribution model (FAPoD) [105] which individuates the shape
of an object intended as a collection of a fixed number of points. The model
implementation has been studied on a toy model of simulated images and then
tested on the real magnetic resonances dataset, by far a comparison with a standard tool for shape analysis i.e. the spherical harmonic framework (SPHARM)
[106, 107, 108] was performed.
4.5.1 SPHARM analysis
First of all, MR brain scans were standardized in terms of intensity and spatial
coordinates [109]. Hippocampal boxes, i.e. bounding boxes containing the
Hippocampus and the para-hippocampal region were extracted according to
Calvini et al. [110]: MRIs are registered to the stereotactic space (ICBM152) and
a putative hippocampal region is individuated by an atlas or it is manually
traced. This region of 30 × 70 × 30 voxels is very small if compared with the
whole brain scans, but it does contain the Hippocampus. It becomes the region
of interest (ROI) for further analyses. The same operations are then performed
on the corresponding manually traced segmentations, which are the object of
the shape model construction.
The masks are represented through a mesh representation which is projected
onto a unit sphere surface ( Fig. 12).
This results in a vectorial bijective mapping ~F between each contour voxel
and a point ν with spherical coordinates θ and φ:
ν(θ, φ) = (Fx (θ, φ), Fy (θ, φ), Fz (θ, φ)
The object surface can then be described by a complete set of spherical harmonic basis functions Ylm , where Ylm denotes the spherical harmonic of degree
l and order m. The expansion takes the form:
ν(θ, φ) =
∞ X
l
X
m
cm
l Yl (θ, φ)
(81)
l=0 m=−l
m m
(cm
lx , cly , clz ).
where cm
These coefficients can be estimated up to a desired
l =
degree by solving a set of linear equations and the object surface can therefore
65
Fig. 12: The figure shows a mesh representation compared with its projection onto a
unit sphere; the colors are just a figurative representation for different subhippocampal regions.
be reconstructed using these coefficients; the more the number of coefficients
used, the more the accuracy of the model (Fig. 13).
4.5.2 SPHARM description
The main idea behind SPHARM workflow relies on the parametrization of a
connected surface onto a unit sphere. This problem can be seen as an optimization problem whose variables are the object vertexes coordinates.
There are three constraints:
a. The Euclidean norm of the coordinates must be 1 for every vertex;
b. Area must be preserved, so that in the spheric case any object region
must be mapped onto a proportional spheric area;
66
Fig. 13: Comparison of hippocampal shapes reconstructed with a different number of
coefficients cm
l : 1 coefficient, 5 coefficients, 10 coefficients and 15 coefficients.
As expected, the more the number of coefficients, the more the details the
model is able to capture.
c. No spheric region is allowed to be defined by angles which are negative
or exceed π.
The goal of the optimization is to minimize the distortion of the surface net
in the mapping. This is perfectly achieved when every voxel facet is mapped
onto a "spherical square" [106], however this can be achieved only in a special
case, i.e. a single voxel image. The variables of the optimization are the vertexes
coordinates, in a spherical geometry these are the latitude θ and the longitude
φ. The model for the mathematical description is derived from the physical
heat equation for diffusion. The north and the south pole latitudes of the unit
sphere are considered as temperatures:
67
∇2 θ = 0 ∇2 φ = 0
(82)
θnorth = 0 θsouth = π
(83)
φnorth = φsouth = 0
(84)
The results of this parametrization has already been shown in Fig. 12. Once
coordinates θ and φ are optimized, the surface ~S of the object can be explicitly
defined:

x(θ, φ)


~S(θ, φ) = 
 y(θ, φ) 
(85)
z(θ, φ)
The 3-D shape can be expanded into a complete set of basis functions. By
using the spherical harmonics Ylm ,
s
Ylm (θ, φ) =
2l + 1(1 − m)! m
pl (cosθ)eimφ
4π(l + m)
(86)
where pm
l (cosθ) are the associated Legendre polynomials defined by:
Plm (x) =
(−1)m
dl+m 2
2 m
2
(1
+
x
)
(x − 1)l
(2l l!)
dxl+m
(87)
the model results to be defined up to a desired number of complex coefficients cm
l , so that:
~ φ) =
S(θ,
∞ X
l
X
m
cm
l Yl (θ, φ)
(88)
l=0 m=−l
It is worthwhile to note that this procedure defines the parameter coordinates only for the vertexes, therefore an interpolation function must be used between the sample points; this introduces an artificial sub-voxel resolution that
has no base in the input data. However, the discretization of the process yields
an affordable effort in terms of computational time and a post-processing minimization error is performed in order to make the results the more accurate it
is possible.
Nevertheless, the SPHARM framework has become of widespread use for
several reasons:
68
a. It is able to deal with surfaces of arbitrary connected shapes;
b. Parametrization allows to take into account both local and global variations;
c. It is always possible to achieve a unique solution for a shape model (except rotation);
d. The parametrization preserves areas and minimize local distortions.
Besides, another main reason for the SPHARM adoption is the capability to
take into account shape variability and describe them in terms of the harmonic
functions; the numerical coefficients can be interpreted as shape descriptors
so that their statistical behavior can be easily used to create statistical models.
This is crucial for medical applications where no dataset can be considered
complete, in a mathematical sense, and therefore every model trained on restricted data is required to possess the ability to reproduce, describe or foresight the behavior of data the SPHARM algorithm is not trained onto. This can
be achieved by considering the coefficients cm
l and their distribution. Therefore
a mean value and a standard deviation is calculated, every shape of the model
can be retrieved starting by the linear combination of these values.
The main drawback is the high complexity of the model and the difficulty
to have a straightforward interpretation of the coefficient variability. On the
contrary it would be greatly useful to have a sound interpretation, for example
it could be useful to know that the hippocampal region with higher shape
variability is the head, an information which in this framework is not easily
inferred.
4.5.3 The SPHARM average shape algorithm
The training set manually labeled masks have been firstly topologically checked
in order to remove little imperfections, such as holes or protrusions. In the following Fig. 14 an example is shown.
To keep these masks as close as possible to the original manual labellings,
only protrusions or holes of dimensions inferior to = 1.5 voxels have been
fixed. In this way manual labellings have a connected topology which allows a
sperical harmonic representation. Then masks have been decomposed with a
fixed number c = 15 of harmonic coefficients (a value suggested for many applications). As it has been already shown in Fig. 13 this number of coefficients
is sufficient for a detailed modeling of the hippocampal shapes.
69
Fig. 14: Comparison of hippocampal shapes with thee original no fixed (left) and fixed
(right) topology. In the left figure, some isolated voxels are clearly visible.
Once hippocampal shapes are parametrized, an alignment process takes
place. This process is in general necessary to overcome the natural misalignment caused by inter-subjects variability or acquisition issues; in our case,
dataset has already undergone a fine registration process, therefore only a
gross first order ellipsoid (FOE) alignment is performed. This procedures is
based on the registration of the FOEs (already shown in Fig. 13 case (a))
which are obtained with a first order harmonic spheric expansion. FOEs are
ellipsoids whose alignment is therefore simply performed with respect to the
principal dimension Fig. 15.
The hippocampal shapes, now all registered and described in a common
framework, can be statistically analyzed. All the SPHARM coefficients are
normalized and comparable across objects, and then group analysis can be
performed. In particular this analysis was used to obtain the mean hippocampal shape for both the left and the right hippocampus. These informations
70
(a)
(c)
(b)
(d)
Fig. 15: The figure represents two different hippocampal labellings aligned along the
principal dimension. It has to be noted that this procedures yields a rigid
registration which does not modify the mask dimensions.
are valuable with the aim of describing the hippocampus, on the other hand
it is worthwhile to note that while this analysis describes well variations over
the mean shape with respect of the training data, no information can be given
about unknown hippocampal shapes. In this way the analysis is strongly dependent on the completeness of the training set, a condition which is difficult
to fulfill for medical images. With the goal to generalize the hippocampal
shape description, a novel fast automated algorithm based on point distribution model (FAPoD) was developed.
71
4.6
a novel fapod algorithm
Segementation is a difficult task especially when concerning biological shapes
characterized by high variability, non exhaustive data, and so on. Another
important issue descends from the lack of a reliable gold standard, the ground
truth, to perform quantitative evaluations or comparisons.
To tackle this problem firstly a simulated dataset was collected. This choice
allows to perfectly control the tuning parameters and the internal variability.
Then simulated data were described in a point distribution framework, therefore defining and detecting in an automated way particularly relevant points
for training shapes. Once every training shape had been described through
a set of these points and their variability, an average shape and a confidence
bound was determined.
4.6.1
Simulated Data
With the aim of studying a complex shape such as the Hippocampus, we built
a dataset of simulated images based on the standard Moving Picture Experts
Group (MPEG) database. A set of binary images representing simple geometrical shapes or ordinary objects was used, in Fig. 16 there is one of these.
Fig. 16: A watch from the standard MPEG database.
The model wanted to reproduce the manual tracing of experts, consequently
the shape model we built was based on a 2-dimensional image in order to
reproduce the coronal view of an MRI scan, which is in general a convex shape
and the main choice for manual tracing in several segmentation protocols [66].
The training shape underwent a noisy process to simulate the presence of
artifacts and the biological variability, these have to be characterized by ran-
72
domness and that is why gaussian noise was chosen. This noise was added
both voxel-by-voxel and to specific regions of the shape contours, in the first
case to reproduce imperfections in acquisition, especially due to poor signal to
noise ratios, whilst in the second case to tackle the intrinsic shape variability.
The noisy images had to reproduce features proper of medical image databases,
therefore a comparative study was performed to establish the level of noise to
be applied [105]. To this aim the similarity function sum of squared differences
SSD is used:
SSD(Ii , I) = k Ii − I k2
=(Ii − I)T (Ii − I)
= k Ii k2 + k I k −2ITi I
= k Ii k2 + k I k −2c(Ii , I)
(89)
The SSD function can be interpreted as a measure of the correlation between
the reference image I and the noisy images Ii . The noise amplitude was studied
on a set of 50 simulated images. This number was chosen to simulate a real
set of medical images. This study established how for different noise values
above a particular threshold the correlation among the reference and the noisy
images was lost. As an example a noisy watch is shown in Fig. 17.
Fig. 17: An example of the noisy process over the template watch.
Dealing with biological variability deriving for example from differences in
age or sex is another important issue. According to the main causes of variability in magnetic resonance imaging, which are slightly ghosts effect, translations
or rotations due to small involuntary movements of the subjects a random effect of scaling, rotations and translations were simulated both globally and
locally. The Fig. 18 is a proper example of these effects.
73
(a)
(b)
Fig. 18: The figure shows a visual comparison between an image representing a disk
and its counterpart obtained by randomization of scaling, translations and
rotations.
The training set of noisy images were then used to develop an automated
algorithm for shape modeling and shape retrieval in the point distribution
model framework.
4.6.2
Shape model construction
Constructing a statistical shape model consists basically of extracting the mean
shape and a number of modes of variation from a collection of training samples.
In statistical approaches, the dominant role is carried out by point wise description. An essential requirement to describe shapes within a point distribution
model is that reference points, the so called landmarks [111], are located at corresponding locations on all training samples. This requirement is generally the
most challenging part of shape model construction and at the same time one
of the major factors influencing the model quality. Manual landmarking has
become unpopular not only because it is a tedious and time-consuming procedure requiring expert work but also because a large number of landmarks is
required (especially in 3-D analyses) and results reproducibility is not guaranteed.
In principle, all algorithms that automatically compute correspondences actually perform a registration between the involved shapes. Even for this choice,
several possibilities can be explored. The straightforward solution to landmark
creation in 3-D is mesh-to-mesh registration. It works directly with training
meshes and the most popular algorithms applied in this case are Iterative Closest Point algorithm [112] and Procrustes [113]. One of the shortcoming of this
approach is the bias induced by the choice of a reference shape. A solution
74
of this issue is obtained requiring that all possible combinations of matches
among landmarks at the cost of a substantial increment of computational burden. However, the largest drawback of using a standard point matching algorithm is the restriction to similarity transformations. In samples characterized
by high variability determination of corresponding points by proximity alone
can not only lead to wrong correspondences but also to non homeomorphic
mappings and thus flipping triangles in the mesh. Alternative approaches
are mesh-to-volume registrations. These approaches consist in adapting a deformable surface mesh to a volume data (such as MR scans). In this approach
a bias is introduced because of the need for an initial model that therefore
relies on the training sample. Instead of adopting and consequentially adapting a template mesh, an alternative volume-to-volume approach is to register
training samples to a standard template. In this case no a priori model is required, landmarks are placed on the atlas and their corresponding points are
obtained as propagation of the registration deformation field. Whatever approach is chosen the goal for robust fully automated landmarking still holds.
After alignment, the next step is the retrieval of the mean shape and its variability.
As described in literature, the first step to perceive a point distribution model
is to capture the best shape information through a point wise description. In
this sense every shape is described by a set of landmarks.
In a general automated framework, the main issue of adopting a point wise
shape description is that no correspondences among different shapes nor importance criteria can be adopted a priori to determine which particular contour
points should be eligible to become landmarks of a particular model. For example, well known simple shapes, as one hand, could be described by defining the
fingertip extremities as landmarks, but this is not the case of complex shapes
as the hippocampal one where no privileged anatomical points can be easily
or automatically be detected. These are the main reasons for our choice to
adopt a statistical framework to define the landmarks of our shapes and to use
a preprocessing registration procedure.
Firstly, a cumulative image S was defined summing over the contours of
the training images Ii . Then a uniform spatial sampling of S was performed,
the sampling was obtained through a moving window of 10 × 10 pixels and
for each sample the pixel sum n was calculated and its value assigned to the
center of the window itself. A threshold t is finally applied to keep only significant informations, the value of this threshold was studied with respect of
the number of surviving points and the shape reconstruction accuracy. Once
the mathematical landmarks are defined shape retrieval is straightforward, the
last step is therefore to measure the accuracy of the reconstruction. For our
75
training images results were calculated in terms of dice index D, a popular
error metric for image processing [114] defined by:
D=
2(A ∩ B)
A+B
(90)
where A represents the set of reference pixels, and B the set of pixels obtained by reconstruction. For simulated images results were on average about
96.8% ± 0.4% [105].
The shape description is not the only information one observer would like
to acquire. Another important issue is to establish if the model has or not
predictive power, and as a consequence if it is able to manage and model data
never seen in training phase. In this sense the main information obtainable
by the proposed framework is the variability, expressed in terms of standard
deviations, characterizing every landmark and then every shape.
4.6.3 Modeling the variations
The FAPoD procedure so far has established how to detect from a set of noisy
images the N landmarks representing every shape of the training set. Once
landmarks are detected, their homologous can be identified and labeled in
every shape. Therefore each shape is represented by an N-dimensional vector
~x. To best capture the shape, for every couple of consecutive landmarks, a
mathematical pseudo-landmark is introduced as that whose distance from the
chord subtended by the two consecutive landmarks is maximum. An example
is shown in Fig. 19
For each landmark and pseudo-landmark a mean position and the relative
standard deviation can be calculated. The mean shape is obviously obtained
by considering the mean landmarks, however what really matters in this case
is the standard deviation which includes in the model shapes only on the basis
of statistical considerations. Thus, at each landmark and its surrounding pixels
a probabilistic values is assigned according to a standard gaussian distribution
N(µ, σ) Fig. 20:
Z1
2
Plandmark =
N(0, σ)dx
− 12
(91)
Z n+ 1
2
Pneighbors =
76
N(0, σ)dx
n− 12
(92)
Fig. 19: A mathematical pseudo-landmark is defined as the contour voxel whose distance from the chord subtended by two consecutive landmark is maximum.
where µ = 0, σ is calculated with respect of the average determined for each
landmark and n is the distance of the neighbor from the average measured in
voxel unities.
Fig. 20: The figure shows an example of how the probabilistic values are associated
to a landmark and its neighbors.
Once probabilities are assigned to every landmark and its neighbors, shape
reconstruction depends on the probability to assign at a contour, depending
77
on which pixels will be used for reconstruction. It is worthwhile to note that
the probability of a pixel to belong to an edge of a shape is different from the
probability to belong to the shape itself. Besides, if the goal of the analysis
is to determine a region of interest, inner pixel analysis can be skipped thus
resulting in no loss of information and in a dramatically decrease of computing
requirements.
FAPoD compares well with SPHARM description in the sense that its average hippocampal shape is consistent with the one retrieved by SPHARM
analysis, besides it requires a lower number of training Hippocampi to take
into account the whole dataset variability.
4.7
ensemble classifier segmentation
The shape analysis gives an enormous contribution to individuate morphological differences among different subjects; besides, it allows to perform statistical
evaluations about these differences both globally (for example about the volume, the convexity, the symmetry, etc...) or locally (for example with PCA or
p-value analysis of vertexes, as previously shown in the SPHARM framework).
Another fundamental contribution is given to segmentation. Firstly because
shape analysis shrinks the region of interest where a more specific attention
has to be given, this allows to dramatically decrease computational requirements and times. Secondly, shape analysis can be used for post-processing
evaluations and therefore to establish if segmentation results are reasonable.
However, segmentation cannot be performed only through morphology informations. A decisive role in segmentation is performed by color or gray intensities. This is why a voxel-wise analysis has been performed too.
4.7.1
Voxel-wise analysis with machine learning
Automated segmentation techniques are gaining increasing recognition since,
not only they offer the possibility of studying rapidly large databases, for example in pharmaceutical trials or genetic research, but also afford higher testretest reliability and the robust reproducibility needed for multi-centric studies.
Several packages as FreeSurfer [115] and FIRST [116], are available: FreeSurfer
performs a cortical and sub-cortical segmentation assigning a label to each
voxel. It is based on probabilistic information automatically estimated from
a large training set of expert measurements. FIRST performs segmentations
using Bayesian shape and appearance models. More examples could be provided, in any case, it can be concluded that the role of the statistical learning
78
applied to heterogeneous and imbalanced data as those deriving from medical
datasets has become more and more important.
In recent years a number of studies has investigated and compared the performances of these tools, especially with state-of-the-art machine learning techniques. Particularly promising results were obtained with classifiers as SVM,
Adaboost, or Ada-SVM (i.e. SVM with features automatically selected by Adaboost) [85, 86, 87]. These studies showed that in particular Ada-SVM segmentation compared favorably with the manual segmentations, while Freesurfer
gave the worst results and the most visually inconsistent segmentations. Despite numerous efforts at literature, no automatic segmentation method is currently in wide use by clinical research groups nor adopted for large-scale quantitative studies of hippocampal anatomy.
From this point of view, the main goal of this work was to construct an accurate strategy based on supervised learning algorithms devoted to hippocampal
segmentation. A classifier, trained on a set of previously labeled examples (a
number of 3-D brain MR images in which the hippocampi had been previously
manually segmented) will classify voxels of a new brain MR image as belonging or not to the hippocampus. In particular, for each brain image a volume of
interest (VOI), as previously described, is extracted according to shape analysis
results.
The VOI contains tens of thousand of voxels divided in two classes, hippocampal region and background. From a statistical point of view the structural complexity of the VOIs is twofold. Firstly, the two classes are dramatically
unbalanced: in each VOI less then the 5% of voxels belongs to the hippocampus. Secondly, VOI data set is very large, consisting of 112 boxes when referring to the first dataset and arriving to 4000 boxes. The classifier performance
can degrade significantly as the severity of the imbalance increases. Hence generating accurate statistical solutions for automated hippocampal segmentation
is not trivial. As a consequence, an important phase of the study was related
to the performance assessment of a Random Forest classifier [117] and a novel
algorithm combining data random undersampling with Adaboost (RUSBoost)
[118]. It is worthwhile to note that this was the first attempt to tackle the
hippocampal segmentation problem with these classifiers.
4.7.2 Feature Extraction
Supervised pattern recognition systems involve taking a set of labeled examples (or features) and learning a pattern based on those examples. The features
should contain information relevant to the classification task. In the analysis
79
presented here for each voxel a vector whose elements represent information
about position, intensity, neighboring texture, and local filters, was obtained.
Texture information (contrast, uniformity, rugosity, regularity, etc.) was expressed using both Haar-like and Haralick features, as in [85]. This type of
features is characterized by computational simplicity. For each voxel, a value
was obtained by the weighted sum of the intensities on the area covered by a
template, the sum of the weights being zero [119]. Filters of size varying from
3 × 3 × 3 to 9 × 9 × 9 were used for the calculation of the Haar-like features.
The Haralick features [120] were calculated from the normalized gray-level
co-occurrence matrices (GLCM) created on the m × m voxels projection subimages of the volume of interest; m defined for overlapping sliding-windows.
For each voxel values of m varying from 3 to 9 were used.
Haralick features rely on the calculation of the GLCM over the Ng gray levels, based on the assumption that the texture information of an image is contained in the spatial relationship between pairs of voxel intensities. In the cooccurrence matrix M each element pij represents an estimate of the probability
that two pixels with a specified polar separation (d, θ) have gray levels i and j.
Coordinates d and θ are, respectively, the distance and the angle between the
two considered pixels. As in [120], d = 1 and the displacements at quantized
angles θ = kπ/4, with k = 0, 1, 2, 3 were considered.
As shown elsewhere [121, 122, 123] a subset of Haralik features is sufficient
to obtain a satisfactory discrimination. To establish which of the original 14
GLCM Haralick features give the best recognition rate preliminary recognition
experiments were carried out resulting in the following configuration:
• energy:
f1 = Σij p2ij
(93)
• contrast:
N −1
N
N
g
n2 Σi g Σj g p2ij ; |i − j| = n
f2 = Σn=0
(94)
• correlation:
f3 =
Σij (ij)p2ij − µx µy
σx σy
(95)
where µx , µy , σx and σy are the means and standard deviations of px and
py , the partial probability density functions obtained summing the rows
or the columns of pij .
80
• inverse difference moment:
f4 = Σij
pij
1 + (i − j)2
(96)
Finally, the gradients calculated in different directions and at different distances and the relative positions of the voxels, x, y, and z were included as
additional features. The best analysis configuration, expressed by the highest
metric mean value, was obtained with 315 features.
4.7.3 Classification methods
The goal of supervised learning algorithms is not to learn an exact representation of the training data itself, but rather to build a statistical model of the process which generates the data. In this sense the choice to tackle the hippocampal segmentation problem within a statistical pattern recognition framework is
dictated by the necessity to deal with heterogeneous data, biological variability, pathological conditions and several other characteristics which prevent the
possibility to achieve a deterministic description. This idea is summarized by
the generalization error.
The generalization error is the sum of the bias squared plus the variance
[124]. Bias and variance are complementary, a simple statistical model has a
small variance and on the contrary a large bias, ion the contrary a model with
an excessive flexibility with respect to the training dataset has a poor bias and
a large variance. The optimum balance between bias and variance depends on
the values of the model parameters; at this regard there are two strategies:
a. pruning, intended as removing superfluous parameters from starting models characterized by a huge number of themselves.
b. regularization, involving the addition of a penalty term to the error function through which the model is learn.
To show the dichotomy bias-variance it is convenient to consider the ideal case
of an infinite size dataset used to model a scalar output y for which the sum
of squared errors E is:
Z
1
{y(~x)− < t|~x >}2 p(~x)d~x +
E =
2
Z
1
+ {< t2 |~x > − < t|~x >2 }p(~x)d~x
2
(97)
81
where p(~x) is the unconditional density of the input data and < t|~x > the
conditional density for the target variable t, similarly < t2 |~x >. The second
term is model independent in the sense that it does not depend upon the model
prediction y and this is why it is called intrinsic error. Therefore a statistical
model can be considered perfect when it makes the first term vanish. As a
consequence in a perfect statistical model:
(98)
y(~x) = < t|~x >
In practical situations the whole ensemble of training sets arising from a
finite size dataset D allows only to give an estimation of the desired quantities.
For example the estimated error is given by:
D [{y(~x)− < t|~x >}2 ] = {D [y(~x)]− < t|~x >}2 +
+D [{y(~x) − D [y(~x)]}2 ]
(99)
The first term is for definition the squared bias and the second term the variance. The main problem in generalization is in the end minimizing these two
quantities. To assess how different classifiers can manage the generalization error with respect of the particular hippocampal segmentation several tests were
carried out. In particular two different classifiers, whose performances had
never been tested before in this particular field of neuroimaging, i. e. Random Forest and RUSBoost, were chosen. The reason for this choice is the well
known excellence in accuracy, among current algorithms, for Random Forest
while RUSBoost as a novel algorithm especially designed to deal with imbalanced datasets is a straightforward choice given the extremely high skewness
of data distribution for hippocampal and background voxels.
4.7.4 Random Forests
Random forest is a combination of tree classifiers, each tree depends on the
values of a randomly and independently sampled vector, for all trees in the
forest. The generalization error for forests converges to a limit as the number
of trees in the forest becomes large. The generalization error of a forest of
tree classifiers depends on the strength of the individual trees in the forest and
the correlation between them. Using a random selection of features to split
each node yields error rates that compare favorably to other machine learning
techniques (Adaboost), but are more robust with respect to noise. Internal
82
estimates monitor error, strength, and correlation and these are used to show
the response to increasing the number of features used in the splitting. Internal
estimates are also used to measure variable importance.
Starting from a training set, seen as a collection of examples, in Random
Forest experiments, a combination of bagging and random features selection
is used. A number of new training sets are drawn from the original one, with
replacement. Then a tree is grown on the new training set using random
feature selection. The trees grown are not pruned. There are two reasons for
using bagging. The first is that this enhances accuracy when random feature
selection is adopted; the second is that bagging can be used to give ongoing
estimates of the generalization error, as well as an estimate for strength (the
accuracy of the individual classifiers) and correlation (the dependence among
individual classifiers).
Given a specific training set T with N examples of d features, k bootstrap
training sets Tk are formed. For every new training set Tk a tree is grown giving
as a result a classifier h(~x, Tk ) where ~x is an input feature vector. The prediction
y for every ~x is obtained by voting majority only over those classifiers which
do not contain ~x. In this way a so called out-of-bag estimate is achieved for the
generalization error. From an operative point of view, this technique is realized
by leaving out about a third of the whole training set for every k bootstrap
sampling which is then used to internally test the forest performances.
The simplest random forest with random features is grown by selecting at
random a small group of features for every node to split on, the nodes are also
called leaves. The methodology is that of classification and regression trees
(CART). A size f for the group of features to randomly sample is √
fixed. Acd where
cording to the original algorithm [117] this value is set to f =
d = 315 is the whole feature numbers. It is worthwhile to note that the number
of features to be sampled√f has to be in general far lesser than the total, the previous choice to set f = d is used to ensure this randomness condition. Once
the features are randomly selected a split of the test examples is performed
83
maximizing the purity A synthetic overview of Random Forests algorithm is
shown in Fig. 21 according to this schematic representation of the algorithm 1:
Data: Training set T : N examples d features
Result: Random Forest classifier
initialization
for t = 1 .. k bootstrap size do
bootstrap
split bootstrap samples in training test with 2:1 ratio
calculate leaf size l1 , l2
while l1 != 1 & l2 != 1 do
random sample: f features
split test examples maximizing purity in two leaves
end
end
Algorithm 1: Random Forest algorithm scheme.
Fig. 21: The Random Forest algorithm. A training set with N examples of d features
is sampled k times with bootstrap. Each bootstrapped training set Ti is internally divided in a training and a test set with a 2:1 ratio. Then a random
sample of f features is performed, those features are used to split the test set
in two leaves. The procedure is iterated until the leaf size reaches a desired
size, in classification this is usually set to one.
84
However, the only parameter to which Random Forest were proven to be
somehow sensitive is precisely f. In fact, Breiman’s original work proved that
both correlation and strength among the classifiers is influenced by and directly
proportional to the parameter f. The problem is that as the correlation increases
the error rate decreases whilst the strength of the classifiers decreases it.
The main tuning about Random Forests is therefore the choice of the feature
number to sample, however it is not a complex or laborious procedure thanks
to out-of-bag estimates. Results are evaluated by comparing the classifier prediction and the manual labellings through the similarity index, also called Dice
index, as defined by equation (90).
4.7.5
RUSBoost
Class imbalance is a common problem for many applications. This is particularly true when dealing with image processing, in fact regions of interest are
likely to fill a small sample of the entire field of view. When examples of one
class in training dataset outnumbers examples of the other class, traditional
data mining methods tend to create suboptimal classification models. Several
techniques have been used to mitigate the problem of class imbalance, including data sampling and boosting [125]. A hybrid approach, namely RUSBoost
was designed for imbalanced classification problems.
Data sampling balances the class distribution in the training sample by either
adding examples to the minority class, therefore oversampling it, or removing
examples from the minority one (undersampling). The goal of these techniques
is to reach class ratio balance, usually by bootstrap. Both undersampling and
oversampling have their benefits and drawbacks, the main drawback associated with undersampling is the loss of information from the training sample
whilst for oversampling it is overfitting deriving from the inclusion of duplicated examples [126, 127]. If computational issues are of concern undersampling should be preferred.
Another technique usually adopted to improve classification performances
is boosting. Such a technique is particularly effective at dealing with class
imbalance because the minority class examples which are most likely those
misclassified receive higher weights in successive iterations. The combination
of sampling and boosting is the core idea behind RUSBoost method.
Random under sampling is a technique that randomly removes examples
from the majority class until the desired balance is achieved. The algorithm
assigns to each example i of the training set T the weight
85
D1 (i) =
1
∀i = 1, 2, ..., N
N
(100)
Then, for each round r = 1, 2, ..., R, a learning phase described in algorithm
2 is performed to maximize the match between the predicted label yi and the
true label y for every example. The main parameter of the learning phase is
the number of rounds R the learning has to be performed.
Data: Training set T : N examples d features
Result: RUSBoost classifier
initialization
for t = 1 .. k bootstrap size do
create temporary training set St
while r < R = number of rounds do
random under sampling to achieve a desired ration between the
majority and the minority class
call a weak learner → a hypothesis ht (~
xi , yi )
compute the error function
P
=
xi , yi ) + ht (~
xi , y))
x~i ,yi Dt (i)(1 − ht (~
update the weights Dt (i) → Dt+1 (i)
normalize the weights
end
end
Algorithm 2: RUSBoost algorithm scheme.
Finally as in the case of Random Forests performances were evaluated with
the measure of the similarity or Dice index.
4.8
analyses on distributed infrastructures
In this section the segmentation pipeline implementation on the local computer
cluster and on the geographically grid-based computing infrastructure is discussed. Computational requests coming from every scientific community are
daily growing and the capabilities of exploiting Grid and in general distributed
computing environment is of crucial importance. However, the most common
environment for medical imaging, i. e. the LONI pipeline does not provide
any usable plugin to exploit those computational infrastructures for Torquebased systems, therefore limiting the adoption of the environment itself. In
this work a solution to this problem is shown allowing end users to access a
number of resources like European Grid Infrastructure (EGI), local batch farm
86
and dedicated servers. The proposed approach can be useful for large-scale
studies.
4.8.1 Medical imaging and distributed environments
The field of medical imaging has seen in recent years an enormous development. Image databases, made of thousands of medical images, are currently
available to supply clinical diagnosis, this is particularly true for brain diseases.
At the same time more and more sophisticated software and computationally
intensive algorithms have been implemented to extract useful information from
medical images. Many medical image processing applications would greatly
take advantage from grids [128] [129]: run-time reduction, sharing of data
collections and platform-hardware independent configurations are just a few
examples [130, 131].
The access to distributed computing is crucial to meet the needs of the growing community of neuroimaging, together with the increase of the amount of
data available. This is also true because screening programs are now in the
development phase and feasibility studies to demonstrate the potentiality of
Grid tools for medical applications spread all over the world [132] [133] [134].
Unfortunately, high computational and storage requirements as well as the
need for large reference image datasets of normal subjects are limiting the use
to advanced academic hospital; besides, the management of complex analysis
pipelines, the combination of several processes and routines hard to assemble
together make the adoption of distributed computing infrastructures very challenging at least without a change of paradigm. This paradigm seems nowadays
represented by workflows technologies [135].
Workflow technologies are emerging as the dominant approach to coordinate groups of distributed services, in particular this is true for Grid computing
services. The background philosophy of such an approach is that if a client (another service or an end user) makes an invocation on a remote server, it should
not to be concerned itself with the inner protocols (as for example the language
it is written in) to take advantage of its functionalities. This is the approach
pursued by workflows. The main idea is that each service is independent from
the other. This will offer a great degree of flexibility and scalability for evolving applications. Although the concept of service-oriented architectures is not
new, this paradigm has seen a widespread adoption through the Web services
approach, which makes use of a suite of standards such as XML, WSDL and
SOAP, to facilitate service interoperability [136].
87
One of the most used workflow manager for medical image processing is
the LONI pipeline (LP) [137], a graphical workbench developed by the Laboratory of Neuro Imaging 5 with the goal to manage and execute neuroimaging
processing algorithms. The LP is a simple and efficient computing solution to
problems of organization, handling and storage of intermediate data as well
as for processing data and perform modular analysis. However several requirements must be fulfilled to run the environment on computer farms. In
particular LP requires or at least suggests CentOS operating system and Java
Platform Standard Edition to be installed. In order to run the analysis the user
has to be sure that LP is able to submit request for executing application to
the computing infrastructure available in that particular context. If the requirements about operating system and Java can be easily achieved or passed by, the
second one is a particularly strong constraint, because different open source resource managers can be used and are often preferred. As far as we know,
no dedicated plugin for Torque or for the gLite/EMI Grid infrastructure has
been released, limiting the adoption of LP environment. To tackle these problems different fully automated algorithms have been proposed; all of them
are characterized by intensive computations and storage management issues,
which naturally should involve the use of distributed computing. However
their adoption or their deployment is still challenging and reports illustrating
applications beyond demonstrative level lack.
4.8.2 Workflow managers
Web services provide a solution to a complex problem: to coordinate a group
of service together to achieve a shared task or goal. Workflows are the glue
for joining together distributed services, which are owned and maintained by
different organizations. The plethora of different workflow specifications, standards, frameworks and toolkits cause scientists to prefer reinventing the wheel
by implementing their own proprietary workflow language rather than investing time in existing workflow technologies. In fact, several scientific workflows
already exist: Taverna [138], Kepler [139] and Triana [140]. In this paper a web
service approach has been used to overcome the main issue of making the LP
available with Torque and at the same time to generalize LP applications to distributed infrastructures. The two aspects are different sides of the same goal: to
make end users able to access distributed resources for intensive computations
while keeping the workflow management easy to handle. This is particularly
true for grids. In fact, although feasibility studies have demonstrated grids to
5 http://pipeline.loni.ucla.edu
88
be able to tackle several issues in medical imaging, grid adoption in practice is
still a challenging problem [141].
The success of contemporary computational neuroscience depends on large
amounts of heterogeneous data, powerful computational resources and dynamic web-services. To provide an extensible framework for interoperability
of these resources in neuroimaging LP exploits a decentralized infrastructure,
where data, tools and services are linked via an external inter-resource mediating layer whose backbone schema is formed by standard eXtensible Markup
Language (XML). The pipeline environment does not require an application
programming interface, its graphical user interface has been originally programmed as a lightweight Java 1.3 environment [142].
LONI Pipeline use has spread and several works have demonstrated which
advantages can be achieved through this powerful tool [143]. In particular
XML resource description allows the LP infrastructure facilitates the integration of disparate resources and provides a natural and comprehensive data
provenance. It also enables the broad dissemination of resource metadata descriptions via web-services and the constructive utilization of multidisciplinary
expertise by experts, novice users and trainees. The LP features include distributed grid-enabled infrastructure, virtualized execution environment, efficient
integration, data provenance, validation and distribution of new computational
tools, automated data format conversion, and an intuitive graphical user interface.
As we already mentioned LONI does not provide the needed plugin to submit and manage jobs to the gLite/EMI grid [144] and to a batch farm based on
Torque.
4.8.3 Workflow Implementation
In order to fill the existing issue of submitting and monitoring LP jobs with
Torque resource manager and more in general to provide a general manageable framework for grids we decided to use a meta-scheduler based on Job
Submission Tool (JST) [135] that is able to submit jobs to different computing
architectures, exposing to the end user only a simple Web Service interface
based on the REST protocol Fig. 22.
Each application composing the workflow is executed on the infrastructure
that fits better the requirements in terms of the computational time needed for
the job execution and the input data size: for example, short jobs that make
access to a large amount of data will be executed on the local batch farm, as
this will increase the available bandwidth to read the input data.
89
Fig. 22: The simplified workflow implementation. In reddish diamonds there are
the input/ouput modules, the backend analysis modules are represented
by turquoise diamonds. To better emphasize the possibility to dynamically
choose the local farm or the grid infrastructures, these backend modules are
shown in yellow.
In order to exploit JST with the LP it was needed to add a SOAP based
web services, that is not already available. Using this web-service interface the
job submission can be performed transparently through the Front-end which
provides a set of web-services calls to submit jobs, monitor them and retrieve
their output.
The JST framework is composed by the Front-end, that hides the complexity
of the underlying layer, developed in Java and based on Apache Tomcat (it
exploits MySQL RDBMS), and by the Back-end that takes care of job execution
or submission to the batch system, based on Torque/PBS or to the EGI grid
infrastructures.
The workflow has been implemented as a sequence of calls to the Front-End
web-services. Actually, LP supports SOAP-based web services but unfortunately it seems not to be able to handle properly the WSDL file of the Front-
90
end web services. A workaround has been implemented using a SOAP client,
wsdlpull v.1.24, completely free and written in C++.
4.8.4 Distributed infrastructure employment
The local farm Bari Computer Center for Science (BC2 S) is a computing infrastructure realized to tackle different use case from different research teams involved
in astroparticle physics, nuclear physics, medical physics, statistics, bioinformatics, theoretical physics, etc. Available computing nodes are able to provide
up to 5000 CPU cores and about 1.8 PByte of storage. The operating system is
Linux, the primary release is Scientific Linux (SL) 5 put together by Fermilab
and CERN and various other labs and universities around the world with the
main purpose to have a common install base for the various experimenters. Debian distribution is available too. Storage system is based on Lustre distributed
file-system, in order to improve performances in reading data and to simplify
the users activities through the adoption of just one distributed file system.
Torque and Maui build up the batch system and manage the job queue.
The implemented solution has been designed to take advantage of distributed computing resources available also in the Grid-based infrastructure. The
applications, i.e. the executable modules in the LONI workflow, can be executed both on the already described computing farm and on the EGI [145]
distributed grid infrastructure that is composed of about 300 sites geographically distributed around the world.
In our experiments the jobs have been submitted to grid using the Workload
Manager System (WMS) service offered to users of the biomed Virtual Organization. This can make use of about two hundreds of Computing Elements
and the great abundance of resources can be an advantage thanks to the reduction of the average job response time, however it has some drawbacks: the
execution environment is not completely controlled due to the heterogeneity
of the resources (pre-installed software, supported data transfer protocols, etc.).
Therefore, this can have a significant impact on the job success rate. In order
to manage this complication we have conducted preliminary tests (using only
one image) aimed at assessing the feasibility, adverse events and predicting
the performance. The results collected from these "small-scale" tests have been
used to adjust the implemented solutions in order to increase the success probability of the "full-scale" test (using the complete image dataset). The preliminary test results have suggested the following improvements: the computing
elements showing problems with JAVA run-time environment have been excluded from the available resources (actually this adjustment can be done by
91
modifying the JDL Requirements field); fallback and recovery solutions have
been implemented for the storage operations. Input data reading and output
data writing are very critical phases, being the major job success/failure factors. In this regard, multiple data transfer protocols (srm, gridftp, http) and
multiple storage elements have been used in our implementation. It is worthwhile to note that the described tuning can be done dynamically: the system
is able to take charge of these changes without stopping or restarting the job
submission operations.
4.8.5 Grid services and interface
Data management is a critical issue when dealing with distributed systems
[146] [147]. Usually, applications need computing power but also requirements
of data sharing, storage and transfers are compelling. This is particularly true
for medical image analysis which require that file storage and transfers are efficient and reliable. Besides, non technical issues concerning privacy problems
tend to minimize data replication and data transfer.
The data transfer and replication issues have been approached from two
different perspectives. Algorithmically, once data is loaded by end user, all
needed information in our segmentation process, based on voxels intensities,
is extracted and anonymously stored or transferred to processing nodes; no
clinical information about age, sex and so on are used, therefore privacy policy
is not so compelling. However there is no certainty that further developments
will not require to manage this kind of data. This is why particular attention
has been given to protect data information, security protocols have been provided against known vulnerabilities.
4.8.6 Security and Data Management
All the steps performed over the grid infrastructure, both in terms of data management and job submission, are executed only after a strong authentication
based on X509 certificate is fulfilled. This provides a good level of data protection. All the job submitted on the local batch farm are fulfilling the standard
authentication based on Unix rules and permission. In order to achieve good
performance and reliability in accessing data to be analyzed during this activity we have exploited the standard Data Management tool offered by the
gLite/EMI grid infrastructure. Namely the lcg-cp tool for coping data and a
SRM Storage Element based on Lustre plus Storm for the SRM layer. The advance in this case is that all the files that are available on the local farm using
92
lustre storage element could be used also over the grid infrastructure reading
them exploiting an SRM interface.
4.8.7 Segmentation workflow deployment
From the computational point of view the presented algorithm can be summarized in three steps:
• Training dataset selection: to improve the algorithm performance for each
test image a training subset was selected through the use of the Pearson correlation coefficient between the test image itself and the training
dataset;
• Feature extraction: all voxels included in the selected dataset were characterized by 315 features computed from local informations such as image
intensity, voxel positions, Haar-like filters and selected local Haralick features;
• Voxel classification: an ensemble classifier is used to classify voxels as
belonging or not belonging to the hippocampus.
Each module of the pipeline can be considered as an independent service
and not necessary enrolled in the neuroimaging field, the most relevant steps
of the analysis and the main parameters are represented in Fig. 23. The end
user uploads an MRI which is compared with the stored images; a similarity
analysis is performed by means of correlation measures and an adaptive training set is selected. The number of correlated images to use for training depends
on the desired accuracy degree but also on storage and time requirements; it
is one of the main parameters which have to be tuned.
Features have to reproduce textural behavior according to intensity information. For 3D images, features are extracted voxel by voxel, statistical features
as mean, standard deviation, gradients, entropy, range and skewness are calculated in cubic windows of varying size (3 × 3 × 3, ..., 9 × 9 × 9) centered in
the voxel to be considered. Besides, position features and textural features are
considered: Haralick features (energy, correlation, contrast and homogeneity)
and Haar-like features have been chosen. A feature file has typical dimensions
of 150 Mb, this is an important aspect both for the storage and the upload on
worker nodes.
Training set images selected are the then used by the RF. The classifier training depends on several parameters that have to be adjusted as the number
93
Fig. 23: A simplified representation of the hippocampus segmentation algorithm. In
this case particular emphasis is given to computational issues: parameters
and critical aspects are shown in round boxes, processing steps in rectangles
and files in diamonds. The distributed computing is enclosed by a dotted line
while the end user interface is shown in reddish diamonds.
of trees to grow or the split criterion; besides, the performances of the classifier strongly depend on the discriminative power of the features, therefore
feature selection is necessary to improve the performances. Finally, several
post-processing analysis can be performed as thresholding or filtering the final
output is the segmented image downloadable by the end user.
4.8.8 Workflow setup
The algorithm has been developed with the LONI pipeline workflow manager: Fig. 24 shows a simplified version of the corresponding implementation,
where main analysis steps are modularized and therefore able to be deployed
on distributed computing infrastructures. Each module executes bash scripts
that transparently handle the submission and monitoring of farm/grid jobs
94
Fig. 24: The segmentation algorithm in its LP implementation. It consists of four main
modules: the Adaptive Training module measures the correlation between the
image to be segmented and the training set, the Feature Extraction module
calculates the features on the testing image according to the volume of interest determined by the previous module, the Random Forest Training module
performs the learning phase for the chosen classifier and finally the segmentation is returned by the Random Forest Prediction module. Each module is
compiled and therefore able to be run on a distributed computing structure.
that, in turn, execute the Matlab 6 code compiled with MCR (with -nojvm
compile option to disable the java virtual machine in order to speed-up our
non-graphical applications). The first processing step, in charge of VOI training, is the least computational intensive module, but the most data intensive
since it needs to access to all the images composing the database. Therefore, it
is candidate to run on the local farm where the input data can be read without significant latency. Moreover, this module can rely on the Matlab software
pre-installed on the local farm; the availability of the same software environment can not be guaranteed on all the Grid worker nodes where the other
modules can run. Therefore, the wrapper script is in charge to download the
MCR package from a configured repository and to install it before executing
the application.
The workflow is user-friendly thanks to the usage of LONI module grouping feature: at high level only the data dependencies among the tasks are
highlighted whereas the details related to the static input data and parameters
of each module can be retrieved browsing inside the module group. In fact,
each box in Fig. 24 has the internal structure shown in Fig. 25 consisting of
6 http://www.mathworks.it/products/matlab/
95
Fig. 25: Each module hides implementation details, in particular the checkinput, insertJob and getStatus modules which allow in a transparent way to control
the workflow without the need to be concerned with submitting or monitoring issues.
three different modules in cascade. Before each processing step, the existence
of input data, provided by the previous module, is checked by the conditional
block (InputCheck) in order to avoid submitting jobs that are doomed to fail.
The insertJob module is in charge of the SOAP request to the JST web service
(using the wsdlpull SOAP client) for the job submission with the proper associated arguments; the SOAP response contains the job identifier to be used for
later monitoring. The getStatus module implements a loop: the SOAP client
periodically sends a status request to the JST web server, using the job identifier returned by the insertJob module, in order to monitor the job status until
its completion, as shown in the sequence diagram in Fig. 26. Upon job completion, the URL of the produced data is retrieved by callin again the JST web
service.
4.8.9 Summary
The development of neuroimaging and signal processing has made possible
the visualization and measurement of pathological brain changes in vivo, producing a radical change, not only in the field of scientific research, but also
in the everyday clinical practice. Therefore the availability of distributed computing software environments and adequate infrastructures is of fundamental
importance. In this study the LONI pipeline processing environment was used,
a user-friendly and efficient workflow manager for complex analysis pipeline.
A study for grid deployment was also performed with the aim of creating automated segmentation algorithms for large screening trials. Several tests were
96
Fig. 26: Web service call sequence implemented in the LP modules.
carried out both on the local computer farm BC2 S and on the EGI. BC2 S is a
distributed computing infrastructure consisting of about 5000 CPU and allowing up to 1.8 PB storage, while the EGI consists of about 300 geographically
distributed sites around the world. In particular all the results presented in
this study were obtained on the BC2 S using 56 MRIs at our disposal. In addition, a feasibility study on about 3000 images (replicas of the 56 original ones)
was performed with success on the EGI. The method proposed tackles two different problems the distributed infrastructures have to deal with: it does not
strongly suffer from an overhead performances deterioration and at the same
time it has a job failure rate reduced to zero.
In particular the method presented allows to use the LP with Torque and
at the same time to use grids. With the use of a workflow manager the end
user can both run already available workflows, modify them before execution
and build completely new analysis pipelines. This option is very powerful. In
the future we would like to develop a simple web interface to allow users to
exploit an already available workflow changing only the configuration parameters and the input files. In this way it would be possible for the users to execute
some analysis without the need of specific expertise in grid management. This
web interface would also provide the needed support for strong authentication
mechanism.
In our setup all analysis modules were written in Matlab or ITK 7 ; besides
the flexibility of the web services methods should be further investigated.
Failure management is a challenging problem in production grids, with the
consequence that jobs need a continuous monitoring. Moreover, when data
7 http://www.itk.org
97
transfers are huge, overall times grow exponentially with job failures. Therefore, data transfer still remain the most limiting factor to grid effectiveness.
The number of correlated images is, in this sense, a crucial parameter. This is
of course an ad hoc strategy which is difficult to generalize, a further improvement of this model should be concerned with the calculation of the smallest
data amount to transfer.
98
5
E X P E R I M E N TA L R E S U LT S
In this section an overview of the experimental results is given. According to the combined strategy proposed results are presented in particular for three different analyses:
segmentation, classification and computing performances.
The work developed in this thesis followed, as previously explained, three
main guidelines concerning the image segmentation which can be summarized
as in the following Table 4:
Segmentation
performances
Classification
performances
A cross-validation analysis on magnetic resonance brain
scans at 1.0T is performed.
The segmentation model is applied at 1.5T and 3.0T MR
brain scans from ADNI databases.
Performances are evaluated by comparison fo automated
segmentations with manually traced labellings, several
error metrics are provided.
Hippocampal volumes are measured as an atrophy index.
Longitudinal analyses over ADNI data is performed.
Discriminative power among healthy controls, MCI and
AD subjects are given through variance analysis.
Computing per- Workflows for fully automated segmentation are deformances
ployed both on local cluster BC2 S and Grid.
Failure rate and execution times are measured to evaluate the computing performances.
Table 4: A schematic overview of the analyses presented in this work
99
5.1
hippocampus segmentation
In this section a detailed overview of the experimental results concerning the
hippocampal segmentations are provided. Firstly let us briefly discuss the
results of the registration.
Starting from the original 181 × 145 × 181 brain scans a rigid registration
was performed using the ICBM152 template. This registration was used as
the initial guess for a refined affine registration. In this way restricted regions
of interest of dimension 50 × 60 × 60 were extracted for both left and right
Hippocampi; these hippocampal boxes were used for further analyses. In the
following Table 5 a summary of the mean hippocampal volume for each hemisphere is given.
Right Hippocampus
Volume (voxels)
Core - 60% (voxels)
rigid transform
24569
202
affine transform
14820
2578
rigid transform
24397
9
Backward Selection
13256
2633
Left Hippocampus
Table 5: A schematic summary of the registration volumes. The volume represent the
total average hippocampal volume qhile the core is the volume of the inner
Hippocampus, namely the core accounting for the 60% of the total volume.
As the Table 5 shows after rigid registrations the overall hippocampal regions of interest are far higher than would desire; in fact from literature it
is known the Hippocampus to be on average of 3200 mm3 . On the contrary,
affine registration allows to refine the region of interest, the main result in this
case is given from the core value. The voxels representing the inner 60% of
the Hippocampus, namely the core, overlap arriving to 2578 mm3 for the right
hemisphere and to 2633 mm3 for the left one. This result is in fact a measure of
the goodness of the registration. Aside from the biological variability, the 60%
of the hippocampal regions are in effect close to the expected volumes, which
should have nominal values of about 2000 mm3 .
5.1.1 Exploratory Analysis
Once the putative hippocampal regions of interest are extracted a refined analysis of the boxes can be performed. The dataset used for this analysis consisted
100
of 56 MRI scans and the corresponding manual hippocampal labellings as described in Chap. 4. First of all box intensity distributions were explored. The
gray level distributions differ significantly from one scan to another, as shown
in the example of Fig. 27.
Fig. 27: The figure shows the comparison between the gray level distribution of two
randomly chosen right boxes, as obtained after registration preprocessing
It can be noticed how registration processing has different impacts even by
looking at the gray distributions. In fact the number of occurrences for black
voxels (i. e. intensity i = 0) can be used as a measure of how many background
voxels were artificially introduced by registration, and therefore as a measure
of how similar or not were the registered image and the template. An overall
view to gray level distribution can be given by representing the cumulative
image. In this case statistical fluctuations are averaged and background voxels
are disregarded. As a consequence a smoother distribution is obtained, Fig. 28
represents this effect for the cumulative distribution of right boxes.
Left tail is significantly smoothed, this can be easily interpreted: the great
number of background voxels observed in Fig. 27 was a random effect introduced by registration.
It is also interesting to note that hippocampal voxels contribute continuously
to the gray level distribution. It is not possible to classify voxel to belong or not
to the Hippocampus only on the basis of intensity considerations. In fact, even
considering only the true hippocampal voxels of a box (as manually labeled by
the experts) there is a huge number of voxels with an intensity almost equal to
zero.
Let us say that for the considered dataset hippocampal intensities had median m about 0.45 and interquartile range IQR is about 0.25. Therefore, it is
101
4000
3500
# occurrences
3000
2500
2000
1500
1000
500
0
0
0.1
0.2
0.3
0.4
0.5
0.6
Gray−level intensities
0.7
0.8
0.9
1
Fig. 28: The cumulative image represented in the figure allows to evaluate how noise
introduced by registration or statistical fluctuations among the different images affect especially the left tail.
interesting to note that the outliers of the distribution are found above 0.90 and
below 0.10 level intensities, taking this into account it is possible to estimate an
upper and lower threshold for hippocampal voxels to be considered without
affecting the segmentation accuracy.
Another important consideration arises from the comparison of the total
number of hippocampal voxels and the total box size. As Fig. 27 shows, total hippocampal voxels are about 24 × 103 , while the total example number
is 10 × 106 resulting in a very imbalanced dataset. This point will be crucial
for further considerations, in particular it is this imbalancing the main reason
for th choice to apply classifiers such as Random Forests or RUSBoost, which
are specifically designed to manage imbalanced datasets, to the hippocampal
segmentation task.
Further insight about the data sample is obtained by studying the similarity
among the boxes. Similarity is investigated using the Pearson’s correlation
coefficient r calculated over voxel by voxel intensities, then for two images I1
and I2 , if C is the covariance matrix:
C(I1 , I2 )
r= p
C(I1 )C(I2 )
It is worthwhile to note that the correlation has to be computed only over
a small region of the boxes, both for computational issues and interpretative
concerns. In fact, with the goal to segment the Hippocampus what it is interesting to observe is whether hippocampal boxes can be defined similar in
102
the region containing the Hippocampus itself. As a consequence only the core
voxels have to be used for this computation. These core or inner voxels can
be determined according to the the FAPoD VOI hunter introduced in Chap. 4
as those concerning the 60% of the hippocampal mean shape. Therefore, once
boxes were extracted, FAPoD analysis was performed on these boxes to model
the average hippocampal shape, both for left and right Hippocampi.
FAPoD algorithm gives as a result a probability map for the hippocampal
voxel locations and therefore it allows to individuate a region containing the
Hippocampus which is smaller than the box itself. Besides, the probability map
can be conveniently and soundly thresholded to keep only higher probability
voxels to be used for correlation computation. In particular the previously
cited threshold of 60% was used to detect inner hippocampal voxels, i. e. those
whose probability to belong to the Hippocampus exceeded the 0.6 assigned by
FAPoD.
5.1.2 VOI extraction
The manual labellings described in Chap. 4 were used to train in cross-validation
the FAPoD shape model. The main result of this analysis was to describe the
hippocampal shape through a probability map.
As an example, Fig. 29 shows an hippocampal labelling contained in FAPoD
volume.
Fig. 29: The figure shows the bounding peri-hippocampal region obtained through
FAPoD analysis (green) and the labeled mask (white).
103
According to the Point Distribution Model Theory the mean hippocampal
shape can be described by the mean and standard deviation for each landmark series. These mean landmarks and the respective standard deviations
landmarks σl were combined with a pointwise linear interpolation.
Several tests were carried out to identify the optimal ratio between the number of landmarks and the mean contour length. The choice of parameters was
carried out evaluating the performance of the overall system and taking into
account the computational cost of processing. First of all, the number of images useful to retrieve the FAPoD volume was evaluated. As a performance
index in this case it was used the volume of interest measure. This results
shows that the model has a good generalization capability, in fact it is able
to contain testing shape variabilities using less than a half of the training set.
Besides it has to be stressed that starting from MR scans with an overall size
of about 4.7 × 106 voxels, and passing through boxes of 1.8 × 105 voxels the
FAPoD analysis has in the end allowed to detect a region of interest of about
2.5 × 104 voxels with dramatical decrease of the computational costs, of the
dataset imbalancing thus making the segmentation task more efficient. In Fig.
30, this result is shown, in particular the mean volume obtained by FAPoD
method plateaus using about 25 images.
4
4
x 10
1 sigma
2 sigma
3 sigma
Hippocamapal Volume mm3
3.5
3
2.5
2
1.5
1
0
5
10
15
20
25
# Images
30
35
40
45
50
Fig. 30: The volume reconstructed by FAPoD (in mm3 ) varying the number of images
used to retrieve the volume for (a) left and (b) right Hippocampus.
These tests were performed randomly sampling 100 times the training set
and varying the number of training images sampled, as a consequence for
each mean shape it was possible to detect also a standard deviation; in Fig.
30 it can be seen how 3σl , 2σl or 1σl trends behave similarly but the optimal
choice for the putative hippocampal mean shape should be the 2σl variability
104
which assures a region of about 25000 voxels. For following analyses then
VOIs (with σl = 2) are intended. The FAPoD region allows to detect a sound
per-hippocampal region and, by thresholding, to define the inner hippocampal
region where similarity analysis can be performed.
5.1.3 Correlation analysis
As stressed in the previous section, the Pearson’s correlation coefficient can be
used as a similarity measure among the hippocampal boxes. However, measuring the similarity all over the box can yield misleading information. In
fact, it has to be pointed out how the goal of this analysis is the hippocampal
similarity more than the boxes similarity.
According to this, it is the hippocampal core defined through the FAPoD
algorithm application which undergoes the correlation analysis. The following
Fig. 31 shows an example of how an MRI box is correlated with the other
boxes of the dataset.
11
10
9
# occurrences
8
7
6
5
4
3
2
1
0
0
0.1
0.2
0.3
0.4
Correlation coefficient r
0.5
0.6
0.7
Fig. 31: The figure shows the correlation coefficients computed for an image Ii of the
data set and the remaining images. As it can be shown there are several
images moderately correlated with Ii .
As Fig. 31 shows, there is a small numbers of images which are correlated
to the MRI box used for this particular example, in fact this is a general result,
for the most hippocampal boxes, moderate correlation can always found for
no more than 10 − 15 images. As the main goal of this analysis is to provide
a supervised learning technique to segment the Hippocampus, it is important
to investigate whether a smaller dataset, consisting only of the more correlated
images, can furnish a better choice for the classifier training instead of using
105
the whole dataset and therefore relying only on the generalization power of
the classifier itself.
To gain an overall perspective, average correlations can be studied. The right
boxes show a mean correlation coefficient r̄ = 0.3 while the left ones have
r̄ = 0.2. In particular looking at Fig. 32 showing averages correlations for
both right (case a) and left (case b) hippocampal boxes it is possible to outline
an interesting behavior: only a small fraction of images can be considered
correlated or moderately correlated (r > 0.3) to the dataset and in general right
Hippocampi seems to show better correlations.
12
# occurrences
10
8
6
4
2
0
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Average correlation coefficient (RIGHT)
0.45
0.5
(a)
12
10
# occurrences
8
6
4
2
0
0
0.05
0.1
0.15
0.2
0.25
Average correlation coefficient (LEFT)
0.3
0.35
(b)
Fig. 32: The figure shows the average correlation coefficients computed for both right
(case a) and left (case b) Hippocampi
106
According to this it is natural to investigate segmentation performances with
a varying number of correlated training images. In the following sections, the
experimental results about the learning phase are discussed, besides the the existence of a possible relationship between the correlation and the segmentation
accuracy is discussed. Before discussing these aspects a more detailed description of the features used to train the models and of the feature extraction has
to be given.
5.1.4 Feature importance
The hippocampal boxes used in this work, as previously described, have dimensions 50 × 60 × 60, for a total amount of 180000 voxels. On average the
feature extraction of the 315 features described in Chap. 4 requires 16 hours
of CPU time and it is by far the most intensive computation involved in the
whole segmentation pipeline [148]. Another fundamental aspects deals with
the classifiers to be adopted. A Random Forests classifier, for example, is able
to perform an internal feature selection and therefore it is able to achieve sound
performances with the whole feature sample, on the contrary other classifiers
would greatly benefit from an a priori feature selection. This is why a particular
attention has to be given to the study of the feature importance.
Several approaches were addressed to determine an optimal sub-sample of
features and different methodologies were employed. In particular as a first
test the distributions were investigated to unveil whether different features
could be considered as indistinguishable or statistically not different. At this
aim the Kolmogorov-Smirnov test was performed to evaluate which feature
distributions rejected the null hypothesis to belong to the same population.
Only statistically different features were kept for training. Another used approach was the Principal component analysis (PCA), it was adopted to have a
feature importance/independence measure, in this case only features accounting for more than the 99% variability were selected, thus 197 features survived
the PCA analysis. Finally forward and backward selection were employed.
The following Table 6 summarizes the obtained results. Methods employed,
number of features and normalized performances are shown. The latter were
obtained in cross-validation by measuring how segmentations obtained training the classifiers on the selected features overlapped the manual labellings.
Methods
# features
Performances
all features
315
1.00
Kolmogorov-Smirnov
57
0.90
107
PCA
197
0.80
Forward Selection
36
0.98
Backward Selection
23
0.97
Table 6: A schematic view of the different feature importance analyses performed.
5.1.5 Random Forest classification
The features extracted and selected were used to train a Random Forest classifier. A cross-validation approach was used for the training of both right and
left hippocampal boxes. Thus, for each box to be segmented, the training sample included only the remaining 55 boxes.
Several approaches were used, in particular K-fold cross-validation was used
to find the better training configuration. A number of images N, with N =
10, 20, 30, 40, 50 were randomly extracted from the training sample, the extraction had 100 iterations so that a hundred classifiers were trained, then predicted labels were used to segment the Hippocampus. In this phase of the
analysis the simple threshold th = 0.5 was used to make the continuous scores
given from the classifiers a binary output. The results in terms of Dice index
for the left Hippocampus are reported in Table 7. At this purpose, let us recall
Dice index D definition: D = 2(A ∩ B)/(A + B). Similar results were obtained
with the right Hippocampus.
Training Images
Mean
Median
Standard Deviation
10
078
0.79
0.01
20
0.80
0.81
0.01
30
0.81
0.81
0.01
40
0.81
0.82
0.01
50
0.81
0.84
0.04
Table 7: For each cross-validation iteration the Dice index distribution is calculated
and then mean, median and standard deviation are averaged. The table
shows these mean values and it clearly shows how performances increase
with the number of training images, however at the cost of an increased
spread in the distribution.
Within the machine learning literature, it is widely appreciated that leaveone-out (loo) approach is a suboptimal method for cross-validation, as it gives
108
estimates of the prediction error that are more variable than other forms of
cross-validation such as K-fold or bootstrap. In fact, even in this case it was
verified that K-fold cross-validation with K > 30 was stable and gave results
comparable with the loo but with a lesser variability. However, when dealing
with massive computations as those involved in the neuroimaging field it is not
possible to perform a K-fold analysis for mere questions of times. This is why
loo was used in this work as a reference value, besides an alternative strategy
(namely, active learning) was investigated, based on the correlation analysis of
the data set. This latter approach used only the N images resulted to be more
correlated with the box to be segmented, correlation as previously explained,
was measured in the inner region of the box, which can be considered to be
putatively representing the hippocampal core.
As a comparison with the previous results segmentation performances were
investigated by using the subset consisting of the N = 10, 20, 30, 40 most correlated images to the testing image. Besides, the loo segmentation was also
performed. In Fig. 33 the results for the left Hippocampus are shown as a
boxplot.
Fig. 33: The figure shows the segmentation performances for left Hippocampus using
the 10, 20, 30, 40 most correlated images and all the remaining 55 images (loo).
Similar results were obtained for right Hippocampi. It could be noticed that
it is not possible to find a significant difference in performances between the
active learning and the loo segmentations, however what it is worthwhile to
note is that performances with active learning and loo are slightly higher than
109
cross-validation performances, but in this latter case the computational burden
is dramatically increased. As a consequence, the active learning approach has
to be preferred both for performance and computational considerations.
5.1.6
RUSBoost classification
The same analyses were conducted using RUSBoost. As previously described,
RUSBoost is a classifier especially designed to take into account problems arising from high imbalanced datasets. The obtained results for both left and right
Hippocampi are statistically comparable.
The fact that both the classifiers have the same performances is important
because it confirms how one of the most severe difficulties in segmenting the
Hippocampus is the high imbalance between the classes signal/noise, as described in Chap. 4. In the following Table 8 the results for the left hemisphere
are shown to compare with those obtained with Random Forests (RF) presented in Table 7 .
Training Images
Mean
Median
Standard Deviation
10
076
0.78
0.01
20
0.79
0.80
0.01
30
0.79
0.81
0.01
40
0.80
0.82
0.01
50
0.81
0.83
0.03
Table 8: For each cross-validation iteration the Dice index distribution is calculated
and then mean, median and standard deviation are averaged. The table
shows these mean values and it clearly shows how performances increase
with the number of training images, however at the cost of an increased
spread in the distribution.
Even if results from both RF and RUSBoost are comparable, RF perform
slightly better in terms of computational burden and are decisively easier to
tune in training phase. Another important aspect is that by comparing the
Random Forests and RUSBoost segmentations while the first have a somehow
balancing in misclassifcations, i. e. there is almost the same number of false
positive and false negative predictions, the second tends to have a preferential misclassification with false negatives thus underestimating hippocampal
volumes. According to this the RF predictions are preferred and a further analysis of the RF segmentation error is presented. On the contrary a proof of
110
the misleading behavior of RUSBoost will become evident when dealing with
clinical predictions in next sections.
5.1.7 The segmentation error
It is well known that a large fraction of the errors produced by automated
segmentation algorithms are systematic, i. e., occur consistently from subject
to subject. The main reason for this is that according to the segmentation
protocol used to train the algorithms themselves can in principle be different
from those adopted for manual labellings. This explains, for example why
it is often found that performances declared by tool developers are different
from those found by final users. The absence of recognized "gold standard"
for hippocampal segmentation is the goal of several international initiatives,
as already reported, however until a common reference is not accepted, no
exhaustive treatment of the systematic errors among the different protocols
can be reported.
On the other way, it is somehow interesting to investigate the internal error
of the algorithm proposed. First of all, the similarity index used to evaluate
the performances is already an error metric. In fact, the similarity index can be
interpreted as the ratio between the examples correctly classified and the sum
of the same value with the mean error. The mean error is the sum divided by
two of the false positives (examples misclassified as signal) and the false negatives (examples misclassified as background). As a consequence, the higher
the similarity index, the lesser the number of misclassified examples.
For a qualitative consideration, it can be seen how errors are spatially distributed. To this aim the number and the position of the misclassified voxels
was registered. In Fig. 34 as an example the misclassified voxels of an axial left
hippocampal slice are shown.
The Fig. 34 can be interpreted according to a number of different perspectives. Firstly, by comparing the hippocampal shape without gray level scaling
with the corresponding scaled image it can be appreciate how, even after registration, Hippocampi are misaligned. This does not depend on the quality
of the registration, that indeed being fully automated cannot be as good as
a manual registration could be, but on the contrary it depends on the high
variability of the anatomical structure. As a consequence, when looking at the
scaled image it can be seen that the outer voxels are present in a small subsample of the training, in fact when scaling gray levels on the frequency, the
hippocampal shape is dramatically reduced.
111
Left hippocampal average Left hippocampal average (gray)
Misclassified voxels
Average and Error superimposition
Fig. 34: The left hippocampal average shape is shown both without (above left) and
with (above right) scaled gray levels. From the lower figures it emerges that
misclassification is uniformly distributed on the hippocampal contour.
Another important aspect can be deduced from the lower part of Fig. 34.
Misclassified voxels are uniformly distributed over the hippocampal contour.
It is important to investigate whether misclassification can be imputed preferentially to false positives or false negatives. If this was the case, a systematic
error would emerge. Further insight at this purpose can be given by examining
separately the distributions of false positives and false negatives as in Fig. 35.
These results are as expected as desired. Hippocampal inner voxels, regarding a more uniform region to be segmented represent an easier task for the
classification and this is why almost no classification error occur in the hippocampal core. The result according to which misclassification regards the
peri-hippocampal regions is a further acknowledgment that the classifier has
robust performances. With this aim, let us remark that, even for human raters,
it is expected that misclassification will occur on boundaries and according to
this perspective an ideal classifier should reproduce the same error rate a human rater could have. On the other hand, the fact that the error distribution
follows exactly the hippocampal contour is another clue of the registration
quality. In fact, misaligned hippocampal shapes would have led to a more
spatially spread error distribution.
112
Left false positives (FP)
FP and average superimposition
Left false negatives (FN)
FN and average superimposition
Fig. 35: The figure makes evident that no discrepancies can be found when comparing
the spatial distributions of false positives (FP) and false negative (FN).
One of the main error causes in the proposed segmentation workflow arises
from the thresholding process the classification output must undergo to obtain
a labeling to compare with the manual one. The classification output is, in fact,
a continuous score. In this work the optimal threshold value was studied in
cross-validation segmentations. The threshold optimal values was then used
to reproduce the results obtained on a second dataset which will be introduced
in section 5.2. The optimal value was found by thresholding the classification
scores with t = 0.1, 0.2, ..., 0.9. Then the mean value and the standard deviation
of the misclassified voxles for image was calculated. In Fig. 36 the results
for the left hemisphere are presented. For the right hemisphere an analogous
result was obtained.
In this way an optimal value t∗ = 0.5 was found, however a more important
conclusion is that segmentations are stable with respect to the threshold value
and the plateau is exactly where it should be expected, i. e. in a neighborhood
of 0.5, meaning that the classification is not showing a preferential behavior in
labeling signal or background. In fact, if the optimal threshold value would
have been found for higher or lower threshold values it could have been deduced a certain difficulty of the classifier in separating the classes.
113
# Misclassified voxels
3000
2500
2000
1500
1000
500
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Segmentation threshold
Fig. 36: The figure represents the average error and its standard deviation as a function of the segmentation threshold. An evident plateau for values near to
t = 0.5 is found.
A different perspective on the segmentation error was provided when looking at the ROC curves. In that cases, more precisely, it should strictly speaking
of classification error. As it will be shown in the following sections an estimate
of this error can be given as a function of the area under the curve and in terms
of in terms of confidence intervals. What still remains is to assess whether the
manual labellings and the automated segmentations provide consistent results.
To this aim a statistical measure of agreement was performed.
5.1.8 Statistical agreement assessment
In clinical measurements comparison of a new measurement technique with an
established one is often needed to see whether they agree sufficiently for the
new to replace the old. Such investigations are often analyzed inappropriately,
notably by using correlation coefficients. The use of correlation is misleading.
In fact, it can be used, as previously described, to measure if two quantities
have a functional relation and how this relation is strong. On the other hand
correlation is not a measure of agreement. Obviously, a measure which is
114
alway smaller than a second measure could be strongly linearly dependent
even if measures are not consistent.
An alternative approach, described in [152], is based on graphical techniques
and the confidence interval estimation. The measure of the agreement is performed in three steps:
a. Graphical examination of the agreement;
b. Examination of the differences of the methods against the mean;
c. Limits of agreement estimation;
d. Precision of the agreement and its significance.
It has to be noted how in clinical measurements true values remain unknown,
thus indirect methods have to applied to measure the agreement between an
established technique and a new method. The first step is readily presented
in Fig. 37 where the manual labellings and the automated volumes are respectively plotted as the x-axis and the y-axis.
8000
Automated volumes (mm3)
7000
6000
5000
4000
3000
2000
1000
0
0
1000
2000
3000
4000
5000
6000
7000
8000
3
Manual labellings (mm )
Fig. 37: The figure represents the automated volumes against those obtained through
manual segmentation. The straight line of perfect agreement is also represented in blue.
115
The proposed method agrees sufficiently well with the old, however as previously stated graphical examination is not sufficient. When dealing with calibration this assessment consists in measuring how much the new method is
able to reproduce the real values and whether it performs better than the established technique; in hippocampal segmentation, however, when comparing the
manual labellings and the automated ones, neither provides an unequivocally
correct measure of the volume of the Hippocampus.
The previous Fig. 37 shows in a qualitative way a certain agreement between
the two volumetric measure. This is clearly expected, in any case in fact two
methods aiming to measure the same quantity should provide an at least qualitative agreement. To gain further insight let us give a look to the plot of the
punctual differences against the mean difference of Fig. 38.
Differences in volume (mm3)
1500
1000
500
0
−500
−1000
−1500
0
1000
2000
3000
4000
5000
6000
7000
8000
Mean values (mm3)
Fig. 38: The figure represents the volume differences against the mean volumes obtained by manual and automated segmentations. The straight lines in blue
represent the mean value of differences and the 95% confidence limits.
This representation is critical because it allows to gain an overall comprehension of how the two methods differ and if there are significant differences
with respect to the measured values. In particular this is important to see if
the statistical variables under examination are heteroscedastic. Heteroscedasticity occurs when variance is no constant, therefore it is possible to extract
two or more samples with different variances from the original dataset. In
116
Fig. 38 it is possible to appreciate how variance is almost constant along the
x-axis. Moreover, the Fig. 38 shows that only two measure differences exceed
the 95% confidence interval, which in the following will be referred as limits of
agreement, and therefore measures are consistent.
Limits of agreement lupper and llower for a normal population are straightforwardly calculated as:
lupper = µ + 2σ
(101)
llower = µ − 2σ
(102)
where, in the present case, µ is the mean value of the differences and σ the
standard deviation. This estimate is valid under the assumption the population
is Gaussian. In fact, this was verified with a significance level α = 0.5% with a
χ2 test. The χ2 distribution for ν degrees of freedom was calculated according
to its definition:
f(x|ν) =
x(ν−2)/2 e(−x/2)
2ν/2 Γ (ν/2)
(103)
where Γ is the Euler function. It is known that this distribution behaves as a
Gaussian and therefore it is possible by calculating the experimental statistics
to test the null hypothesis of normality.
In the case of perfect agreement measures should lie on the difference mean
value straight line with the mean value of the differences being null. Heteroscedasticity can cause ordinary least squares estimates to be biased. In
this case it would result in an inappropriate inference about the confidence
intervals of the methods and therefore in a possibly wrong hypothesis test for
significant comparison of the two methods. According to this manual and automated segmentations are homoscedastic and therefore they can be soundly
compared. The previous picture also shows how differences for the two methods are statistically not significant, they do not exceed the 95% confidence
interval except for two cases. However a more significant representation is
given in Fig. 39 with the confidence interval estimation of the mean and of the
upper and lower limits of agreement.
The confidence intervals previously
represented are calculated as usual. For
p
2
the mean, the standard error is σ /N, where N is the sample size, while the
117
Differences in volume (mm3)
1500
1000
500
0
−500
−1000
−1500
0
1000
2000
3000
4000
5000
6000
7000
8000
3
Mean values (mm )
Fig. 39: In this figure, as a difference with the previous one, the confidence intervals
are represented too. It can now be appreciate that only one measure is significantly different and that an an overall range of variability of about 1000
voxels is found, corresponding to a relative 15% variability.
p
standard error for the upper and lower limits of agreement is 3σ2 /N. According to this the 95% confidence interval for the upper limit is [897, 1328];
the 95% confidence interval for the lower limit is instead [−1221, −790]. Therefore, by considering that the mean hippocampal value is of about 6400 mm3 ,
the manual labellings and the automated segmentations agree within a 15% of
the measured quantity. The analysis confirms the reliability of the automated
segmentation, thus it is possible to approach the clinical issue of the AD discrimination.
118
5.2
alzheimer’s disease classification
The segmentation workflow has shown promising performances on the training set being able to reproduce accurately manual labellings by expert neuroradiologists. Nevertheless, another important aspect of the analysis is the measure of the informative power of the segmentations and their clinical predictive
power. To approach these measure a distinct dataset was used. In this validation phase no manual labellings were available, data consisted of 1824 MRI
brain scans acquired at 1.5T from the ADNI repository 1 ; for every subject age,
sex and clinical state (control (CTRL), mild cognitive impairment (MCI) and
Alzheimer’s disease (AD)) were available. The goal of this validation phase is
to measure how much the implemented hippocampal segmentation is able to
capture a clinical information. In particular volumetric measures were investigated and their predictive power as AD biomarker was measured.
As a first step, 456 MRI scans were downloaded and processed with the proposed segmentation process. For every subjects 4 acquisitions were available:
• screening scans representing the time t0 for acquisition;
• repeat scans acquired with a very short time delay with respect of the
screening;
• month 12 scans acquired after a year from screening;
• month 24 scans acquired after two years from screening.
According to the previous results, segmentations were performed using only
the 10 most correlated images. In this way computational times were decreased
without a significant loss of accuracy, as previously pointed out. Follow-up
images are a valuable support to check whether clinical prediction yielded
from screening-repeat analyses are robust; screening and repeat images are
useful to determine the method uncertainty, as it will be shown later in this
section. In fact, screening and repeat images are acquired with a time delay
that is not significant if considering biological or physiological variations, as
a consequence a comparison between the segmentation volumes of screening
and repeat scans can be used to estimate the method uncertainty.
First of all the discriminative power of right and left hippocampal volumes is
separately investigated. The Fig. 40 shows the segmentation volume boxplots
for both the right and the left Hippocampi.
Right volumes appear slightly higher than the left ones. However no significant difference can be found among the two distributions nor a difference
1 https://ida.loni.usc.edu/
119
Right hippocampal volumes
5000
Volumes (mm3)
4500
4000
3500
3000
2500
2000
1500
1000
CTRL
MCI
AD
(a)
Left hippocampal volumes
4500
Volumes (mm3)
4000
3500
3000
2500
2000
1500
1000
CTRL
MCI
AD
(b)
Fig. 40: The figure shows both right (a) and left (b) hippocampal volumes, with CTRL/MCI/AD class discrimination.
in terms of separation of the three classes CTRL, MCI and AD. The obtained
segmentations allow to calculate for each subject the right and the left hippocampal volumes; those are used for an intra-subject and an inter-subject
comparison. Similar results were obtained for all the acquisition: screening,
repeat, month 12 and month 24 scans.
120
For every acquisition the discriminative power was measured using the receiver operating characteristic (ROC) and their area under the curve (AUC). As
an example the ROC curves for CTRL and AD in the right and left hemispheres
are shown for screening in Fig. 41.
Right Hippocampus ROC
True positive rate (sensitivity)
1
0.9
AUC = 0.82
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False positive rate (1 − specificity)
(a)
Left Hippocampus ROC
True positive rate (sensitivity)
1
0.9
AUC = 0.84
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False positive rate (1 − specificity)
(b)
Fig. 41: The ROC curves for right (a) and left (b) Hippocampi. The reported AUC is
the measure of the discrimination between CTRL and AD subjects obtained
with the screening scans.
The previous figure shows how performances increase on the left hemisphere. As already discussed in Chap. 2 this differences is not surprising.
121
For an overview of the different results with respect to the acquisition time,
i. e. screening, repeat, month 12 and month 24, the Table 9 is presented.
Acquisition
Right Hippocampus
Right Hippocampus
AUC CTRL/AD
AUC CTRL/MCI
Screening
0.82 ± 0.02
0.74 ± 0.03
Repeat
0.81 ± 0.03
0.72 ± 0.03
Month 12
0.86 ± 0.02
0.75 ± 0.03
Month 24
0.85 ± 0.02
0.75 ± 0.03
Acquisition
Left Hippocampus
Left Hippocampus
AUC CTRL/AD
AUC CTRL/MCI
Screening
0.85 ± 0.02
0.74 ± 0.03
Repeat
0.84 ± 0.02
0.74 ± 0.03
Month 12
0.88 ± 0.02
0.76 ± 0.03
Month 24
0.89 ± 0.02
0.78 ± 0.03
Table 9: In this table the volume comparison for both right and left Hippocampi is
shown, in the left column the RF results while in the right one the RUSBoost
ones . It is worthwhile to note how CTRL and AD classes are well separated,
on the contrary CTRL and MCI cannot be well separated as the previous ones.
By the way, let us stress that this behavior is expected being MCI a wide class
including subjects who will never develop the AD pathology in their lives.
As previously discussed, several studies demonstrated the hemispheric asymmetry of hippocampal volumes, besides recent studies pointed out the possibility of systematic error arising from magnetic resonance volumetry. According
to this a comparison of the segmentations obtained by Random Forests and
RUSBoost is performed, even in terms of segmentation accuracy the two classifiers obtained similar performances this is not true in terms of AUC. This
would suggest that RF classifier is not only able to perform accurate segmentations, but also that as a difference with RUSBoost its segmentations are able
to capture clinical information with more detail. Firstly, for each classification
method and for each acquisition the AUC is given with the standard error
(SE) calculated according to [149]. These results show how the hippocampal
volume can be used as a classification index for the AD diagnosis. However
the result cannot be exhaustive if ignoring age and sex informations which do
affect the hippocampal volumes.
122
According to this the analyses were repeated including sex and age in multilinear regression model. The model was built by considering the overall age
distribution; the reference time (measured in years) t0 was set to be the minimum age of the distribution. For an age t then a scaling factor k was established in 30 mm3 for year, so that if the hippocampal volume at the time t is
V(t) then the effective volume to be considered Veff is:
Veff = V(t) + (t − t0 ) ∗ k
where k was set according to the values established in several studies [150,
151]. The results obtained with Random Forests thanks to this model are summarized in Table 10. Besides, to take into account differences sexual differences,
segmentations are divided by sex. It is well known, in fact, that hippocampal
volumes differ according to the sex of the examined subjects.
Males
Acquisition
Right Hippocampus
Right Hippocampus
AUC CTRL/AD
AUC CTRL/MCI
Screening
0.83 ± 0.02
0.75 ± 0.03
Repeat
0.84 ± 0.03
0.74 ± 0.03
Month 12
0.85 ± 0.02
0.76 ± 0.03
Month 24
0.87 ± 0.02
0.76 ± 0.03
Left Hippocampus
Left Hippocampus
AUC CTRL/AD
AUC CTRL/MCI
Screening
0.86 ± 0.02
0.75 ± 0.03
Repeat
0.87 ± 0.02
0.77 ± 0.02
Month 12
0.91 ± 0.02
0.79 ± 0.02
Month 24
0.90 ± 0.02
0.79 ± 0.02
Females
Acquisition
Right Hippocampus
Right Hippocampus
AUC CTRL/AD
AUC CTRL/MCI
Screening
0.87 ± 0.02
0.79 ± 0.02
Repeat
0.83 ± 0.02
0.74 ± 0.03
Month 12
0.90 ± 0.02
0.78 ± 0.02
123
Month 24
0.89 ± 0.02
0.80 ± 0.02
Acquisition
Left Hippocampus
Left Hippocampus
AUC CTRL/AD
AUC CTRL/MCI
Screening
0.88 ± 0.02
0.76 ± 0.02
Repeat
0.86 ± 0.02
0.75 ± 0.02
Month 12
0.90 ± 0.02
0.80 ± 0.02
Month 24
0.90 ± 0.02
0.80 ± 0.02
Table 10: In this table the volume comparison for both right and left Hippocampi, as
before, is considered by separating male and female subjects. Moreover, the
effect of aging is considered by means of a linear de-trending.
Looking at Table 9 and Table 10 a significant improvement in classifications
is obtained especially for the CTRL/AD classes. At this purpose an important consideration has to be stressed. The AUC is not able for its nature to
capture clinical differences arising from age and sex informations, in fact it
is only based on the segmented volume. As a consequence, it is important
to learn another model to describe these results, for the sake of simplicity a
regression approach was chosen. This is obtained by considering for every
acquisition time the least square linear model describing CTRL hippocampal
volumes. Moreover, an accurate model can be defined only if an estimate of
the uncertainties for its points is known, on the contrary a robust fit could not
be provided. According to this a detailed analysis of the stability of the results
was performed.
5.2.1
Stability of the results
Another important aspect of the analysis consists in the study of the stability
of the results. In particular it is of paramount important that The stability
of the segmentation workflow can be measured both analyzing the relationships within the right and left labellings or evaluating the presence of shape
incongruence. The first relationships were measured adopting the correlation
between the labellings as a parameter of the stability of the segmentation itself. The second was measured in the SPHARM framework by investigating
the presence of statistically significant differences among the acquisition and
compared to the different acquisitions. Firstly the correlation study is examined. The figures Fig. 42 and Fig. 43 show how different acquisition volumes
are compared, both for right and left Hippocampi.
124
0.79
0.79
0.76
0.87
0.88
3000
2000
1000
0
4000
0.79
2000
Month 12
0
4000
Month 24
Repeat
Screening
Correlation Matrix Right Hippocampus
4000
4000
0.79
0.87
0.76
0.88
0.92
2000
0
0.92
2000
0
0
1000 2000 3000 4000
0
Screening
2000
4000
0
Repeat
2000
4000
0
2000
4000
Month 24
Month 12
Fig. 42: The figures represent the correlation among the volumes obtained for the 4
acquisitions for the right Hippocampi.
Screening
Correlation Matrix Left Hippocampus
0.90
0.84
0.86
0.83
2000
0
0.90
Repeat
0.88
4000
4000
2000
Month 24
Month 12
0
0.88
0.86
0.84
0.83
0.91
4000
2000
0
0.91
4000
2000
0
0
2000
4000
Screening
0
2000
4000
Repeat
0
2000
4000
Month 12
0
2000
4000
Month 24
Fig. 43: The figures represent the correlation among the volumes obtained for the 4 acquisitions for the left Hippocampi. In general linear correlation is confirmed
as expected; left Hippocampi have slightly better behavior, however comparable with left segmentations.
The left Hippocampi show in general better correlations, even if no statistically significant differences can be found compared to right correlations.
However, it is already known that no left-right invariance can be assumed
when dealing with anatomical structures, and therefore it is quietly expected
125
(a)
(b)
Fig. 44: The colormap (p-values higher than 0.05 correspond to hot colors) figure allows to represent significant differences in the left hippocampal regions (a).
They are especially significant in correspondence of the hippocampal head
and its digitations. It is worthwhile to note how statistical differences are
more evident in the left segmentations than in the right ones (b).
to found small differences. Nevertheless a time dependent decreasing correlation is found as expected.
A further analysis was performed on the statistical shape model deriving
from the automated segmentations. The statistical model was built within the
SPHARM framework, 80 meshes were identified to calculate local variations in
shape, the choice was driven from the concurring needs to have a local information about the segmentations and at the same time to deal with hippocampal
regions which could have significance from the anatomical point of view.
According to the methodology described in Chap. 4 segmentations were
first topologically fixed, registered with a FOE procedure and parametrized,
then finally labels were compared and a local p-value map, with the usual
significance value p = 0.05 was built. In the Fig. 44 the results are shown
for CTRL/AD comparison in the right and left hemispheres. The results show
a significantly greater presence of variability between the CTRL and the AD
classes. It would seem therefore that left Hippocampi should be preferred to
investigate sub-regional discriminative behaviors. In fact further developments
could analyze if the inter-class separation spreads when taking into account
localized hippocampal sub-regions.
According to the analysis of stability performed, it emerges how segmentation is reproducible, on the contrary no linear correlations would hold between
different acquisition time. In fact, a more detailed study of the correlation between screening and repeat scans has to be emphasized because it is looking at
these acquisitions that it is possible to infer eventual variabilities of the method
126
itself. It is in fact clear that being screening and repeat scans acquired on a time
scale where no biological variation can occur only source of variability in this
case is due to the segmentation itself. With the goal of estimating the method
uncertainty then a further more detailed comparison of screening and repeat
scans was performed.
5.2.2 The method uncertainty
In order to build a robust regression model for hippocampal volumes the residual distribution was first investigated. For each subject the hippocampal volumes obtained from the screening and the repeat scan were compared. The
difference was calculated and the mean and the standard deviation of the sample determined. In Fig. 45 the distribution and the estimated density are
represented.
Fig. 45: The figure shows how the 456 differences between screening and repeat segmentation volumes have mainly null values. In particular the standard deviation of this distribution can be used to determine a conservative value for the
method uncertainty.
The difference distribution is clearly symmetric and has a mean value which
cannot be significantly distinguished by zero. In fact, the mean value µ = −1.4
mm3 and the standard deviation σ = 84.3 mm3 for the differences between
127
screening and repeat volumes show how stable the method is. Besides, the fact
that differences are on average compatible with zero is a nice proof that the
method has no significant bias, however a more detailed analysis about this
aspect will be discussed in the next section.
Besides, only the left hippocampal volume is considered, in fact previous
results showed that no significant different information arises from the examination of the right hemisphere. The linear model of CTRL volumes is used
to determine the confidence intervals. Given β̂ the expected slope and α̂ it is
possible to find the values of expected volumes V̂ and therefore the error on
the parameters of the model β̂ and α̂.
The error of the slope for N = 456 volumes is estimated by:
PN
(Vi −V̂i )
(N−2)
i=1
= qP
N
(104)
i=1 (xi ) − x̄
and then it is possible to obtain the 95% confidence interval for the slope by
equation (104):
β95% = [β̂ − tN−2 × , β̂ + tN−2 × ]
(105)
where tN−2 denotes the usual t-student distribution value for N − 2 degrees
of freedom. According to this another fundamental result can be given in terms
of confidence intervals. In this way it is also possible to furnish an estimate of
the significance of the prediction. For example, if the significance value for the
risk of assigning to a CTRL subject the AD pathology is desired to be small, let
us say below the 5%, then the β95% confidence interval previously calculated
should be used.
The result shown in Fig. 46 allows to appreciate the physiological atrophy
rate, in terms of the negative slope of CTRL population. As measures in several
studies the physiological atrophy rate is about 30 mm3 which is comparable
with the estimated negative slope m = −33.4 ± 3.2 mm3 .
As a comparison, the same described procedure was applied to month 24
acquisition. The results are shown in the bottom row of Fig. 47.
How it was already pointed out from the ROC analyses, the classification performances improve when comparing screening acquisitions with follow-ups.
The main reason is that neurodegenerative pathological conditions yield more
severe atrophic conditions over time. Besides, anatomical differences due to
128
6000
CTRL 95 conf int
AD volumes
5000
Volume (mm3)
4000
3000
2000
1000
0
55
60
65
70
75
Age (years)
80
85
90
Fig. 46: The CTRL population is used to estimate a linear model with its 95% confidence interval, represented in red. The AD population is represented through
the segmented volume and the related uncertainty in blue. The figure shows
how nicely AD are separated from CTRL subjects.
6000
CTRL 95 conf int
AD volumes
5000
Volume (mm3)
4000
3000
2000
1000
0
55
60
65
70
75
Age (years)
80
85
90
Fig. 47: As in the previous figure a linear model for CTRL population and its 95%
confidence interval is shown in red. In this case represented volumes are
obtained from 24 month follow up. It is evident how pathological atrophy
rate makes AD subject hippocampal volumes easier to be distinguished after
2 years.
sex and age, even physiological, result in misleading values of volumes. The
Fig. 47 shows how for the left hemisphere in 24 months AD had a significant
impact on the hippocampal volumes.
From a more quantitative point of view, not only AUC indicators, as seen
previously, manifest an increasing discriminative power. Another measure of
129
the classification performance improvement can be given in absolute terms considering the previous Fig. 46. According to this figure, it is possible to observe
that for screening scans subjects reporting correct classification represent the
91% of their populations. Performances improve when considering the month
24 scans. In this case the AD subjects are correctly classified for the 94% of the
sample.
The effect of considering a model linear in the subject year improves the
classification performances because it takes into account physiological atrophy rate. In this case the number of corrected classification rises. A further
improvement ca be acquired when considering separately the male and the
female population as shown in Fig. 48
6000
6000
CTRL 95 conf int
AD volumes
5000
5000
4000
4000
3
Volume (mm )
Volume (mm3)
CTRL 95 conf int
AD volumes
3000
3000
2000
2000
1000
1000
0
55
60
65
70
75
Age (years)
(a)
80
85
90
0
65
70
75
80
Age (years)
85
90
(b)
Fig. 48: The AD male hippocampal volumes in (a) and female ones in (b) are shown in
blue and compared respectively with male and female CTRL population. As
expected from other studies, female atrophy rate is clearly different (and less
severe) than the male one. As a consequence separating the two populations
helps the discrimination between CTRL and AD.
In fact, for screening acquisition male subjects correctly labeled as AD subjects are the 92% while female subjects are the 93%. Atrophy rate changes
if considering male or female subjects, by the way the result is confirmed in
literature being proved the female subjects to have a less severe atrophy rate
resulting in a minor slope of the regression model. Consistent results were
obtained with RUSBoost too. These results enclose a convincing picture, hippocampal volume is confirmed as a supportive feature for the AD diagnosis
and the automated segmentation is able to capture this clinical information.
130
5.3
distributed infrastructure exploitation
The proposed segmentation workflow requires an average processing time of
about 160 minutes on our local workstation (AMD Phenom 9950 Quad-Core
Processor, 2 GB RAM) to segment both right and left Hippocampi. Thus, the
whole cross-validation dataset, consisting of 56 MRI scans, would require a
CPU time of about 7 days to be segmented. The longitudinal analysis of the
second dataset, consisting of 1824 MRI brain scans for 4 different acquisitions,
would require a CPU time of about 2 years and 2 months.
The implemented smart system of job queuing allows to use the local BC2 S
computer farm and the geographically distributed grid with efficient use of
time. In particular the segmentation workflow modules requiring small resources to be computed were submitted to the local farm (for example the
correlation analyses) while the module requiring more resources were submitted to the grid. We used the LONI pipeline Command Line Interface (CLI)
with the −batch and −parallel options enabled to execute parallel workflows.
The LONI client/server application ran on node on a node of the local farm
with 2GB RAM. The resources required by the java CLI process scale with the
number of workflows to run in parallel. As a consequence, the more is the
number of parallel workflows, the more the memory consumed.
In Fig. 49 the results collected executing the segmentation of the whole
ADNI dataset are shown. It can be noticed how no segmentation exceeded the
amount of time of 500 minutes, which was then the time requested to segment
the whole dataset.
60
# performed segmentations
50
40
30
20
10
0
0
66
133
200
267
run−time (min)
333
400
467
Fig. 49: The figure shows the overall run-time distribution using the combination of
the local farm and the grid infrastructure.
131
The performance benefit can be better appreciated in Fig. 50 where the
number of segmentations to execute is plotted against time.
500
450
Remaining Segmentations
400
350
300
250
200
150
100
50
0
0
1
2
3
4
5
6
7
Overall segmentation time (hours)
8
9
10
Fig. 50: In this figure the time sequence of the executed segmentation is represented.
The results show how in fact the 95% of the segmentations were executed in
less than 7 hours.
The grid "inertia" due to job submission and match making operations performed by the WMS translates into an initial latency that can be further dampened by increasing the number of parallel workflow executions. After this
initial step the advantages of the grid execution are evident since we obtained
the 90% of the segmentations after less than 7 hours. The job submission tool
(JST) is able to exploit the paradigm of the pilot jobs and the capabilities of late
binding that assure a good performance in scheduling all the jobs submitted
to the grid infrastructure. Moreover, the multiple jobs pulled by the pilot on
the worker node can make use of the same run-time environment: this helps
performance by eliminating the need for duplicated installation of the run-time
software and by exploiting cached data.
The JST framework allows also to collect detailed monitoring information
about each submitted job. This is very useful for further analyses and statistical studies. For example, the effective time spent by each job on the selected
worker-node to execute its work (input data transfer and processing, output
generation and transfer to the target storage element) can be measured. Besides, It is possible to generate and analyze the run-time distributions for each
module of the workflow. The most interesting result, obtained for the Random
Forest training module, is shown in Fig. 51. Unlike the other distributions
(characterized by a single peak and a few outliers), this distribution shows
clearly a bimodal behavior, the two peaks are spaced of about 50 minutes.
132
60
50
# jobs
40
30
20
10
0
0
10
20
30
40
run−time (sec)
50
60
70
80
Fig. 51: In this figure RF execution times are represented. It is evident a two peak
distribution.
The RF training module is indeed the most data-intensive since it needs also
to download and read the features files of the ten most correlated images: therefore, the performance penalty is maximum when the grid copy fails since the
job waits for the copy timeout (200 seconds for each input file) before switching to other protocols (e.g. http). Moreover, an important measure for the
efficiency of the proposed method is given by the overall run-time distribution
shown in Fig. 52.
120
100
# of segmentations
80
60
40
20
0
50
100
150
200
250
300
350
400
run−time (min)
Fig. 52: Distribution of the workflow execution times derived from job monitoring
information.
It is worthwhile to note that grid submission overhead does not involve a
strong degradation of the performances. From the comparison between workflow total execution times of Fig. 52 and the overall run-time distribution of
Fig. 49 it can be underlined how the distribution peak of the latter (i.e. the
average run-time) and its width are greater of a factor 2 and 3.5 respectively:
133
this is a remarkable result taking into account the high number of jobs and the
failure rate reduced to zero thanks to the automatic job re-submission implemented by the JST. In fact, this comparison shows that grid submission times
do not have a severe impact in terms of computational times and therefore grid
is an exploitable resource for distributed segmentations.
134
6
CONCLUSIONS
This thesis explored the hippocampal segmentation task from different perspectives.
On one hand in fact the segmentation accuracy of the proposed analysis framework
was discussed, besides an investigation of the clinical information contained within the
hippocampal volume, seen as a supportive feature of the AD diagnosis and an analysis
of the computational burden invoked to analyze the data and eventually allow large
clinical trials were performed
6.1
motivations and summary
In the last few years, the development of signal processing techniques has enabled the visualization and measurement of pathological brain changes in vivo,
producing a radical change, not only in the field of scientific research, but also
in everyday clinical practice. As discussed in 2 among the several fields, neuroimaging techniques are become of paramount important in the early diagnosis
of a number of pathologies. The current need for new biological signatures
in the clinical field match the "big data" era whose maturity is confirmed by
several aspects. The goal to achieve sound computer aided diagnosis is particularly urgent for those scientific fields where the data analysis is seen as a
time-consuming burden more than a resource. This is, in fact, true for neuroimaging. An example of this is the Alzheimer’s disease. One of the signatures
of this pathology is in fact the atrophy of the medial temporal lobe and in particular a relevant role is played by the measure of the hippocampal volume, a
brain structure extensively described in 2.
Brain MR scans are a huge source of information. They provide detailed
insight when other imaging techniques just cannot as discussed in 3. The ad-
135
vent of new technologies, especially of high and ultra-high field scanners has
reduced the signal to noise ration, however MR scans examination is a timeconsuming activity. Moreover inter-rater variability sometimes makes visual
inspection of this kind of data an art more than a quantitative and objective
task. Intra-rater variability is for the example the main cause for the lack of
a confirmed consensus over the hippocampal segmentation. Besides, to the
inter-rater variability the protocols defined to segment the Hippocampus yield
another variability source, so that there is no certainty about the "true" hippocampal shape nor about the best method to manually achieve that.
In this scenario the need for automated segmentation methods arose. They
can in fact tackle both the main issues of manual labeling. The main source of
variability, i. e. the inter-rater one is ridden over when adopting a deterministic segmentation framework, nevertheless variability arising from differences
in training and test procedure can always be dealt with statistical analysis.
Moreover, the presence of a unique segmentation protocol would eliminate the
systematic errors the use of different protocol necessary involve. It has been
widely proven that different segmentation protocols yield obviously different
segmentations, the main drawback however is that the lack of a gold standard
prevent the possibility to measure this bias. An estimate of the relative discrepancies among the different methods can be useful to understand where
or whether methods differ, this is why several world initiatives are facing the
challenge to develop a fully automated segmentation protocol.
According to this a fully automated segmentation workflow is presented
in 4. It is trained and tested in a cross-validation framework on MRI scans
whose Hippocampi were manually labeled. Moreover, it is well known how
hippocampal volume is a supportive feature for AD diagnosis. To validate the
workflow another set of MRI scans from ADNI database was used, as a difference with the training set this database lacks of manual labellings, but on
the contrary in this case clinical labels and information are provided, so that
for example it is known age, sex and clinical status of the examined subjects.
Finally a feasibility study is performed. A non secondary aspect of segmentation workflows is, in fact, the computational burden they require. This is
important not only with the aim of allowing large clinical trials, but also in a
smart health-care perspective which could greatly benefit of automated tools
able to perform real-time or quasi real-time analyses.
The results of this work are presented in 5. First of all segmentation performances are evaluated from different perspectives, nevertheless the main result
in this case is given in terms of the similarity index, a popular error metric for
medical imaging. The performances acquired reach in fact state of the art algorithms. Besides, an evaluation of the clinical information is performed. Even in
136
this case results are promising. The metrics adopted to compare hippocampal
volumes obtained through the presented segmentation workflow with other
state of the art techniques show a good agreement. Finally, computational performances were presented, it was shown how even a thousand MR scans can
be segmented in a fully automated framework in reasonable times.
6.2
segmentation
This work proposes an automated tool for the segmentation of the Hippocampus, a structure of great importance in numerous brain diseases. This is an
innovative approach based on the use of discriminating features and on their
classification using a RF classifier in a VOI delimited with the novel method
FAPoD. This method, based on shape evaluation only, was able to reflect
the database heterogeneity within 2σl variation. Future further improvement
could be achieved combining shape-based and intensity based information,
perhaps using morphing methods before FAPoD.
A number of studies using atlas-based approaches reported similarity index
coefficients in the range 0.75-0.88 [153, 154, 155, 156, 157, 158]. Values as high
as 0.86 for healthy subjects [159] and 0.88 for mixed-cohorts [160] have been
achieved with graph-cuts while a similarity coefficient of 0.85 is reported using embedded learning [161]. Automated region-growing methods with automatic detection and correction of atlas mismatch obtained similarity indexes of
0.87 ± 0.03 for healthy subjects and 0.84 ± 0.05 for a mixed-cohort (healthy controls and patients with temporal lobe epilepsy) [162]. Other studies [163] using
FMASH obtained average Dice coefficients of 0.82 ± 0.01 and 0.80 ± 0.01 respectively on two mixed cohorts databases. By far, the most promising results in
the literature have been acquired through patch-based multi-atlas segmentations [89, 88, 83].
To the best of my knowledge, this is the first application to hippocampal segmentation of a RF classifier combined with expert priors on shape. The number of features required for a robust classification was much reduced compared
with other work [164], which did however show comparable performance (similarity index = 0.85). The average similarity index of 0.84 ± 0.04 obtained in this
study is in keeping with existing results for mixed cohorts, i.e. composed of
healthy controls and diseased subjects.
It is worthwhile to note that the study was conducted on 1.0 T images, which
suffer from a lower signal-to-noise ratio compared with high-field datasets.
Nevertheless, the acquired performances compare well with those published
with high-field data and homogeneous cohorts. In addition it is important
137
to keep in mind that the segmentation results are also influenced by segmentation protocol used for the manual labeling [90], as a consequence segmentation
performances should always be compared with peculiar attention. On the contrary, a more significant result from a scientific point of view concerns the
confirmation of the hippocampal volume as a supportive feature of AD. This
aspect is particularly emphasized in the next section.
Differences in image quality, manual segmentation protocol, clinical status
and demographics have been described as possible causes of discrepancy in
results [160]. It should also be noted that the currently available protocols for
manual segmentation include information that is not entered in the training of
automated algorithms, generating additional source of variability. The inclusion or exclusion of white matter, the use of arbitrary lines, the exclusion of
parts of the hippocampal tail or of vestigial hippocampal matter [165, 166, 167]
lead to non-systematic differences in hippocampal segmentations across subjects, since these portions have different size based on individual morphology.
A critical advantage of the use of machine learning algorithms is the possibility of using very large training datasets, shared by the scientific community.
This is exemplified by the efforts of the EADC-ADNI working group to develop a standard harmonized protocol for manual segmentation ([168, 169],
www.hippocampal-protocol.net).
Moreover, the proposed method has shown nice performances in terms of
stability and segmentation uncertainty. It has been proven how the method
agrees with manual labellings by expert neuroradiologists and no statistical
significant bias can be outlined within a 95% confidence interval, a necessary
feature for a robust and accurate clinical classification.
Finally an analysis of the stability of the segmentation workflow was performed. According to correlation measures over the segmented volumes, it
was shown how a fundamentally linear relationship existed between the volumes acquired at different times and this was confirmed for both the left and
the right Hippocampi. This is the best assurance the method is stable, because
it implies that there is a linear correspondence among the volume segmented
for the same subject in different times, therefore the segmentation indeed is
unveiling a substantial information. Correlations are higher for the left Hippocampi r 0.90 to be compared with r 0.80 for the right ones. However, this
can be explained with the same argumentations concerning the higher variability of the right hemisphere.
138
6.3
clinical classification
The segmentation workflow has shown promising performances on the training set being able to reproduce accurately manual labellings by expert neuroradiologists. Nevertheless, another important aspect of the analysis is the
measure of the informative power of the segmentations and their clinical predictive power.
The segmentation performances deal with the capability of the analysis to
learn from manually labeled examples and to to reproduce the rater expertise.
Once segmentations are acquired, a clinical evaluation of the results has to be
made. This evaluation is clinical in the sense that its goal is to retrieve with a
statistical data analysis the clinical label of the subject and then to discriminate
controls from AD patients.
This evaluation was performed in terms of ROC curves and AUC measures.
Results have demonstrated how the method is able to capture the clinical information; in fact, it is possible to discriminate CTRl from AD with AUC measures > 0.82 for right Hippocampi and > 0.84 for left Hippocampi. Another
important aspect is the effect of physiological and non physiological atrophy.
According to this it was verified how performances depend on scan time acquisition, as a matter of fact the previously cited results when dealing with scans
acquired 24 months later give overall AUC measures > 0.85, even if this result
has obviously not the same relevance of discrimination acquired with respect
of the baseline. An interesting aspect of this analysis was the acknowledgment
of the asymmetric behavior of right and left Hippocampi. It was confirmed, for
example, that due to the higher variability of right Hippocampi, performances
regarding the right hemisphere were slightly worst than those concerning the
left one.
Besides, another shape analysis was performed in the SPHARM framework
to analyze inter-class differences. This analysis showed that, according to the
previous results, the left Hippocampus was more useful to detect differences
among CTRL and AD, this can be expected because the higher physiological
variability of the right hemisphere could prevent the observation of anomalous
behavior, except for extraordinary situations.
Stability allows to furnish an estimate of the method uncertainty, in fact by
comparing screening and repeat segmented volumes, whose difference should
always be considered ad a consequence of the only method variability. The
analysis of the residual distribution obtained by comparison of screening and
repeat scans is clearly symmetric and has a mean value which cannot be significantly distinguished by zero. In fact, the mean value µ = −1.4 mm3 and the
standard deviation σ = 84.3 mm3 show how stable the method is. Besides, the
139
fact that differences are on average compatible with zero is a nice proof that
the method has no significant bias.
Thanks to the uncertainty estimation a robust regression model depending
on hippocampal volumes, age and sex was built. The analysis confirmed that
male and female subjects have different behaviors, deriving from significant
anatomical differences. It was found, according to this, that discrimination
power increases when separating the male and the female populations. These
performances can be further improved if age information is kept. In fact stateof-the-art AUC values of 0.92% for male and 0.93% for female subjects are
acquired. To achieve these results the adoption of a multilinear model taking
into account volumes, age and sex was fundamental.
These results enclose a convincing picture, hippocampal volume is confirmed
as a supportive feature for the AD diagnosis and the automated segmentation
is able to capture this clinical information.
6.4
computation
In the case of medical images the creation of large databases is very time consuming and methodologically challenging [170, 171, 172, 164]. The active learning in the classification step is therefore useful for detecting reduced training
datasets without a significant loss in performances. The approach could play a
substantial role in the use of distributed computing infrastructures by reducing
the training set size and, therefore, overcoming upload/download problems
and reducing the training phase time.
In this work a study for grid deployment was performed with the aim of
creating automated segmentation algorithms for large screening trials. Several
tests were carried out both on the local computer farm BC2 S and on the EGI.
BC2 S is a distributed computing infrastructure consisting of about 5000 CPU
and allowing up to 1.8 PB storage, while the EGI consists of about 300 geographically distributed sites around the world. In particular all the results presented in this study were obtained on the BC2 S using 56 MRIs at our disposal
and 1824 ADNI images. In addition, a feasibility study on all these images was
performed with success on the EGI. The method proposed tackles two different
problems the distributed infrastructures have to deal with: it does not strongly
suffer from an overhead performance deterioration and at the same time it has
a job failure rate reduced to zero.
In particular the method presented allows to use the LP with Torque and at
the same time to use grids. With the use of a workflow manager the end user
can both run already available workflows, modify them before execution and
140
build completely new analysis pipelines. Future works should aim to develop
a simple web interface to allow users to exploit an already available workflow
changing only the configuration parameters and the input files. In this way it
would be possible for the users to execute some analysis without the need of
specific expertise in grid management. This web interface would also provide
the needed support for strong authentication mechanism.
Failure management is a challenging problem in production grids, with the
consequence that jobs need a continuous monitoring. Moreover, when data
transfers are huge, overall times grow exponentially with job failures. Therefore, data transfer still remain the most limiting factor to grid effectiveness.
The adoption of active learning is, in this sense, crucial. This is of course an
ad hoc strategy which is difficult to generalize, a further improvement of this
method should be concerned with the calculation of the smallest data amount
to transfer.
LIST OF FIGURES
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
The figure shows a T1 high resolution brain template
from the International Consortium for Brain Mapping.
The primary goal of the ICBM project is the development
of a probabilistic reference system for the human brain. .
The figure shows the intraventricular aspect of the Hippocampus. 1, hippocampal body; 2, hippocampal head
and its digitations; 3, hippocampal tail; 4, fimbria; 5, crus
of the fornix; 6, subiculum; 7, splenium of the corpus callosum; 8, calcar avis; 9, collateral trigone; 10, collateral
eminence; 11, uncal recess of the temporal horn. Image
scale of 1 cm is represented in the right lower corner. . .
The figure shows a T1 sagittal view of the ICBM 152
template; the Hippocampus and the adjacent amygdala
boundaries were manually pointed out. . . . . . . . . . .
~ and
A symmetrical spinning top with angular velocity Ω
mass m precessing in a constant gravitational field and
in correspondence the precession of the angular momentum ~J. The angular momentum increment d~J involves a
counterclockwise precession. . . . . . . . . . . . . . . . . .
Clockwise precession of a spin around the magnetic field
~ direction. . . . . . . . . . . . . . . . . . . . . . . . . . . .
B
The figure shows the contrast as a function of the relaxation time for gray matter and white matter. . . . . . . .
This figure shows the contrast as a function of the relaxation time for gray matter and cerebrospinal fluid; it is
important how contrast changes according to the examined tissues. . . . . . . . . . . . . . . . . . . . . . . . . . .
The figure shows the contrast as a function of the echo
time for gray matter and white matter. . . . . . . . . . . .
This figure shows the contrast as a function of the echo
time for gray matter and cerebrospinal fluid. . . . . . . .
The figure shows a comparison among (in clockwise order): a Proton density, a T1 -weighted and a T2 -weighted
brain scan from ICBM. . . . . . . . . . . . . . . . . . . . .
27
28
29
37
38
51
51
52
53
53
143
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Flow chart of the segmentation method, according to the
following steps: 1) volume of interest extraction, 2) determination of voxel features and 3) voxel classification.
The learning phase is represented in detail in the classification box, while in red input and output data. . . . . .
The figure shows a mesh representation compared with
its projection onto a unit sphere; the colors are just a
figurative representation for different sub-hippocampal
regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Comparison of hippocampal shapes reconstructed with
a different number of coefficients cm
l : 1 coefficient, 5
coefficients, 10 coefficients and 15 coefficients. As expected, the more the number of coefficients, the more
the details the model is able to capture. . . . . . . . . . .
Comparison of hippocampal shapes with thee original
no fixed (left) and fixed (right) topology. In the left figure, some isolated voxels are clearly visible. . . . . . . . .
The figure represents two different hippocampal labellings
aligned along the principal dimension. It has to be noted
that this procedures yields a rigid registration which
does not modify the mask dimensions. . . . . . . . . . .
A watch from the standard MPEG database. . . . . . . .
An example of the noisy process over the template watch.
The figure shows a visual comparison between an image representing a disk and its counterpart obtained by
randomization of scaling, translations and rotations. . . .
A mathematical pseudo-landmark is defined as the contour voxel whose distance from the chord subtended by
two consecutive landmark is maximum. . . . . . . . . . .
The figure shows an example of how the probabilistic
values are associated to a landmark and its neighbors. . .
The Random Forest algorithm. A training set with N examples of d features is sampled k times with bootstrap.
Each bootstrapped training set Ti is internally divided
in a training and a test set with a 2:1 ratio. Then a random sample of f features is performed, those features
are used to split the test set in two leaves. The procedure is iterated until the leaf size reaches a desired size,
in classification this is usually set to one. . . . . . . . . . .
57
66
67
70
71
72
73
74
77
77
84
Figure 22
Figure 23
Figure 24
Figure 25
Figure 26
Figure 27
Figure 28
Figure 29
The simplified workflow implementation. In reddish diamonds there are the input/ouput modules, the backend analysis modules are represented by turquoise diamonds. To better emphasize the possibility to dynamically choose the local farm or the grid infrastructures,
these backend modules are shown in yellow. . . . . . . . 90
A simplified representation of the hippocampus segmentation algorithm. In this case particular emphasis is given
to computational issues: parameters and critical aspects
are shown in round boxes, processing steps in rectangles and files in diamonds. The distributed computing
is enclosed by a dotted line while the end user interface
is shown in reddish diamonds. . . . . . . . . . . . . . . . 94
The segmentation algorithm in its LP implementation.
It consists of four main modules: the Adaptive Training
module measures the correlation between the image to
be segmented and the training set, the Feature Extraction
module calculates the features on the testing image according to the volume of interest determined by the previous module, the Random Forest Training module performs the learning phase for the chosen classifier and finally the segmentation is returned by the Random Forest
Prediction module. Each module is compiled and therefore able to be run on a distributed computing structure. 95
Each module hides implementation details, in particular
the checkinput, insertJob and getStatus modules which
allow in a transparent way to control the workflow without the need to be concerned with submitting or monitoring issues. . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Web service call sequence implemented in the LP modules. 97
The figure shows the comparison between the gray level
distribution of two randomly chosen right boxes, as obtained after registration preprocessing . . . . . . . . . . . 101
The cumulative image represented in the figure allows
to evaluate how noise introduced by registration or statistical fluctuations among the different images affect especially the left tail. . . . . . . . . . . . . . . . . . . . . . . 102
The figure shows the bounding peri-hippocampal region
obtained through FAPoD analysis (green) and the labeled mask (white). . . . . . . . . . . . . . . . . . . . . . . 103
Figure 30
The volume reconstructed by FAPoD (in mm3 ) varying
the number of images used to retrieve the volume for (a)
left and (b) right Hippocampus. . . . . . . . . . . . . . . . 104
Figure 31
The figure shows the correlation coefficients computed
for an image Ii of the data set and the remaining images.
As it can be shown there are several images moderately
correlated with Ii . . . . . . . . . . . . . . . . . . . . . . . . 105
Figure 32
The figure shows the average correlation coefficients computed for both right (case a) and left (case b) Hippocampi 106
Figure 33
The figure shows the segmentation performances for left
Hippocampus using the 10, 20, 30, 40 most correlated images and all the remaining 55 images (loo). . . . . . . . . 109
Figure 34
The left hippocampal average shape is shown both without (above left) and with (above right) scaled gray levels.
From the lower figures it emerges that misclassification
is uniformly distributed on the hippocampal contour. . . 112
Figure 35
The figure makes evident that no discrepancies can be
found when comparing the spatial distributions of false
positives (FP) and false negative (FN). . . . . . . . . . . . 113
Figure 36
The figure represents the average error and its standard
deviation as a function of the segmentation threshold.
An evident plateau for values near to t = 0.5 is found. . . 114
Figure 37
The figure represents the automated volumes against
those obtained through manual segmentation. The straight
line of perfect agreement is also represented in blue. . . . 115
Figure 38
The figure represents the volume differences against the
mean volumes obtained by manual and automated segmentations. The straight lines in blue represent the mean
value of differences and the 95% confidence limits. . . . . 116
Figure 39
In this figure, as a difference with the previous one, the
confidence intervals are represented too. It can now be
appreciate that only one measure is significantly different and that an an overall range of variability of about
1000 voxels is found, corresponding to a relative 15%
variability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Figure 40
The figure shows both right (a) and left (b) hippocampal
volumes, with CTRL/MCI/AD class discrimination. . . . 120
Figure 41
The ROC curves for right (a) and left (b) Hippocampi.
The reported AUC is the measure of the discrimination between CTRL and AD subjects obtained with the
screening scans. . . . . . . . . . . . . . . . . . . . . . . . . 121
Figure 42
The figures represent the correlation among the volumes
obtained for the 4 acquisitions for the right Hippocampi. 125
Figure 43
The figures represent the correlation among the volumes
obtained for the 4 acquisitions for the left Hippocampi.
In general linear correlation is confirmed as expected;
left Hippocampi have slightly better behavior, however
comparable with left segmentations. . . . . . . . . . . . . 125
Figure 44
The colormap (p-values higher than 0.05 correspond to
hot colors) figure allows to represent significant differences in the left hippocampal regions (a). They are especially significant in correspondence of the hippocampal
head and its digitations. It is worthwhile to note how
statistical differences are more evident in the left segmentations than in the right ones (b). . . . . . . . . . . . . 126
Figure 45
The figure shows how the 456 differences between screening and repeat segmentation volumes have mainly null
values. In particular the standard deviation of this distribution can be used to determine a conservative value
for the method uncertainty. . . . . . . . . . . . . . . . . . 127
Figure 46
The CTRL population is used to estimate a linear model
with its 95% confidence interval, represented in red. The
AD population is represented through the segmented
volume and the related uncertainty in blue. The figure
shows how nicely AD are separated from CTRL subjects. 129
Figure 47
As in the previous figure a linear model for CTRL population and its 95% confidence interval is shown in red.
In this case represented volumes are obtained from 24
month follow up. It is evident how pathological atrophy
rate makes AD subject hippocampal volumes easier to
be distinguished after 2 years. . . . . . . . . . . . . . . . . 129
Figure 48
Figure 49
Figure 50
Figure 51
Figure 52
The AD male hippocampal volumes in (a) and female
ones in (b) are shown in blue and compared respectively
with male and female CTRL population. As expected
from other studies, female atrophy rate is clearly different (and less severe) than the male one. As a consequence separating the two populations helps the discrimination between CTRL and AD. . . . . . . . . . . . .
The figure shows the overall run-time distribution using
the combination of the local farm and the grid infrastructure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
In this figure the time sequence of the executed segmentation is represented. The results show how in fact the
95% of the segmentations were executed in less than 7
hours. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
In this figure RF execution times are represented. It is
evident a two peak distribution. . . . . . . . . . . . . . . .
Distribution of the workflow execution times derived
from job monitoring information. . . . . . . . . . . . . . .
130
131
132
133
133
L I S T O F TA B L E S
Table 1
Table 2
Table 3
Table 4
A schematic view of the different types of dementia and
of their principal symptom patterns. . . . . . . . . . . . .
15
The revised lexicon of AD. Particular attention is given
to recent advances in use of reliable biomarkers of AD
for definition of early stages of the disease as the Prodromal AD or ambiguous situations as for the MCI. . . .
21
The relaxation times T1 and T2 for different types of human tissues. . . . . . . . . . . . . . . . . . . . . . . . . . .
43
A schematic overview of the analyses presented in this
work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
99
Table 5
A schematic summary of the registration volumes. The
volume represent the total average hippocampal volume
qhile the core is the volume of the inner Hippocampus,
namely the core accounting for the 60% of the total volume.100
Table 6
A schematic view of the different feature importance
analyses performed. . . . . . . . . . . . . . . . . . . . . . . 108
Table 7
For each cross-validation iteration the Dice index distribution is calculated and then mean, median and standard deviation are averaged. The table shows these mean
values and it clearly shows how performances increase
with the number of training images, however at the cost
of an increased spread in the distribution. . . . . . . . . . 108
Table 8
For each cross-validation iteration the Dice index distribution is calculated and then mean, median and standard deviation are averaged. The table shows these mean
values and it clearly shows how performances increase
with the number of training images, however at the cost
of an increased spread in the distribution. . . . . . . . . . 110
149
Table 9
Table 10
In this table the volume comparison for both right and
left Hippocampi is shown, in the left column the RF results while in the right one the RUSBoost ones . It is
worthwhile to note how CTRL and AD classes are well
separated, on the contrary CTRL and MCI cannot be
well separated as the previous ones. By the way, let us
stress that this behavior is expected being MCI a wide
class including subjects who will never develop the AD
pathology in their lives. . . . . . . . . . . . . . . . . . . . . 122
In this table the volume comparison for both right and
left Hippocampi, as before, is considered by separating
male and female subjects. Moreover, the effect of aging
is considered by means of a linear de-trending. . . . . . . 124
BIBLIOGRAPHY
[1] Alzheimer’s Disease International.
World Alzheimer Report 2013
Overcoming the stigma of dementia, 2013. URL www.alzheimer.it/
Report2013.pdf.
[2] American Psychiatric Association. Diagnostic and Statistical Manual of
Mental Disorders. Washington, DC, American Psychiatric Association,
2000.
[3] G. McKhann, D. Drachman, F. Marshall, R. Katzman, D. Price, and
E. M. Stadlan. Clinical Diagnosis of Alzheimer’s disease: Report of
the NINCDS-ADRDA Work Group under the auspices of Department
of Health and Human Services Task Force on Alzheimer’s Disease. Neurology, 34(7):939 – 944, 1984.
[4] B. Dubois, H. H. Feldman, C. Jacova, S. T. DeKosky, P. Barberger-Gateau,
J. Cummings, A. Delacourte, D. Galasko, S. Gauthier, G. Jicha, K. Meguro, J. O’Brien, F. Pasquier, P. Robert, M. Rossor, S. Salloway, Y. Stern,
P. J. Visser, and P. Scheltens. Research criteria for the diagnosis of Alzheimer’s disease: revising the NINCDSÐADRDA criteria. Lancet Neurology,
6(8):734 – 746, 2007.
[5] M. F. Folstein, S. E. Folstein, and P. R. McHugh. Mini-Mental State: a
practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research, 12(3):189 – 198, 1975.
[6] G. Blessed, B. E. Tomlinson, and M. Roth. The association between quantitative measures of dementia and of senile change in the cerebral grey
matter of elderly subjects. The British Journal of Psychiatry, 114(512):797 –
811, 1968.
[7] C. R. Jack Jr, M. S. Albert, D. S. Knopman, G. M. McKhann, R. A. Sperling, M. C. Carrillo, B. Thies, and C. H. Phelps. Introduction to the
recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease.
Alzheimer’s & Dementia, 7(3):257 – 262, 2011.
151
[8] D. G. Davis, F. A. Schmitt, D. R. Wekstein, and W. R. Markesbery. Alzheimer neuropathologic alterations in aged cognitively normal subjects.
Journal of Neuropathology & Experimental Neurology, 58(4):376 – 388, 1999.
[9] J. L. Price and J. C. Morris. Tangles and plaques in nondemented aging
and "preclinical" Alzheimer’s disease. Annals of neurology, 45(3):358 – 368,
1999.
[10] S. Alladi, J. Xuereb, T. Bak, P. Nestor, J. Knibb, K. Patterson, and J. R.
Hodges. Focal cortical presentations of Alzheimer’s disease. Brain, 130
(10):2636 – 2645, 2007.
[11] G. D. Rabinovici, W. J. Jagust, A. J. Furst, J. M. Ogar, C. A. Racine, E. C.
Mormino, J. P. O’Neil, R. A. Lal, N. F. Dronkers, B. L. Miller, and M. L.
Gorno-Tempini. Aβ amyloid and glucose metabolism in three variants
of primary progressive aphasia. Annals of neurology, 64(4):388 – 401, 2008.
[12] A. Lim, D. Tsuang, W. Kukull, D. Nochlin, J. Leverenz, W. McCormick,
J. Bowen, L. Teri, J. Thompson, E. R. Peskind, M. Raskind, and E. B.
Larson. Clinico-neuropathological correlation of Alzheimer’s disease in
a community-based case series. Journal of the American Geriatrics Society,
47(5):564 – 569, 1999.
[13] H. Petrovitch, L. R. White, G. W. Ross, S. C. Steinhorn, C. Li, K. H.
Masaki, D. G. Davis, J. Nelson, J. Hardman, J. D. Curb, P. L. Blanchette,
L. J. Launer, K. Yano, and M. D. Markesbery. Accuracy of clinical criteria for AD in the Honolulu-Asia Aging Study, a population-based study.
Neurology, 57(2):226 – 234, 2001.
[14] A. R. Varma, J. S. Snowden, J. J. Lloyd, P. R. Talbot, D. M. A. Mann, and
D. Neary. Evaluation of the NINCDS-ADRDA criteria in the differentiation of Alzheimer’s disease and frontotemporal dementia. Journal of
Neurology, Neurosurgery & Psychiatry, 66(2):184 – 188, 1999.
[15] A. M. Kazee, T. A. Eskin, L. W. Lapham, K. R. Gabriel, K. D. McDaniel,
and R. W. Hamill. Clinicopathologic correlates in Alzheimer disease: assessment of clinical and pathologic diagnostic criteria. Alzheimer Disease
& Associated Disorders, 7(3):152 – 164, 1993.
[16] D. Neary, J. Snowden, and D. Mann. Frontotemporal dementia. The
Lancet Neurology, 4(11):771 – 780, 2005.
[17] J. R. Hodges, K. Patterson, S. Oxbury, and E. Funnell. Semantic dementia
progressive fluent aphasia with temporal lobe atrophy. Brain, 115(6):1783
– 1806, 1992.
[18] M. Mesulam. Slowly progressive aphasia without generalized dementia.
Annals of neurology, 11(6):592 – 598, 1982.
[19] J. J. Rebeiz, E. H. Kolodny, and E. P. Richardson. Corticodentatonigral
degeneration with neuronal achromasia. Archives of Neurology, 18(1):20 –
33, 1968.
[20] W. R. G. Gibb, P. J. Luthert, and C. D. Marsden. Corticobasal degeneration. Brain, 112(5):1171 – 1192, 1989.
[21] D. F. Benson, R. J. Davis, and B. D. Snyder. Posterior cortical atrophy.
Archives of Neurology, 45(7):789, 1988.
[22] I. G. McKeith, D. Galasko, K. Kosaka, E. K. Perry, D. W. Dickson, L. A.
Hansen, D. P. Salmon, J. Lowe, S. S. Mirra, E. J. Byrne, G. Lennox, N. P.
Quinn, J. A. Edwardson, P. G. Ince, C. Bergeron, A. Burns, B. L. Miller,
S. Lovestone, D. Collerton, E. N. H. Jansen, C. Ballard, R. A. I. de Vos,
G. K. Wilcock, K. A. Jellinger, and R. H. Perry. Consensus guidelines
for the clinical and pathologic diagnosis of dementia with Lewy bodies
(DLB) Report of the consortium on DLB international workshop. Neurology, 47(5):1113 – 1124, 1996.
[23] G. C. Román, T. K. Tatemichi, T. Erkinjuntti, J. L. Cummings, J. C. Masdeu, J. H. Garcia, L. Amaducci, J. M. Orgogozo, A. Brun, A. Hofman,
A. Hofman, D. M. Moody, M. D. O’Brien, T. Yamaguchi, J. Grafman,
B. P. Drayer, D. A. Bennet, M. Fisher, J. Ogata, E. Kokmen, F. Bermejo,
P. A. Wolf, P. B. Gorelick, K. L. Bick, A. K. Pajeau, M. A. Bell, C. DeCarli, A. Culebras, A. D. Korczyn, J. Bogousslavsky, A. Hartmann, and
P. Scheinberg. Vascular dementia Diagnostic criteria for research studies:
Report of the NINDS-AIREN International Workshop*. Neurology, 43(2):
250 – 250, 1993.
[24] H. C. Chui, J. I. Victoroff, D. Margolin, W. Jagust, R. Shankle, and R. Katzman. Criteria for the diagnosis of ischemic vascular dementia proposed
by the State of California Alzheimer’s Disease Diagnostic and Treatment
Centers. Neurology, 42(3):473 – 473, 1992.
[25] M. S. Forman, J. Farmer, J. K. Johnson, C. M. Clark, S. E. Arnold,
H. Coslett, A. Chatterjee, H. I. Hurtig, J. H. Karlawish, H. J. Rosen, et al.
Frontotemporal dementia: clinicopathological correlations. Annals of neurology, 59(6):952 – 962, 2006.
[26] K. A. Josephs, J. L. Holton, M. N. Rossor, A. K. Godbolt, T. Ozawa,
K. Strand, N. Khan, S. Al-Sarraj, and T. Revesz. Frontotemporal lobar
degeneration and ubiquitin immunohistochemistry. Neuropathology and
applied neurobiology, 30(4):369 – 373, 2004.
[27] J. C. Morris, M. Storandt, J. P. Miller, D. W. McKeel, J. L. Price, E. H.
Rubin, and L. Berg. Mild cognitive impairment represents early-stage
Alzheimer disease. Archives of neurology, 58(3):397, 2001.
[28] O. L. Lopez, J. T. Becker, W. Klunk, J. Saxton, D. I. Hamilton, R.
L.and Kaufer, R. A. Sweet, C. C. Meltzer, S. Wisniewski, M. I. Kamboh,
and S. T. DeKosky. Research evaluation and diagnosis of probable Alzheimer’s disease over the last two decades: I. Neurology, 55(12):1854 –
1862, 2000.
[29] H. Braak and E. Braak. Neuropathological stageing of Alzheimer-related
changes. Acta neuropathologica, 82(4):239 – 259, 1991.
[30] A. Delacourte, J. P. David, N. Sergeant, L. Buee, A. Wattez, P. Vermersch,
F. Ghozali, C. Fallet-Bianco, F. Pasquier, F. Lebert, et al. The biochemical pathway of neurofibrillary degeneration in aging and alzheimer’s
disease. Neurology, 52(6):1158 – 1158, 1999.
[31] B. Dubois, H. H. Feldman, C. Jacova, J. L. Cummings, S. T. DeKosky,
P. Barberger-Gateau, A. Delacourte, G. B. Frisoni, N. C. Fox, D. Galasko,
S. Gauthier, H. Hampel, G. A. Jicha, K. Meguro, J. O’Brien, F. Pasquier,
P. Robert, M. Rossor, S. Salloway, M. Sarazin, L. C. de Souza, Y. Stern, P. J.
Visser, and P. Scheltens. Revising the definition of Alzheimer’s disease:
a new lexicon. The Lancet Neurology, 9(11):1118 – 1127, 2010.
[32] B. Dubois and M. L. Albert. Amnestic MCI or prodromal Alzheimer’s
disease? Lancet Neurology, 3(4):246 – 248, 2004.
[33] T. H. Crook and S. H. Ferris. Age associated memory impairment. BMJ:
British Medical Journal, 304(6828):714, 1992.
[34] R. Levy. Aging-associated cognitive decline. International Psychogeriatrics,
6(1):63 – 68, 1994.
[35] World Health Organization. ICD-10: International statistical classification
of diseases and related health problems. World Health Organization, 2004.
[36] C. Flicker, S. H. Ferris, and B. Reisberg. Mild cognitive impairment in
the elderly predictors of dementia. Neurology, 41(7):1006 – 1006, 1991.
[37] S. Larrieu, L. Letenneur, J. M. Orgogozo, C. Fabrigoule, H. Amieva,
N. Le Carret, P. Barberger-Gateau, and J. F. Dartigues. Incidence and
outcome of mild cognitive impairment in a population-based prospective cohort. Neurology, 59(10):1594 – 1599, 2002.
[38] V. Jelic, M. Kivipelto, and B. Winblad. Clinical trials in mild cognitive
impairment: lessons for the future. Journal of Neurology, Neurosurgery &
Psychiatry, 77(4):429 – 438, 2006.
[39] M. Grundman, R. C. Petersen, D. A. Bennett, H. H. Feldman, S. Salloway,
P. J. Visser, L. J. Thal, D. Schenk, Z. Khachaturian, and W. Thies. Alzheimer’s Association Research Roundtable Meeting on Mild Cognitive
Impairment: What have we learned? Alzheimer’s & Dementia, 2(3):220 –
233, 2006.
[40] R. C. Petersen, R. G. Thomas, M. Grundman, D. Bennett, R. Doody, S. Ferris, D. Galasko, S. Jin, J. Kaye, A. Levey, E. Pfeiffer, M. Sano, C. H. van
Dyck, and L. J. Thal. Vitamin E and donepezil for the treatment of mild
cognitive impairment. New England Journal of Medicine, 352(23):2379 –
2388, 2005.
[41] L. J. Thal, S. H. Ferris, L. Kirby, G. A. Block, C. R. Lines, E. Yuen, C. Assaid, M. L. Nessly, B. A. Norman, C. C. Baranak, and S. A. Reines. A
randomized, double-blind, study of rofecoxib in patients with mild cognitive impairment. Neuropsychopharmacology, 30(6):1204 – 1215, 2005.
[42] P. J. Visser, P. Scheltens, and F. R. J. Verhey. Do MCI criteria in drug
trials accurately identify subjects with predementia Alzheimer’s disease?
Journal of Neurology, Neurosurgery & Psychiatry, 76(10):1348 – 1354, 2005.
[43] G. A. Jicha, J. E. Parisi, D. W. Dickson, K. Johnson, R. Cha, R. J. Ivnik,
E. G. Tangalos, B. F. Boeve, D. S. Knopman, H. Braak, and R. C. Petersen.
Neuropathologic outcome of mild cognitive impairment following progression to clinical dementia. Archives of neurology, 63(5):674 – 681, 2006.
[44] C. R. Jack, D. W. Dickson, J. E. Parisi, Y. C. Xu, R. H. Cha, P. C. O’Brien,
S. D. Edland, G. E. Smith, B. F. Boeve, E. G. Tangalos, E. Kokmen, and
R. C. Petersen. Antemortem MRI findings correlate with hippocampal
neuropathology in typical aging and dementia. Neurology, 58(5):750 –
757, 2002.
[45] D. H. S. Silverman, S. S. Gambhir, H. W. C. Huang, J. Schwimmer, S. Kim,
G. W. Small, J. Chodosh, J. Czernin, and M. E. Phelps. Evaluating early
dementia with and without assessment of regional cerebral metabolism
by PET: a comparison of predicted costs and benefits. Journal of Nuclear
Medicine, 43(2):253 – 266, 2002.
[46] F. Hulstaert, K. Blennow, A. Ivanoiu, H. C. Schoonderwaldt, M. Riemenschneider, P. P. De Deyn, C. Bancher, P. Cras, J. Wiltfang, P. D. Mehta,
K. Iqbal, H. Pottel, E. Vanmechelen, and H. Vanderstichele. Improved
discrimination of AD patients using β-amyloid (1-42) and tau levels in
CSF. Neurology, 52(8):1555 – 1555, 1999.
[47] P. J. Visser, P. Scheltens, E. Pelgrim, and F. R. J. Verhey. Medial temporal
lobe atrophy and APOE genotype do not predict cognitive improvement
upon treatment with rivastigmine in Alzheimer’s disease patients. Dementia and geriatric cognitive disorders, 19(2-3):126 – 133, 2005.
[48] M. J. De Leon, A. E. George, J. Golomb, C. Tarshish, A. Convit, A. Kluger,
S. De Santi, T. Mc Rae, S. H. Ferris, B. Reisberg, C. Ince, H. Rusinek,
M. Bobinski, B. Quinn, D. C. Miller, and H. M. Wisniewski. Frequency
of hippocampal formation atrophy in normal aging and Alzheimer’s disease. Neurobiology of aging, 18(1):1 – 11, 1997.
[49] K. M. Gosche, J. A. Mortimer, C. D. Smith, W. R. Markesbery, and D. A.
Snowdon. Hippocampal volume as an index of Alzheimer neuropathology Findings from the Nun Study. Neurology, 58(10):1476 – 1482, 2002.
[50] C. Bottino, C. C. Castro, R. L. E. Gomes, C. A. Buchpiguel, R. L. Marchetti,
and M. R. L. Neto. Volumetric MRI measurements can differentiate Alzheimer’s disease, mild cognitive impairment, and normal aging. International Psychogeriatrics, 14(01):59 – 72, 2002.
[51] M. P. Laakso, H. Soininen, K. Partanen, M. Lehtovirta, M. Hallikainen,
T. Hänninen, E. L. Helkala, P. Vainio, and P. J. Riekkinen Sr. MRI of the
hippocampus in Alzheimer’s disease: sensitivity, specificity, and analysis
of the incorrectly classified subjects. Neurobiology of aging, 19(1):23 – 31,
1998.
[52] P. Scheltens, N. Fox, F. Barkhof, and C. De Carli. Structural magnetic
resonance imaging in the practical assessment of dementia: beyond exclusion. The Lancet Neurology, 1(1):13 – 21, 2002.
[53] A. Chincarini, P. Bosco, G. Gemme, S. Morbelli, D. Arnaldi, F. Sensi,
I. Solano, N. Amoroso, S. Tangaro, R. Longo, S . Squarcia, and F. Nobili.
Alzheimer’s disease markers from structural MRI and FDG-PET brain
images. EPJ Plus 127, (11):135, 2012.
[54] J. G. Csernansky, L. Wang, J. Swank, J. P. Miller, M. Gado, D. McKeel,
M. I. Miller, and J. C. Morris. Preclinical detection of Alzheimer’s disease:
hippocampal shape and volume predict dementia onset in the elderly.
Neuroimage, 25(3):783 – 792, 2005.
[55] L. G. Apostolova, R. A. Dutton, I. D. Dinov, K. M. Hayashi, A. W. Toga,
J. L. Cummings, and P. M. Thompson. Conversion of mild cognitive impairment to Alzheimer disease predicted by hippocampal atrophy maps.
Archives of Neurology, 63(5):693 – 699, 2006.
[56] E. S. C. Korf, L. O. Wahlund, P. J. Visser, and P. Scheltens. Medial temporal lobe atrophy on MRI predicts dementia in patients with mild cognitive impairment. Neurology, 63(1):94 – 100, 2004.
[57] C. R. Jack, R. C. Petersen, Y. C. Xu, P. C. O’Brien, G. E. Smith, R. J. Ivnik,
B. F. Boeve, S. C. Waring, E. G. Tangalos, and E. Kokmen. Prediction of
AD with MRI-based hippocampal volume in mild cognitive impairment.
Neurology, 52(7):1397 – 1397, 1999.
[58] P. J. Visser, P. Scheltens, F. R. J. Verhey, B. Schmand, L. J. Launer, J. Jolles,
and C. Jonker. Medial temporal lobe atrophy and memory dysfunction
as predictors for dementia in subjects with mild cognitive impairment.
Journal of Neurology, 246(6):477 – 485, 1999.
[59] P. J. Visser, F. R. J. Verhey, P. A. M. Hofman, P. Scheltens, and J. Jolles.
Medial temporal lobe atrophy predicts Alzheimer’s disease in patients
with minor cognitive impairment. Journal of Neurology, Neurosurgery &
Psychiatry, 72(4):491 – 497, 2002.
[60] O. Hansson, H. Zetterberg, P. Buchhave, E. Londos, K. Blennow, and
L. Minthon. Association between CSF biomarkers and incipient Alzheimer’s disease in patients with mild cognitive impairment: a follow-up
study. The Lancet Neurology, 5(3):228 – 234, 2006.
[61] A. M. Fagan, M. A. Mintun, R. H. Mach, S. Y. Lee, C. S. Dence, A. R. Shah,
G. N. Larossa, M. L. Spinner, W. E. Klunk, C. A. Mathis, et al. Inverse
relation between in vivo amyloid imaging load and cerebrospinal fluid
Aβ42 in humans. Annals of neurology, 59(3):512 – 519, 2006.
[62] M. A. Mintun, G. N. Larossa, Y. I. Sheline, C. S. Dence, S. Y. Lee, R. H.
Mach, C. A. Klunk, W. E.and Mathis, S. T. DeKosky, and J. C. Morris.
[11C] PIB in a nondemented population Potential antecedent marker of
Alzheimer disease. Neurology, 67(3):446 – 452, 2006.
[63] J. C. Price, W. E. Klunk, B. J. Lopresti, X. Lu, S. K. Hoge, J. A .and Ziolko,
D. P. Holt, C. C. Meltzer, S. T. DeKosky, and C. A. Mathis. Kinetic modeling of amyloid binding in humans using PET imaging and Pittsburgh
Compound-B. Journal of Cerebral Blood Flow & Metabolism, 25(11):1528 –
1547, 2005.
[64] T. Bartsch. The Clinical Neurobiology of the Hippocampus: An integrative
view. OUP Oxford, 2012.
[65] F. T. Lewis. The significance of the term hippocampus. The Journal of
Comparative Neurology, 35(3):213 – 230, 1923.
[66] J. C. Pruessner, L. M. Li, W. Serles, M. Pruessner, D. L. Collins, N. Kabani, S. Lupien, and A. C. Evans. Volumetry of Hippocampus and
Amygdala with High-resolution MRI and Three-dimensional Analysis
Software: Minimizing the Discrepancies between Laboratories. Cerebral
Cortex, (10):433 – 442, 2000.
[67] D. Shen, S. Moffat, S. M. Resnick, and C. Davatzikos. Measuring Size and
Shape of the Hippocampus in MR Images Using a Deformable Shape
Model. NeuroImage, 15(2):422 – 434, 2002.
[68] O. Pedraza, D. Bowers, and R. Gilmore. Asymmetry of the hippocampus
and amygdala in mri volumetric measurements of normal adults. Journal
of the International Neuropsychological Society, 10(5):664 – 678, 2004.
[69] L. Bergouignan, M. Chupin, Y. Czechowska, S. Kinkingnéhun,
C. Lemogne, G. Le Bastard, M. Lepage, L. Garnero, O. Colliot, and
P. Fossati. Can voxel based morphometry, manual segmentation and automated segmentation equally detect hippocampal volume differences in
acute depression? Neuroimage, 45(1):29 – 37, 2009.
[70] H. Wolf, M. Grunwald, F. Kruggel, S. G. Riedel-Heller, S. Angerhöfer,
A. Hojjatoleslami, A. Hensel, T. Arendt, and H. J. Gertz. Hippocampal
volume discriminates between normal cognition; questionable and mild
dementia in the elderly. Neurobiology of aging, 22(2):177 – 186, 2001.
[71] F. Shi, B. Liu, Y. Zhou, C. Yu, and T. Jiang. Hippocampal volume and
asymmetry in mild cognitive impairment and alzheimer’s disease: Metaanalyses of mri studies. Hippocampus, 19(11):1055 – 1064, 2009.
[72] A. A. Woolard and S. Heckers. Anatomical and functional correlates of
human hippocampal volume asymmetry. Psychiatry Research: Neuroimaging, 201(1):48 – 53, 2012.
[73] B. P. Rogers, J. M. Sheffield, A. S. Luksik, and S. Heckers. Systematic
error in hippocampal volume asymmetry measurement is minimal with
a manual segmentation protocol. Frontiers in neuroscience, 6:179, 2012.
[74] W. Gerlach and O. Stern. Der experimentelle nachweis der richtungsquantelung im magnetfeld. Zeitschrift für Physik A Hadrons and Nuclei, 9(1):349 – 352, 1922.
[75] I. I. Rabi, J. R. Zacharias, S. Millman, and P. Kusch. A New Method of
Measuring Nuclear Magnetic Moment. Phys. Rev., 53(4):318 – 318, 1938.
[76] F. Bloch. Nuclear induction. Physical Review, 70(7-8):460 – 474, 1946.
[77] E. M. Purcell, H. C. Torrey, and R. V. Pound. Resonance Absorption by
Nuclear Magnetic Moments in a Solid. Physical Review, 69(1-2):37 – 38,
1946.
[78] E.M. Haacke, R.W. Brown, M.R. Thompson, and R. Venkatesan. Magnetic
Resonance Imaging: Physical Principles and Sequence Design. Wiley, 1999.
[79] M. Chupin, A. Hammers, E. Bardinet, O. Colliot, R. S. N. Liu, J. S. Duncan, L. Garnero, and L. Lemieux. Fully automatic segmentation of the
hippocampus and the amygdala from mri using hybrid prior knowledge.
In Medical Image Computing and Computer-Assisted Intervention–MICCAI
2007, pages 875 – 882. Springer, 2007.
[80] P. Aljabar, R. A. Heckemann, A. Hammers, J. V. Hajnal, and D. Rueckert.
Multi-atlas based segmentation of brain images: atlas selection and its
effect on accuracy. Neuroimage, 46(3):726 – 738, 2009.
[81] H. Wang, J. W. Suh, S. Das, J. Pluta, M. Altinay, and P. Yushkevich.
Regression-based label fusion for multi-atlas segmentation. In Computer
Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, 2011.
[82] F. van der Lijn, T. den Heijer, M. Breteler, and W. J. Niessen. Hippocampus segmentation in mr images using atlas registration, voxel classification, and graph cuts. Neuroimage, 43(4):708 – 720, 2008.
[83] K. Kwak, U. Yoon, D. Lee, G. H. Kim, S. W. Seo, D. L. Na, H. Shim, and
J. Lee. Fully-automated approach to hippocampus segmentation using a
graph-cuts algorithm combined with atlas-based segmentation and morphological opening. Magnetic resonance imaging, 31(7):1090 – 1096, 2013.
[84] D. L. Collins and J. C. Pruessner. Towards accurate, automatic segmentation of the hippocampus and amygdala from mri by augmenting animal
with a template library and label fusion. NeuroImage, 52(4):1355 – 1366,
2010.
[85] J. H. Morra, Z. Tu, L. G. Apostolova, A. E. Green, C. Avedissian, S. K.
Madsen, N. Parikshak, X. Hua, A. W. Toga, C. R. Jack Jr, M. W. Weiner,
and P. M. Thompson. Validation of a fully automated 3d hippocampal
segmentation method using subjects with alzheimer’s disease mild cognitive impairment, and elderly controls. Neuroimage, 43(1):59 – 68, 2008.
[86] J. H. Morra, Z. Tu, L. G. Apostolova, A. E. Green, A. W. Toga, and P. M.
Thompson. Comparison of adaboost and support vector machines for
detecting alzheimer’s disease through automated hippocampal segmentation. Medical Imaging, IEEE Transactions on, 29(1):30 – 43, 2010.
[87] C. A. Bishop, M. Jenkinson, J. Andersson, J. Declerck, and D. Merhof.
Novel fast marching for automated segmentation of the hippocampus
(fmash): method and validation on clinical data. NeuroImage, 55(3):1009
– 1019, 2011.
[88] P. Coupé, J. V. Manjón, V. Fonov, J. Pruessner, M. Robles, and D. L.
Collins. Patch-based segmentation using expert priors: Application to
hippocampus and ventricle segmentation. NeuroImage, 54(2):940 – 954,
2011.
[89] M. J. Cardoso, K. Leung, M. Modat, S. Keihaninejad, D. Cash, J. Barnes,
N. C. Fox, and S. Ourselin. Steps: Similarity and truth estimation for
propagated segmentations and its application to hippocampal segmentation and brain parcelation. Medical image analysis, 17(6):671 – 684, 2013.
[90] S. M. Nestor, E. Gibson, F.g Gao, A. Kiss, and S. E. Black. A direct morphometric comparison of five labeling protocols for multi-atlas driven
automatic segmentation of the hippocampus in alzheimer’s disease. Neuroimage, 66(1):50 – 70, 2012.
[91] H. Wang and P. A. Yushkevich. Spatial bias in multi-atlas based segmentation. In Computer Vision and Pattern Recognition (CVPR), Conference on.
IEEE, 2012.
[92] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham. Active Shape
Models-Their Training and Application. Computer Vision and Image Understunding, 61(1):38 – 59, 1995.
[93] T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active appearance models.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6):681 –
685, 2001.
[94] C. Davatzikos, S. M. Resnick, X. Wu, P. Parmpi, and C. M. Clark. Individual patient diagnosis of AD and FTD via high-dimensional pattern
classification of MRI. NeuroImage, 41(4):1220 – 1227, 2008.
[95] S. M. Pizer, P. T. Fletcher, S. Joshi, A. Thall, J. Z. Chen, Y. Fridman, D. S.
Fritsch, A. G. Gash, J. M. Glotzer, M. R. Jiroutek, C. Lu, K. E. Muller,
G. Tracton, P. Yushkevich, and E. L. Chaney. Deformable M-Reps for 3D
Medical Image Segmentation. International Journal of Computer Vision 55,
55(2):85 – 106, 2003.
[96] L. Wang, F. Beg, T. Ratnanather, C. Ceritoglu, L. Younes, J. C. Morris,
J. G. Csernansky, and M. I. Miller. Large deformation diffeomorphism
and momentum based hippocampal shape discrimination in dementia of
the Alzheimer type. IEEE Trans. Med. Imaging 26, 26(4):462 – 470, 2007.
[97] S. C. Zhu and A. Yuille. Region Competition: Unifying Snakes, Region
Growing, and Bayes/MDL for Multi-band Image Segmentation. IEEE
Transactions on Pattern Analysis and Machine Intelligence 18, 18(6):884 – 900,
1996.
[98] Z. Tu, K. L. Narr, P. Dollár, I. Dinov, P. M. Thompson, and A. W. Toga.
Brain anatomical structure segmentation by Hyhrid discriminative/generative models. IEEE Trans. Med. Imaging 27, 27(4):495 – 508, 2008.
[99] Y. Zhang, M. Brady, and S. Smith. Segmentation of brain MR images
through a hidden Markov random field model and the expectationmaximization algorithm. IEEE Trans. Med. Imaging, 20(1):45 – 57, 2001.
[100] Z. Song, N. Tustison, B. Avants, and J. C. Gee. Integrated Graph Cuts for
Brain MRI Segmentation. Proc. Med Image Comput Comput Assist Interv, 9
(2):831 – 838, 2006.
[101] F. Sabattoli, M. Boccardi, S. Galluzzi, A. Treves, P. M. Thompson, and
G. B. Frisoni. Hippocampal shape differences in dementia with Lewy
bodies. Neuroimage, 41(3):699–705, 2008.
[102] J. V. Hajnal and D. L. G. Hill. Medical image registration. CRC, 2010.
[103] P. Viola and W. M. Wells III. Alignment by maximization of mutual
information. International journal of computer vision, 24(2):137 – 154, 1997.
[104] H. Gudbjartsson and S. Patz. The rician distribution of noisy mri data.
Magnetic Resonance in Medicine, 34(6):910 – 914, 1995.
[105] N. Amoroso, R. Bellotti, S. Bruno, A. Chincarini, G. Logroscino, S. Tangaro, and A. Tateo. Automated Shape Analysis landmark detection for
medical image processing. Proceedings of the International Symposium, CompIMAGE, 2012.
[106] C. Brechbühler, G. Gerig, and O. Kübler. Parametrization of closed surfaces for 3-d shape description. Computer vision and image understanding,
61(2):154 – 170, 1995.
[107] L. Shen and F. Makedon. Spherical mapping for processing of 3d closed
surfaces. Image and vision computing, 24(7):743 – 761, 2006.
[108] L. Shen, H. A. Firpi, A. J. Saykin, and J. D. West. Parametric surface
modeling and registration for comparison of manual and automated segmentation of the hippocampus. Hippocampus, 19(6):588 – 595, 2009.
[109] A. Chincarini, P. Bosco, P. Calvini, G. Gemme, M. Esposito, C. Olivieri,
L. Rei, S. Squarcia, G. Rodriguez, R. Bellotti, P. Cerello, I. De Mitri,
A. Retico, F. Nobili, and The Alzheimer’s Disease Neuroimaging Initiative. Local MRI analysis approach in the diagnosis of early and prodromal Alzheimer’s disease. NeuroImage, 58(2):469 – 480, 2011.
[110] P. Calvini, A. Chincarini, G. Gemme, M. A. Penco, S. Squarcia, F. Nobili, G. Rodriguez, R. Bellotti, E. Catanzariti, P. Cerello, I. De Mitri, M. E.
Fantacci, MAGIC-5 Collaboration, and The Alzheimer’s Disease Neuroimaging Initiative. Automatic analysis of medial temporal lobe atrophy
from structural MRIs for the early assessment of Alzheimer disease. Medical Physics, 36(8):3737 – 3747, 2009.
[111] T. F. Cootes, D. Cooper, C. J. Taylor, and J. Graham. A Trainable Method
of Parametric Shape Description. Proc. British Machine Vision Conference
pub. Spring-Verlag, 10(5):54 – 61, 1991.
[112] P. J. Besl and N. D. McKay. Method for registration of 3-d shapes. In
Robotics-DL tentative, 1992.
[113] A. Rangarajan, H. Chui, and J. S. Duncan. Rigid point feature registration
using mutual information. Medical Image Analysis, 3(4):425 – 440, 1999.
[114] L. R. Dice. Measures of the amount of ecologic association between
species. Ecology, 26(3):297 – 302, 1945.
[115] B. Fischl, D. H. Salat, E. Busa, M. Albert, M. Dieterich, C. Haselgrove,
A. van der Kouwe, R. Killiany, D. Kennedy, S. Klaveness, A. Montillo,
N. Makris, B. Rosen, and A. M. Dale. Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neurotechnique, 33(3):341 – 355, 2002.
[116] B. Patenaude, S. M. Smith, D. N. Kennedy, and M. Jenkinson. A bayesian
model of shape and appearance for subcortical brain segmentation. Neuroimage, 56(3):907 – 922, 2011.
[117] L. Breiman. Random Forests. Machine Learning 45, (1):5 – 32, 2001.
[118] C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse, and A. Napolitano. Rusboost: A hybrid approach to alleviating class imbalance. Systems, Man
and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, 40(1):
185 – 197, 2010.
[119] P. Viola and M. J. Jones. Robust Real-Time Face Detection. International
Journal of Computer Vision, 57(2):137 – 154, 2004.
[120] R. M. Haralick, K. Shanmugam, and I. Dinstein. Textural Features for
Image Classification. IEEE Transactions on Systems, Man and Cybernetics, 3
(6):610 – 621, 1973.
[121] R. W. Conners and C. A. Harlow. A Theoretical Comparison of Texture
Algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence,
2(3):204 – 222, 1980.
[122] R. Bellotti, F. De Carlo, G. Gargano, G. Maggipinto, S. Tangaro, M. Castellano, R. Massafra, D. Cascio, F. Fauci, R. Magro, G. Raso, A. Lauria,
G. Forni, S. Bagnasco, P. Cerello, S. C. Cheran, E. Lopez Torres, U. Bottigli,
G. L. Masala, P. Oliva, A. Retico, M. E. Fantacci, R. Cataldo, I. De Mitri,
and G. De Nunzio. A completely automated CAD system for mass detection in a large mammographic database. Medical Physics, 33(8):3066 –
3075, 2006.
[123] S. Tangaro, F. De Carlo, G. Gargano, R. Bellotti, U. Bottigli, G. L. Masala,
P. Cerello, S. Cheran, and R. Cataldo. Mass lesion detection in mammographic images using Haralik textural features. Proceedings of the International Symposium CompIMAGE 2006 - Computational Modelling of Objects
Represented in Images: Fundamentals, Methods and Applications, 2007.
[124] C. M. Bishop. Neural networks for pattern recognition. Oxford university
press, 1995.
[125] G. M. Weiss. Mining with rarity: a unifying framework. ACM SIGKDD
Explorations Newsletter, 6(1):7 – 19, 2004.
[126] G. E. Batista, R. C. Prati, and M. C. Monard. A study of the behavior
of several methods for balancing machine learning training data. ACM
SIGKDD Explorations Newsletter, 6(1):20 – 29, 2004.
[127] C. Drummond, R. C. Holte, et al. C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In Workshop on Learning
from Imbalanced Datasets II, 2003.
[128] J. Montagnat, F. Bellet, H. Benoit-Cattin, V. Breton, L. Brunie, H. Duque,
Y. Legré, I. E. Magnin, L.a Maigne, S. Miguet, J. M. Pierson, L. Seitz,
and T. Tweed. Medical images simulation, storage, and processing on
the european datagrid testbed. Journal of Grid Computing, 2(4):387 – 400,
2004.
[129] R. Bellotti and S. Pascazio. Editorial: Advanced physical methods in
brain research. EPJ Plus, 127(11):145, 2012.
[130] D. Krefting, M. Vossberg, A. Hoheisel, and T. Tolxdorff. Simplified implementation of medical image processing algorithms into a grid using
workflow management system. Future Generation Computer System, 26(4):
681 – 684, 2010.
[131] M. W. A. Caan, S. Shahand, F. M. Vos, A. H. C. van Kampen, and S. D.
Olabarriaga. Evolution of grid-based services for Diffusion Tensor Image
analysis. Future Generation Computer Systems, 28(8), 2012.
[132] R. Bellotti, P. Cerello, S. Tangaro, V. Bevilacqua, M. Castellano, G. Mastronardi, F. De Carlo, S. Bagnasco, U. Bottigli, R. Cataldo, E. Catanzariti,
S.C. Cheran, P. Delogu, I. De Mitri, G. De Nunzio, M E. Fantacci, F. Fauci,
G. Gargano, B. Golosio, P.L. Indovina, A. Lauria, E. Lopez Torres, R. Magro, G L. Masala, R. Massafra, P. Oliva, A. Preite Martinez, M. Quarta,
G. Raso, A. Retico, M. Sitta, S. Stumbo, A. Tata, S. Squarcia, A. Schenone,
E. Molinari, and B. Canesi. Distributed medical images analysis on a
Grid infrastructure. Future Generation Computer Systems, 23(3):475 – 484,
2007.
[133] P. Cerello, S. Bagnasco, U. Bottigli, S. C. Cheran, P. Delogu, M. E. Fantacci, F. Fauci, G. Forni, A. Lauria, E. Lopez Torres, R. Magro, G. L.
Masala, P. Oliva, R. Palmiero, L. Ramello, G. Raso, A. Retico, M. Sitta,
S. Stumbo, S. Tangaro, and E. Zanon. GPCALMA: a grid-based tool for
mammographic screening. Methods of Information in Medicine, 44(2):244 –
248, 2005.
[134] R. Bellotti, S. Bagnasco, U. Bottigli, M. Castellano, R. Cataldo, E. Catanzariti, P. Cerello, S. C. Cheran, F. De Carlo, P. Delogu, I. De Mitri,
G. De Nunzio, M. E. Fantacci, F. Fauci, G. Forni, G. Gargano, B. Golosio,
P. L. Indovina, A. Lauria, E. Lopez Torres, R. Magro, D. Martello, G. L.
Masala, R. Massafra, P. Oliva, R. Palmiero, A. Preite Martinez, R. Prevete, M. Quarta, L. Ramello, G. Raso, A. Retico, M. Santoro, M. Sitta,
S. Stumbo, S. Tangaro, A. Tata, and E. Zanon. The Magic-5 Project: Medical Applications on a Grid Infrastructure Connection . IEEE Nuclear
Science Symposium Conference Record, 3:1902 – 1906, 2004.
[135] S. Vicario, B. Balech, G. Donvito, P. Notarangelo, and G. Pesole. The
biovel project: Robust phylogenetic workflows running on the grid. EMBnet. journal, 18(B):77, 2012.
[136] A. Barker and J. Hemert. Scientific Workflow: A Survey and Research
Directions. In Parallel Processing and Applied Mathematics, 2008.
[137] D. E. Rex, J. Q. Ma, and A. W. Toga. The LONI Pipeline Processing
Environment. NeuroImage, 19(3):1033 – 1048, 2003.
[138] T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, M. Greenwood,
T. Carver, K. Glover, M. R. Pocock, A. Wipat, and P. Li. Taverna: A tool
for the composition and enactment of bioinformatics workflows. Bioinformatics, 20:3045 – 3054, 2004.
[139] B. Ludäscher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, E A.
Lee, J. Tao, and Y. Zhao. Concurrency and Computation: Practice and Experience, 18(10):1039 – 1065, 2006.
[140] I. Taylor, M. Shields, and I. Wang. Distributed P2P computing within Triana: a galaxy visualization test case. In Parallel and Distributed Processing
Symposium, 2003. Proceedings. International, 2003.
[141] T. Glatard, R. S. Soleman, D. J. Veltman, A. J. Nederveen, and S. D.
Olabarriaga. Large-scale functional MRI study on a production grid. Future Generation Computer System, 26(4):685 – 692, 2010.
[142] I. Dinov, K. Lozev, P. Petrosyan, Z. Liu, P. Eggert, J. Pierce, A. Zamayan,
S. Chakrapani, and J. Van Horn. Neuroimaging Study Designs, Computational Analyses and Data Provenance Using the LONI Pipeline. PlosONE,
5(9), 2010.
[143] A. J. MacKenzie-Graham, A. Payan, I. Dinov, J. D. Horn, and A. W. Toga.
Neuroimaging Data Provenance Using the LONI Pipeline Workflow Environment. In Provenance and Annotation of Data and Processes, 2008.
[144] E. Laure, F. Hemmer, and F. Prelz. Middleware for the next generation
grid infrastructure. Computing in High Energy Physics and Nuclear Physics,
2004.
[145] European Grid Infrastructure (EGI), Accessed Jan 2013. URL http://www.
egi.eu.
[146] J. Ma, W. Liu, and T. Glatard. A classification of file placement and
replication methods on grids. Future Generation Computer Systems, 29(6):
1395–1406, 2013.
[147] R. Pordes, D. Petravick, B. Kramer, D. Olson, M. Livny, A. Roy, P. Avery,
K. Blackburn, T. Wenaus, F. Würthwein, I. Foster, R. Gardner, M. Wilde,
A. Blatecky, J. McGee, and R. Quick. The open-science grid. Journal of
Physics: Conference Series, 78(1):012057, 2007.
[148] A. Tateo. Distributed analysis for feature selection in medical image
processing. submitted, 2013.
[149] J. A. Hanley. Characteristic (roc) curvel. Radiology, 743:29 – 36, 1982.
[150] C. R. Jack, R. C. Petersen, Y. C. Xu, S. C. Waring, P. C. O’Brien, E. G. Tangalos, G. E. Smith, R. J. Ivnik, and E. Kokmen. Medial temporal atrophy
on mri in normal aging and very mild alzheimer’s disease. Neurology, 49
(3):786 – 794, 1997.
[151] K. I. Erickson, D. L. Miller, and K. A. Roecklein. The aging hippocampus
interactions between exercise, depression, and bdnf. The Neuroscientist,
18(1):82 – 97, 2012.
[152] M. J. Bland and D. G. Altman. Statistical methods for assessing agreement between two methods of clinical measurement. The Lancet Neurology, 327(8476):307 – 310, 1986.
[153] O. Carmichael, H. Aizenstein, S. Davis, J. Becker, P. Thompson,
C. Meltzer, and Y. Liu. Atlas-based Hippocampus Segmentation in Alzheimer’s Disease and Mild Cognitive Impairment. NeuroImage, 27(4):
979 – 990, 2005.
[154] R. A. Heckemann, J. V. Hajnal, P. Aljabar, D. Rueckert, and A. Hammers.
Automatic anatomical brain MRI segmentation combining label propagation and decision fusion. NeuroImage, 33(1):115 – 126, 2006.
[155] P. Aljabar, R. Heckemann, A. Hammers, J. V. Hajnal, and D. Rueckert.
Classifier Selection Strategies for Label Fusion Using Large Atlas Databases.
2007.
[156] P. Aljabar, R. Heckemann, A. Hammers, J. V. Hajnal, and D. Rueckert.
Multi-atlas based segmentation of brain images: Atlas selection and its
effect on accuracy. NeuroImage, 46(3):726 – 738, 2009.
[157] A. Hammers, R. Heckemann, M. J. Koepp, J. S. Duncan, J. V. Hajnal,
D. Rueckert, and P. Aljabar. Automatic detection and quantification
of hippocampal atrophy on MRI in temporal lobe epilepsy: a proof-ofprinciple study. NeuroImage, 36(1):38 – 47, 2007.
[158] B. B. Avants, P. Yushkevich, J. Pluta, D. Minkoff, M. Korczykowski, J. Detre, and J. C. Gee. The optimal template effect in hippocampus studies
of diseased populations. NeuroImage, 49(3):2457 – 2466, 2010.
[159] F. van der Lijn, T. den Heijer, M. M. B. Breteler, and W. J. Niessen. Hippocampus segmentation in MR images using atlas registration, voxel
classification, and graph cuts. NeuroImage, 43(4):708 – 720, 2008.
[160] J. M. P. Lotjonen, R. Wolz, J. R. Koikkalainen, L. Thurfjell, G. Waldemar,
H. Soininen, and D. Rueckert. Fast and robust multi-atlas segmentation
of brain magnetic resonance images. NeuroImage, 49(3):2352 – 2365, 2010.
[161] R. Wolz, P. Aljabar, J. V. Hajnal, A. Hammers, D. Rueckert, and The Alzheimer’s Disease Neuroimaging Initiative. LEAP: learning embeddings
for atlas propagation. NeuroImage, 49(2):1316 – 1325, 2010.
[162] M. Chupin, E. Gérardin, R. Cuingnet, C. Boutet, L. Lemieux, S. Lehéricy,
H. Benali, L. Garnero, O. Colliot, and The Alzheimer’s Disease Neuroimaging Initiative. Fully automatic hippocampus segmentation and classification in Alzheimer’s disease and mild cognitive impairment applied
on data from ADNI. Hippocampus, 19(6):579 – 587, 2009.
[163] C. A. Bishop, M. Jenkinson, J. Andersson, J. Declerck, and D. Merhof.
Novel Fast Marching for Automated Segmentation of the Hippocampus
(FMASH): Method and validation on clinical data. NeuroImage, 55(3):1009
– 1019, 2011.
[164] J. H. Morra, Z. Tu, L. G. Apostolova, A. E. Green, C. Avedissian, S. K.
Madsen, N. Parikshak, X. Hua, A. W. Toga, C. R. Jack Jr, M. W. Weiner,
P. M. Thompson, and The Alzheimer’s Disease Neuroimaging Initiative.
Validation of a fully automated 3D hippocampal segmentation method
using subjects with Alzheimer’s disease mild cognitive impairment, and
elderly controls. NeuroImage, 43(1):59 – 68, 2008.
[165] E. Geuze, E. Vermetten, and J. D Bremner. MR-based in vivo hippocampal volumetrics: 2. Findings in neuropsychiatric disorders. Molecular
Psychiatry, 10(2):147 – 149, 2005.
[166] C. Konrad, T. Ukas, C. Nebel, V. Arolt, A. W. Toga, and K. L. Narr. Defining the human hippocampus in cerebral magnetic resonance images an overview of current segmentation protocols. NeuroImage, 47(4):1185 –
1195, 2009.
[167] M. Boccardi, R. Ganzola, M. Bocchetta, M. Pievani, A. Redolfi, G. Bartzokis, R. Camicioli, J. Csernansky, M. J. de Leone, L. deToledo Morrell,
R. J. Killiany, S. Lehericy, J. Pantel, J. C. Pruessner, H. Soininen, C. Watson, S. Duchesne, C. R. Jack Jr, and G. B. Frisoni. DefiningSurvey of
Protocols for the Manual Segmentation of the Hippocampus: Preparatory Steps Towards a Joint EADC-ADNI Harmonized Protocol. Journal of
Alzheimer’s Disease, 26(Suppl. 3):61 – 75, 2011.
[168] G. B. Frisoni and C. R. Jack. Harmonization of magnetic resonance-based
manual hippocampal segmentation: a mandatory step for wide clinical
use. Alzheimer’s & Dementia, 7(2):171 – 174, 2011.
[169] M. Boccardi, M. Bocchetta, L. Apostolova, J. Barnes, G. Bartzokis, G. Corbetta, C. DeCarli, L. DeToledo-Morrell, M. Firbank, R. Ganzola, L. Gerritsen, W. Henneman, R. Killiany, N. Malykhin, P. Pasqualetti, J. Pruessner,
A. Redolfi, N. Robitaille, H. Soininen, D. Tolomeo, L. Wang, H. Watson,
S. Duchesne, C. Jack, and G. B. Frisoni. DelphiDelphi Consensus on
Landmarks for the Manual Segmentation of the Hippocampus on MRI:
Preliminary Results from the EADC-ADNI Harmonized Protocol Working Group. Neurology, 78(Suppl. 1):171 – 174, 2012.
[170] S. Tangaro, R. Bellotti, F. De Carlo, G. Gargano, E. Lattanzio, P. Monno,
R. Massafra, P. Delogu, M. E. Fantacci, A. Retico, M. Bazzocchi, S. Bagnasco, P. Cerello, S. C. Cheran, E. Lopez Torres, E. Zanon, A. Lauria,
A. Sodano, D. Cascio, F. Fauci, R. Magro, G. Raso, R. Ienzi, U. Bottigli,
G. L. Masala, P. Oliva, G. Meloni, A. P. Caricato, and R. Cataldo. MAGIC5: An Italian mammographic database of digitised images for research.
La Radiologia Medica 113, (4):477 – 485, 2008.
[171] A. Lauria, M. E. Fantacci, U. Bottigli, P. Delogu, F. Fauci, B. Golosio, P. L.
Indovina, G. L. Masala, P. Oliva, R. Palmiero, G. Raso, S. Stumbo, and
S. Tangaro. Diagnostic performance of radiologists with and without different CAD systems for mammography. International Society for Optics
and Photonics, 2003.
[172] B. van Ginneken, S. G. Armato, B. de Hoop, S. van Amelsvoort-van de
Vorst, T. Duindam, M. Niemeijer, K. Murphy, A. Schilham, A. Retico,
M. E. Fantacci, N. Camarlinghi, F. Bagagli, I. Gori, T. Hara, H. Fujita,
G. Gargano, R. Bellotti, S. Tangaro, L. Bolaos, F. D. Carlo, P. Cerello, S. C.
Cheran, E. Lopez Torres, and M. Prokop. Comparing and combining
algorithms for computer-aided detection of pulmonary nodules in computed tomography scans: The ANODE09 study. Medical Image Analysis,
14(6):707–722, 2010.
AKNOWLEDGMENT
[...] I never said thank you.
[...] And you’ll never have to.
I’ve always thought that for people who really care about you there is no
need to say thanks. By the way, this is somehow different, for all of you,
wherever you are, let me say it just once. Thanks.
171