A Classic Thesis Style
Transcription
A Classic Thesis Style
UNIVERSITÀ DEGLI STUDI DI BARI DIPARTIMENTO INTERATENEO DI FISICA “Michelangelo Merlin” DOTTORATO DI RICERCA IN FISICA XXVI CICLO Settore Scientifico Disciplinare FIS/07 Quantitative MRI analysis in Alzheimer’s disease Tutore: Prof. Roberto Bellotti Coordinatore: Prof. Gaetano Scamarcio Dottorando: Nicola Amoroso ANNO ACCADEMICO 2012-2013 CONTENTS 1 introduction 1.1 Quantitative neuroscience frontiers . . . . . . . . . . . . . . . 1.2 The Physics contribution . . . . . . . . . . . . . . . . . . . . . 1.3 An overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 the hippocampus role in alzheimer’s disease 2.1 The Alzheimer’s disease diagnosis . . . . . . . . . . . . . . . 2.2 Diagnosis criteria . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 The need for a revision of Alzheimer’s disease definition . . 2.3.1 Performance issues of the NINCDS-ADRDA criteria 2.3.2 Non-AD dementia involvement . . . . . . . . . . . . . 2.3.3 Histopathological improvements . . . . . . . . . . . . 2.3.4 AD phenotype . . . . . . . . . . . . . . . . . . . . . . . 2.4 Revising the AD definition . . . . . . . . . . . . . . . . . . . . 2.5 The need for early intervention . . . . . . . . . . . . . . . . . 2.6 The AD biomarkers . . . . . . . . . . . . . . . . . . . . . . . . 2.7 The Hippocampus anatomy . . . . . . . . . . . . . . . . . . . 2.7.1 The hippocampal head . . . . . . . . . . . . . . . . . . 2.7.2 The hippocampal body . . . . . . . . . . . . . . . . . . 2.7.3 The hippocampal tail . . . . . . . . . . . . . . . . . . . 2.7.4 General features . . . . . . . . . . . . . . . . . . . . . . 3 magnetic resonance imaging 3.1 The Magnetic Resonance effect . . . . . . . . . . . . . . . . . 3.2 Spin-magnetic field coupling . . . . . . . . . . . . . . . . . . 3.3 The Motion equation . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 The Resonance . . . . . . . . . . . . . . . . . . . . . . 3.4 Magnetization and Relaxation . . . . . . . . . . . . . . . . . . 3.4.1 Relaxation times . . . . . . . . . . . . . . . . . . . . . . 3.5 The Bloch Equation . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Signal Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Image noise and contrast . . . . . . . . . . . . . . . . . . . . . 3.7.1 Signal to noise ratio and spatial resolution . . . . . . 3.7.2 Contrast: Proton density, T1 and T2 weighting . . . . 3.8 Segmentation algorithms: state of the art . . . . . . . . . . . 4 the hippocampus segmentation 4.1 A combined strategy for segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 9 10 13 13 16 17 18 18 18 19 19 21 23 26 29 30 30 31 33 33 34 37 40 41 42 44 45 48 48 49 53 55 55 3 Why an automated segmentation? . . . . . . . . . . . . Materials: database properties . . . . . . . . . . . . . . . Preprocessing, automated registration . . . . . . . . . . 4.4.1 Registration Methodology . . . . . . . . . . . . . 4.4.2 Rigid transformations . . . . . . . . . . . . . . . 4.4.3 Non rigid transformations . . . . . . . . . . . . . 4.4.4 Intensity based registration . . . . . . . . . . . . 4.4.5 Similarity Measures . . . . . . . . . . . . . . . . . 4.4.6 Optimization algorithms for registration . . . . 4.5 Shape Analysis . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 SPHARM analysis . . . . . . . . . . . . . . . . . 4.5.2 SPHARM description . . . . . . . . . . . . . . . . 4.5.3 The SPHARM average shape algorithm . . . . . 4.6 A novel FAPoD algorithm . . . . . . . . . . . . . . . . . 4.6.1 Simulated Data . . . . . . . . . . . . . . . . . . . 4.6.2 Shape model construction . . . . . . . . . . . . . 4.6.3 Modeling the variations . . . . . . . . . . . . . . 4.7 Ensemble Classifier Segmentation . . . . . . . . . . . . 4.7.1 Voxel-wise analysis with machine learning . . . 4.7.2 Feature Extraction . . . . . . . . . . . . . . . . . 4.7.3 Classification methods . . . . . . . . . . . . . . . 4.7.4 Random Forests . . . . . . . . . . . . . . . . . . . 4.7.5 RUSBoost . . . . . . . . . . . . . . . . . . . . . . 4.8 Analyses on distributed infrastructures . . . . . . . . . 4.8.1 Medical imaging and distributed environments 4.8.2 Workflow managers . . . . . . . . . . . . . . . . 4.8.3 Workflow Implementation . . . . . . . . . . . . . 4.8.4 Distributed infrastructure employment . . . . . 4.8.5 Grid services and interface . . . . . . . . . . . . 4.8.6 Security and Data Management . . . . . . . . . . 4.8.7 Segmentation workflow deployment . . . . . . . 4.8.8 Workflow setup . . . . . . . . . . . . . . . . . . . 4.8.9 Summary . . . . . . . . . . . . . . . . . . . . . . . 5 experimental results 5.1 Hippocampus Segmentation . . . . . . . . . . . . . . . . 5.1.1 Exploratory Analysis . . . . . . . . . . . . . . . . 5.1.2 VOI extraction . . . . . . . . . . . . . . . . . . . . 5.1.3 Correlation analysis . . . . . . . . . . . . . . . . 5.1.4 Feature importance . . . . . . . . . . . . . . . . . 4.2 4.3 4.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 57 59 60 60 61 63 63 64 65 65 66 69 72 72 74 76 78 78 79 81 82 85 86 87 88 89 91 92 92 93 94 96 99 100 100 103 105 107 5.1.5 Random Forest classification . . 5.1.6 RUSBoost classification . . . . . . 5.1.7 The segmentation error . . . . . 5.1.8 Statistical agreement assessment 5.2 Alzheimer’s disease classification . . . . 5.2.1 Stability of the results . . . . . . 5.2.2 The method uncertainty . . . . . 5.3 Distributed infrastructure exploitation . 6 conclusions 6.1 Motivations and summary . . . . . . . . 6.2 Segmentation . . . . . . . . . . . . . . . 6.3 Clinical Classification . . . . . . . . . . . 6.4 Computation . . . . . . . . . . . . . . . . index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 110 111 114 119 124 127 131 135 135 137 139 140 171 5 1 INTRODUCTION Neuroscience is apporaching quantitative methods of analysis and new computing technologies to unveil the brain levels and functions. The contribution Physics can give in term of data mining strategies and big data analyses range from the methodological to the computational level. 1.1 quantitative neuroscience frontiers Neuroscience is generating exponentially growing volumes of data and knowledge on specific aspects of the healthy and diseased brain, in different species, at different ages. However, there is no effective strategy to experimentally map the brain across all its levels and functions, yet. Modern supercomputing technology offers a solution, making it possible to integrate the data in detailed computer reconstructed models and simulations of the brain. Brain models and simulations allow researchers to predict missing data and principles and to take measurements and perform experimental manipulations that would be ethically or technically impossible in animal or humans. Combined with new knowledge from big data projects across the world, in silico neuroscience has the potential to reveal the detailed mechanisms leading frome genes to cells and circuits and ultimately to cognition and behavior, the very heart of that which makes us human. According to this new perspective, the fist decade of this century has seen the spread of several notable initiatives in the neuroscience field. In 2003 the Allen Brain Atlas project was initiated, with a 100 million dollar donation. The main goals of the project were to advance the research and knowledge about neurobiological conditions and to release an open platform for both data and 7 findings, in order to allow researchers from other fields all over the world to take them into account while designing their own experiments. The main contribution to neuroscience has been the map of gene expression in the brain allowing researchers to correlate forms and functions and to compare patterns of healthy subjects with those affected by a disease. A couple of years later, in 2005, another important project was funded, the Blue Brain Project. This project led by the École Polytechnique Fédérale de Lausanne is important not only for the scientific relevance. In fact, even if primarily funded by Swiss, it is an example of the European flagship Future and Emerging Technologies. Briefly, this project represents the first attempt to re-create in a synthetic way the brain. The first goal of the project was to build a realistic and detailed model of the neocortical column. Once this goal was achieved, a further investigation aimed to larger and more accurate models took place. In the very last years, a major effort has been made. In 2009 the Human Connectome Project was launched. It is a five year project sponsored by the US National Institutes of Health. It is a part of the Blueprint Grand Challenges and its goal is to map the connections within the healthy brain. It is expected to help answer questions about how genes influence brain connectivity, and how this in turn relates to mood, personality and behavior. Another of its goals is the optimization of brain imaging techniques to see the brain’s wiring in unprecedented detail. However, it is 2013 what eventually will be recorded as the neuroscience annus mirabilis. In fact, respectively in January and April, the two projects Human Brain Project and BRAIN Initiative (BRAIN stands for Brain Research through Advancing Innovative Neurotechnologies ) were funded by the European Commission and the US administration. The main reason behind these efforts is that understanding the human brain is one of the greatest challenges facing 21st century science. The challenge rewards are enormous: a. gaining profound insights into what makes us human and brain functionalities; b. development of new treatments for brain diseases; c. implementation of revolutionary new computing technologies. At the present, for the first time in history, modern Infrastructure and Communications Technology seems to have brought these goals within sight. 8 1.2 the physics contribution Medicine is experiencing a data explosion, primarily driven by advances in genetics and imaging technology. However, effective strategies to integrate the data and to identify the unique "biological signatures" of neurological and psychiatric diseases still lack. In this sense the contribution of physical methodologies, deriving from the expertise gained in analyzing and interpreting big data, for example in high energy or nuclear physics, can be strategic. In fact, new databasing and data mining technologies now make it possible to federate and analyze the huge volumes of data accumulating in hospital archives, leading eventually researchers to identify the biological changes associated with disease and opening new possibilities for early diagnosis and personalized medicine. In the longer term, a multidisciplinary integrated approach will allow to modify models of the healthy brain to simulate disease. Disease simulation will provide researchers with a powerful new tool to probe the causal mechanisms responsible for disease, and to screen putative treatments, accelerating medical research and reducing the huge suffering and costs associated with diseases of the brain. The more urgent tasks neuroscience has to take into account are: to federate data of clinical, genetics and imaging provenance, to develop tools making it possible to extract unique disease signatures or to unveil patterns, to access multi-level data analysis in order to combine different informations and sources, to define classification models based on biological features and markers for brain diseases. These tasks naturally require the development of a new class of data classification techniques, new computational strategies and algorithmic solutions. Above all, these tasks require a comprehensive approach, able to take into account both scientific and computational concerns. The need for dedicated informatics infrastructures to manage workflows for "big data" analyses can only be faced in a multidisciplinary framework. Specific analytic methodologies and infrastructures that will conjointly enable researchers and physicians to determine the biological signatures of brain diseases have to be carefully planned to make an efficient use of computing distributed structures modern ICT provides nowadays. In particular two research lines are explored in this work: a. The development, the implementation and the validation of pattern recognition algorithms capable of dealing with data (both numerical and textual) that are: of heterogeneous nature, from multiple sites and times, comprising at least medical images and meta-data (concerning age, sex, etc.). The algorithms should identify reliable and robust patterns within 9 the data and identify categories sharing common disease signatures. This is realized through the development of a fully automated segmentation workflow and using its outputs as clinical signatures for the Alzheimer’s disease. b. The development and the implementation of data workflows for automated processing of magnetic resonance scans. The proposed solution includes state of the art methods for quality controls and traceability of the workflow. Another not secondary aspect to be concerned with is the adoption of reliable security protocols. According to this the present work discusses how the typical methodologies of physical analyses apply to the construction of medical imaging supervised classifications. In particular focus is given to the hippocampal segmentation, as a challenging task which involves all the major issues previously presented. 1.3 an overview Hippocampus is a small brain structure which plays a relevant role in a number of physiological processes. In particular a strong connection between its atrophy and the manifestation of several neurodegenerative diseases has been widely established. In Chap. 2 the particular case of the Alzheimer’s disease is discussed with the privileged role Hippocampus plays on the diagnosis of this disease. In fact, it will be examined the chronological development of the Alzheimer’s disease diagnosis criteria and how in the last years, several imaging bio-markers have been proved to be supportive features for the diagnosis. A major focus will be given to hippocampal atrophy. The measure of the hippocampal volume is correlated to neurodegenerative processes and as a consequence diseased brains show atrophic situations, in particular for Alzheimer’s disease this is what happens to Hippocampus, and therefore it would be possible to detect pathological conditions by comparing healthy hippocampal volumes with those of the examined subjects. The Hippocampus is a brain structure with a peculiar convoluted shape, surrounded by other brain structures sharing the same composition, therefore it results difficult to be segmented even for expert neuroradiologists. These difficulties and the correlation with the hippocampal anatomy will be pointed out at the end of Chap. 2. Nevertheless, structural imaging in the last decades has developed hardware innovations which could make this task easier than before if conveniently exploited. Specifically the introduction of high field magnetic resonance scanners is the most noticeable technical improvement as it is able 10 to reduce the signal to noise ratio of the scans and therefore to improve the image quality. In Chap. 3 an overview of the physical concepts of the magnetic resonance leading to the development of magnetic resonance imaging is given, besides, signal and noise are discussed with a particular attention to the latest developments for high field scanners. In the last decades the hippocampal segmentation has relied principally on manual or semi-automated protocols; the ICT improvements of the last years seem to have pushed several scientists to make higher efforts to reach fully automated procedures. In Chap. 4 the detailed description of the most recent strategies to achieve fully automated hippocampal segmentation is provided. Then a further insight is given about weaknesses of the previous methodologies and therefore the novel solutions proposed in this work. The proposed methodology relies on the use of supervised learning algorithms to distinguish which voxels (the unit of 3-D images) of a magnetic resonance scanner should be labeled to belong or not to the Hippocampus. The proposed methodology makes use of modern classifiers as Random Forests and RUSBoost. As it will be explained in Chap. 4 this choice is strictly tied with the particularly high imbalance characterizing hippocampal images, i. e. the imbalancing between the hippocampal volume and the background. Given the small hippocampal dimensions, this imbalance is always present. The automated segmentation algorithm described in Chap. 4 is thought to be able not only to provide a fully automated computer aided detection system, but also to provide a general scheme for medical imaging workflows. In fact, the proposed analysis is completely modular and is able to employ several kinds of distributed computing infrastructures. Thus, in Chap. 4 a detailed description of the workflow implementation is given; besides further insight on the distributed infrastructure exploitation and security data management is furnished. The implemented solutions allow the dynamic use of local computer clusters as the BC2 S farm of the Istituto Nazionale di Fisica Nucleare in Bari or the geographically distributed grid. Moreover, the proposed solution not only guarantees the modern security standard protocols but also a null failure rate thanks to automated job re-submission tools. In Chap. 5 the results are presented. Particular emphasis is given to the segmentation performances which compare well with other state of the art algorithms. It is worthwhile to note how these results are obtained without any assumption or knowledge on the clinical state of the examined subjects. The segmentation performances are then studied from another perspective using a different validation database. For this dataset a longitudinal analysis is performed and a classification probability is given for the Alzheimer’s disease diagnosis. The main result in this case is that hippocampal atrophy is 11 confirmed as a biomarker of the Alzheimer’s disease and according to the volumetric measures obtained through segmentation it is possible to distinguish between healthy subjects and those affected by mild cognitive impairment or the Alzheimer’s disease itself. These distinctions are statistically significant. The chapter is concluded with a detailed description of the computational performances, in particular a major attention is given to the strategy employed to decrease the computational times and the job failure rate. The presented results are finally discussed in Chap. 6. Some conclusions and suggestions are drawn, besides possible directions for future research are also addressed. 12 2 THE HIPPOCAMPUS ROLE IN ALZHEIMER’S DISEASE The Alzheimer’s disease is one of the most common neurodegenerative diseases. One of the supportive features for its diagnosis is the atrophy assessment of the medial temporal lobe through structural magnetic resonance imaging. Particularly relevant at this aim is the volumetric measure of the Hippocampus. 2.1 the alzheimer’s disease diagnosis Alzheimer’s disease (AD) is the most common type of dementia [1]. "Dementia" is by definition a term describing a variety of diseases and conditions affecting the normal functions of brain. The death or the malfunction of neurons causes in fact changes in one’s memory behavior and capability of clearly thinking. The pathologic characteristics are degeneration of specific nerve cells, presence of neuritic plaques and, in some cases, noradrenergic and somatostatinergic systems that innervate the telencephalon. For research purposes the AD diagnosis is based on general criteria usually defined by the Diagnostic and Statistical Manual of Mental Disorders fourth edition (DSM-IV) [2] and specifically by the National Institute of Neurological and Communicative Disorders and Stroke (NINCDS) [3] and by the Alzheimer’s Disease and Related Disorders Association (ADRDA) [3]. These criteria have been extremely useful and have survived intact without modification for more than a quarter of century. According to them, the following requirements are needed to support a dementia diagnosis: a. Symptoms must include decline in memory and cognitive functions as: 13 a) the ability to speak or understand spoken or written language; b) the ability to recognize or identify objects; c) the ability to perform motor activities; d) the ability to think abstractly, make sound judgments and carry out complex tasks. b. The decline in cognitive abilities must be severe enough to have repercussions in everyday life. From a physician point of view establishing a dementia diagnosis is therefore equivalent to determine the causes of the above cited symptoms, in fact some conditions could be caused by symptoms that mimic dementia but that can be reversed with treatment. This is statistically found in the 10% of dementia cases, common causes are depression, delirium, medication side effects, thyroid problems, vitamin deficiencies and alcohol abuse. In contrast, AD and other dementias are caused by damage to neurons that cannot be reversed with current treatments. Different types of dementia are associated with distinct symptom patterns and brain abnormalities, as described in the following table: Alzheimer’s disease Vascular dementia 14 Most common type of dementia; accounts for an estimated 60 to 80 percent of cases. Difficulty to remember names and recent events is often an early clinical symptom; apathy and depression are also often early symptoms. Later symptoms include impaired judgment, disorientation, confusion, behavior changes and difficulty speaking, swallowing and walking. Hallmark brain abnormalities are deposits of the protein fragment beta-amyloid (plaques) and twisted stands of the protein tau (tangles). Previously known as multi-infarct or post-stroke dementia, vascular dementia is less common as a sole cause of dementia than is AD. Impaired judgment or ability to make plans is more likely to be the initial symptom, as opposed to the memory loss often associated with the initial symptoms of Alzheimer’s. Vascular dementia occurs because of brain injuries. The location of the brain injury determines how the individual’s thinking and physical functioning are affected. Dementia with Levy People with DLB have some of the symptoms bodies (DLB) common in AD, but are more likely than people with Alzheimer’s to have initial or early symptoms such as sleep disturbances or well-formed visual hallucinations. DLB alone can cause dementia, or it can occur with AD and/or vascular dementia. The individual is said to have "mixed dementia" when this happens. Frontotemporal lobar Typical symptoms include changes in personality degeneration (FTLD) and behavior. Nerve cells in the front and side regions of the brain are especially affected. No distinguishing microscopic abnormality is linked to all possible cases. Parkinson’s disease As Parkinson’s disease progresses it often results in a severe dementia similar to DLB or AD. Problems with movement are common symptom early in the disease. The Parkinson’s disease incidence is about one tenth of AD subjects. Table 1: A schematic view of the different types of dementia and of their principal symptom patterns. A diagnosis of AD is most commonly made by an individual’s primary care physician. The physician obtains relevant informations as family history, including psychiatric history and history of cognitive and behavioral changes. He also conducts cognitive tests and physical and neurological examinations; in particular he can request the individual to undergo magnetic resonance imaging (MRI) scans. MRI scans may help identifying brain changes, such as the presence of a tumor or the evidence of a stroke, that could explain the individual symptoms. With the continuous developments in the neuroimaging field, MRI is no longer helpful only to exclude other causes of the individual’s symptoms but it has also become a supportive feature for early AD diagnosis [4]. 15 2.2 diagnosis criteria Since early 80’s the established criteria for AD diagnosis were based on discriminant features, supportive features and consistent features as schematically reported in the following. a. The criteria for the clinical diagnosis of probable Alzheimer’s disease include: • Dementia established according to Mini-Mental test [5], Blessed Dementia Scale [6] or similar examination; • Deficits in two or more areas of cognition; • Progressive worsening of memory and other cognitive functions; • No disturbance of consciousness; • Onset between ages 40 and 90, most often after age 65; • Absence of systemic disorders or other brain diseases that could account for the progressive deficits in memory and cognition. b. The diagnosis of probable AD is supported by: • Progressive deterioration of specific cognitive functions such as language (aphasia), motor skills (apraxia) and perception (agnosia); • Impaired activities of daily living and altered patterns of behavior; • Family history of similar disorders, particularly if confirmed neuropathologically; • Laboratory results of: normal lumbar puncture as evaluated by standard techniques, normal pattern or nonspecific changes in EEG, such as increased slow-wave activity, evidence of cerebral atrophy with progression documented by serial observation. c. Other clinical features consistent with the diagnosis of probable AD, after the exclusion of causes of dementia other than AD, include: • Plateaux in the course of progression of the illness; • Associated symptoms of depression, insomnia, incontinence, illusions, hallucinations, catastrophic verbal, emotional or physical outbursts, sexual disorders and weight loss; • Other neurological abnormalities in some patients, especially with more advanced disease and including motor signs such as increased muscle tone, myoclonus or gait disorder; 16 • Seizures in advanced disease; • Computed tomography normal for age. d. Criteria diagnosis of definite Alzheimer’s disease are: • The clinical criteria for probable AD; • Histopathological evidence obtained from a biopsy or autopsy. It is worthwhile to note that a definite diagnosis of AD is only made when there is histopathological confirmation of the clinical diagnosis. These widely adopted criteria have been extremely useful and have survived intact without modification for more than a quarter of a century. However, since the publication of the NINCDS-ADRDA criteria in 1984 the comprehension of the biological basis of AD has advanced greatly allowing to understand in an unprecedented way the disease process. Distinctive markers of the disease are now recognized including structural brain changes visible on MRI with early and extensive involvement of the medial temporal lobe (MTL); molecular neuroimaging changes seen with PET with hypomethabolism or hypoperfusion in tempoparietal areas, and changes in cerebrospinal fluid (CSF) biomarkers. 2.3 the need for a revision of alzheimer’s disease definition By 2009, broad consensus existed throughout academia and industry that the NINCDS-ADRDA criteria should be revised in order to incorporate scientific advances in the field. As a consequence of the technological advancements and the improvements in the comprehension of the disease, particularly intense efforts have been accomplished to characterize the earliest stages of AD [7]. The original NINCDS-ADRDA criteria rested on the notion that AD is a clinical-pathological entity. The criteria were designed with the expectation that in most cases subjects who met the clinical criteria would have AD pathology as the underlying etiology if the subject were to be presented for an autopsy. It was believed that AD, like many other brain diseases, always exhibited a close correspondence between clinical symptoms and the underlying pathology, such that AD pathology and clinical symptoms were synonymous, and individuals either had fully developed AD pathology, in which case they were not demented. However it has become clear by far that this clinicalpathological correspondence is not always consistent. Extensive AD pathology, particularly diffuse amyloid plaques, can be present in the absence of any obvious symptoms [8, 9]. 17 Additionally, AD pathophysiology can manifest itself with clinically atypical presentations and prominent language and visuospatial disturbances [10, 11]. The 1984 criteria did not account for cognitive impairment that did not reach the threshold for dementia nor for the fact that AD develops slowly over time, with dementia representing the end stage. This is why in the very last few years a debate about the revision of the diagnosis criteria for AD has developed. 2.3.1 Performance issues of the NINCDS-ADRDA criteria The NINCDS-ADRDA criteria have been validated against neuropathological gold standards with accuracy ranging from 65% − 96% [12, 13]. However, the specificity of these diagnostic criteria against other dementias is only 23% − 88% [14, 15]. The accuracy of these estimates is difficult to assess, given that the neuropathological standard is not the same in all studies. Nevertheless, the low specificity must be addressed through both revised AD and accurate nonAD dementia diagnostic criteria. This is very important because it can be seen as the main motivation for the research of alternative supporting diagnosis features. 2.3.2 Non-AD dementia involvement Since the publication of the NINCDS-ADRDA criteria, operational definition and characterization of non-AD dementia has improved. Entities for which there are diagnostic criteria include the frontotemporal lobar degenerations (frontotemporal dementia frontal variant, semantic dementia, progressive nonfluent aphasia) [16, 17, 18], corticobasal degeneration [19, 20], posterior cortical atrophy [21], dementia with Lewy bodies [22] and vascular dementia [23, 24]. Many of these disorders can fulfill the NINCDS-ADRDA criteria and it is likely that they have been included in AD research studies. Meanwhile, for each of these disorders, criteria have been developed that aim for high specificity. The development of disease-specific criteria that are applicable in some cases before dementia is fully manifested has been enabled the criteria to be used without going through the two-step process of dementia recognition (the syndrome) followed by the specific disease (the aetiology). 2.3.3 Histopathological improvements The histopathological diagnosis of the non-AD dementias has also advanced. In the example of frontotemporal lobar degeneration, the identification of ubi- 18 quitin immunoreactive cytoplasmic and intranuclear inclusions as an important pathology in patients has reduced the neuropathological diagnostic prevalence of dementia lacking distinctive histopathology from 40% to 10% in autopsy series [25, 26]. There is no doubt that progress in the clinical definition of non-AD dementia improves the sensitivity of the currently accepted diagnostic criteria for AD by reducing the level of uncertainty. Another important improvement is yielded by a most detailed knowledge of AD phenotype. 2.3.4 AD phenotype In most patients (86% − 94%) there is a progressive amnestic core that appears as an impairment of episodic memory [27, 28]. The pathological pathway of Alzheimer’s related changes has been fully described [29, 30] and involves the medial temporal structures (eg, enthorinal cortex, hippocampal formation, parahippocampal gyrus) early in the course of the disease. Moreover, the episodic memory disorder of AD correlates well with the distribution of neurofibrillary tangles within the MTL and with MRI volumetric loss of the Hippocampus, structures known to be critical for episodic memory. The availability of neuroimaging techniques that can reliably measure the MTL have further supported this vital clinico-neuroanatomic correlation. 2.4 revising the ad definition The previously described advances provide in vivo evidence of the disease and therefore the necessity to conceptualize a novel diagnosis framework, with a particular attention to atypical presentations and early stages [31] as shown in the following table: Alzheimer’s disease This diagnostic label is restricted to the clinical disorder. AD diagnosis is established in vivo relying evidence of both specific memory changes and in vivo markers. 19 Preclinical states There is a long asymptomatic phase between the earliof AD est pathogenic events/brain lesions of AD and the first appearance of specific cognitive changes. Two preclinical states can be isolated at present: • Asymptomatic at-risk-state for AD characterized by in vivo evidence of amyloidosis in the brain or in the CSF; • Presymptomatic AD state applies to individuals from families affected by rare autosomal AD mutations. Prodromal AD AD dementia Typical AD Atypical AD Mixed AD Alzheimer’s pathology 20 This term refers to the early symptomatic phase. It is determined by clinical symptoms, not enough severe to affect daily living, and biomarker evidence. It is possible that future AD definitions will include this phase. AD dementia refers to cases in which clinical symptoms start to affect daily activities. When an early significant and progressive episodic memory deficit appears and is followed by other cognitive impairments (as executive dysfunction or language impairments) clinical phenotype of AD is matched and the individuals are affected by typical AD. This situation is referred to less common and well characterized clinical phenotypes as primary progressive non-fluent aphasia or posterior cortical atrophy. This term refers to patients who fully fulfill the diagnostic criteria of typical AD and are additionally affected by other comorbid disorders such as possibly DLB. Alzheimer’s pathology refers to neurobiological changes that span the earliest pathogenic events, even when lacking of clinical manifestation. Mild cognitive This term applies to individuals with measurable MCI impairment in the absence of a significant effect in daily living. This (MCI) label is applied if there is no pathology to which MCI can be attributed and it remains as a threshold for individuals who are suspected to be affected by AD but do not fulfill the diagnosis criteria because they deviate clinical phenotype of prodromal AD or because they are biomarker negative. Table 2: The revised lexicon of AD. Particular attention is given to recent advances in use of reliable biomarkers of AD for definition of early stages of the disease as the Prodromal AD or ambiguous situations as for the MCI. A definition of Prodromal AD has been introduced to take into account the symptomatic predementia phase of AD, generally included in the mild cognitive impairment category which is characterized by symptoms not severe enough to meet the accepted diagnostic criteria for AD [32]. It must be distinguished within the broad and heterogeneous state of cognitive functioning that falls outside normal aging. This state has been described by a wide range of nosological terms including age-associated memory impairment, agerelated cognitive decline, age-associated cognitive decline, mild neurocognitive disorder, cognitively impaired not demented, and mild cognitive impairment [33, 34, 35, 36]. Mild cognitive impairment (MCI) is the most widely used diagnostic term for the disorder in individuals who have subjective memory or cognitive symptoms, objective memory or cognitive impairment, and whose activities of daily living are generally normal. Progression to clinically diagnosable dementia occurs at a higher rate from MCI than from an unimpaired state, but is clearly not the invariable clinical outcome at follow-up. Therefore a more refined definition of AD is still needed to reliably identify the diseases at earliest stages [37]. 2.5 the need for early intervention The rapid growth of knowledge about the potential pathogenic mechanisms of AD including the amyloidopathy and tauopathy has spawned numerous experimental therapeutic approaches to enter into clinical trials. There is accruing evidence that, years before the onset of clinical symptoms, there is an AD process evolving along a predictable pattern of progression in the brain [29, 30]. The neurobiological advantage of earlier intervention within this cas- 21 cade is clear. Earlier intervention with disease-modifying therapies is likely to be more effective when there is a lower burden of amyloid and hyperphosphorylated tau and may truncate the ill effects of secondary events due to inflammatory, oxidation, excitotoxicity and apoptosis [4]. Early intervention may also be directly targeted against these events because they may play an important role in early phases of AD. By time there is clear functional disability, the disease process is significantly advanced and even definitive interventions are likely to be suboptimal. Revised research criteria would allow diagnosis when symptoms first appear, before full-blown dementia, thus supporting earlier intervention at the prodromal stage. In this sense a sound definition of mild cognitive impairment is necessary, nevertheless several issues still stand, especially with regards of randomized controlled trials. With only small variations in the inclusion criteria for mild cognitive impairment, four trials (ADCS-MIS, InDDEx, Gal-Int 11 and Rofecoxib) have had a very wide range of annual rates of progression to AD dementia [38, 39, 40, 41]. The intention in these trials on mild cognitive impairment was to include many individuals with prodromal AD (i. e., individuals with symptoms not sufficiently severe to meet currently accepted diagnostic criteria) that later progress to meet these criteria. When the mild cognitive impairment inclusion criteria of these trials were applied to a cohort of memory clinic patients in an observation study, they had diagnostic sensitivities of 46% − 88% and specificities of 37% − 90% in identifying prodromal AD [42]. Given these numbers, these trials have clearly treated many patients who do not have AD or are not going to progress AD for a long time. This has diluted the potential for a significant treatment effect and may have contributed to the negative outcomes where non of these drugs were successful at delaying the time to diagnosis of AD. These trials have also incurred significant costs where sample sizes of 750 to 1000 have been called for with durations of 3 − 4 years. Increasing the severity of mild cognitive impairment needed for inclusion in trials might improve sensitivity, specificity and predictive values. However, participants would then be much closer to the current dementia threshold and would have a greater pathological burden, making the clinical gain marginal and disease modification difficult. Neuropathological findings in mild cognitive impairment have also reinforced the heterogeneity of the clinical disorders subsumed under the definition mild cognitive impairment. To address the recognized clinical and pathological heterogeneity, it has been proposed that subtyping of mild cognitive impairment might be useful. The term amnestic mild cognitive impairment has been proposed to include individuals with subjective memory symptoms, objective memory impairment and with other cognitive domains and activities 22 of daily living generally assessed as being normal. However, only 70% of a selected cohort of people with amnestic cognitive impairment clinically identified to have progressed to dementia actually met neuropathological criteria for AD [43]. This finding indicates that applying the criteria for this subtype of mild cognitive impairment clinically, without other evidence such as neuroimaging or results of cerebrospinal fluid analyses, will lack specificity for predicting the future development of AD since at least 30% of cases will have non-AD pathology. In the planning of trials of disease-modifying treatment, special care will be needed to limit not only the exposure of potentially toxic therapies to those with prodromal AD but also to reliably exclude those who are destined to develop non-AD dementia. 2.6 the ad biomarkers Over the past two decades since the NINCDS-ADRDA criteria had been published, great progress was made in identifying the AD-associated structural and molecular changes in the brain and their biochemical footprints. MRI enables detailed visualization of MTL structures implicated in the core diagnostic feature of AD, PET with fluorodeoxyglucose (FDG) has been approved in the USA for diagnostic purposes ans is sensitive and specific in detecting AD in early stages, cerebrospinal fluid biomarkers for detecting the key molecular pathological features of AD in vivo are available and can be assessed reliably [44, 45, 46]. According to this, novel frameworks have been proposed for the designation of probable AD [4] even if retaining the old ones. In particular novel frameworks address the disease presentation that is typical for AD. Atypical presentations are excluded such as those presenting focal cortical syndromes (primarily progressive aphasia, visuospatial dysfunction) where ante mortem diagnosis would at best receive the designation of possible AD from the framework itself. This may change in the future as work on diagnostic biomarkers advances and reliance on a well characterized clinical phenotype is lessened. In the absence of completely specific biomarkers, the clinical diagnosis of AD can still be only probabilistic, even in the case of typical AD. To meet criteria for probable AD, an effected individual must fulfill a core criterion and at least one of the following supportive features: a. The core clinical criterion: a) gradual and progressive change in memory function at disease onset reported by patients or informants for a period greater than 6 months. 23 The reporting of subjective memory complaints is a common symptom in aging population, however such self-reported symptom is associated with a high risk of future development of AD and therefore should be carefully taken into account. b) Objective evidence of significantly impaired episodic memory on testing A diagnosis of AD requires an objective deficit on memory testing. This generally consists of recall deficit that does not improve significantly or does not normalize with cueing or recognition testing and after effective encoding of information has been previously controlled. c) The episodic memory impairment can be isolated or associated with other cognitive changes at onset of AD or as AD advances In most cases, even at the earliest stages of the disease, the memory disorder is associated with other cognitive changes. As AD advances, these changes become notable and can involve the following domains: executive function (conceptualization with impaired abstract thinking; working memory with decreased digit span or mental ordering; activation of mental set with decreased verbal fluencies); language (naming difficulties and impaired comprehension); praxis (impaired imitation, production, recognition of gestures); complex visual processing and gnosis (impaired recognition of objects or faces). b. Supportive features a) Atrophy of medial temporal structures on MRI Atrophy of the MTL on MRI seems to be common in AD (71% − 96% depending on disease severity), frequent in mild cognitive impairment (59% − 78%), but less frequent in normal aging (29%) [47, 48]. MTL atrophy is related to the presence of AD neuropathological and its severity, both in terms of fulfillment of AD neuropathological criteria and Braak stages [44, 49]. MRI measurements of MTL structures include qualitative ratings of the atrophy in the hippocampal formation or quantitative techniques with tissue segmentation and digital computation of volume. Both techniques can reliably separate AD group data from normal age-matched control group data, with sensitivities greater than 85% [50, 51, 52, 53]. In studies of mild cognitive impairment the accuracy of MTL atrophy measures in identifying prodromal AD has been generally lower, 24 possibly because individuals who did not meet currently accepted AD diagnostic criteria at study completion included some cases that would have done so at a later time Qualitatively MTL ratings can identify prodromal AD; however the sensitivities and the specificities, respectively of 51% − 70% and 68% − 69% at present limit their usefulness. The predictive usefulness of quantitative measures of hippocampal volume in identifying prodromal AD is inconsistent, measures of hippocampal subfields might be more useful than measures of the entire structure [54, 55]. In turn there is a potential incremental value of MTL measurements. In several studies MTL measures (quantitative and qualitative) contributed independently of memory scores to the identification of prodromal AD. The reported accuracy increased from 74% to 81% and from 88% to 96% when MTL measures were added to age and memory scores, respectively. Inclusion of MTL atrophy as a diagnostic criterion of AD, irrespective of the age at onset, mandates exclusion of other causes of MTL structural abnormality including bilateral ischaemia, bilateral hippocampal sclerosis, herpes simplex encephalitis and temporal lobe epilepsy [56, 57, 58, 59]. b) Abnormal cerebrospinal fluid biomarkers In the NINCDS-ADRDA guidelines, cerebrospinal fluid examination was recommended as an exclusion procedure for non-AD dementia, due to inflammatory disease, vasculitis or demyelination. Since then, there has been a lot of research into the usefulness of AD-specific biomarkers that are reflective of the central pathogenic processes of amyloid β aggregation and hyperphosphorylation of τ protein. These markers have included amyloid β1−42 , total τ and phosphoτ. In AD the concentration of β1−42 in cerebrospinal fluid is low and that of total τ and phospho-τ are high compared with those in healthy controls. Combinations of abnormal markers reached sensitivities and specificities greater than 90% and 85% respectively [60]. c) Specific metabolic pattern evidence with molecular neuroimaging methods PET and single photon emission computed tomography (SPECT) are in vivo nuclear radioisotopic scans that can measure blood flow, glucose metabolism and, more recently, protein aggregates. Within an AD diagnostic framework their ideal role is to increase the specificity of clinical criteria. 25 For instance, a reduction of glucose metabolism as seen in PET in bilateral temporal parietal regions and in the posterior cingulate is the most commonly described diagnostic criterion for AD. There are promising techniques that provide visualization of amyloid and potentially neurofibrillary tangles. Furthermore, these visualization techniques clearly have the potential of increasing the usefulness of PET in AD within the diagnostic framework, but their diagnostic accuracy, in particular their specificity for AD, requires further investigation as there is evidence of high AD-like behavior in some healthy people and some people with mild cognitive impairment [61, 62, 63]. Because SPECT is more widely available and cheaper than PET, it has received much attention as an alternative technique. However, at present, the technique has not yet reached sound accuracy levels. d) Familial genetic mutations Three autosomal dominant mutations that cause AD have been identified on chromosomes 21, 14 and 1. The presence of a proband with genetic-testing evidence of one of these mutations can be considered as strongly supportive for the diagnosis of AD for affected individuals within the immediate family who did not themselves have a genetic test for this mutation. If individuals with a positive mutation history of the described type present with the core amnestic criterion (A), they should be considered for probable AD. From the previous overview it is clearly underlined how quantitative measurements can have huge impact especially with supportive features. In particular the atrophy of medial temporal structures and especially of the Hippocampus can be assessed with structural imaging measures. In this way the role of the hippocampal segmentation can be decisive, even if the challenge is not trivial especially because of its complex anatomy. 2.7 the hippocampus anatomy The hippocampal anatomy description is a difficult task for two distinct problems: a. The complexity of the Hippocampus itself, which makes it one of the most mysterious structures of the central nervous system; 26 b. A great confusion has affected its terminology since the first studies appeared almost a hundred years ago. The very first description of the Hippocampus and the word itself were coined in 1587 by the Italian anatomist Julius Caesar Arantius [64] who compared the protrusion on the floor of the temporal horn to a Hippocampus or sea horse. Several terminologies are available, but the views of Lewis [65] have been used in this thesis. After almost a century of confusion, the terminology that is currently in use needs to be clarified. The name Hippocampus is referred to the entire ventricular protrusion which comprises two cortical laminae rolled up one inside the other: the cornu Ammonis and the textitgyrus dentatus. The subiculum is sometimes considered as a part of the main structure. The general situation of the Hippocampus in relation to the hemisphere is shown in the high resolution Fig. 1 representing a 3-D view of the International Consortium for Brain Mapping (ICBM) ICBM152 template 1 . Fig. 1: The figure shows a T1 high resolution brain template from the International Consortium for Brain Mapping. The primary goal of the ICBM project is the development of a probabilistic reference system for the human brain. 1 http://www.loni.ucla.edu/ICBM/ 27 The Hippocampus is prolonged by the subiculum which forms part of the parahippocampal gyrus and the amygdala; both these gray matter structures surrounding the Hippocampus make definitely the precise segmentation of the Hippocampus itself an awkward problem. The Hippocampus forms an arc whose shape is enlarged in the anterior extremity then narrowing like a comma (Fig. 2). Fig. 2: The figure shows the intraventricular aspect of the Hippocampus. 1, hippocampal body; 2, hippocampal head and its digitations; 3, hippocampal tail; 4, fimbria; 5, crus of the fornix; 6, subiculum; 7, splenium of the corpus callosum; 8, calcar avis; 9, collateral trigone; 10, collateral eminence; 11, uncal recess of the temporal horn. Image scale of 1 cm is represented in the right lower corner. It can be divided in three parts: a. an anterior part, the head; b. a middle part, the body; 28 c. a posterior part, the tail. The hippocampal length is of about 4.5 cm, with the head being wide on average 1.5 − 2.0 cm. The mean hippocampal volume is of about 3300 mm3 , no particular differences exist between the right and the left Hippocampi even if recent studies show the right ones being slightly greater than the left and male size slightly greater than the female one [66, 67]. 2.7.1 The hippocampal head The head is the anterior part of the arc of the Hippocampus, it consists of an intraventricular and an extraventricular part. The intraventricular part features the digitationes Hippocampi on the intraventricular side, when they appear at the junction of the body and the head the fimbria gives way to a thick alveus that covers them. The digitations and the amygdala are often joined together with the intraventricular surface of the amygdala overlapping almost the entire surface of the hippocampal head Fig. 3. Fig. 3: The figure shows a T1 sagittal view of the ICBM 152 template; the Hippocampus and the adjacent amygdala boundaries were manually pointed out. 29 The extraventricular part mainly consists of an inferior surface, visible only after ablation of the parahippocampal gyrus, divided into the band of Giacomini, the external digitations and the inferior surface of the uncal apex. 2.7.2 The hippocampal body As for the head, two views can be considered, the intraventricular and the extraventricular descriptions. The intraventricular part of the body is an element of the floor in the lateral ventricular (temporal or inferior horn). It is a strongly convex protrusion, smooth and padded with ependyma covering the alveus. Numerous subependymal veins radiate on its surface. The body is surrounded medially by the fimbria and laterally by the narrow collateral eminence. The roof of the temporal horn overhangs the intraventricular part, it is composed of the temporal stem, the tail of the caudate nucleus and the stria terminalis. The extraventricular part is visible on the medial surface of the temporal lobe; it is limited by the gyrus dentatus, the fimbria and the superficial hippocampal sulcus. The fimbria is a narrow white stripe that hides partially the margo denticulatus which is the superficial part of the gyrus dentatus. The margo denticulatus is bordered inferiorly by the superficial hippocampal sulcus which separates it from the adjacent subiculum. 2.7.3 The hippocampal tail The tail is the posterior part of the Hippocampus and as the head or the body its structure can be divided in intraventricular and extraventricular parts. The intraventricular part reminds the head in shape but it as a smaller size. Although digitations do not appear on the surface of the tail, its internal structures is similar to that of the head as it is composed mainly of a vast layer of the cornu Ammonis. The intraventricular surface of the tail is thickly covered by the alveus and the subependymal veins. It is medially flanked by the fimbria and laterally by the collateral trigone; the flat surface of the collateral trigone and the intraventricular part of the tail form together the floor of the atrium (Fig. 2). The extraventricular part of the tail may be divided into an initial segment which is a continuation of the body, a middle segment and a terminal one. The initial segment resembles the body, the margo denticulatus of the tail is divided into two dentes which successively decrease in size. In the middle segment some important changes appear, the most important is that concerning the fimbria which in the initial segment hides the margo denticulatus while successively separates from it ascending to join the crus of fornix. 30 The main parts of the middle segment of the hippocampal tail are: the gyrus fasciolaris, the fasciola cinerea and the gyri of Andreas Retzius. The last segment of the tail covers the inferior splenial surface and alone merits the name subsplenial gyrus. 2.7.4 General features As a summary of the properties encountered so far, some points should be outlined: a. The Hippocampus has throughout its entire structure the same composition. The cornu Ammonis has in fact an analogous hierarchical organization in the head, the body and the tail; the same can be said for the gyrus dentatus even if from a strictly terminological point of view this could be seem untrue. In fact, the visible segment of the gyrus dentatus is known as margo denticulatus in the body, band of Giacomini in the uncus and fasciola cinerea in the tail. However, it is the same structure and the same term could be used for its whole length. b. Because of its arched shape, the coronal view of the body, the sagittal views of the tail and of the head have similar appearance. c. Also because of the hippocampal curve, coronal sections are often difficult to interpret, nevertheless they are the most used to trace the hippocampal boundaries especially since 3D-software have proven themselves to be a valid help for the work of the expert neuroradiologists. This summary points out the shape complexity of the Hippocampus suggesting thus which difficulties expert neuroradiologists must face when dealing with its segmentation. However it should also be noted that, for automated or semi-automated segmentation tools, the primary difficulty arises from the need to discriminate neighbor structures and especially the amygdala which shares the same material composition. Finally, multiple studies have explored the relationship between MRI-based volumetric measurements. Considerable variability exists with regard to the reported volumetric values of the Hippocampus; results showed that it is an asymmetrical structure, with larger right hippocampal volumes (p = 0.001) and that differences in MRI magnet field strength and slice thickness values might differentially contribute to volumetric asymmetry estimates [68]. Besides, right Hippocampi seem to show a higher variability in terms of shape, making them more difficult to segment [69]. Hippocampal volume asymmetry is associated 31 with dementia, may be increased in mild cognitive impairment and correlates with cognitive performances [70, 71, 72]. However, remarkable systematic errors arise from the right/left visual bias that may cause estimated volumes to depend on the orientation of the images presented to a human rater. The adoption of more and more refined manual labeling protocols can afford these systematic errors, but their existence still remains confirmed [73]. 32 3 MAGNETIC RESONANCE IMAGING Structural magnetic resonance imaging is based on the well known resonance effect discovered by Purcell and Bloch. Latest developments have increased signal to noise ratios to further levels, especially with ultra high field scanners. Therefore nowadays the use of structural imaging guarantees higher accuracy and robustness for segmentation then ever before. 3.1 the magnetic resonance effect Magnetic resonance is based upon the interaction between an applied magnetic field and a nucleus that possesses spin. Nuclear spin or, more precisely, nuclear spin angular momentum, is one of several intrinsic properties of an atom and its value depends on the precise atomic composition. Every element in the Periodic Table except argon and cerium has at least one naturally occurring isotope that possesses spin. Thus, in principle, nearly every element can be examined using MR, and the basic ideas of resonance absorption and relaxation are common to all of these elements. The precise details will vary from nucleus to nucleus and from system to system. The concept of nuclear magnetic resonance had its underpinnings with the discovery of the spin nature of the proton. Leaning on the works of the early 1920’s [74] and the developments yielded since late 1930’s [75], in 1946 Bloch and Purcell extended these early quantum mechanical concepts to a measurement of an effect of the precession of the spins around a magnetic field. Not only did they succeeded in the measure of a precessional signal from a water sample [76] and a paraffin sample [77] , respectively. Besides, they explained precociously many of the experimental and theoretical details that we continue 33 to draw from still today. For this work they shared the Nobel prize in physics in 1952. The basic elements of MRI can be summarized as follows [78]: a. Fundamental interaction between the proton spin and a magnetic field: how nuclei react to a magnetic field; b. Equilibrium alignment of spin: how magnetization and relaxation are coded; c. Magnetization detection: signal acquisition and retrieval; d. Imaging. In the following a detailed description is given, starting from the description of the fundamental interaction the resonance phenomen arises from: the spinmagnetic field coupling. 3.2 spin-magnetic field coupling Indeed much of MRI theory can be explained through classical analogies. In particular, looking at the interactions between a current loop and a magnetic ~ with current I in a region field; the d~F force experienced by a current loop dl ~ with a magnetic field B is given by the Lorentz force: ~ d~F = Id~l ∧ B (1) The loop can be rotated if a torque d~τ is generated by the forces according to: d~τ = ~r ∧ d~F (2) that in this case yields a straightforward definition for the magnetic moment ~µ: τ = r sinθ IlB = IΣ = ⇒ (3) ~µ = IΣûn where ûn represents the versor perpendicular to the loops. As a consequence the torque ~τ can be rewritten in terms of magnetic moment ~µ and magnetic ~ field B: ~ ~τ = ~µ ∧ B 34 (4) The equation (4) is exact for constant fields and also very accurate for small loops in a non uniform field if the loop principal dimension, say D, is much less than typical distances over which the field changes. (For example, |∆B| ' |∂B/∂x| D << |B|). Corrections would be needed for higher moments, such as electric quadrupole moments, but that is not the case of the proton, for whom higher moments are null. The net effect of the torque is on one hand to realign the magnetic moment ~µ ~ as (4) states; the same conclusion can be achieved in order to minimize to the B the potential energy: τ=− dU dθ (5) On the other hand, there is another fundamental effect that must be taken into account. The behavior that have been described so far can be analyzed from a quantum mechanics point of view, especially in term of the analogy between the classical angular momentum and the spin. The direct relationship between the magnetic moment and the spin is found by observing that the angular momentum ~L and the torque ~τ are related by: d~L = ~τ = ⇒ ~µ ∝ ~L dt (6) so that: ~µ = γ ~L (7) where γ is the gyromagnetic ratio and depends on the particle or nucleus. For the proton it is found to be γ = 2.675 × 102 MHz/T (8) or, it is often used the γ̄: γ̄ = γ = 42.58 MHz/T 2π (9) where T is the Tesla unit of magnetic field and is equal to 10000 Gauss. If we consider a circulating charged particle moving with respect to the center with velocity v, mass m and charge q at a distance r the resulting angular moment is L = rmv (in the perpendicular situation) so that the magnetic moment is: ~ = ~µ = IA qv q ~ πr2 ~un = L 2πr 2m (10) 35 and therefore: γ = q 2m (11) The equation still holds with respect of the analogy between classical and quantum mechanics. In particular for an electron being h the quantum unit of angular momentum, the following relation defines the Bohr magneton: µB = eh = 9.27 × 10−24 A m2 2me (12) A similar relation holds for every particle, proton included, in which case: µP = eh = 5.05 × 10−27 A m2 2mp (13) From the comparison of equations (11) (12) and (13) one can evaluate the difference between the electron and the proton gyromagnetic ratio: γe / γP = 658 (14) The difference between the observed value and the measured mass ratio (which is equal to 1836) is due to the difference in the structure of the two particles. This difference is the principal reason because proton are used in magnetic resonance imaging instead of electrons. For nuclei the situation is pretty much the same. The intrinsic angular momentum has to be non zero, and therefore no "even-even" nucleus can be used for magnetic resonance; for "odd-odd" nuclei only the unpaired neutron and proton contribute to nuclear spin, however in general γ remains within an order of magnitude or so of that for the proton. In conclusion the hydrogen nucleus remains the most useful both for energetic considerations due to its gyromagnetic ratio and both for its high concentration in the human body. The combination of ~ ~τ = ~µ ∧ B (15) ~µ = γ ~L (16) and completely determines the motion of the magnetic moment ~µ. 36 3.3 the motion equation The fundamental equation for describing the phenomena involved by the coupling between spins and a magnetic field can be derived from equations (4) and (7). According to these: d~µ ~ = γ~µ ∧ B dt (17) It is a simple version of the Bloch equation which will be discussed in the following, in that important corrections will arise from the interactions among spins and their surroundings, however it contains the theoretical core upon which the resonance phenomena are based. ~ field is readily solved. Among the several The equation (17) for a static B approaches that can be used, let us keep the classical analogy. There is, in fact, a direct correspondence between the equations for a magnetic moment immersed in a vertical constant magnetic field and a spinning top in a constant vertical gravitational field Fig. 4. ~ and mass m precessing Fig. 4: A symmetrical spinning top with angular velocity Ω in a constant gravitational field and in correspondence the precession of the angular momentum ~J. The angular momentum increment d~J involves a counterclockwise precession. Let us consider a magnetic field parallel to the z axis, let the magnetic moment ~µ be parallel to the ~rcm of Fig. 4; the differential change in the moment d~µ in time dt pushes ~µ on a clockwise precession as shown in Fig. 5. 37 ~ direction. Fig. 5: Clockwise precession of a spin around the magnetic field B ~ If φ is the angle between ~µ and B: |d~µ| = µ sinφ |dθ| (18) and ~ |d~µ| = γ|~µ ∧ B|dt = γµB sinφ dt (19) therefore by comparing these two relations : dθ ~ µsinφ |dθ| = γ|~µ ∧ B|dt = ⇒ ω ≡ dt (20) the fundamental formula defining the Larmor frequency ω can be obtained, even if in fact ω is an angular velocity in MRI terminology the frequency is preferred: ω = γB 38 (21) Equation (21) defines the angular velocity of the magnetic moment ~µ preces~ parallel sion, whose components motion is described by equation (17). For B to the z-axis: dµx dt = γµy B0 = ω0 µy dµy = −γµx B0 = −ω0 µx dt dµz = 0 dt (22) By taking the second derivatives, the well-known armonic oscillator equations are retrieved: 2 d µx = −ω20 µx dt2 2 d µy = −ω2 µ 0 y dt2 (23) The system solutions are finally: µx (t) = µx (0)cosω0 t + µy (0)sinω0 t µy (t) = µy (0)cosω0 t + µx (0)sinω0 t (24) µz (t) = µz (0) The fact that the motion of the spin in a constant magnetic field can be described in the plane orthogonal to the magnetic field suggests that a complex 2-dimensional representation could be useful. In order to describe the rotations in a lower dimensional representation let us introduce: µ+ (t) = µx (t) + iµy (t) (25) which yields: dµ+ = −iω0 µ+ = ⇒ µ+ (t) = µ+ (0)e−iω0 t dt (26) 39 While the amplitude remains constant, the phase varies over time. The staticfield solution for the phase is therefore: φ0 (t) = −ω0 t + φ0 (0) (27) For a static field in conclusion the interaction of a classical magnetic moment with an external magnetic field is equivalent to an instantaneous rotation of the moment about the field itself. 3.3.1 The Resonance The act of turning on a magnetic field for some time as a result make the spins to align to its direction; an additional field perpendicular to the first one on the contrary tips the spins away from that direction. Such rotations leave the magnetic moment precessing around the original magnetic field at the Larmor frequency ω. Let us consider a reference frame precessing with around the z direction with angular velocity Ω, the magnetic moments are at rest. In the laboratory inertial frame the total derivative of time of ~µ is: d~µ ~ ∧ ~µ = Ω (28) dt Therefore this relation between the rotating (primed) frame and the inertial one stands: 0 d~µ d~µ ~ ∧ mu = + Ω ~ (29) dt dt On the other hand by comparing equation (17): ~ = γ~µ ∧ B d~µ dt 0 ~ ∧ mu + Ω ~ = ⇒ 0 d~µ ~ −Ω ~ ∧ ~µ = γ~µ ∧ B dt ~ eff = γ~µ ∧ B (30) where ~ ~ eff = B ~ + Ω B γ 40 (31) is the effective magnetic field in the rotating frame. In the rotational frame it ~ = −Ω /γ then ~µ 0 is constant. Therefore is straightforward to prove that if B in the inertial laboratory frame ~µ rotates at a fixed angle with respect to the ~ As a consequence if B ~ 1 is a z direction and with fixed angular velocity Ω. radiofrequency magnetic field (called radiofrequency rf pulse) added to tip the proton spin from the z axis (with only transverse components, see equation ~ 0 is the static z oriented magnetic (30)) corresponding to a frequency ω1 and B field with Larmor frequency ω ~ 0: ~ eff = [ẑ 0 (ω0 − ω) + x̂ 0 ω1 ]/γ B (32) with ω being the frequency of a general rotating laboratory frame and x̂ 0 given by x̂ cosωt − ŷ 0 sinωt is an arbitrary rotating axis perpendicular to the z direction. The important result here is that in the rotating frame whose ~ 1 field is maximally frequency matches Larmor frequency (resonance condition) B synchronized to tip the spin around its axis d~µ dt 0 = ω1~µ ∧ x̂ 0 (33) ~ 1 field, the radiofrequency pulse, is applied for a time ∆t it causes a If the B rotation flip angle ∆φ: ∆φ = γB1 ∆t (34) It is important to note that B1 is the full rotating amplitude available for spin flipping. The more fundamental quantum picture allows only two spin states for any measurements along the static direction, however the use of a classical picture of a spin precessing at some angle is still appropriate in magnetic resonance because it can be shown that a two-state mixture of parallel and anti-parallel spins results in a continuous spectrum for the polarization angle φ. 3.4 magnetization and relaxation The interactions between the proton spin and the neighboring atoms has been neglected, however they lead to important modifications in the global behavior. To take into account the behavior of a volume element voxel let us consider a volume V: 41 • containing a large number of protons; • over which external fields can be considered constant • where the set of spins defines an ensemble of spins with the same phase. The magnetization is: ~ = 1 M V X ~µi (35) protons and for non interacting protons it satisfies an analogous equation of (17): ~ dM ~ ∧B ~ = γM dt (36) and therefore: dMz =0 dt ~⊥ M ~⊥ ∧ B ~ = γM dt (37) (38) These equations neglect the important fact that protons naturally tend to align with the external field through an exchange of energy (5). Let us assume that for the effect of radiofrequency pulse field magnetization is flipped from the static z axis. By returning to its original configuration two main effects show up: a. the magnetization z component reaches its equilibrium value M0 ; b. the transverse components must be vanishing. The main effect is therefore the presence of a relaxation phenomenon which can be measured in terms of relaxation times to quantify the interaction. 3.4.1 Relaxation times The equation which models the behavior described in the previous section is: dMz 1 = (M0 − Mz ) dt T1 42 (39) and its solution is: Mz (t) = Mz (0)et/T1 + M0 (1 − e−t/T1 ) (40) The time T1 is called spin-lattice relaxation time. It ranges from tens to thousands of milliseconds for protons in human being tissues (with external fields ∼ 10−2 T ), as shown in table 3: Tissue T1 (ms) T2 (ms) gray matter (GM) 950 100 white matter (WM) 600 80 muscle 900 50 cerebrospinal fluid (CSF) 4500 2200 fat 250 60 arterial blood 1200 200 venous blood 1200 100 Table 3: The relaxation times T1 and T2 for different types of human tissues. The time T2 of table 3 is the spin-spin relaxation time and takes into account the vanishing of transverse magnetization components. Since spins experience local field variations as a result they lost coherence in the xy plane, a process called dephasing, this causing a loss of transverse magnetization. The equation which models this effect is: ~⊥ dM ~⊥ ∧ B ~− 1M ~⊥ = γM dt T2 (41) ~ ⊥ which can be The additional terms involves an exponential decay for M straightforwardly calculated in the rotating reference frame where for modulus: dM⊥ 0 1 = − dt T2 (42) whose solution is: M⊥ (t) 0 = M⊥ (0) 0 e(−t/T2 ) (43) 43 In practice, there is an additional dephasing introduced by external field inhomogeneities which further reduce transverse magnetization. For this reason it is usual to refer to T2 and T2∗ . The latter being the sum of the two effects: the spin-spin interaction induced dephasing T2 and the external field induced dephasing T20 . It is important to keep in mind that while T20 is recoverable which means that it is able to dominate the entire effect allowing then to recover initial phase relationship, intrinsic time T2 is not, being these losses in transverse magnetization completely random. 3.5 the bloch equation The differential equations (39) and (41) can be combined into one vector differential equation, the Bloch Equation: ~ dM ~ ∧ B ~ + 1 (M0 − Mz )ẑ − 1 M ~⊥ = γM dt T1 T2 (44) ~ = B0 ẑ the component equations become: If B dMz M0 − Mz = dt T1 dM Mx x = ω0 My − dt T2 dMy My = −ω0 Mx − dt T2 (45) whose solutions are: Mz (t) = Mz (0)e−t/T1 + M0 (1 − e−t/T1 ) Mx (t) = e−t/T2 (Mx (0)cosω0 t + My (0)sinω0 t) M (t) = e−t/T2 (M (0)cosω t − M (0)sinω t) y y x 0 0 For t → ∞ all exponentials vanish and the solution is: 44 (46) Mz (∞) = M0 (47) Mx (∞) = My (∞) = 0 (48) If the external field is (as it is in real cases) the sum of an external static field ~ 1 the Bloch equation B0 ẑ and a much smaller radiofrequency orthogonal field B ~ 1 is at rest. The effective field in this can be solved in a rotating frame where B frame is: ~ eff B ω = B0 − ẑ + B1 x̂ 0 γ (49) and the component equations are: M0 − Mz dMz 0 = −ω1 My 0 + dt T1 dMx 0 0 Mx 0 = ∆ωMy 0 − dt T2 dMy 0 0 Mx 0 = ∆ωMx 0 + ω1 Mz − dt T2 (50) with ∆ ≡ ω0 − ω where ω1 is the spin precession frequency due to the rf pulse, ω is the rotating frame frequency, as usual it is in general different from Larmor frequency, and ω0 is the Larmor frequency induced by the static field. In the resonance condition ∆ω should be neglected, in fact it is usual to keep it in the equations to take into account deviations from ideal situations such as static field impurities or variations. These equations model the already described behavior according to whom ~ 1 direction of transverse components is suon-resonance precession around B perimposed on the relaxation decay. 3.6 signal acquisition The signal detection from magnetic resonance is based on Faraday’s law of electromagnetic induction: = − dΦ dt (51) 45 where Φ is the flux of the magnetic field through a coil and the inducted electromotive force (emf). The equation (51) is generally referred to an emf generated by variations of the magnetic flux, however it can be converted into a form more useful for MRI where roles are reversed. ~ r, t) induces a density of current ~Jm (~r, t): The magnetization M(~ ~Jm (~r, t) = ∇ ~ r, t) ~ × M(~ (52) ~ and therefore of a magnetic field which is the source of a potential vector A ~ B: ~ r, t) = µ0 A(~ 4π ~ = ∇ ~ ~ × A B Z d3 r 0 ~J(~r, t) |~r − ~r 0 | (53) (54) and therefore using Stokes theorem: Z ~ · d~S ~ × A) (∇ Φ= IS ~ d~l · A = l I ! Z 0 × M(~ 0) ~ ~ µ ∇ r 0 d3 r 0 = d~l · 4π |~r − ~r 0 | l V Z I µ0 1 3 0 0 0 ~ r) ~ × = d r d~l · −∇ × M(~ 4π V |~r − ~r 0 | l " !# Z I ~ µ0 dl ~ r 0) · ∇ ~0 × = d3 r 0 M(~ 4π V r − ~r 0 | l |~ Z ~ 0 ~ r 0 ) · (∇ ~ 0 × A(~r )) = d3 r 0 M(~ I V ⇒ Z ~ coil (~r 0 ) ~ r 0) · B Φ= d3 r 0 M(~ (55) V The latter equation shows explicitly the Principle of Reciprocity: ~ 0 ~ coil (~r 0 ) = B(~r ) B I 46 (56) As a consequence the flux through a coil induced by a magnetization source ~ r, t) is the same that would be produced by the magnetic field per unit M(~ ~ coil (~r 0 ) associated with the coil. Mathematically it is worthwhile to current B note that the original integral over the coil surface is in the end substituted by a volumetric integral over the magnetized sample region. The emf deduced by equation (55) is: d d = − Φ(t) = − dt dt Z ~ coil (~r 0 ) ~ r 0) · B d3 rM(~ (57) V Since the signal is assumed to be detected in presence of a static B0 ẑ field and a rf pulse, the signal detected depends on the magnetization components and on their relaxation times. In particular using notations of equation (25) for longitudinal and transverse magnetization: Mz (~r, t) = e−t/T1 (~r) Mz (~r, 0) + (1 − e−t/T1 (~r) )M0 (58) M+ (~r, t) = e−t/T2 (~r) e−iω0 t M+ (~r, 0) = et/T2 (~r) e−iω0 t+iφ(~r) M⊥ (~r, 0) (59) The phase φ0 and the amplitude M⊥ (~r, 0) are determined by the initial rf pulse conditions. The Larmor frequency ω0 with a static field at the Tesla level for protons is at least of four orders of magnitude larger than typical values 1/T1 and 1/T2 therefore the exponentials containing the relaxation times can be neglected in the derivative. By considering the more general case in which the presence of field inhomogeneities or deviations from ideal case must be taken into account with the replacement T2 → T2∗ , the signal s is: Z h ∗ d3 r e−t/T2 (~r) M⊥ (~r, 0) Bcoil r)sin(ω0 t − φ0 (~r)) + x (~ i Bcoil (~ r)cos(ω t − φ (~ r)) 0 0 y s ∼ ω0 (60) A simplified version of the equation (60) can be obtained through the introduction of: Bx coil ≡ B⊥ cosφB By coil ≡ B⊥ sinφB (61) 47 finally the signal is: Z s ∼ ω0 ∗ d3 r e−t/T2 (~r) M⊥ (~r, 0)B⊥ (~r)sin(ω0 t + θB~r − θ0 (~r)) (62) The equation (62) has general validity, in fact the only correction which has been neglected depends on the presence of possible time-independent (or timeaveraged) variations in the z direction, in that case ω0 would not be constant but an ω(~r) should be included. 3.7 image noise and contrast 3.7.1 Signal to noise ratio and spatial resolution All physical measurements involve either random or systematic noise. The quantitative measure for the noise affecting a signal s is given by the signal-tonoise ratio (SNR). SNR = s σ (63) where σ is an estimate of the noise affecting the signal. In MRI the goal is to localize as accurately as possible the signal and to be able to discriminate among the tissue types. As we have seen in equation (60) the signal is directly proportional to the magnetization and therefore (but not only) to the spin density of the sample. In a 3D image the voxel intensity summarize this information in a gray scale, or a probability map. It is important that in every voxel an adequate SNR is achieved. The principal noise to take into account is the thermal gaussian noise causing uncertainties in the coil and in the sample: σthermal ∼ 4kT Rβw (64) where R is the effective resistance of the coil, the body and the electrons and βw is the bandwidth of the noise-voltage detecting system. Of course this kind of noise can be reduced by averaging over several acquisitions: p ∆x ∆y ∆z Nacq s SNR/voxel ∼ βw N x Ny Nz 48 (65) being Ni the sampled point in the image space, ∆i window dimensions for signal reconstruction and Nacq the number of acquisitions, more in general the field of view (FOV) Li = ∆i · Ni is introduced. As a consequence improving resolution (∆x → 1/2∆x) lowers the SNR. On the contrary lowering resolution could be tolerated because this would result in several benefits: • artifacts reduction; • overcoming field inhomogeneities; • reducing relaxation effects (which weaken the signal). In clinical applications the patient comfort is important, this is why it is preferred lowering the window sizes and consequently having images with less resolution but higher SNR and especially lower acquisition time (which is linearly dependent on the window sizes). 3.7.2 Contrast: Proton density, T1 and T2 weighting Even the highest SNR is not a guarantee for the image to be useful. It is important in fact for clinical purposes to be able to distinguish different tissues. The problem of distinguishing different signals in presence of noise falls under the broad category of signal detection problems. To this aim a fundamental concept is that of contrast-to-noise ratio (CNR). In the case of MRI, if two tissues correspond to signals sA and sB their contrast C is by definition: C ≡ sA − sB (66) and therefore the (CNR) is: CNR = CAB = SNRA − SNRB (67) The most basic contrast mechanisms in MRI derive from three different physical parameters: a. the spin density ρ0 ; b. the T1 relaxation time; c. the T2 relaxation time. 49 The physical law describing this behavior is: CAB = ρ0,A (1 − e−TR /T1A )e −TE /T2∗ A − ρ0,B (1 − e−TR /T1B )e −TE /T2∗ B (68) where TE is the echo time which is by definition the time spins need to return to the initial phase after a rf pulse; TR is the repetition time: it determines the number of acquisitions for averaging the signal and for the actual purpose more importantly it determines the amount of regrowth for longitudinal magnetization. Proton Density imaging is based on spin density. In this case TE and TR must be minimized, appropriate choices are: ∗ ⇒ eTE /T2 → 1 TE T2∗A,B = (69) TR T1A,B = ⇒ eTR /T1 → 0 (70) = ⇒ CAB = (ρ0,A − ρ0,B ) − ρ0,A (e−TR /T1A + TE ) T2∗A + O(TE ) + O(TR ) ' ρ0,A − ρ0,B In this approximation the contrast does not depend on TR or TE , this gives a general rule for spin density weighting, all it is needed is to keep TR much longer than T1 and TE shorter than the shortest value of T2∗A,B . Normal soft tissue T1 values are quite different from one another. For this reason T1 -weighted images are a powerful method to individuate different tissues. As seen with (69) a short TE minimizes the T2∗ effects. As a consequence the contrast CA B is: CA B = SA (TE ) − SB (TE ) ' ρ0,A (1 − e−TR /T1,A ) − ρ0,B (1 − e−TR /T1,B ) = (ρ0,A − ρ0,B ) − (ρ0,A e−TR /T1,A − ρ0,B e−TR /T1,B ) 50 (71) When several tissues are present it is useful to adopt two different values for TR . The optimal value can be obtained graphically by plotting the expression for CA B as a function of TR . As an example the contrast for gray matter versus white matter and cerebrospinal fluid is shown respectively in Fig. 6 and Fig. 7. 0.18 0.16 0.14 contrast 0.12 0.1 0.08 0.06 0.04 0.02 0 −0.02 0 500 1000 1500 2000 2500 3000 3500 4000 repetion time (ms) Fig. 6: The figure shows the contrast as a function of the relaxation time for gray matter and white matter. 0.3 contrast 0.2 0.1 0 −0.1 −0.2 0 2000 4000 6000 8000 10000 12000 14000 16000 repetition time (ms) Fig. 7: This figure shows the contrast as a function of the relaxation time for gray matter and cerebrospinal fluid; it is important how contrast changes according to the examined tissues. 51 The last contrast mechanism relies on transverse decay. In the T2 -weighting case to avoid T1 contribution TR is chosen according to (70), therefore: ∗ ∗ CA B = ρ0,A e−TE /T2,A − ρ0,B e−TE /T2,B (72) As in T1 case, a finer tuning of signal acquisition is can be achieved by combining two different values for TE , the different results for contrast are shown in Fig. 8 and Fig. 9. 0.15 contrast 0.1 0.05 0 0 100 200 300 400 500 echo time (ms) Fig. 8: The figure shows the contrast as a function of the echo time for gray matter and white matter. The general appearance of the three weighting types is shown in figure 10: It is worthwhile to note that proton density images are in general interpreted as a count of the proton number in tissues, this is false for CSF where T1 is larger than TR (4.5s vs 2.5s); T1 and T2 weighted appear to have reversed intensities, in particular CSF is brighter in T2 images because of its long transverse relaxation time. In conclusion let us stress how magnetic resonance scanners with high (3T) and ultra-high (7T) magnetic fields have recently become widespread. The introduction of such high field scanners has allowed to increase the image signal-to-noise ratio and to extend the boundaries of spatial resolution and sensitivity, thus improving image processing algorithms especially dedicated to structural anlayses. 52 0.8 0.7 contrast 0.6 0.5 0.4 0.3 0.2 0.1 0 0 50 100 150 200 250 300 350 400 450 echo time (ms) Fig. 9: This figure shows the contrast as a function of the echo time for gray matter and cerebrospinal fluid. Fig. 10: The figure shows a comparison among (in clockwise order): a Proton density, a T1 -weighted and a T2 -weighted brain scan from ICBM. 3.8 segmentation algorithms: state of the art The accurate and robust segmentation of anatomical structures is an essential step in quantitative brain magnetic resonance imaging analysis. Many clinical 53 applications rely on the segmentation of MRI scans which allows to describe how brain anatomy changes over time, by aging or disease. However, manual labeling by clinical experts is a laborious time consuming task and, above all, it is subject to inter and intra rater variability. That is why automatic techniques are desirable to enable a routine analysis of brain MRIs in clinical use, especially for the hippocampal segmentation. Despite the large number of existing techniques and their combinations as probabilistic and multi-atlas segmentations [79, 80, 81], graph cuts [82, 83], label fusion [84], voxel-based classifications [85, 86] and warping patch-based methods [87, 88], it still remains a challenging task to develop a fast and accurate fully automated segmentation method. Among them, atlas-based methods have been shown to outperform other state-of-the-art algorithms in terms of similarity index k (k > 0.88) and other error metrics [88, 89, 83]. These methods rely on the registration of the target image with one or several templates, the segmentation is achieved by warping these templates onto the target and then finally labeling voxel belonging or not to the Hippocampus according to different possible strategies as label fusion, voxel classification, similarity measures, etc. Nevertheless, segmentation errors produced by atlas-based methods have been proved to be both random and systematic [81, 80]. Random errors derive mainly from image noise in acquisition and subject biological variability while systematic errors occur consistently as the disagreement between manual labeling and automatic segmentations derives from differences in the segmentation protocols [90]. For example, a manual segmentation protocol may follow a specific anatomical criterion to assign labels to different voxels and the automatic method could otherwise rely on a slightly different criterion thus yielding a systematic labeling error. However, multiatlas segmentations are known to introduce a spatial bias [91] and another important issue for multi-atlas based segmentation methods is the computational burden. 54 4 T H E H I P P O C A M P U S S E G M E N TAT I O N The hippocampal segmentation can be achieved through a combined strategy using both shape and voxel based information. A priori shape constraints individuate thinner regions of interests where supervised learning techniques provide robust voxelwise classifications. The implementation of strategies employing distributed infrastructures makes the segmentation task no more computationally challenging. 4.1 a combined strategy for segmentation The Hippocampus is primary involved in the pathogenesis of a number of conditions, firstly the Alzheimer’s disease. As described in previous chapters, its segmentation can play a fundamental role in early diagnosis, clinical trials and protocol assessments of several neurodegenerative diseases. Nevertheless, segmentation issues deriving from acquisition noise and the complexity itself of this anatomical structure made hippocampal segmentation a challenging task. Until recent years the segmentation of the Hippocampus, i. e. its identification and separation from surrounding brain structures, was performed mainly manually or with semi-automated techniques, followed by manual editing. This is obviously time-consuming and subject to investigator variability, so a number of automated segmentation methods have been developed. These have relied so far mainly on image intensity, often adopting multi-atlas registration approaches, in order to minimize errors due to individual anatomical variations. More recently, though, a number of methods that exploit shape information have been developed, based on preliminary work carried out in the nineties with the active shape models and the active appearance models 55 [92, 93]. These models address the issue of identifying objects of a known shape in a digital image, when such shape is characterized by a certain degree of variability, as in the case of anatomical structures. Alternative methods have used strategies based on principal component analysis, deformable representations, diffeomorphic mappings and Bayesian frameworks [94, 95, 96, 97]. In recent years a number of studies have used probabilistic tree frames for brain segmentation, in some cases adopting specific models such as Markovian random fields or graph cuts [98, 99, 100]. These studies have encouraged so far the exploration of several machine learning techniques to address the hippocampal segmentation. The method presented in this work makes use of a dedicated classifier to label voxels belonging or not to the Hippocampus. In this work a combined strategy has been developed to tackle the main issues involved in hippocampal segmentation: • Shape analysis; • Intensity-based segmentation; • Distributed computing implementation. 4.2 why an automated segmentation? The hippocampal segmentation is one of the most challenging issues in the scenery of medical image processing. But this challenge is no longer just a matter of image processing. The large amount of data and the technical developments have brought to a substantial improvement of the standard quality (mainly) provided by magnetic resonances, however distributed computing resources as far as we know have not been successfully involved in this research field even if computational times, storage issues and computer resources have been proved to be formidable obstacles to neuroimaging development in the last years. According to this reason the proper individuation of small and not well defined (in intensity) structures as the Hippocampus has become a less difficult task for experts; nevertheless manual segmentation is time-consuming, expensive and not completely reliable, being strongly dependent on the human expertise. These features prevent larger diagnosis programs to be scheduled and even in the case they would undermine the diagnosis certainties: on one hand because of costs that larger programs would require, on the other hand because even if these programs could be afforded the results should face the unreliability of the manual segmentation methods. 56 A fully automatic segmentation procedure would allow medical structures to obtain fast and economic diagnosis tools; besides, an automatic procedure would guarantee the standardization of the segmentation which would be no longer dependent on the intrinsic variabilities involved by the use of human experts. Moreover, the possibility to take advantage from distributed computing infrastructures would allow to perform computational intensive analysis on large scales in a fast and efficient way. Fig. 11: Flow chart of the segmentation method, according to the following steps: 1) volume of interest extraction, 2) determination of voxel features and 3) voxel classification. The learning phase is represented in detail in the classification box, while in red input and output data. In Fig. 11 a synthetic overview of the segmentation pipeline is shown. Following sections will be dealt with a more detailed description of these strategies, in the first place focus will be given to database properties. 4.3 materials: database properties The research activity discussed in this work is primarily based on a set of real medical images consisting of 56 T1 -weighted whole brain magnetic resonance scans and the corresponding manually segmented bilateral hippocampi (masks). Data comes from the Laboratory of Epidemiology and Neuroimaging, IRCCS San Giovanni di Dio - FBF in Brescia (Italy). Scans were performed on 57 healthy subjects of different sex and age as well on subjects affected mainly by Alzheimer’s disease, mild cognitive impairment or subjective memory complaints. All images were acquired on a 1.0 T scanner according to the following parameters: gradient echo 3D technique, (repetition time) TR = 20 ms, (echo time) TE = 5 ms, flip angle = 30o , field of view = 220 mm, acquisition matrix of 256 × 256 and contiguous slice thickness of 1.3 mm resulting in images with 181 × 145 × 181 overall dimensions [101]. For manual segmentation, the images were automatically resampled through an algorithm included in the MINC package 1 and normalized to the Colin27 template 2 with a voxel size of 1.00 × 1.50 × 1.00 mm3 . When automated registration failed, manual registration was performed, based on 11 anatomical landmarks. Manual hippocampal segmentations were performed on contiguous coronal brain sections by a single individual blind to diagnosis using the software Display 1.3 3 , following the protocol defined by Pruessner et al. [66]. The protocol mandates the scans to be acquired with the three dimensional gradient technique and consequently to correct the non uniformity of the scans and to register them into standard stereotaxic space prior to segmentation. The Hippocampus is a bilaminar formed and symmetrically located structure, the manual tracing in coronal plane has to be preferred to sagittal and axial orientations to preserve convexity properties of the shape, however the other views can be used whenever these are felt to prove more valuable for boundary detection. Hippocampus manual segmentation protocol prescribes also a number of rules of thumb which take into account the brain morphology and the complexity of the neighbor medial temporal lobe structures. For example the Hippocampus was defined to include the dentate gyrus, the cornu ammonis regions, the part of the fasciolar gyrus that is adjacent to the cornu ammonis regions, the alveus and the fimbria. The Andreas-Retzius gyrus, the part of the fasciolar gyrus which is adjacent to this gyrus, and the crus of the fornix were omitted from the Hippocampus. As a consequence a consistent segmentation of the Hippocampus should exclude arbitrarily a number of pixels in the region where the hippocampal tail is adjacent to the Andreas-Retzius gyrus which are indistinguishable since they both appear as gray matter in T1 scans. The same situation can be found in several other hippocampal regions, such as where the Hippocampus is attached to the trigone of the lateral ventricle or where its boundaries literally fade into the amygdala’s ones. 1 www.bic.mni.mcgill.ca/software 2 www.bic.mni.mcgill.ca/ServicesAtlases/Colin27 3 www.bic.mni.mcgill.ca/ServicesSoftwareVisualization/Display 58 This data was used to train and test in a cross-validation framework the segmentation workflow and the distributed environment. To validate results, however, a second database was used. This second dataset without available manual labellings was downloaded from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) 4 . This data consisted of a random sample of 1.5T1 scanner from 456 subjects having 4 different time acquisitions: screening, repeat, month 12 and month 24, where screening and repeat scans where acquired almost simultaneously, the other scans were acquired respectively 12 and 24 months later. A further discussion of these characteristics will be provided in the Chap. 5. This data is characterized by high variability, different scanner protocols are used, in particular those from General Electric Healthcare, Philips Medical Systems and Siemens Medical Solutions. ADNI collected data include also high field scans at 3.0 T, but with the aim of providing a clinical validation to the proposed segmentation workflow (which is trained on 1.0 T scans), only 1.5 T scans were used. It is important to underline that this second dataset provided also sex, age and clinical information. In particular among the 456 downloaded scans there were 94 AD patients and 217 MCI subjects. In this way, segmented hippocampal volumes were used to provide a diagnosis and an evaluation of the hippocampal volume as an AD biomarker was performed. 4.4 preprocessing, automated registration A preliminary processing is necessary to align images so that corresponding features can easily be related, by definition this is what is commonly intended for image registration. In the last 25 years remarkable progresses in this field have been accomplished, huge investments both by universities and industry have been funded. The main reason is that image registration has evolved from being considered as a minor precursor to some medical imaging applications to a significant subdiscipline in itslef [102]. Why has registration become so important is the necessity for medical imaging to establish correspondences of spatial information in images and equivalent body structures. In many clinical scenarios, images from several modalities may be acquired and the diagnostician’s task is to mentally combine or "fuse" this information to draw useful clinical conclusions. However, this task is naturally time-consuming and therefore international concerns about health-care costs drive development of automated meth4 http://adni.loni.usc.edu 59 ods to make the best possible use of medical images. In this sense automated registration is a condicio sine qua non for medical image processing. 4.4.1 Registration Methodology As previously stated registration is a subdiscipline of image processing which has a wide range of applications: a. Combining information from multiple imaging modalities; b. Monitoring changes in size, shape or image intensities over time; c. Relating preoperative images and surgical plans; d. Relating an individual’s anatomy to a standardized atlas. In this work, related to hippocampal segmentation and its volumetric changes according to pathological conditions, primarily the Alzheimer’s disease, the registration is mainly used to monitor anatomical changes with respect of a standardized template. To be effective, these tasks require the establishment of spatial correspondence. The process of image registration aims to fulfill this requirement finding the mappings that relate the spatial information conveyed in one image to those in another or in physical space, namely a template reference. The degrees of freedom needed to describe a transformation depend on the type of transformation itself and the data type. The main possible transformations for structural magnetic resonances are the rigid, the affine and the deformable transformations, while the data type is manly defined in terms of the acquisition source (MRI, PET, ...) or the dimensionality (2-D or 3-D). 4.4.2 Rigid transformations The spatial mapping between two 3-D magnetic resonance images is in general defined as a function T which maps a vector ~x representing the spatial coordinates of a point in a new reference frame coordinates ~x 0 : T : ~x ∈ R3 7→ ~x 0 (73) In general the input data is also referred as moving space/image whilst the arrival space is called target space/image. The registration is an optimization 60 problem, in the sense that the process itself stops when the moving image and the target image reach the best possible match. Obviously, this require the introduction of a suitable metric to measure the matching, in general metrics based on the squared sum of errors or correlation measures are adopted. As an optimization problem, registration can be highly computing intensive, this is why an initial guess or an initial setting is usually adopted. In a fully automated registration framework, the best initial guess is obtained through a rigid registration: ~x 0 = Trigid (~x) = R(~x) + ~t (74) where R is a real orthogonal matrix, meaning that: RT R = RRT = I (75) with I the identity matrix and ~t a translation vector. Matrices as those defined by equation 75 have determinant det(R) = ±1, and correspond to proper and improper rotations (the latter are no properly rotations in the sense that they can be obtained only by the combination of a rotation and a reflection); however improper rotations can be eliminated requiring det(R) = +1. As a consequence Trigid has six degrees of freedom which can be interpreted as the three components of translation ~t and three Euler angles : α, β, γ which uniquely define a proper rotation. The intensity information can be used to calculate the images center of mass or the momentum distributions. In the first case the moving and the target images are registered shifting the moving image to align the centers of mass while in the second the first moments are aligned. In general this second procedure is used, because of its generality. In particular, because it does not make any assumption about the physical origins of the moving and target images. Once images are rigidly registered, a second registration is usually performed to achieve a better match. Several options can be explored, however for structural data the most common choice is for a non rigid transformation. 4.4.3 Non rigid transformations The simplest non rigid transformation, the scaling transformation can be obtained from a rigid one with the introduction of a scaling parameter: 61 ~x 0 = T(~x) = RS(~x) + ~t (76) where S is a diagonal matrix whose non null elements sx , sy , sz represent the scale factors along the different coordinate axes: sx 0 0 S = 0 sy 0 0 0 sz If the scaling is isotropic then sx = sy = sz , however in general anisotropic solutions are explored and give the best results. The scaling transformations are a special case of affine transformations Taffine : ~x 0 = Taffine (~x) = A(~x) + ~t (77) The real matrix A, as a difference with rotations R, has no restrictions on the elements aij . The affine transformation preserves the straightness of lines, and hence, the planarity of surfaces; it preserves parallelism, but it allows angles between line to change. The affine transformations are frequently represented in homogeneous coordinates: a11 a12 a13 t1 a21 a22 a23 t2 A= a31 a32 a33 t3 0 0 0 1 (78) Both rigid and non rigid transformations can be used to register a set of images, several different algorithms have been proposed so far and in particular for 3-D boundaries. The first works on the field where strictly connected with the idea that registration should have been point wise. According to this approach, registration problem could be classified as a distance optimization problem. The main reason behind this approach was the lack of computing power and facilities to tackle the registration problem with a global approach. However, in recent years, intensity based registration methods, which rely on the whole image content, have obtained wide spread use. 62 4.4.4 Intensity based registration Intensity based registration involves calculating a transformation between the moving M and the target T images using the voxel values alone. No need for landmarks and no need for a priori intervention make this approach suitable for automated algorithms. In its purest form, intensity based registration is determined by iteratively optimizing a similarity measure calculated from all voxel intensities M(i) and T (i). The effect of an affine transformation T is to iteratively modify M(i) until a cost function, defined through the similarity metric adopted, reaches a minimum. It is worthwhile to note that the metric choice plays a fundamental role to achieve satisfactory results. 4.4.5 Similarity Measures Let us suppose that the moving image M and the target image T of voxel size N are identical except for the misalignment. An intuitively obvious similarity measure would then be the sum of squared differences. In this case, perfect registration would return a null result. The sum of squared differences 1 X |T (i) − T[M(i)]|2 ∀i ∈ T ∩ T(M) N N (79) i is the optimum measure even if the two images M and T differ only by Gaussian noise [103]. Certain image registration problems are reasonably close to this ideal case. For example, in serial registration of structural MR images, it is expected that the aligned images are identical except for small changes, resulting from disease progression or inter subject variability. Similarly, in functional imaging, as for example functional MRI, only a small number of voxels is expected to be change during the study, so images can be registered under the assumption to be pretty much identical. In the end, the sum of squared differences is likely to work well as similarity metric in all those cases where moving and target images are supposed to be different only for a small fraction of voxels. However, in several cases the previous assumption can be no longer considered correct. For example, when a small number of voxels change intensity by a large amount passing from the moving to the target image, this can result in a poor registration. Another important issue comes from the fact that medical images, especially magnetic resonances, have as noise distribution a the Rician 63 behavior [104], so that the Gaussian assumption does not hold. This is why for magnetic resonances a preferable choice consists in adopting the correlation coefficient r P [T (i)− < T >][T(M(i))− < T(M(i)) >] r= P i P 1 { i [T (i)− < T >]2 i [T(M(i))− < T(M(i)) >]2 } 2 ∀i ∈ T ∩ T(M) (80) where the sums are intended over every voxel i in the overlap region of the moving and the target images. Nevertheless, the choice of a suitable metric for similarity does not exhaust the different settings characterizing the registration task. In fact, unless a robust optimization procedure is established, no guarantee of a correct registration can be given. 4.4.6 Optimization algorithms for registration In many optimization problems it is desirable to determine the globally optimal solution. This is not true for image registration, an example can be parenthetical. Let us consider two images characterized by wide low intensity noisy regions as two heads surrounded by acquisition noise, then the sum of squared differences is minimized aligning the noisy regions rather than aligning regions significant for the analysis. The fact that the desired optimum is local rather than global does not invalidate the use of voxel similarity measures for registration, on the contrary it has, though, implications for robust implementations. The correct local optimum value can be found provided that a starting estimate T is given and that this initial guess is not "so far" from the desired local minimum. It is worthwhile to note that what "not so far" really means is that the initial guess must be closer to the desired minimum than to the background alignment configuration, however this can be easily tackled by using a rigid registration as initial guess and imposing this registration to match physical conditions as the alignment of the centers of mass of the images or of the momentum distributions. Once images are registered to a template or a reference atlas they all share common features that can be therefore analyzed in a statistical framework. In particular, registration is a necessary prerequisite for a robust segmentation and a robust statistical shape analysis. 64 4.5 shape analysis Segmentation algorithms are naturally based on intensity analysis, however this information in the case of the Hippocampus is not sufficient to achieve a consistent segmentation, as discussed in the previous section. This is why a shape model for the Hippocampus has been built. We developed a fully automatic point distribution model (FAPoD) [105] which individuates the shape of an object intended as a collection of a fixed number of points. The model implementation has been studied on a toy model of simulated images and then tested on the real magnetic resonances dataset, by far a comparison with a standard tool for shape analysis i.e. the spherical harmonic framework (SPHARM) [106, 107, 108] was performed. 4.5.1 SPHARM analysis First of all, MR brain scans were standardized in terms of intensity and spatial coordinates [109]. Hippocampal boxes, i.e. bounding boxes containing the Hippocampus and the para-hippocampal region were extracted according to Calvini et al. [110]: MRIs are registered to the stereotactic space (ICBM152) and a putative hippocampal region is individuated by an atlas or it is manually traced. This region of 30 × 70 × 30 voxels is very small if compared with the whole brain scans, but it does contain the Hippocampus. It becomes the region of interest (ROI) for further analyses. The same operations are then performed on the corresponding manually traced segmentations, which are the object of the shape model construction. The masks are represented through a mesh representation which is projected onto a unit sphere surface ( Fig. 12). This results in a vectorial bijective mapping ~F between each contour voxel and a point ν with spherical coordinates θ and φ: ν(θ, φ) = (Fx (θ, φ), Fy (θ, φ), Fz (θ, φ) The object surface can then be described by a complete set of spherical harmonic basis functions Ylm , where Ylm denotes the spherical harmonic of degree l and order m. The expansion takes the form: ν(θ, φ) = ∞ X l X m cm l Yl (θ, φ) (81) l=0 m=−l m m (cm lx , cly , clz ). where cm These coefficients can be estimated up to a desired l = degree by solving a set of linear equations and the object surface can therefore 65 Fig. 12: The figure shows a mesh representation compared with its projection onto a unit sphere; the colors are just a figurative representation for different subhippocampal regions. be reconstructed using these coefficients; the more the number of coefficients used, the more the accuracy of the model (Fig. 13). 4.5.2 SPHARM description The main idea behind SPHARM workflow relies on the parametrization of a connected surface onto a unit sphere. This problem can be seen as an optimization problem whose variables are the object vertexes coordinates. There are three constraints: a. The Euclidean norm of the coordinates must be 1 for every vertex; b. Area must be preserved, so that in the spheric case any object region must be mapped onto a proportional spheric area; 66 Fig. 13: Comparison of hippocampal shapes reconstructed with a different number of coefficients cm l : 1 coefficient, 5 coefficients, 10 coefficients and 15 coefficients. As expected, the more the number of coefficients, the more the details the model is able to capture. c. No spheric region is allowed to be defined by angles which are negative or exceed π. The goal of the optimization is to minimize the distortion of the surface net in the mapping. This is perfectly achieved when every voxel facet is mapped onto a "spherical square" [106], however this can be achieved only in a special case, i.e. a single voxel image. The variables of the optimization are the vertexes coordinates, in a spherical geometry these are the latitude θ and the longitude φ. The model for the mathematical description is derived from the physical heat equation for diffusion. The north and the south pole latitudes of the unit sphere are considered as temperatures: 67 ∇2 θ = 0 ∇2 φ = 0 (82) θnorth = 0 θsouth = π (83) φnorth = φsouth = 0 (84) The results of this parametrization has already been shown in Fig. 12. Once coordinates θ and φ are optimized, the surface ~S of the object can be explicitly defined: x(θ, φ) ~S(θ, φ) = y(θ, φ) (85) z(θ, φ) The 3-D shape can be expanded into a complete set of basis functions. By using the spherical harmonics Ylm , s Ylm (θ, φ) = 2l + 1(1 − m)! m pl (cosθ)eimφ 4π(l + m) (86) where pm l (cosθ) are the associated Legendre polynomials defined by: Plm (x) = (−1)m dl+m 2 2 m 2 (1 + x ) (x − 1)l (2l l!) dxl+m (87) the model results to be defined up to a desired number of complex coefficients cm l , so that: ~ φ) = S(θ, ∞ X l X m cm l Yl (θ, φ) (88) l=0 m=−l It is worthwhile to note that this procedure defines the parameter coordinates only for the vertexes, therefore an interpolation function must be used between the sample points; this introduces an artificial sub-voxel resolution that has no base in the input data. However, the discretization of the process yields an affordable effort in terms of computational time and a post-processing minimization error is performed in order to make the results the more accurate it is possible. Nevertheless, the SPHARM framework has become of widespread use for several reasons: 68 a. It is able to deal with surfaces of arbitrary connected shapes; b. Parametrization allows to take into account both local and global variations; c. It is always possible to achieve a unique solution for a shape model (except rotation); d. The parametrization preserves areas and minimize local distortions. Besides, another main reason for the SPHARM adoption is the capability to take into account shape variability and describe them in terms of the harmonic functions; the numerical coefficients can be interpreted as shape descriptors so that their statistical behavior can be easily used to create statistical models. This is crucial for medical applications where no dataset can be considered complete, in a mathematical sense, and therefore every model trained on restricted data is required to possess the ability to reproduce, describe or foresight the behavior of data the SPHARM algorithm is not trained onto. This can be achieved by considering the coefficients cm l and their distribution. Therefore a mean value and a standard deviation is calculated, every shape of the model can be retrieved starting by the linear combination of these values. The main drawback is the high complexity of the model and the difficulty to have a straightforward interpretation of the coefficient variability. On the contrary it would be greatly useful to have a sound interpretation, for example it could be useful to know that the hippocampal region with higher shape variability is the head, an information which in this framework is not easily inferred. 4.5.3 The SPHARM average shape algorithm The training set manually labeled masks have been firstly topologically checked in order to remove little imperfections, such as holes or protrusions. In the following Fig. 14 an example is shown. To keep these masks as close as possible to the original manual labellings, only protrusions or holes of dimensions inferior to = 1.5 voxels have been fixed. In this way manual labellings have a connected topology which allows a sperical harmonic representation. Then masks have been decomposed with a fixed number c = 15 of harmonic coefficients (a value suggested for many applications). As it has been already shown in Fig. 13 this number of coefficients is sufficient for a detailed modeling of the hippocampal shapes. 69 Fig. 14: Comparison of hippocampal shapes with thee original no fixed (left) and fixed (right) topology. In the left figure, some isolated voxels are clearly visible. Once hippocampal shapes are parametrized, an alignment process takes place. This process is in general necessary to overcome the natural misalignment caused by inter-subjects variability or acquisition issues; in our case, dataset has already undergone a fine registration process, therefore only a gross first order ellipsoid (FOE) alignment is performed. This procedures is based on the registration of the FOEs (already shown in Fig. 13 case (a)) which are obtained with a first order harmonic spheric expansion. FOEs are ellipsoids whose alignment is therefore simply performed with respect to the principal dimension Fig. 15. The hippocampal shapes, now all registered and described in a common framework, can be statistically analyzed. All the SPHARM coefficients are normalized and comparable across objects, and then group analysis can be performed. In particular this analysis was used to obtain the mean hippocampal shape for both the left and the right hippocampus. These informations 70 (a) (c) (b) (d) Fig. 15: The figure represents two different hippocampal labellings aligned along the principal dimension. It has to be noted that this procedures yields a rigid registration which does not modify the mask dimensions. are valuable with the aim of describing the hippocampus, on the other hand it is worthwhile to note that while this analysis describes well variations over the mean shape with respect of the training data, no information can be given about unknown hippocampal shapes. In this way the analysis is strongly dependent on the completeness of the training set, a condition which is difficult to fulfill for medical images. With the goal to generalize the hippocampal shape description, a novel fast automated algorithm based on point distribution model (FAPoD) was developed. 71 4.6 a novel fapod algorithm Segementation is a difficult task especially when concerning biological shapes characterized by high variability, non exhaustive data, and so on. Another important issue descends from the lack of a reliable gold standard, the ground truth, to perform quantitative evaluations or comparisons. To tackle this problem firstly a simulated dataset was collected. This choice allows to perfectly control the tuning parameters and the internal variability. Then simulated data were described in a point distribution framework, therefore defining and detecting in an automated way particularly relevant points for training shapes. Once every training shape had been described through a set of these points and their variability, an average shape and a confidence bound was determined. 4.6.1 Simulated Data With the aim of studying a complex shape such as the Hippocampus, we built a dataset of simulated images based on the standard Moving Picture Experts Group (MPEG) database. A set of binary images representing simple geometrical shapes or ordinary objects was used, in Fig. 16 there is one of these. Fig. 16: A watch from the standard MPEG database. The model wanted to reproduce the manual tracing of experts, consequently the shape model we built was based on a 2-dimensional image in order to reproduce the coronal view of an MRI scan, which is in general a convex shape and the main choice for manual tracing in several segmentation protocols [66]. The training shape underwent a noisy process to simulate the presence of artifacts and the biological variability, these have to be characterized by ran- 72 domness and that is why gaussian noise was chosen. This noise was added both voxel-by-voxel and to specific regions of the shape contours, in the first case to reproduce imperfections in acquisition, especially due to poor signal to noise ratios, whilst in the second case to tackle the intrinsic shape variability. The noisy images had to reproduce features proper of medical image databases, therefore a comparative study was performed to establish the level of noise to be applied [105]. To this aim the similarity function sum of squared differences SSD is used: SSD(Ii , I) = k Ii − I k2 =(Ii − I)T (Ii − I) = k Ii k2 + k I k −2ITi I = k Ii k2 + k I k −2c(Ii , I) (89) The SSD function can be interpreted as a measure of the correlation between the reference image I and the noisy images Ii . The noise amplitude was studied on a set of 50 simulated images. This number was chosen to simulate a real set of medical images. This study established how for different noise values above a particular threshold the correlation among the reference and the noisy images was lost. As an example a noisy watch is shown in Fig. 17. Fig. 17: An example of the noisy process over the template watch. Dealing with biological variability deriving for example from differences in age or sex is another important issue. According to the main causes of variability in magnetic resonance imaging, which are slightly ghosts effect, translations or rotations due to small involuntary movements of the subjects a random effect of scaling, rotations and translations were simulated both globally and locally. The Fig. 18 is a proper example of these effects. 73 (a) (b) Fig. 18: The figure shows a visual comparison between an image representing a disk and its counterpart obtained by randomization of scaling, translations and rotations. The training set of noisy images were then used to develop an automated algorithm for shape modeling and shape retrieval in the point distribution model framework. 4.6.2 Shape model construction Constructing a statistical shape model consists basically of extracting the mean shape and a number of modes of variation from a collection of training samples. In statistical approaches, the dominant role is carried out by point wise description. An essential requirement to describe shapes within a point distribution model is that reference points, the so called landmarks [111], are located at corresponding locations on all training samples. This requirement is generally the most challenging part of shape model construction and at the same time one of the major factors influencing the model quality. Manual landmarking has become unpopular not only because it is a tedious and time-consuming procedure requiring expert work but also because a large number of landmarks is required (especially in 3-D analyses) and results reproducibility is not guaranteed. In principle, all algorithms that automatically compute correspondences actually perform a registration between the involved shapes. Even for this choice, several possibilities can be explored. The straightforward solution to landmark creation in 3-D is mesh-to-mesh registration. It works directly with training meshes and the most popular algorithms applied in this case are Iterative Closest Point algorithm [112] and Procrustes [113]. One of the shortcoming of this approach is the bias induced by the choice of a reference shape. A solution 74 of this issue is obtained requiring that all possible combinations of matches among landmarks at the cost of a substantial increment of computational burden. However, the largest drawback of using a standard point matching algorithm is the restriction to similarity transformations. In samples characterized by high variability determination of corresponding points by proximity alone can not only lead to wrong correspondences but also to non homeomorphic mappings and thus flipping triangles in the mesh. Alternative approaches are mesh-to-volume registrations. These approaches consist in adapting a deformable surface mesh to a volume data (such as MR scans). In this approach a bias is introduced because of the need for an initial model that therefore relies on the training sample. Instead of adopting and consequentially adapting a template mesh, an alternative volume-to-volume approach is to register training samples to a standard template. In this case no a priori model is required, landmarks are placed on the atlas and their corresponding points are obtained as propagation of the registration deformation field. Whatever approach is chosen the goal for robust fully automated landmarking still holds. After alignment, the next step is the retrieval of the mean shape and its variability. As described in literature, the first step to perceive a point distribution model is to capture the best shape information through a point wise description. In this sense every shape is described by a set of landmarks. In a general automated framework, the main issue of adopting a point wise shape description is that no correspondences among different shapes nor importance criteria can be adopted a priori to determine which particular contour points should be eligible to become landmarks of a particular model. For example, well known simple shapes, as one hand, could be described by defining the fingertip extremities as landmarks, but this is not the case of complex shapes as the hippocampal one where no privileged anatomical points can be easily or automatically be detected. These are the main reasons for our choice to adopt a statistical framework to define the landmarks of our shapes and to use a preprocessing registration procedure. Firstly, a cumulative image S was defined summing over the contours of the training images Ii . Then a uniform spatial sampling of S was performed, the sampling was obtained through a moving window of 10 × 10 pixels and for each sample the pixel sum n was calculated and its value assigned to the center of the window itself. A threshold t is finally applied to keep only significant informations, the value of this threshold was studied with respect of the number of surviving points and the shape reconstruction accuracy. Once the mathematical landmarks are defined shape retrieval is straightforward, the last step is therefore to measure the accuracy of the reconstruction. For our 75 training images results were calculated in terms of dice index D, a popular error metric for image processing [114] defined by: D= 2(A ∩ B) A+B (90) where A represents the set of reference pixels, and B the set of pixels obtained by reconstruction. For simulated images results were on average about 96.8% ± 0.4% [105]. The shape description is not the only information one observer would like to acquire. Another important issue is to establish if the model has or not predictive power, and as a consequence if it is able to manage and model data never seen in training phase. In this sense the main information obtainable by the proposed framework is the variability, expressed in terms of standard deviations, characterizing every landmark and then every shape. 4.6.3 Modeling the variations The FAPoD procedure so far has established how to detect from a set of noisy images the N landmarks representing every shape of the training set. Once landmarks are detected, their homologous can be identified and labeled in every shape. Therefore each shape is represented by an N-dimensional vector ~x. To best capture the shape, for every couple of consecutive landmarks, a mathematical pseudo-landmark is introduced as that whose distance from the chord subtended by the two consecutive landmarks is maximum. An example is shown in Fig. 19 For each landmark and pseudo-landmark a mean position and the relative standard deviation can be calculated. The mean shape is obviously obtained by considering the mean landmarks, however what really matters in this case is the standard deviation which includes in the model shapes only on the basis of statistical considerations. Thus, at each landmark and its surrounding pixels a probabilistic values is assigned according to a standard gaussian distribution N(µ, σ) Fig. 20: Z1 2 Plandmark = N(0, σ)dx − 12 (91) Z n+ 1 2 Pneighbors = 76 N(0, σ)dx n− 12 (92) Fig. 19: A mathematical pseudo-landmark is defined as the contour voxel whose distance from the chord subtended by two consecutive landmark is maximum. where µ = 0, σ is calculated with respect of the average determined for each landmark and n is the distance of the neighbor from the average measured in voxel unities. Fig. 20: The figure shows an example of how the probabilistic values are associated to a landmark and its neighbors. Once probabilities are assigned to every landmark and its neighbors, shape reconstruction depends on the probability to assign at a contour, depending 77 on which pixels will be used for reconstruction. It is worthwhile to note that the probability of a pixel to belong to an edge of a shape is different from the probability to belong to the shape itself. Besides, if the goal of the analysis is to determine a region of interest, inner pixel analysis can be skipped thus resulting in no loss of information and in a dramatically decrease of computing requirements. FAPoD compares well with SPHARM description in the sense that its average hippocampal shape is consistent with the one retrieved by SPHARM analysis, besides it requires a lower number of training Hippocampi to take into account the whole dataset variability. 4.7 ensemble classifier segmentation The shape analysis gives an enormous contribution to individuate morphological differences among different subjects; besides, it allows to perform statistical evaluations about these differences both globally (for example about the volume, the convexity, the symmetry, etc...) or locally (for example with PCA or p-value analysis of vertexes, as previously shown in the SPHARM framework). Another fundamental contribution is given to segmentation. Firstly because shape analysis shrinks the region of interest where a more specific attention has to be given, this allows to dramatically decrease computational requirements and times. Secondly, shape analysis can be used for post-processing evaluations and therefore to establish if segmentation results are reasonable. However, segmentation cannot be performed only through morphology informations. A decisive role in segmentation is performed by color or gray intensities. This is why a voxel-wise analysis has been performed too. 4.7.1 Voxel-wise analysis with machine learning Automated segmentation techniques are gaining increasing recognition since, not only they offer the possibility of studying rapidly large databases, for example in pharmaceutical trials or genetic research, but also afford higher testretest reliability and the robust reproducibility needed for multi-centric studies. Several packages as FreeSurfer [115] and FIRST [116], are available: FreeSurfer performs a cortical and sub-cortical segmentation assigning a label to each voxel. It is based on probabilistic information automatically estimated from a large training set of expert measurements. FIRST performs segmentations using Bayesian shape and appearance models. More examples could be provided, in any case, it can be concluded that the role of the statistical learning 78 applied to heterogeneous and imbalanced data as those deriving from medical datasets has become more and more important. In recent years a number of studies has investigated and compared the performances of these tools, especially with state-of-the-art machine learning techniques. Particularly promising results were obtained with classifiers as SVM, Adaboost, or Ada-SVM (i.e. SVM with features automatically selected by Adaboost) [85, 86, 87]. These studies showed that in particular Ada-SVM segmentation compared favorably with the manual segmentations, while Freesurfer gave the worst results and the most visually inconsistent segmentations. Despite numerous efforts at literature, no automatic segmentation method is currently in wide use by clinical research groups nor adopted for large-scale quantitative studies of hippocampal anatomy. From this point of view, the main goal of this work was to construct an accurate strategy based on supervised learning algorithms devoted to hippocampal segmentation. A classifier, trained on a set of previously labeled examples (a number of 3-D brain MR images in which the hippocampi had been previously manually segmented) will classify voxels of a new brain MR image as belonging or not to the hippocampus. In particular, for each brain image a volume of interest (VOI), as previously described, is extracted according to shape analysis results. The VOI contains tens of thousand of voxels divided in two classes, hippocampal region and background. From a statistical point of view the structural complexity of the VOIs is twofold. Firstly, the two classes are dramatically unbalanced: in each VOI less then the 5% of voxels belongs to the hippocampus. Secondly, VOI data set is very large, consisting of 112 boxes when referring to the first dataset and arriving to 4000 boxes. The classifier performance can degrade significantly as the severity of the imbalance increases. Hence generating accurate statistical solutions for automated hippocampal segmentation is not trivial. As a consequence, an important phase of the study was related to the performance assessment of a Random Forest classifier [117] and a novel algorithm combining data random undersampling with Adaboost (RUSBoost) [118]. It is worthwhile to note that this was the first attempt to tackle the hippocampal segmentation problem with these classifiers. 4.7.2 Feature Extraction Supervised pattern recognition systems involve taking a set of labeled examples (or features) and learning a pattern based on those examples. The features should contain information relevant to the classification task. In the analysis 79 presented here for each voxel a vector whose elements represent information about position, intensity, neighboring texture, and local filters, was obtained. Texture information (contrast, uniformity, rugosity, regularity, etc.) was expressed using both Haar-like and Haralick features, as in [85]. This type of features is characterized by computational simplicity. For each voxel, a value was obtained by the weighted sum of the intensities on the area covered by a template, the sum of the weights being zero [119]. Filters of size varying from 3 × 3 × 3 to 9 × 9 × 9 were used for the calculation of the Haar-like features. The Haralick features [120] were calculated from the normalized gray-level co-occurrence matrices (GLCM) created on the m × m voxels projection subimages of the volume of interest; m defined for overlapping sliding-windows. For each voxel values of m varying from 3 to 9 were used. Haralick features rely on the calculation of the GLCM over the Ng gray levels, based on the assumption that the texture information of an image is contained in the spatial relationship between pairs of voxel intensities. In the cooccurrence matrix M each element pij represents an estimate of the probability that two pixels with a specified polar separation (d, θ) have gray levels i and j. Coordinates d and θ are, respectively, the distance and the angle between the two considered pixels. As in [120], d = 1 and the displacements at quantized angles θ = kπ/4, with k = 0, 1, 2, 3 were considered. As shown elsewhere [121, 122, 123] a subset of Haralik features is sufficient to obtain a satisfactory discrimination. To establish which of the original 14 GLCM Haralick features give the best recognition rate preliminary recognition experiments were carried out resulting in the following configuration: • energy: f1 = Σij p2ij (93) • contrast: N −1 N N g n2 Σi g Σj g p2ij ; |i − j| = n f2 = Σn=0 (94) • correlation: f3 = Σij (ij)p2ij − µx µy σx σy (95) where µx , µy , σx and σy are the means and standard deviations of px and py , the partial probability density functions obtained summing the rows or the columns of pij . 80 • inverse difference moment: f4 = Σij pij 1 + (i − j)2 (96) Finally, the gradients calculated in different directions and at different distances and the relative positions of the voxels, x, y, and z were included as additional features. The best analysis configuration, expressed by the highest metric mean value, was obtained with 315 features. 4.7.3 Classification methods The goal of supervised learning algorithms is not to learn an exact representation of the training data itself, but rather to build a statistical model of the process which generates the data. In this sense the choice to tackle the hippocampal segmentation problem within a statistical pattern recognition framework is dictated by the necessity to deal with heterogeneous data, biological variability, pathological conditions and several other characteristics which prevent the possibility to achieve a deterministic description. This idea is summarized by the generalization error. The generalization error is the sum of the bias squared plus the variance [124]. Bias and variance are complementary, a simple statistical model has a small variance and on the contrary a large bias, ion the contrary a model with an excessive flexibility with respect to the training dataset has a poor bias and a large variance. The optimum balance between bias and variance depends on the values of the model parameters; at this regard there are two strategies: a. pruning, intended as removing superfluous parameters from starting models characterized by a huge number of themselves. b. regularization, involving the addition of a penalty term to the error function through which the model is learn. To show the dichotomy bias-variance it is convenient to consider the ideal case of an infinite size dataset used to model a scalar output y for which the sum of squared errors E is: Z 1 {y(~x)− < t|~x >}2 p(~x)d~x + E = 2 Z 1 + {< t2 |~x > − < t|~x >2 }p(~x)d~x 2 (97) 81 where p(~x) is the unconditional density of the input data and < t|~x > the conditional density for the target variable t, similarly < t2 |~x >. The second term is model independent in the sense that it does not depend upon the model prediction y and this is why it is called intrinsic error. Therefore a statistical model can be considered perfect when it makes the first term vanish. As a consequence in a perfect statistical model: (98) y(~x) = < t|~x > In practical situations the whole ensemble of training sets arising from a finite size dataset D allows only to give an estimation of the desired quantities. For example the estimated error is given by: D [{y(~x)− < t|~x >}2 ] = {D [y(~x)]− < t|~x >}2 + +D [{y(~x) − D [y(~x)]}2 ] (99) The first term is for definition the squared bias and the second term the variance. The main problem in generalization is in the end minimizing these two quantities. To assess how different classifiers can manage the generalization error with respect of the particular hippocampal segmentation several tests were carried out. In particular two different classifiers, whose performances had never been tested before in this particular field of neuroimaging, i. e. Random Forest and RUSBoost, were chosen. The reason for this choice is the well known excellence in accuracy, among current algorithms, for Random Forest while RUSBoost as a novel algorithm especially designed to deal with imbalanced datasets is a straightforward choice given the extremely high skewness of data distribution for hippocampal and background voxels. 4.7.4 Random Forests Random forest is a combination of tree classifiers, each tree depends on the values of a randomly and independently sampled vector, for all trees in the forest. The generalization error for forests converges to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to other machine learning techniques (Adaboost), but are more robust with respect to noise. Internal 82 estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. Starting from a training set, seen as a collection of examples, in Random Forest experiments, a combination of bagging and random features selection is used. A number of new training sets are drawn from the original one, with replacement. Then a tree is grown on the new training set using random feature selection. The trees grown are not pruned. There are two reasons for using bagging. The first is that this enhances accuracy when random feature selection is adopted; the second is that bagging can be used to give ongoing estimates of the generalization error, as well as an estimate for strength (the accuracy of the individual classifiers) and correlation (the dependence among individual classifiers). Given a specific training set T with N examples of d features, k bootstrap training sets Tk are formed. For every new training set Tk a tree is grown giving as a result a classifier h(~x, Tk ) where ~x is an input feature vector. The prediction y for every ~x is obtained by voting majority only over those classifiers which do not contain ~x. In this way a so called out-of-bag estimate is achieved for the generalization error. From an operative point of view, this technique is realized by leaving out about a third of the whole training set for every k bootstrap sampling which is then used to internally test the forest performances. The simplest random forest with random features is grown by selecting at random a small group of features for every node to split on, the nodes are also called leaves. The methodology is that of classification and regression trees (CART). A size f for the group of features to randomly sample is √ fixed. Acd where cording to the original algorithm [117] this value is set to f = d = 315 is the whole feature numbers. It is worthwhile to note that the number of features to be sampled√f has to be in general far lesser than the total, the previous choice to set f = d is used to ensure this randomness condition. Once the features are randomly selected a split of the test examples is performed 83 maximizing the purity A synthetic overview of Random Forests algorithm is shown in Fig. 21 according to this schematic representation of the algorithm 1: Data: Training set T : N examples d features Result: Random Forest classifier initialization for t = 1 .. k bootstrap size do bootstrap split bootstrap samples in training test with 2:1 ratio calculate leaf size l1 , l2 while l1 != 1 & l2 != 1 do random sample: f features split test examples maximizing purity in two leaves end end Algorithm 1: Random Forest algorithm scheme. Fig. 21: The Random Forest algorithm. A training set with N examples of d features is sampled k times with bootstrap. Each bootstrapped training set Ti is internally divided in a training and a test set with a 2:1 ratio. Then a random sample of f features is performed, those features are used to split the test set in two leaves. The procedure is iterated until the leaf size reaches a desired size, in classification this is usually set to one. 84 However, the only parameter to which Random Forest were proven to be somehow sensitive is precisely f. In fact, Breiman’s original work proved that both correlation and strength among the classifiers is influenced by and directly proportional to the parameter f. The problem is that as the correlation increases the error rate decreases whilst the strength of the classifiers decreases it. The main tuning about Random Forests is therefore the choice of the feature number to sample, however it is not a complex or laborious procedure thanks to out-of-bag estimates. Results are evaluated by comparing the classifier prediction and the manual labellings through the similarity index, also called Dice index, as defined by equation (90). 4.7.5 RUSBoost Class imbalance is a common problem for many applications. This is particularly true when dealing with image processing, in fact regions of interest are likely to fill a small sample of the entire field of view. When examples of one class in training dataset outnumbers examples of the other class, traditional data mining methods tend to create suboptimal classification models. Several techniques have been used to mitigate the problem of class imbalance, including data sampling and boosting [125]. A hybrid approach, namely RUSBoost was designed for imbalanced classification problems. Data sampling balances the class distribution in the training sample by either adding examples to the minority class, therefore oversampling it, or removing examples from the minority one (undersampling). The goal of these techniques is to reach class ratio balance, usually by bootstrap. Both undersampling and oversampling have their benefits and drawbacks, the main drawback associated with undersampling is the loss of information from the training sample whilst for oversampling it is overfitting deriving from the inclusion of duplicated examples [126, 127]. If computational issues are of concern undersampling should be preferred. Another technique usually adopted to improve classification performances is boosting. Such a technique is particularly effective at dealing with class imbalance because the minority class examples which are most likely those misclassified receive higher weights in successive iterations. The combination of sampling and boosting is the core idea behind RUSBoost method. Random under sampling is a technique that randomly removes examples from the majority class until the desired balance is achieved. The algorithm assigns to each example i of the training set T the weight 85 D1 (i) = 1 ∀i = 1, 2, ..., N N (100) Then, for each round r = 1, 2, ..., R, a learning phase described in algorithm 2 is performed to maximize the match between the predicted label yi and the true label y for every example. The main parameter of the learning phase is the number of rounds R the learning has to be performed. Data: Training set T : N examples d features Result: RUSBoost classifier initialization for t = 1 .. k bootstrap size do create temporary training set St while r < R = number of rounds do random under sampling to achieve a desired ration between the majority and the minority class call a weak learner → a hypothesis ht (~ xi , yi ) compute the error function P = xi , yi ) + ht (~ xi , y)) x~i ,yi Dt (i)(1 − ht (~ update the weights Dt (i) → Dt+1 (i) normalize the weights end end Algorithm 2: RUSBoost algorithm scheme. Finally as in the case of Random Forests performances were evaluated with the measure of the similarity or Dice index. 4.8 analyses on distributed infrastructures In this section the segmentation pipeline implementation on the local computer cluster and on the geographically grid-based computing infrastructure is discussed. Computational requests coming from every scientific community are daily growing and the capabilities of exploiting Grid and in general distributed computing environment is of crucial importance. However, the most common environment for medical imaging, i. e. the LONI pipeline does not provide any usable plugin to exploit those computational infrastructures for Torquebased systems, therefore limiting the adoption of the environment itself. In this work a solution to this problem is shown allowing end users to access a number of resources like European Grid Infrastructure (EGI), local batch farm 86 and dedicated servers. The proposed approach can be useful for large-scale studies. 4.8.1 Medical imaging and distributed environments The field of medical imaging has seen in recent years an enormous development. Image databases, made of thousands of medical images, are currently available to supply clinical diagnosis, this is particularly true for brain diseases. At the same time more and more sophisticated software and computationally intensive algorithms have been implemented to extract useful information from medical images. Many medical image processing applications would greatly take advantage from grids [128] [129]: run-time reduction, sharing of data collections and platform-hardware independent configurations are just a few examples [130, 131]. The access to distributed computing is crucial to meet the needs of the growing community of neuroimaging, together with the increase of the amount of data available. This is also true because screening programs are now in the development phase and feasibility studies to demonstrate the potentiality of Grid tools for medical applications spread all over the world [132] [133] [134]. Unfortunately, high computational and storage requirements as well as the need for large reference image datasets of normal subjects are limiting the use to advanced academic hospital; besides, the management of complex analysis pipelines, the combination of several processes and routines hard to assemble together make the adoption of distributed computing infrastructures very challenging at least without a change of paradigm. This paradigm seems nowadays represented by workflows technologies [135]. Workflow technologies are emerging as the dominant approach to coordinate groups of distributed services, in particular this is true for Grid computing services. The background philosophy of such an approach is that if a client (another service or an end user) makes an invocation on a remote server, it should not to be concerned itself with the inner protocols (as for example the language it is written in) to take advantage of its functionalities. This is the approach pursued by workflows. The main idea is that each service is independent from the other. This will offer a great degree of flexibility and scalability for evolving applications. Although the concept of service-oriented architectures is not new, this paradigm has seen a widespread adoption through the Web services approach, which makes use of a suite of standards such as XML, WSDL and SOAP, to facilitate service interoperability [136]. 87 One of the most used workflow manager for medical image processing is the LONI pipeline (LP) [137], a graphical workbench developed by the Laboratory of Neuro Imaging 5 with the goal to manage and execute neuroimaging processing algorithms. The LP is a simple and efficient computing solution to problems of organization, handling and storage of intermediate data as well as for processing data and perform modular analysis. However several requirements must be fulfilled to run the environment on computer farms. In particular LP requires or at least suggests CentOS operating system and Java Platform Standard Edition to be installed. In order to run the analysis the user has to be sure that LP is able to submit request for executing application to the computing infrastructure available in that particular context. If the requirements about operating system and Java can be easily achieved or passed by, the second one is a particularly strong constraint, because different open source resource managers can be used and are often preferred. As far as we know, no dedicated plugin for Torque or for the gLite/EMI Grid infrastructure has been released, limiting the adoption of LP environment. To tackle these problems different fully automated algorithms have been proposed; all of them are characterized by intensive computations and storage management issues, which naturally should involve the use of distributed computing. However their adoption or their deployment is still challenging and reports illustrating applications beyond demonstrative level lack. 4.8.2 Workflow managers Web services provide a solution to a complex problem: to coordinate a group of service together to achieve a shared task or goal. Workflows are the glue for joining together distributed services, which are owned and maintained by different organizations. The plethora of different workflow specifications, standards, frameworks and toolkits cause scientists to prefer reinventing the wheel by implementing their own proprietary workflow language rather than investing time in existing workflow technologies. In fact, several scientific workflows already exist: Taverna [138], Kepler [139] and Triana [140]. In this paper a web service approach has been used to overcome the main issue of making the LP available with Torque and at the same time to generalize LP applications to distributed infrastructures. The two aspects are different sides of the same goal: to make end users able to access distributed resources for intensive computations while keeping the workflow management easy to handle. This is particularly true for grids. In fact, although feasibility studies have demonstrated grids to 5 http://pipeline.loni.ucla.edu 88 be able to tackle several issues in medical imaging, grid adoption in practice is still a challenging problem [141]. The success of contemporary computational neuroscience depends on large amounts of heterogeneous data, powerful computational resources and dynamic web-services. To provide an extensible framework for interoperability of these resources in neuroimaging LP exploits a decentralized infrastructure, where data, tools and services are linked via an external inter-resource mediating layer whose backbone schema is formed by standard eXtensible Markup Language (XML). The pipeline environment does not require an application programming interface, its graphical user interface has been originally programmed as a lightweight Java 1.3 environment [142]. LONI Pipeline use has spread and several works have demonstrated which advantages can be achieved through this powerful tool [143]. In particular XML resource description allows the LP infrastructure facilitates the integration of disparate resources and provides a natural and comprehensive data provenance. It also enables the broad dissemination of resource metadata descriptions via web-services and the constructive utilization of multidisciplinary expertise by experts, novice users and trainees. The LP features include distributed grid-enabled infrastructure, virtualized execution environment, efficient integration, data provenance, validation and distribution of new computational tools, automated data format conversion, and an intuitive graphical user interface. As we already mentioned LONI does not provide the needed plugin to submit and manage jobs to the gLite/EMI grid [144] and to a batch farm based on Torque. 4.8.3 Workflow Implementation In order to fill the existing issue of submitting and monitoring LP jobs with Torque resource manager and more in general to provide a general manageable framework for grids we decided to use a meta-scheduler based on Job Submission Tool (JST) [135] that is able to submit jobs to different computing architectures, exposing to the end user only a simple Web Service interface based on the REST protocol Fig. 22. Each application composing the workflow is executed on the infrastructure that fits better the requirements in terms of the computational time needed for the job execution and the input data size: for example, short jobs that make access to a large amount of data will be executed on the local batch farm, as this will increase the available bandwidth to read the input data. 89 Fig. 22: The simplified workflow implementation. In reddish diamonds there are the input/ouput modules, the backend analysis modules are represented by turquoise diamonds. To better emphasize the possibility to dynamically choose the local farm or the grid infrastructures, these backend modules are shown in yellow. In order to exploit JST with the LP it was needed to add a SOAP based web services, that is not already available. Using this web-service interface the job submission can be performed transparently through the Front-end which provides a set of web-services calls to submit jobs, monitor them and retrieve their output. The JST framework is composed by the Front-end, that hides the complexity of the underlying layer, developed in Java and based on Apache Tomcat (it exploits MySQL RDBMS), and by the Back-end that takes care of job execution or submission to the batch system, based on Torque/PBS or to the EGI grid infrastructures. The workflow has been implemented as a sequence of calls to the Front-End web-services. Actually, LP supports SOAP-based web services but unfortunately it seems not to be able to handle properly the WSDL file of the Front- 90 end web services. A workaround has been implemented using a SOAP client, wsdlpull v.1.24, completely free and written in C++. 4.8.4 Distributed infrastructure employment The local farm Bari Computer Center for Science (BC2 S) is a computing infrastructure realized to tackle different use case from different research teams involved in astroparticle physics, nuclear physics, medical physics, statistics, bioinformatics, theoretical physics, etc. Available computing nodes are able to provide up to 5000 CPU cores and about 1.8 PByte of storage. The operating system is Linux, the primary release is Scientific Linux (SL) 5 put together by Fermilab and CERN and various other labs and universities around the world with the main purpose to have a common install base for the various experimenters. Debian distribution is available too. Storage system is based on Lustre distributed file-system, in order to improve performances in reading data and to simplify the users activities through the adoption of just one distributed file system. Torque and Maui build up the batch system and manage the job queue. The implemented solution has been designed to take advantage of distributed computing resources available also in the Grid-based infrastructure. The applications, i.e. the executable modules in the LONI workflow, can be executed both on the already described computing farm and on the EGI [145] distributed grid infrastructure that is composed of about 300 sites geographically distributed around the world. In our experiments the jobs have been submitted to grid using the Workload Manager System (WMS) service offered to users of the biomed Virtual Organization. This can make use of about two hundreds of Computing Elements and the great abundance of resources can be an advantage thanks to the reduction of the average job response time, however it has some drawbacks: the execution environment is not completely controlled due to the heterogeneity of the resources (pre-installed software, supported data transfer protocols, etc.). Therefore, this can have a significant impact on the job success rate. In order to manage this complication we have conducted preliminary tests (using only one image) aimed at assessing the feasibility, adverse events and predicting the performance. The results collected from these "small-scale" tests have been used to adjust the implemented solutions in order to increase the success probability of the "full-scale" test (using the complete image dataset). The preliminary test results have suggested the following improvements: the computing elements showing problems with JAVA run-time environment have been excluded from the available resources (actually this adjustment can be done by 91 modifying the JDL Requirements field); fallback and recovery solutions have been implemented for the storage operations. Input data reading and output data writing are very critical phases, being the major job success/failure factors. In this regard, multiple data transfer protocols (srm, gridftp, http) and multiple storage elements have been used in our implementation. It is worthwhile to note that the described tuning can be done dynamically: the system is able to take charge of these changes without stopping or restarting the job submission operations. 4.8.5 Grid services and interface Data management is a critical issue when dealing with distributed systems [146] [147]. Usually, applications need computing power but also requirements of data sharing, storage and transfers are compelling. This is particularly true for medical image analysis which require that file storage and transfers are efficient and reliable. Besides, non technical issues concerning privacy problems tend to minimize data replication and data transfer. The data transfer and replication issues have been approached from two different perspectives. Algorithmically, once data is loaded by end user, all needed information in our segmentation process, based on voxels intensities, is extracted and anonymously stored or transferred to processing nodes; no clinical information about age, sex and so on are used, therefore privacy policy is not so compelling. However there is no certainty that further developments will not require to manage this kind of data. This is why particular attention has been given to protect data information, security protocols have been provided against known vulnerabilities. 4.8.6 Security and Data Management All the steps performed over the grid infrastructure, both in terms of data management and job submission, are executed only after a strong authentication based on X509 certificate is fulfilled. This provides a good level of data protection. All the job submitted on the local batch farm are fulfilling the standard authentication based on Unix rules and permission. In order to achieve good performance and reliability in accessing data to be analyzed during this activity we have exploited the standard Data Management tool offered by the gLite/EMI grid infrastructure. Namely the lcg-cp tool for coping data and a SRM Storage Element based on Lustre plus Storm for the SRM layer. The advance in this case is that all the files that are available on the local farm using 92 lustre storage element could be used also over the grid infrastructure reading them exploiting an SRM interface. 4.8.7 Segmentation workflow deployment From the computational point of view the presented algorithm can be summarized in three steps: • Training dataset selection: to improve the algorithm performance for each test image a training subset was selected through the use of the Pearson correlation coefficient between the test image itself and the training dataset; • Feature extraction: all voxels included in the selected dataset were characterized by 315 features computed from local informations such as image intensity, voxel positions, Haar-like filters and selected local Haralick features; • Voxel classification: an ensemble classifier is used to classify voxels as belonging or not belonging to the hippocampus. Each module of the pipeline can be considered as an independent service and not necessary enrolled in the neuroimaging field, the most relevant steps of the analysis and the main parameters are represented in Fig. 23. The end user uploads an MRI which is compared with the stored images; a similarity analysis is performed by means of correlation measures and an adaptive training set is selected. The number of correlated images to use for training depends on the desired accuracy degree but also on storage and time requirements; it is one of the main parameters which have to be tuned. Features have to reproduce textural behavior according to intensity information. For 3D images, features are extracted voxel by voxel, statistical features as mean, standard deviation, gradients, entropy, range and skewness are calculated in cubic windows of varying size (3 × 3 × 3, ..., 9 × 9 × 9) centered in the voxel to be considered. Besides, position features and textural features are considered: Haralick features (energy, correlation, contrast and homogeneity) and Haar-like features have been chosen. A feature file has typical dimensions of 150 Mb, this is an important aspect both for the storage and the upload on worker nodes. Training set images selected are the then used by the RF. The classifier training depends on several parameters that have to be adjusted as the number 93 Fig. 23: A simplified representation of the hippocampus segmentation algorithm. In this case particular emphasis is given to computational issues: parameters and critical aspects are shown in round boxes, processing steps in rectangles and files in diamonds. The distributed computing is enclosed by a dotted line while the end user interface is shown in reddish diamonds. of trees to grow or the split criterion; besides, the performances of the classifier strongly depend on the discriminative power of the features, therefore feature selection is necessary to improve the performances. Finally, several post-processing analysis can be performed as thresholding or filtering the final output is the segmented image downloadable by the end user. 4.8.8 Workflow setup The algorithm has been developed with the LONI pipeline workflow manager: Fig. 24 shows a simplified version of the corresponding implementation, where main analysis steps are modularized and therefore able to be deployed on distributed computing infrastructures. Each module executes bash scripts that transparently handle the submission and monitoring of farm/grid jobs 94 Fig. 24: The segmentation algorithm in its LP implementation. It consists of four main modules: the Adaptive Training module measures the correlation between the image to be segmented and the training set, the Feature Extraction module calculates the features on the testing image according to the volume of interest determined by the previous module, the Random Forest Training module performs the learning phase for the chosen classifier and finally the segmentation is returned by the Random Forest Prediction module. Each module is compiled and therefore able to be run on a distributed computing structure. that, in turn, execute the Matlab 6 code compiled with MCR (with -nojvm compile option to disable the java virtual machine in order to speed-up our non-graphical applications). The first processing step, in charge of VOI training, is the least computational intensive module, but the most data intensive since it needs to access to all the images composing the database. Therefore, it is candidate to run on the local farm where the input data can be read without significant latency. Moreover, this module can rely on the Matlab software pre-installed on the local farm; the availability of the same software environment can not be guaranteed on all the Grid worker nodes where the other modules can run. Therefore, the wrapper script is in charge to download the MCR package from a configured repository and to install it before executing the application. The workflow is user-friendly thanks to the usage of LONI module grouping feature: at high level only the data dependencies among the tasks are highlighted whereas the details related to the static input data and parameters of each module can be retrieved browsing inside the module group. In fact, each box in Fig. 24 has the internal structure shown in Fig. 25 consisting of 6 http://www.mathworks.it/products/matlab/ 95 Fig. 25: Each module hides implementation details, in particular the checkinput, insertJob and getStatus modules which allow in a transparent way to control the workflow without the need to be concerned with submitting or monitoring issues. three different modules in cascade. Before each processing step, the existence of input data, provided by the previous module, is checked by the conditional block (InputCheck) in order to avoid submitting jobs that are doomed to fail. The insertJob module is in charge of the SOAP request to the JST web service (using the wsdlpull SOAP client) for the job submission with the proper associated arguments; the SOAP response contains the job identifier to be used for later monitoring. The getStatus module implements a loop: the SOAP client periodically sends a status request to the JST web server, using the job identifier returned by the insertJob module, in order to monitor the job status until its completion, as shown in the sequence diagram in Fig. 26. Upon job completion, the URL of the produced data is retrieved by callin again the JST web service. 4.8.9 Summary The development of neuroimaging and signal processing has made possible the visualization and measurement of pathological brain changes in vivo, producing a radical change, not only in the field of scientific research, but also in the everyday clinical practice. Therefore the availability of distributed computing software environments and adequate infrastructures is of fundamental importance. In this study the LONI pipeline processing environment was used, a user-friendly and efficient workflow manager for complex analysis pipeline. A study for grid deployment was also performed with the aim of creating automated segmentation algorithms for large screening trials. Several tests were 96 Fig. 26: Web service call sequence implemented in the LP modules. carried out both on the local computer farm BC2 S and on the EGI. BC2 S is a distributed computing infrastructure consisting of about 5000 CPU and allowing up to 1.8 PB storage, while the EGI consists of about 300 geographically distributed sites around the world. In particular all the results presented in this study were obtained on the BC2 S using 56 MRIs at our disposal. In addition, a feasibility study on about 3000 images (replicas of the 56 original ones) was performed with success on the EGI. The method proposed tackles two different problems the distributed infrastructures have to deal with: it does not strongly suffer from an overhead performances deterioration and at the same time it has a job failure rate reduced to zero. In particular the method presented allows to use the LP with Torque and at the same time to use grids. With the use of a workflow manager the end user can both run already available workflows, modify them before execution and build completely new analysis pipelines. This option is very powerful. In the future we would like to develop a simple web interface to allow users to exploit an already available workflow changing only the configuration parameters and the input files. In this way it would be possible for the users to execute some analysis without the need of specific expertise in grid management. This web interface would also provide the needed support for strong authentication mechanism. In our setup all analysis modules were written in Matlab or ITK 7 ; besides the flexibility of the web services methods should be further investigated. Failure management is a challenging problem in production grids, with the consequence that jobs need a continuous monitoring. Moreover, when data 7 http://www.itk.org 97 transfers are huge, overall times grow exponentially with job failures. Therefore, data transfer still remain the most limiting factor to grid effectiveness. The number of correlated images is, in this sense, a crucial parameter. This is of course an ad hoc strategy which is difficult to generalize, a further improvement of this model should be concerned with the calculation of the smallest data amount to transfer. 98 5 E X P E R I M E N TA L R E S U LT S In this section an overview of the experimental results is given. According to the combined strategy proposed results are presented in particular for three different analyses: segmentation, classification and computing performances. The work developed in this thesis followed, as previously explained, three main guidelines concerning the image segmentation which can be summarized as in the following Table 4: Segmentation performances Classification performances A cross-validation analysis on magnetic resonance brain scans at 1.0T is performed. The segmentation model is applied at 1.5T and 3.0T MR brain scans from ADNI databases. Performances are evaluated by comparison fo automated segmentations with manually traced labellings, several error metrics are provided. Hippocampal volumes are measured as an atrophy index. Longitudinal analyses over ADNI data is performed. Discriminative power among healthy controls, MCI and AD subjects are given through variance analysis. Computing per- Workflows for fully automated segmentation are deformances ployed both on local cluster BC2 S and Grid. Failure rate and execution times are measured to evaluate the computing performances. Table 4: A schematic overview of the analyses presented in this work 99 5.1 hippocampus segmentation In this section a detailed overview of the experimental results concerning the hippocampal segmentations are provided. Firstly let us briefly discuss the results of the registration. Starting from the original 181 × 145 × 181 brain scans a rigid registration was performed using the ICBM152 template. This registration was used as the initial guess for a refined affine registration. In this way restricted regions of interest of dimension 50 × 60 × 60 were extracted for both left and right Hippocampi; these hippocampal boxes were used for further analyses. In the following Table 5 a summary of the mean hippocampal volume for each hemisphere is given. Right Hippocampus Volume (voxels) Core - 60% (voxels) rigid transform 24569 202 affine transform 14820 2578 rigid transform 24397 9 Backward Selection 13256 2633 Left Hippocampus Table 5: A schematic summary of the registration volumes. The volume represent the total average hippocampal volume qhile the core is the volume of the inner Hippocampus, namely the core accounting for the 60% of the total volume. As the Table 5 shows after rigid registrations the overall hippocampal regions of interest are far higher than would desire; in fact from literature it is known the Hippocampus to be on average of 3200 mm3 . On the contrary, affine registration allows to refine the region of interest, the main result in this case is given from the core value. The voxels representing the inner 60% of the Hippocampus, namely the core, overlap arriving to 2578 mm3 for the right hemisphere and to 2633 mm3 for the left one. This result is in fact a measure of the goodness of the registration. Aside from the biological variability, the 60% of the hippocampal regions are in effect close to the expected volumes, which should have nominal values of about 2000 mm3 . 5.1.1 Exploratory Analysis Once the putative hippocampal regions of interest are extracted a refined analysis of the boxes can be performed. The dataset used for this analysis consisted 100 of 56 MRI scans and the corresponding manual hippocampal labellings as described in Chap. 4. First of all box intensity distributions were explored. The gray level distributions differ significantly from one scan to another, as shown in the example of Fig. 27. Fig. 27: The figure shows the comparison between the gray level distribution of two randomly chosen right boxes, as obtained after registration preprocessing It can be noticed how registration processing has different impacts even by looking at the gray distributions. In fact the number of occurrences for black voxels (i. e. intensity i = 0) can be used as a measure of how many background voxels were artificially introduced by registration, and therefore as a measure of how similar or not were the registered image and the template. An overall view to gray level distribution can be given by representing the cumulative image. In this case statistical fluctuations are averaged and background voxels are disregarded. As a consequence a smoother distribution is obtained, Fig. 28 represents this effect for the cumulative distribution of right boxes. Left tail is significantly smoothed, this can be easily interpreted: the great number of background voxels observed in Fig. 27 was a random effect introduced by registration. It is also interesting to note that hippocampal voxels contribute continuously to the gray level distribution. It is not possible to classify voxel to belong or not to the Hippocampus only on the basis of intensity considerations. In fact, even considering only the true hippocampal voxels of a box (as manually labeled by the experts) there is a huge number of voxels with an intensity almost equal to zero. Let us say that for the considered dataset hippocampal intensities had median m about 0.45 and interquartile range IQR is about 0.25. Therefore, it is 101 4000 3500 # occurrences 3000 2500 2000 1500 1000 500 0 0 0.1 0.2 0.3 0.4 0.5 0.6 Gray−level intensities 0.7 0.8 0.9 1 Fig. 28: The cumulative image represented in the figure allows to evaluate how noise introduced by registration or statistical fluctuations among the different images affect especially the left tail. interesting to note that the outliers of the distribution are found above 0.90 and below 0.10 level intensities, taking this into account it is possible to estimate an upper and lower threshold for hippocampal voxels to be considered without affecting the segmentation accuracy. Another important consideration arises from the comparison of the total number of hippocampal voxels and the total box size. As Fig. 27 shows, total hippocampal voxels are about 24 × 103 , while the total example number is 10 × 106 resulting in a very imbalanced dataset. This point will be crucial for further considerations, in particular it is this imbalancing the main reason for th choice to apply classifiers such as Random Forests or RUSBoost, which are specifically designed to manage imbalanced datasets, to the hippocampal segmentation task. Further insight about the data sample is obtained by studying the similarity among the boxes. Similarity is investigated using the Pearson’s correlation coefficient r calculated over voxel by voxel intensities, then for two images I1 and I2 , if C is the covariance matrix: C(I1 , I2 ) r= p C(I1 )C(I2 ) It is worthwhile to note that the correlation has to be computed only over a small region of the boxes, both for computational issues and interpretative concerns. In fact, with the goal to segment the Hippocampus what it is interesting to observe is whether hippocampal boxes can be defined similar in 102 the region containing the Hippocampus itself. As a consequence only the core voxels have to be used for this computation. These core or inner voxels can be determined according to the the FAPoD VOI hunter introduced in Chap. 4 as those concerning the 60% of the hippocampal mean shape. Therefore, once boxes were extracted, FAPoD analysis was performed on these boxes to model the average hippocampal shape, both for left and right Hippocampi. FAPoD algorithm gives as a result a probability map for the hippocampal voxel locations and therefore it allows to individuate a region containing the Hippocampus which is smaller than the box itself. Besides, the probability map can be conveniently and soundly thresholded to keep only higher probability voxels to be used for correlation computation. In particular the previously cited threshold of 60% was used to detect inner hippocampal voxels, i. e. those whose probability to belong to the Hippocampus exceeded the 0.6 assigned by FAPoD. 5.1.2 VOI extraction The manual labellings described in Chap. 4 were used to train in cross-validation the FAPoD shape model. The main result of this analysis was to describe the hippocampal shape through a probability map. As an example, Fig. 29 shows an hippocampal labelling contained in FAPoD volume. Fig. 29: The figure shows the bounding peri-hippocampal region obtained through FAPoD analysis (green) and the labeled mask (white). 103 According to the Point Distribution Model Theory the mean hippocampal shape can be described by the mean and standard deviation for each landmark series. These mean landmarks and the respective standard deviations landmarks σl were combined with a pointwise linear interpolation. Several tests were carried out to identify the optimal ratio between the number of landmarks and the mean contour length. The choice of parameters was carried out evaluating the performance of the overall system and taking into account the computational cost of processing. First of all, the number of images useful to retrieve the FAPoD volume was evaluated. As a performance index in this case it was used the volume of interest measure. This results shows that the model has a good generalization capability, in fact it is able to contain testing shape variabilities using less than a half of the training set. Besides it has to be stressed that starting from MR scans with an overall size of about 4.7 × 106 voxels, and passing through boxes of 1.8 × 105 voxels the FAPoD analysis has in the end allowed to detect a region of interest of about 2.5 × 104 voxels with dramatical decrease of the computational costs, of the dataset imbalancing thus making the segmentation task more efficient. In Fig. 30, this result is shown, in particular the mean volume obtained by FAPoD method plateaus using about 25 images. 4 4 x 10 1 sigma 2 sigma 3 sigma Hippocamapal Volume mm3 3.5 3 2.5 2 1.5 1 0 5 10 15 20 25 # Images 30 35 40 45 50 Fig. 30: The volume reconstructed by FAPoD (in mm3 ) varying the number of images used to retrieve the volume for (a) left and (b) right Hippocampus. These tests were performed randomly sampling 100 times the training set and varying the number of training images sampled, as a consequence for each mean shape it was possible to detect also a standard deviation; in Fig. 30 it can be seen how 3σl , 2σl or 1σl trends behave similarly but the optimal choice for the putative hippocampal mean shape should be the 2σl variability 104 which assures a region of about 25000 voxels. For following analyses then VOIs (with σl = 2) are intended. The FAPoD region allows to detect a sound per-hippocampal region and, by thresholding, to define the inner hippocampal region where similarity analysis can be performed. 5.1.3 Correlation analysis As stressed in the previous section, the Pearson’s correlation coefficient can be used as a similarity measure among the hippocampal boxes. However, measuring the similarity all over the box can yield misleading information. In fact, it has to be pointed out how the goal of this analysis is the hippocampal similarity more than the boxes similarity. According to this, it is the hippocampal core defined through the FAPoD algorithm application which undergoes the correlation analysis. The following Fig. 31 shows an example of how an MRI box is correlated with the other boxes of the dataset. 11 10 9 # occurrences 8 7 6 5 4 3 2 1 0 0 0.1 0.2 0.3 0.4 Correlation coefficient r 0.5 0.6 0.7 Fig. 31: The figure shows the correlation coefficients computed for an image Ii of the data set and the remaining images. As it can be shown there are several images moderately correlated with Ii . As Fig. 31 shows, there is a small numbers of images which are correlated to the MRI box used for this particular example, in fact this is a general result, for the most hippocampal boxes, moderate correlation can always found for no more than 10 − 15 images. As the main goal of this analysis is to provide a supervised learning technique to segment the Hippocampus, it is important to investigate whether a smaller dataset, consisting only of the more correlated images, can furnish a better choice for the classifier training instead of using 105 the whole dataset and therefore relying only on the generalization power of the classifier itself. To gain an overall perspective, average correlations can be studied. The right boxes show a mean correlation coefficient r̄ = 0.3 while the left ones have r̄ = 0.2. In particular looking at Fig. 32 showing averages correlations for both right (case a) and left (case b) hippocampal boxes it is possible to outline an interesting behavior: only a small fraction of images can be considered correlated or moderately correlated (r > 0.3) to the dataset and in general right Hippocampi seems to show better correlations. 12 # occurrences 10 8 6 4 2 0 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Average correlation coefficient (RIGHT) 0.45 0.5 (a) 12 10 # occurrences 8 6 4 2 0 0 0.05 0.1 0.15 0.2 0.25 Average correlation coefficient (LEFT) 0.3 0.35 (b) Fig. 32: The figure shows the average correlation coefficients computed for both right (case a) and left (case b) Hippocampi 106 According to this it is natural to investigate segmentation performances with a varying number of correlated training images. In the following sections, the experimental results about the learning phase are discussed, besides the the existence of a possible relationship between the correlation and the segmentation accuracy is discussed. Before discussing these aspects a more detailed description of the features used to train the models and of the feature extraction has to be given. 5.1.4 Feature importance The hippocampal boxes used in this work, as previously described, have dimensions 50 × 60 × 60, for a total amount of 180000 voxels. On average the feature extraction of the 315 features described in Chap. 4 requires 16 hours of CPU time and it is by far the most intensive computation involved in the whole segmentation pipeline [148]. Another fundamental aspects deals with the classifiers to be adopted. A Random Forests classifier, for example, is able to perform an internal feature selection and therefore it is able to achieve sound performances with the whole feature sample, on the contrary other classifiers would greatly benefit from an a priori feature selection. This is why a particular attention has to be given to the study of the feature importance. Several approaches were addressed to determine an optimal sub-sample of features and different methodologies were employed. In particular as a first test the distributions were investigated to unveil whether different features could be considered as indistinguishable or statistically not different. At this aim the Kolmogorov-Smirnov test was performed to evaluate which feature distributions rejected the null hypothesis to belong to the same population. Only statistically different features were kept for training. Another used approach was the Principal component analysis (PCA), it was adopted to have a feature importance/independence measure, in this case only features accounting for more than the 99% variability were selected, thus 197 features survived the PCA analysis. Finally forward and backward selection were employed. The following Table 6 summarizes the obtained results. Methods employed, number of features and normalized performances are shown. The latter were obtained in cross-validation by measuring how segmentations obtained training the classifiers on the selected features overlapped the manual labellings. Methods # features Performances all features 315 1.00 Kolmogorov-Smirnov 57 0.90 107 PCA 197 0.80 Forward Selection 36 0.98 Backward Selection 23 0.97 Table 6: A schematic view of the different feature importance analyses performed. 5.1.5 Random Forest classification The features extracted and selected were used to train a Random Forest classifier. A cross-validation approach was used for the training of both right and left hippocampal boxes. Thus, for each box to be segmented, the training sample included only the remaining 55 boxes. Several approaches were used, in particular K-fold cross-validation was used to find the better training configuration. A number of images N, with N = 10, 20, 30, 40, 50 were randomly extracted from the training sample, the extraction had 100 iterations so that a hundred classifiers were trained, then predicted labels were used to segment the Hippocampus. In this phase of the analysis the simple threshold th = 0.5 was used to make the continuous scores given from the classifiers a binary output. The results in terms of Dice index for the left Hippocampus are reported in Table 7. At this purpose, let us recall Dice index D definition: D = 2(A ∩ B)/(A + B). Similar results were obtained with the right Hippocampus. Training Images Mean Median Standard Deviation 10 078 0.79 0.01 20 0.80 0.81 0.01 30 0.81 0.81 0.01 40 0.81 0.82 0.01 50 0.81 0.84 0.04 Table 7: For each cross-validation iteration the Dice index distribution is calculated and then mean, median and standard deviation are averaged. The table shows these mean values and it clearly shows how performances increase with the number of training images, however at the cost of an increased spread in the distribution. Within the machine learning literature, it is widely appreciated that leaveone-out (loo) approach is a suboptimal method for cross-validation, as it gives 108 estimates of the prediction error that are more variable than other forms of cross-validation such as K-fold or bootstrap. In fact, even in this case it was verified that K-fold cross-validation with K > 30 was stable and gave results comparable with the loo but with a lesser variability. However, when dealing with massive computations as those involved in the neuroimaging field it is not possible to perform a K-fold analysis for mere questions of times. This is why loo was used in this work as a reference value, besides an alternative strategy (namely, active learning) was investigated, based on the correlation analysis of the data set. This latter approach used only the N images resulted to be more correlated with the box to be segmented, correlation as previously explained, was measured in the inner region of the box, which can be considered to be putatively representing the hippocampal core. As a comparison with the previous results segmentation performances were investigated by using the subset consisting of the N = 10, 20, 30, 40 most correlated images to the testing image. Besides, the loo segmentation was also performed. In Fig. 33 the results for the left Hippocampus are shown as a boxplot. Fig. 33: The figure shows the segmentation performances for left Hippocampus using the 10, 20, 30, 40 most correlated images and all the remaining 55 images (loo). Similar results were obtained for right Hippocampi. It could be noticed that it is not possible to find a significant difference in performances between the active learning and the loo segmentations, however what it is worthwhile to note is that performances with active learning and loo are slightly higher than 109 cross-validation performances, but in this latter case the computational burden is dramatically increased. As a consequence, the active learning approach has to be preferred both for performance and computational considerations. 5.1.6 RUSBoost classification The same analyses were conducted using RUSBoost. As previously described, RUSBoost is a classifier especially designed to take into account problems arising from high imbalanced datasets. The obtained results for both left and right Hippocampi are statistically comparable. The fact that both the classifiers have the same performances is important because it confirms how one of the most severe difficulties in segmenting the Hippocampus is the high imbalance between the classes signal/noise, as described in Chap. 4. In the following Table 8 the results for the left hemisphere are shown to compare with those obtained with Random Forests (RF) presented in Table 7 . Training Images Mean Median Standard Deviation 10 076 0.78 0.01 20 0.79 0.80 0.01 30 0.79 0.81 0.01 40 0.80 0.82 0.01 50 0.81 0.83 0.03 Table 8: For each cross-validation iteration the Dice index distribution is calculated and then mean, median and standard deviation are averaged. The table shows these mean values and it clearly shows how performances increase with the number of training images, however at the cost of an increased spread in the distribution. Even if results from both RF and RUSBoost are comparable, RF perform slightly better in terms of computational burden and are decisively easier to tune in training phase. Another important aspect is that by comparing the Random Forests and RUSBoost segmentations while the first have a somehow balancing in misclassifcations, i. e. there is almost the same number of false positive and false negative predictions, the second tends to have a preferential misclassification with false negatives thus underestimating hippocampal volumes. According to this the RF predictions are preferred and a further analysis of the RF segmentation error is presented. On the contrary a proof of 110 the misleading behavior of RUSBoost will become evident when dealing with clinical predictions in next sections. 5.1.7 The segmentation error It is well known that a large fraction of the errors produced by automated segmentation algorithms are systematic, i. e., occur consistently from subject to subject. The main reason for this is that according to the segmentation protocol used to train the algorithms themselves can in principle be different from those adopted for manual labellings. This explains, for example why it is often found that performances declared by tool developers are different from those found by final users. The absence of recognized "gold standard" for hippocampal segmentation is the goal of several international initiatives, as already reported, however until a common reference is not accepted, no exhaustive treatment of the systematic errors among the different protocols can be reported. On the other way, it is somehow interesting to investigate the internal error of the algorithm proposed. First of all, the similarity index used to evaluate the performances is already an error metric. In fact, the similarity index can be interpreted as the ratio between the examples correctly classified and the sum of the same value with the mean error. The mean error is the sum divided by two of the false positives (examples misclassified as signal) and the false negatives (examples misclassified as background). As a consequence, the higher the similarity index, the lesser the number of misclassified examples. For a qualitative consideration, it can be seen how errors are spatially distributed. To this aim the number and the position of the misclassified voxels was registered. In Fig. 34 as an example the misclassified voxels of an axial left hippocampal slice are shown. The Fig. 34 can be interpreted according to a number of different perspectives. Firstly, by comparing the hippocampal shape without gray level scaling with the corresponding scaled image it can be appreciate how, even after registration, Hippocampi are misaligned. This does not depend on the quality of the registration, that indeed being fully automated cannot be as good as a manual registration could be, but on the contrary it depends on the high variability of the anatomical structure. As a consequence, when looking at the scaled image it can be seen that the outer voxels are present in a small subsample of the training, in fact when scaling gray levels on the frequency, the hippocampal shape is dramatically reduced. 111 Left hippocampal average Left hippocampal average (gray) Misclassified voxels Average and Error superimposition Fig. 34: The left hippocampal average shape is shown both without (above left) and with (above right) scaled gray levels. From the lower figures it emerges that misclassification is uniformly distributed on the hippocampal contour. Another important aspect can be deduced from the lower part of Fig. 34. Misclassified voxels are uniformly distributed over the hippocampal contour. It is important to investigate whether misclassification can be imputed preferentially to false positives or false negatives. If this was the case, a systematic error would emerge. Further insight at this purpose can be given by examining separately the distributions of false positives and false negatives as in Fig. 35. These results are as expected as desired. Hippocampal inner voxels, regarding a more uniform region to be segmented represent an easier task for the classification and this is why almost no classification error occur in the hippocampal core. The result according to which misclassification regards the peri-hippocampal regions is a further acknowledgment that the classifier has robust performances. With this aim, let us remark that, even for human raters, it is expected that misclassification will occur on boundaries and according to this perspective an ideal classifier should reproduce the same error rate a human rater could have. On the other hand, the fact that the error distribution follows exactly the hippocampal contour is another clue of the registration quality. In fact, misaligned hippocampal shapes would have led to a more spatially spread error distribution. 112 Left false positives (FP) FP and average superimposition Left false negatives (FN) FN and average superimposition Fig. 35: The figure makes evident that no discrepancies can be found when comparing the spatial distributions of false positives (FP) and false negative (FN). One of the main error causes in the proposed segmentation workflow arises from the thresholding process the classification output must undergo to obtain a labeling to compare with the manual one. The classification output is, in fact, a continuous score. In this work the optimal threshold value was studied in cross-validation segmentations. The threshold optimal values was then used to reproduce the results obtained on a second dataset which will be introduced in section 5.2. The optimal value was found by thresholding the classification scores with t = 0.1, 0.2, ..., 0.9. Then the mean value and the standard deviation of the misclassified voxles for image was calculated. In Fig. 36 the results for the left hemisphere are presented. For the right hemisphere an analogous result was obtained. In this way an optimal value t∗ = 0.5 was found, however a more important conclusion is that segmentations are stable with respect to the threshold value and the plateau is exactly where it should be expected, i. e. in a neighborhood of 0.5, meaning that the classification is not showing a preferential behavior in labeling signal or background. In fact, if the optimal threshold value would have been found for higher or lower threshold values it could have been deduced a certain difficulty of the classifier in separating the classes. 113 # Misclassified voxels 3000 2500 2000 1500 1000 500 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Segmentation threshold Fig. 36: The figure represents the average error and its standard deviation as a function of the segmentation threshold. An evident plateau for values near to t = 0.5 is found. A different perspective on the segmentation error was provided when looking at the ROC curves. In that cases, more precisely, it should strictly speaking of classification error. As it will be shown in the following sections an estimate of this error can be given as a function of the area under the curve and in terms of in terms of confidence intervals. What still remains is to assess whether the manual labellings and the automated segmentations provide consistent results. To this aim a statistical measure of agreement was performed. 5.1.8 Statistical agreement assessment In clinical measurements comparison of a new measurement technique with an established one is often needed to see whether they agree sufficiently for the new to replace the old. Such investigations are often analyzed inappropriately, notably by using correlation coefficients. The use of correlation is misleading. In fact, it can be used, as previously described, to measure if two quantities have a functional relation and how this relation is strong. On the other hand correlation is not a measure of agreement. Obviously, a measure which is 114 alway smaller than a second measure could be strongly linearly dependent even if measures are not consistent. An alternative approach, described in [152], is based on graphical techniques and the confidence interval estimation. The measure of the agreement is performed in three steps: a. Graphical examination of the agreement; b. Examination of the differences of the methods against the mean; c. Limits of agreement estimation; d. Precision of the agreement and its significance. It has to be noted how in clinical measurements true values remain unknown, thus indirect methods have to applied to measure the agreement between an established technique and a new method. The first step is readily presented in Fig. 37 where the manual labellings and the automated volumes are respectively plotted as the x-axis and the y-axis. 8000 Automated volumes (mm3) 7000 6000 5000 4000 3000 2000 1000 0 0 1000 2000 3000 4000 5000 6000 7000 8000 3 Manual labellings (mm ) Fig. 37: The figure represents the automated volumes against those obtained through manual segmentation. The straight line of perfect agreement is also represented in blue. 115 The proposed method agrees sufficiently well with the old, however as previously stated graphical examination is not sufficient. When dealing with calibration this assessment consists in measuring how much the new method is able to reproduce the real values and whether it performs better than the established technique; in hippocampal segmentation, however, when comparing the manual labellings and the automated ones, neither provides an unequivocally correct measure of the volume of the Hippocampus. The previous Fig. 37 shows in a qualitative way a certain agreement between the two volumetric measure. This is clearly expected, in any case in fact two methods aiming to measure the same quantity should provide an at least qualitative agreement. To gain further insight let us give a look to the plot of the punctual differences against the mean difference of Fig. 38. Differences in volume (mm3) 1500 1000 500 0 −500 −1000 −1500 0 1000 2000 3000 4000 5000 6000 7000 8000 Mean values (mm3) Fig. 38: The figure represents the volume differences against the mean volumes obtained by manual and automated segmentations. The straight lines in blue represent the mean value of differences and the 95% confidence limits. This representation is critical because it allows to gain an overall comprehension of how the two methods differ and if there are significant differences with respect to the measured values. In particular this is important to see if the statistical variables under examination are heteroscedastic. Heteroscedasticity occurs when variance is no constant, therefore it is possible to extract two or more samples with different variances from the original dataset. In 116 Fig. 38 it is possible to appreciate how variance is almost constant along the x-axis. Moreover, the Fig. 38 shows that only two measure differences exceed the 95% confidence interval, which in the following will be referred as limits of agreement, and therefore measures are consistent. Limits of agreement lupper and llower for a normal population are straightforwardly calculated as: lupper = µ + 2σ (101) llower = µ − 2σ (102) where, in the present case, µ is the mean value of the differences and σ the standard deviation. This estimate is valid under the assumption the population is Gaussian. In fact, this was verified with a significance level α = 0.5% with a χ2 test. The χ2 distribution for ν degrees of freedom was calculated according to its definition: f(x|ν) = x(ν−2)/2 e(−x/2) 2ν/2 Γ (ν/2) (103) where Γ is the Euler function. It is known that this distribution behaves as a Gaussian and therefore it is possible by calculating the experimental statistics to test the null hypothesis of normality. In the case of perfect agreement measures should lie on the difference mean value straight line with the mean value of the differences being null. Heteroscedasticity can cause ordinary least squares estimates to be biased. In this case it would result in an inappropriate inference about the confidence intervals of the methods and therefore in a possibly wrong hypothesis test for significant comparison of the two methods. According to this manual and automated segmentations are homoscedastic and therefore they can be soundly compared. The previous picture also shows how differences for the two methods are statistically not significant, they do not exceed the 95% confidence interval except for two cases. However a more significant representation is given in Fig. 39 with the confidence interval estimation of the mean and of the upper and lower limits of agreement. The confidence intervals previously represented are calculated as usual. For p 2 the mean, the standard error is σ /N, where N is the sample size, while the 117 Differences in volume (mm3) 1500 1000 500 0 −500 −1000 −1500 0 1000 2000 3000 4000 5000 6000 7000 8000 3 Mean values (mm ) Fig. 39: In this figure, as a difference with the previous one, the confidence intervals are represented too. It can now be appreciate that only one measure is significantly different and that an an overall range of variability of about 1000 voxels is found, corresponding to a relative 15% variability. p standard error for the upper and lower limits of agreement is 3σ2 /N. According to this the 95% confidence interval for the upper limit is [897, 1328]; the 95% confidence interval for the lower limit is instead [−1221, −790]. Therefore, by considering that the mean hippocampal value is of about 6400 mm3 , the manual labellings and the automated segmentations agree within a 15% of the measured quantity. The analysis confirms the reliability of the automated segmentation, thus it is possible to approach the clinical issue of the AD discrimination. 118 5.2 alzheimer’s disease classification The segmentation workflow has shown promising performances on the training set being able to reproduce accurately manual labellings by expert neuroradiologists. Nevertheless, another important aspect of the analysis is the measure of the informative power of the segmentations and their clinical predictive power. To approach these measure a distinct dataset was used. In this validation phase no manual labellings were available, data consisted of 1824 MRI brain scans acquired at 1.5T from the ADNI repository 1 ; for every subject age, sex and clinical state (control (CTRL), mild cognitive impairment (MCI) and Alzheimer’s disease (AD)) were available. The goal of this validation phase is to measure how much the implemented hippocampal segmentation is able to capture a clinical information. In particular volumetric measures were investigated and their predictive power as AD biomarker was measured. As a first step, 456 MRI scans were downloaded and processed with the proposed segmentation process. For every subjects 4 acquisitions were available: • screening scans representing the time t0 for acquisition; • repeat scans acquired with a very short time delay with respect of the screening; • month 12 scans acquired after a year from screening; • month 24 scans acquired after two years from screening. According to the previous results, segmentations were performed using only the 10 most correlated images. In this way computational times were decreased without a significant loss of accuracy, as previously pointed out. Follow-up images are a valuable support to check whether clinical prediction yielded from screening-repeat analyses are robust; screening and repeat images are useful to determine the method uncertainty, as it will be shown later in this section. In fact, screening and repeat images are acquired with a time delay that is not significant if considering biological or physiological variations, as a consequence a comparison between the segmentation volumes of screening and repeat scans can be used to estimate the method uncertainty. First of all the discriminative power of right and left hippocampal volumes is separately investigated. The Fig. 40 shows the segmentation volume boxplots for both the right and the left Hippocampi. Right volumes appear slightly higher than the left ones. However no significant difference can be found among the two distributions nor a difference 1 https://ida.loni.usc.edu/ 119 Right hippocampal volumes 5000 Volumes (mm3) 4500 4000 3500 3000 2500 2000 1500 1000 CTRL MCI AD (a) Left hippocampal volumes 4500 Volumes (mm3) 4000 3500 3000 2500 2000 1500 1000 CTRL MCI AD (b) Fig. 40: The figure shows both right (a) and left (b) hippocampal volumes, with CTRL/MCI/AD class discrimination. in terms of separation of the three classes CTRL, MCI and AD. The obtained segmentations allow to calculate for each subject the right and the left hippocampal volumes; those are used for an intra-subject and an inter-subject comparison. Similar results were obtained for all the acquisition: screening, repeat, month 12 and month 24 scans. 120 For every acquisition the discriminative power was measured using the receiver operating characteristic (ROC) and their area under the curve (AUC). As an example the ROC curves for CTRL and AD in the right and left hemispheres are shown for screening in Fig. 41. Right Hippocampus ROC True positive rate (sensitivity) 1 0.9 AUC = 0.82 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False positive rate (1 − specificity) (a) Left Hippocampus ROC True positive rate (sensitivity) 1 0.9 AUC = 0.84 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False positive rate (1 − specificity) (b) Fig. 41: The ROC curves for right (a) and left (b) Hippocampi. The reported AUC is the measure of the discrimination between CTRL and AD subjects obtained with the screening scans. The previous figure shows how performances increase on the left hemisphere. As already discussed in Chap. 2 this differences is not surprising. 121 For an overview of the different results with respect to the acquisition time, i. e. screening, repeat, month 12 and month 24, the Table 9 is presented. Acquisition Right Hippocampus Right Hippocampus AUC CTRL/AD AUC CTRL/MCI Screening 0.82 ± 0.02 0.74 ± 0.03 Repeat 0.81 ± 0.03 0.72 ± 0.03 Month 12 0.86 ± 0.02 0.75 ± 0.03 Month 24 0.85 ± 0.02 0.75 ± 0.03 Acquisition Left Hippocampus Left Hippocampus AUC CTRL/AD AUC CTRL/MCI Screening 0.85 ± 0.02 0.74 ± 0.03 Repeat 0.84 ± 0.02 0.74 ± 0.03 Month 12 0.88 ± 0.02 0.76 ± 0.03 Month 24 0.89 ± 0.02 0.78 ± 0.03 Table 9: In this table the volume comparison for both right and left Hippocampi is shown, in the left column the RF results while in the right one the RUSBoost ones . It is worthwhile to note how CTRL and AD classes are well separated, on the contrary CTRL and MCI cannot be well separated as the previous ones. By the way, let us stress that this behavior is expected being MCI a wide class including subjects who will never develop the AD pathology in their lives. As previously discussed, several studies demonstrated the hemispheric asymmetry of hippocampal volumes, besides recent studies pointed out the possibility of systematic error arising from magnetic resonance volumetry. According to this a comparison of the segmentations obtained by Random Forests and RUSBoost is performed, even in terms of segmentation accuracy the two classifiers obtained similar performances this is not true in terms of AUC. This would suggest that RF classifier is not only able to perform accurate segmentations, but also that as a difference with RUSBoost its segmentations are able to capture clinical information with more detail. Firstly, for each classification method and for each acquisition the AUC is given with the standard error (SE) calculated according to [149]. These results show how the hippocampal volume can be used as a classification index for the AD diagnosis. However the result cannot be exhaustive if ignoring age and sex informations which do affect the hippocampal volumes. 122 According to this the analyses were repeated including sex and age in multilinear regression model. The model was built by considering the overall age distribution; the reference time (measured in years) t0 was set to be the minimum age of the distribution. For an age t then a scaling factor k was established in 30 mm3 for year, so that if the hippocampal volume at the time t is V(t) then the effective volume to be considered Veff is: Veff = V(t) + (t − t0 ) ∗ k where k was set according to the values established in several studies [150, 151]. The results obtained with Random Forests thanks to this model are summarized in Table 10. Besides, to take into account differences sexual differences, segmentations are divided by sex. It is well known, in fact, that hippocampal volumes differ according to the sex of the examined subjects. Males Acquisition Right Hippocampus Right Hippocampus AUC CTRL/AD AUC CTRL/MCI Screening 0.83 ± 0.02 0.75 ± 0.03 Repeat 0.84 ± 0.03 0.74 ± 0.03 Month 12 0.85 ± 0.02 0.76 ± 0.03 Month 24 0.87 ± 0.02 0.76 ± 0.03 Left Hippocampus Left Hippocampus AUC CTRL/AD AUC CTRL/MCI Screening 0.86 ± 0.02 0.75 ± 0.03 Repeat 0.87 ± 0.02 0.77 ± 0.02 Month 12 0.91 ± 0.02 0.79 ± 0.02 Month 24 0.90 ± 0.02 0.79 ± 0.02 Females Acquisition Right Hippocampus Right Hippocampus AUC CTRL/AD AUC CTRL/MCI Screening 0.87 ± 0.02 0.79 ± 0.02 Repeat 0.83 ± 0.02 0.74 ± 0.03 Month 12 0.90 ± 0.02 0.78 ± 0.02 123 Month 24 0.89 ± 0.02 0.80 ± 0.02 Acquisition Left Hippocampus Left Hippocampus AUC CTRL/AD AUC CTRL/MCI Screening 0.88 ± 0.02 0.76 ± 0.02 Repeat 0.86 ± 0.02 0.75 ± 0.02 Month 12 0.90 ± 0.02 0.80 ± 0.02 Month 24 0.90 ± 0.02 0.80 ± 0.02 Table 10: In this table the volume comparison for both right and left Hippocampi, as before, is considered by separating male and female subjects. Moreover, the effect of aging is considered by means of a linear de-trending. Looking at Table 9 and Table 10 a significant improvement in classifications is obtained especially for the CTRL/AD classes. At this purpose an important consideration has to be stressed. The AUC is not able for its nature to capture clinical differences arising from age and sex informations, in fact it is only based on the segmented volume. As a consequence, it is important to learn another model to describe these results, for the sake of simplicity a regression approach was chosen. This is obtained by considering for every acquisition time the least square linear model describing CTRL hippocampal volumes. Moreover, an accurate model can be defined only if an estimate of the uncertainties for its points is known, on the contrary a robust fit could not be provided. According to this a detailed analysis of the stability of the results was performed. 5.2.1 Stability of the results Another important aspect of the analysis consists in the study of the stability of the results. In particular it is of paramount important that The stability of the segmentation workflow can be measured both analyzing the relationships within the right and left labellings or evaluating the presence of shape incongruence. The first relationships were measured adopting the correlation between the labellings as a parameter of the stability of the segmentation itself. The second was measured in the SPHARM framework by investigating the presence of statistically significant differences among the acquisition and compared to the different acquisitions. Firstly the correlation study is examined. The figures Fig. 42 and Fig. 43 show how different acquisition volumes are compared, both for right and left Hippocampi. 124 0.79 0.79 0.76 0.87 0.88 3000 2000 1000 0 4000 0.79 2000 Month 12 0 4000 Month 24 Repeat Screening Correlation Matrix Right Hippocampus 4000 4000 0.79 0.87 0.76 0.88 0.92 2000 0 0.92 2000 0 0 1000 2000 3000 4000 0 Screening 2000 4000 0 Repeat 2000 4000 0 2000 4000 Month 24 Month 12 Fig. 42: The figures represent the correlation among the volumes obtained for the 4 acquisitions for the right Hippocampi. Screening Correlation Matrix Left Hippocampus 0.90 0.84 0.86 0.83 2000 0 0.90 Repeat 0.88 4000 4000 2000 Month 24 Month 12 0 0.88 0.86 0.84 0.83 0.91 4000 2000 0 0.91 4000 2000 0 0 2000 4000 Screening 0 2000 4000 Repeat 0 2000 4000 Month 12 0 2000 4000 Month 24 Fig. 43: The figures represent the correlation among the volumes obtained for the 4 acquisitions for the left Hippocampi. In general linear correlation is confirmed as expected; left Hippocampi have slightly better behavior, however comparable with left segmentations. The left Hippocampi show in general better correlations, even if no statistically significant differences can be found compared to right correlations. However, it is already known that no left-right invariance can be assumed when dealing with anatomical structures, and therefore it is quietly expected 125 (a) (b) Fig. 44: The colormap (p-values higher than 0.05 correspond to hot colors) figure allows to represent significant differences in the left hippocampal regions (a). They are especially significant in correspondence of the hippocampal head and its digitations. It is worthwhile to note how statistical differences are more evident in the left segmentations than in the right ones (b). to found small differences. Nevertheless a time dependent decreasing correlation is found as expected. A further analysis was performed on the statistical shape model deriving from the automated segmentations. The statistical model was built within the SPHARM framework, 80 meshes were identified to calculate local variations in shape, the choice was driven from the concurring needs to have a local information about the segmentations and at the same time to deal with hippocampal regions which could have significance from the anatomical point of view. According to the methodology described in Chap. 4 segmentations were first topologically fixed, registered with a FOE procedure and parametrized, then finally labels were compared and a local p-value map, with the usual significance value p = 0.05 was built. In the Fig. 44 the results are shown for CTRL/AD comparison in the right and left hemispheres. The results show a significantly greater presence of variability between the CTRL and the AD classes. It would seem therefore that left Hippocampi should be preferred to investigate sub-regional discriminative behaviors. In fact further developments could analyze if the inter-class separation spreads when taking into account localized hippocampal sub-regions. According to the analysis of stability performed, it emerges how segmentation is reproducible, on the contrary no linear correlations would hold between different acquisition time. In fact, a more detailed study of the correlation between screening and repeat scans has to be emphasized because it is looking at these acquisitions that it is possible to infer eventual variabilities of the method 126 itself. It is in fact clear that being screening and repeat scans acquired on a time scale where no biological variation can occur only source of variability in this case is due to the segmentation itself. With the goal of estimating the method uncertainty then a further more detailed comparison of screening and repeat scans was performed. 5.2.2 The method uncertainty In order to build a robust regression model for hippocampal volumes the residual distribution was first investigated. For each subject the hippocampal volumes obtained from the screening and the repeat scan were compared. The difference was calculated and the mean and the standard deviation of the sample determined. In Fig. 45 the distribution and the estimated density are represented. Fig. 45: The figure shows how the 456 differences between screening and repeat segmentation volumes have mainly null values. In particular the standard deviation of this distribution can be used to determine a conservative value for the method uncertainty. The difference distribution is clearly symmetric and has a mean value which cannot be significantly distinguished by zero. In fact, the mean value µ = −1.4 mm3 and the standard deviation σ = 84.3 mm3 for the differences between 127 screening and repeat volumes show how stable the method is. Besides, the fact that differences are on average compatible with zero is a nice proof that the method has no significant bias, however a more detailed analysis about this aspect will be discussed in the next section. Besides, only the left hippocampal volume is considered, in fact previous results showed that no significant different information arises from the examination of the right hemisphere. The linear model of CTRL volumes is used to determine the confidence intervals. Given β̂ the expected slope and α̂ it is possible to find the values of expected volumes V̂ and therefore the error on the parameters of the model β̂ and α̂. The error of the slope for N = 456 volumes is estimated by: PN (Vi −V̂i ) (N−2) i=1 = qP N (104) i=1 (xi ) − x̄ and then it is possible to obtain the 95% confidence interval for the slope by equation (104): β95% = [β̂ − tN−2 × , β̂ + tN−2 × ] (105) where tN−2 denotes the usual t-student distribution value for N − 2 degrees of freedom. According to this another fundamental result can be given in terms of confidence intervals. In this way it is also possible to furnish an estimate of the significance of the prediction. For example, if the significance value for the risk of assigning to a CTRL subject the AD pathology is desired to be small, let us say below the 5%, then the β95% confidence interval previously calculated should be used. The result shown in Fig. 46 allows to appreciate the physiological atrophy rate, in terms of the negative slope of CTRL population. As measures in several studies the physiological atrophy rate is about 30 mm3 which is comparable with the estimated negative slope m = −33.4 ± 3.2 mm3 . As a comparison, the same described procedure was applied to month 24 acquisition. The results are shown in the bottom row of Fig. 47. How it was already pointed out from the ROC analyses, the classification performances improve when comparing screening acquisitions with follow-ups. The main reason is that neurodegenerative pathological conditions yield more severe atrophic conditions over time. Besides, anatomical differences due to 128 6000 CTRL 95 conf int AD volumes 5000 Volume (mm3) 4000 3000 2000 1000 0 55 60 65 70 75 Age (years) 80 85 90 Fig. 46: The CTRL population is used to estimate a linear model with its 95% confidence interval, represented in red. The AD population is represented through the segmented volume and the related uncertainty in blue. The figure shows how nicely AD are separated from CTRL subjects. 6000 CTRL 95 conf int AD volumes 5000 Volume (mm3) 4000 3000 2000 1000 0 55 60 65 70 75 Age (years) 80 85 90 Fig. 47: As in the previous figure a linear model for CTRL population and its 95% confidence interval is shown in red. In this case represented volumes are obtained from 24 month follow up. It is evident how pathological atrophy rate makes AD subject hippocampal volumes easier to be distinguished after 2 years. sex and age, even physiological, result in misleading values of volumes. The Fig. 47 shows how for the left hemisphere in 24 months AD had a significant impact on the hippocampal volumes. From a more quantitative point of view, not only AUC indicators, as seen previously, manifest an increasing discriminative power. Another measure of 129 the classification performance improvement can be given in absolute terms considering the previous Fig. 46. According to this figure, it is possible to observe that for screening scans subjects reporting correct classification represent the 91% of their populations. Performances improve when considering the month 24 scans. In this case the AD subjects are correctly classified for the 94% of the sample. The effect of considering a model linear in the subject year improves the classification performances because it takes into account physiological atrophy rate. In this case the number of corrected classification rises. A further improvement ca be acquired when considering separately the male and the female population as shown in Fig. 48 6000 6000 CTRL 95 conf int AD volumes 5000 5000 4000 4000 3 Volume (mm ) Volume (mm3) CTRL 95 conf int AD volumes 3000 3000 2000 2000 1000 1000 0 55 60 65 70 75 Age (years) (a) 80 85 90 0 65 70 75 80 Age (years) 85 90 (b) Fig. 48: The AD male hippocampal volumes in (a) and female ones in (b) are shown in blue and compared respectively with male and female CTRL population. As expected from other studies, female atrophy rate is clearly different (and less severe) than the male one. As a consequence separating the two populations helps the discrimination between CTRL and AD. In fact, for screening acquisition male subjects correctly labeled as AD subjects are the 92% while female subjects are the 93%. Atrophy rate changes if considering male or female subjects, by the way the result is confirmed in literature being proved the female subjects to have a less severe atrophy rate resulting in a minor slope of the regression model. Consistent results were obtained with RUSBoost too. These results enclose a convincing picture, hippocampal volume is confirmed as a supportive feature for the AD diagnosis and the automated segmentation is able to capture this clinical information. 130 5.3 distributed infrastructure exploitation The proposed segmentation workflow requires an average processing time of about 160 minutes on our local workstation (AMD Phenom 9950 Quad-Core Processor, 2 GB RAM) to segment both right and left Hippocampi. Thus, the whole cross-validation dataset, consisting of 56 MRI scans, would require a CPU time of about 7 days to be segmented. The longitudinal analysis of the second dataset, consisting of 1824 MRI brain scans for 4 different acquisitions, would require a CPU time of about 2 years and 2 months. The implemented smart system of job queuing allows to use the local BC2 S computer farm and the geographically distributed grid with efficient use of time. In particular the segmentation workflow modules requiring small resources to be computed were submitted to the local farm (for example the correlation analyses) while the module requiring more resources were submitted to the grid. We used the LONI pipeline Command Line Interface (CLI) with the −batch and −parallel options enabled to execute parallel workflows. The LONI client/server application ran on node on a node of the local farm with 2GB RAM. The resources required by the java CLI process scale with the number of workflows to run in parallel. As a consequence, the more is the number of parallel workflows, the more the memory consumed. In Fig. 49 the results collected executing the segmentation of the whole ADNI dataset are shown. It can be noticed how no segmentation exceeded the amount of time of 500 minutes, which was then the time requested to segment the whole dataset. 60 # performed segmentations 50 40 30 20 10 0 0 66 133 200 267 run−time (min) 333 400 467 Fig. 49: The figure shows the overall run-time distribution using the combination of the local farm and the grid infrastructure. 131 The performance benefit can be better appreciated in Fig. 50 where the number of segmentations to execute is plotted against time. 500 450 Remaining Segmentations 400 350 300 250 200 150 100 50 0 0 1 2 3 4 5 6 7 Overall segmentation time (hours) 8 9 10 Fig. 50: In this figure the time sequence of the executed segmentation is represented. The results show how in fact the 95% of the segmentations were executed in less than 7 hours. The grid "inertia" due to job submission and match making operations performed by the WMS translates into an initial latency that can be further dampened by increasing the number of parallel workflow executions. After this initial step the advantages of the grid execution are evident since we obtained the 90% of the segmentations after less than 7 hours. The job submission tool (JST) is able to exploit the paradigm of the pilot jobs and the capabilities of late binding that assure a good performance in scheduling all the jobs submitted to the grid infrastructure. Moreover, the multiple jobs pulled by the pilot on the worker node can make use of the same run-time environment: this helps performance by eliminating the need for duplicated installation of the run-time software and by exploiting cached data. The JST framework allows also to collect detailed monitoring information about each submitted job. This is very useful for further analyses and statistical studies. For example, the effective time spent by each job on the selected worker-node to execute its work (input data transfer and processing, output generation and transfer to the target storage element) can be measured. Besides, It is possible to generate and analyze the run-time distributions for each module of the workflow. The most interesting result, obtained for the Random Forest training module, is shown in Fig. 51. Unlike the other distributions (characterized by a single peak and a few outliers), this distribution shows clearly a bimodal behavior, the two peaks are spaced of about 50 minutes. 132 60 50 # jobs 40 30 20 10 0 0 10 20 30 40 run−time (sec) 50 60 70 80 Fig. 51: In this figure RF execution times are represented. It is evident a two peak distribution. The RF training module is indeed the most data-intensive since it needs also to download and read the features files of the ten most correlated images: therefore, the performance penalty is maximum when the grid copy fails since the job waits for the copy timeout (200 seconds for each input file) before switching to other protocols (e.g. http). Moreover, an important measure for the efficiency of the proposed method is given by the overall run-time distribution shown in Fig. 52. 120 100 # of segmentations 80 60 40 20 0 50 100 150 200 250 300 350 400 run−time (min) Fig. 52: Distribution of the workflow execution times derived from job monitoring information. It is worthwhile to note that grid submission overhead does not involve a strong degradation of the performances. From the comparison between workflow total execution times of Fig. 52 and the overall run-time distribution of Fig. 49 it can be underlined how the distribution peak of the latter (i.e. the average run-time) and its width are greater of a factor 2 and 3.5 respectively: 133 this is a remarkable result taking into account the high number of jobs and the failure rate reduced to zero thanks to the automatic job re-submission implemented by the JST. In fact, this comparison shows that grid submission times do not have a severe impact in terms of computational times and therefore grid is an exploitable resource for distributed segmentations. 134 6 CONCLUSIONS This thesis explored the hippocampal segmentation task from different perspectives. On one hand in fact the segmentation accuracy of the proposed analysis framework was discussed, besides an investigation of the clinical information contained within the hippocampal volume, seen as a supportive feature of the AD diagnosis and an analysis of the computational burden invoked to analyze the data and eventually allow large clinical trials were performed 6.1 motivations and summary In the last few years, the development of signal processing techniques has enabled the visualization and measurement of pathological brain changes in vivo, producing a radical change, not only in the field of scientific research, but also in everyday clinical practice. As discussed in 2 among the several fields, neuroimaging techniques are become of paramount important in the early diagnosis of a number of pathologies. The current need for new biological signatures in the clinical field match the "big data" era whose maturity is confirmed by several aspects. The goal to achieve sound computer aided diagnosis is particularly urgent for those scientific fields where the data analysis is seen as a time-consuming burden more than a resource. This is, in fact, true for neuroimaging. An example of this is the Alzheimer’s disease. One of the signatures of this pathology is in fact the atrophy of the medial temporal lobe and in particular a relevant role is played by the measure of the hippocampal volume, a brain structure extensively described in 2. Brain MR scans are a huge source of information. They provide detailed insight when other imaging techniques just cannot as discussed in 3. The ad- 135 vent of new technologies, especially of high and ultra-high field scanners has reduced the signal to noise ration, however MR scans examination is a timeconsuming activity. Moreover inter-rater variability sometimes makes visual inspection of this kind of data an art more than a quantitative and objective task. Intra-rater variability is for the example the main cause for the lack of a confirmed consensus over the hippocampal segmentation. Besides, to the inter-rater variability the protocols defined to segment the Hippocampus yield another variability source, so that there is no certainty about the "true" hippocampal shape nor about the best method to manually achieve that. In this scenario the need for automated segmentation methods arose. They can in fact tackle both the main issues of manual labeling. The main source of variability, i. e. the inter-rater one is ridden over when adopting a deterministic segmentation framework, nevertheless variability arising from differences in training and test procedure can always be dealt with statistical analysis. Moreover, the presence of a unique segmentation protocol would eliminate the systematic errors the use of different protocol necessary involve. It has been widely proven that different segmentation protocols yield obviously different segmentations, the main drawback however is that the lack of a gold standard prevent the possibility to measure this bias. An estimate of the relative discrepancies among the different methods can be useful to understand where or whether methods differ, this is why several world initiatives are facing the challenge to develop a fully automated segmentation protocol. According to this a fully automated segmentation workflow is presented in 4. It is trained and tested in a cross-validation framework on MRI scans whose Hippocampi were manually labeled. Moreover, it is well known how hippocampal volume is a supportive feature for AD diagnosis. To validate the workflow another set of MRI scans from ADNI database was used, as a difference with the training set this database lacks of manual labellings, but on the contrary in this case clinical labels and information are provided, so that for example it is known age, sex and clinical status of the examined subjects. Finally a feasibility study is performed. A non secondary aspect of segmentation workflows is, in fact, the computational burden they require. This is important not only with the aim of allowing large clinical trials, but also in a smart health-care perspective which could greatly benefit of automated tools able to perform real-time or quasi real-time analyses. The results of this work are presented in 5. First of all segmentation performances are evaluated from different perspectives, nevertheless the main result in this case is given in terms of the similarity index, a popular error metric for medical imaging. The performances acquired reach in fact state of the art algorithms. Besides, an evaluation of the clinical information is performed. Even in 136 this case results are promising. The metrics adopted to compare hippocampal volumes obtained through the presented segmentation workflow with other state of the art techniques show a good agreement. Finally, computational performances were presented, it was shown how even a thousand MR scans can be segmented in a fully automated framework in reasonable times. 6.2 segmentation This work proposes an automated tool for the segmentation of the Hippocampus, a structure of great importance in numerous brain diseases. This is an innovative approach based on the use of discriminating features and on their classification using a RF classifier in a VOI delimited with the novel method FAPoD. This method, based on shape evaluation only, was able to reflect the database heterogeneity within 2σl variation. Future further improvement could be achieved combining shape-based and intensity based information, perhaps using morphing methods before FAPoD. A number of studies using atlas-based approaches reported similarity index coefficients in the range 0.75-0.88 [153, 154, 155, 156, 157, 158]. Values as high as 0.86 for healthy subjects [159] and 0.88 for mixed-cohorts [160] have been achieved with graph-cuts while a similarity coefficient of 0.85 is reported using embedded learning [161]. Automated region-growing methods with automatic detection and correction of atlas mismatch obtained similarity indexes of 0.87 ± 0.03 for healthy subjects and 0.84 ± 0.05 for a mixed-cohort (healthy controls and patients with temporal lobe epilepsy) [162]. Other studies [163] using FMASH obtained average Dice coefficients of 0.82 ± 0.01 and 0.80 ± 0.01 respectively on two mixed cohorts databases. By far, the most promising results in the literature have been acquired through patch-based multi-atlas segmentations [89, 88, 83]. To the best of my knowledge, this is the first application to hippocampal segmentation of a RF classifier combined with expert priors on shape. The number of features required for a robust classification was much reduced compared with other work [164], which did however show comparable performance (similarity index = 0.85). The average similarity index of 0.84 ± 0.04 obtained in this study is in keeping with existing results for mixed cohorts, i.e. composed of healthy controls and diseased subjects. It is worthwhile to note that the study was conducted on 1.0 T images, which suffer from a lower signal-to-noise ratio compared with high-field datasets. Nevertheless, the acquired performances compare well with those published with high-field data and homogeneous cohorts. In addition it is important 137 to keep in mind that the segmentation results are also influenced by segmentation protocol used for the manual labeling [90], as a consequence segmentation performances should always be compared with peculiar attention. On the contrary, a more significant result from a scientific point of view concerns the confirmation of the hippocampal volume as a supportive feature of AD. This aspect is particularly emphasized in the next section. Differences in image quality, manual segmentation protocol, clinical status and demographics have been described as possible causes of discrepancy in results [160]. It should also be noted that the currently available protocols for manual segmentation include information that is not entered in the training of automated algorithms, generating additional source of variability. The inclusion or exclusion of white matter, the use of arbitrary lines, the exclusion of parts of the hippocampal tail or of vestigial hippocampal matter [165, 166, 167] lead to non-systematic differences in hippocampal segmentations across subjects, since these portions have different size based on individual morphology. A critical advantage of the use of machine learning algorithms is the possibility of using very large training datasets, shared by the scientific community. This is exemplified by the efforts of the EADC-ADNI working group to develop a standard harmonized protocol for manual segmentation ([168, 169], www.hippocampal-protocol.net). Moreover, the proposed method has shown nice performances in terms of stability and segmentation uncertainty. It has been proven how the method agrees with manual labellings by expert neuroradiologists and no statistical significant bias can be outlined within a 95% confidence interval, a necessary feature for a robust and accurate clinical classification. Finally an analysis of the stability of the segmentation workflow was performed. According to correlation measures over the segmented volumes, it was shown how a fundamentally linear relationship existed between the volumes acquired at different times and this was confirmed for both the left and the right Hippocampi. This is the best assurance the method is stable, because it implies that there is a linear correspondence among the volume segmented for the same subject in different times, therefore the segmentation indeed is unveiling a substantial information. Correlations are higher for the left Hippocampi r 0.90 to be compared with r 0.80 for the right ones. However, this can be explained with the same argumentations concerning the higher variability of the right hemisphere. 138 6.3 clinical classification The segmentation workflow has shown promising performances on the training set being able to reproduce accurately manual labellings by expert neuroradiologists. Nevertheless, another important aspect of the analysis is the measure of the informative power of the segmentations and their clinical predictive power. The segmentation performances deal with the capability of the analysis to learn from manually labeled examples and to to reproduce the rater expertise. Once segmentations are acquired, a clinical evaluation of the results has to be made. This evaluation is clinical in the sense that its goal is to retrieve with a statistical data analysis the clinical label of the subject and then to discriminate controls from AD patients. This evaluation was performed in terms of ROC curves and AUC measures. Results have demonstrated how the method is able to capture the clinical information; in fact, it is possible to discriminate CTRl from AD with AUC measures > 0.82 for right Hippocampi and > 0.84 for left Hippocampi. Another important aspect is the effect of physiological and non physiological atrophy. According to this it was verified how performances depend on scan time acquisition, as a matter of fact the previously cited results when dealing with scans acquired 24 months later give overall AUC measures > 0.85, even if this result has obviously not the same relevance of discrimination acquired with respect of the baseline. An interesting aspect of this analysis was the acknowledgment of the asymmetric behavior of right and left Hippocampi. It was confirmed, for example, that due to the higher variability of right Hippocampi, performances regarding the right hemisphere were slightly worst than those concerning the left one. Besides, another shape analysis was performed in the SPHARM framework to analyze inter-class differences. This analysis showed that, according to the previous results, the left Hippocampus was more useful to detect differences among CTRL and AD, this can be expected because the higher physiological variability of the right hemisphere could prevent the observation of anomalous behavior, except for extraordinary situations. Stability allows to furnish an estimate of the method uncertainty, in fact by comparing screening and repeat segmented volumes, whose difference should always be considered ad a consequence of the only method variability. The analysis of the residual distribution obtained by comparison of screening and repeat scans is clearly symmetric and has a mean value which cannot be significantly distinguished by zero. In fact, the mean value µ = −1.4 mm3 and the standard deviation σ = 84.3 mm3 show how stable the method is. Besides, the 139 fact that differences are on average compatible with zero is a nice proof that the method has no significant bias. Thanks to the uncertainty estimation a robust regression model depending on hippocampal volumes, age and sex was built. The analysis confirmed that male and female subjects have different behaviors, deriving from significant anatomical differences. It was found, according to this, that discrimination power increases when separating the male and the female populations. These performances can be further improved if age information is kept. In fact stateof-the-art AUC values of 0.92% for male and 0.93% for female subjects are acquired. To achieve these results the adoption of a multilinear model taking into account volumes, age and sex was fundamental. These results enclose a convincing picture, hippocampal volume is confirmed as a supportive feature for the AD diagnosis and the automated segmentation is able to capture this clinical information. 6.4 computation In the case of medical images the creation of large databases is very time consuming and methodologically challenging [170, 171, 172, 164]. The active learning in the classification step is therefore useful for detecting reduced training datasets without a significant loss in performances. The approach could play a substantial role in the use of distributed computing infrastructures by reducing the training set size and, therefore, overcoming upload/download problems and reducing the training phase time. In this work a study for grid deployment was performed with the aim of creating automated segmentation algorithms for large screening trials. Several tests were carried out both on the local computer farm BC2 S and on the EGI. BC2 S is a distributed computing infrastructure consisting of about 5000 CPU and allowing up to 1.8 PB storage, while the EGI consists of about 300 geographically distributed sites around the world. In particular all the results presented in this study were obtained on the BC2 S using 56 MRIs at our disposal and 1824 ADNI images. In addition, a feasibility study on all these images was performed with success on the EGI. The method proposed tackles two different problems the distributed infrastructures have to deal with: it does not strongly suffer from an overhead performance deterioration and at the same time it has a job failure rate reduced to zero. In particular the method presented allows to use the LP with Torque and at the same time to use grids. With the use of a workflow manager the end user can both run already available workflows, modify them before execution and 140 build completely new analysis pipelines. Future works should aim to develop a simple web interface to allow users to exploit an already available workflow changing only the configuration parameters and the input files. In this way it would be possible for the users to execute some analysis without the need of specific expertise in grid management. This web interface would also provide the needed support for strong authentication mechanism. Failure management is a challenging problem in production grids, with the consequence that jobs need a continuous monitoring. Moreover, when data transfers are huge, overall times grow exponentially with job failures. Therefore, data transfer still remain the most limiting factor to grid effectiveness. The adoption of active learning is, in this sense, crucial. This is of course an ad hoc strategy which is difficult to generalize, a further improvement of this method should be concerned with the calculation of the smallest data amount to transfer. LIST OF FIGURES Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 The figure shows a T1 high resolution brain template from the International Consortium for Brain Mapping. The primary goal of the ICBM project is the development of a probabilistic reference system for the human brain. . The figure shows the intraventricular aspect of the Hippocampus. 1, hippocampal body; 2, hippocampal head and its digitations; 3, hippocampal tail; 4, fimbria; 5, crus of the fornix; 6, subiculum; 7, splenium of the corpus callosum; 8, calcar avis; 9, collateral trigone; 10, collateral eminence; 11, uncal recess of the temporal horn. Image scale of 1 cm is represented in the right lower corner. . . The figure shows a T1 sagittal view of the ICBM 152 template; the Hippocampus and the adjacent amygdala boundaries were manually pointed out. . . . . . . . . . . ~ and A symmetrical spinning top with angular velocity Ω mass m precessing in a constant gravitational field and in correspondence the precession of the angular momentum ~J. The angular momentum increment d~J involves a counterclockwise precession. . . . . . . . . . . . . . . . . . Clockwise precession of a spin around the magnetic field ~ direction. . . . . . . . . . . . . . . . . . . . . . . . . . . . B The figure shows the contrast as a function of the relaxation time for gray matter and white matter. . . . . . . . This figure shows the contrast as a function of the relaxation time for gray matter and cerebrospinal fluid; it is important how contrast changes according to the examined tissues. . . . . . . . . . . . . . . . . . . . . . . . . . . The figure shows the contrast as a function of the echo time for gray matter and white matter. . . . . . . . . . . . This figure shows the contrast as a function of the echo time for gray matter and cerebrospinal fluid. . . . . . . . The figure shows a comparison among (in clockwise order): a Proton density, a T1 -weighted and a T2 -weighted brain scan from ICBM. . . . . . . . . . . . . . . . . . . . . 27 28 29 37 38 51 51 52 53 53 143 Figure 11 Figure 12 Figure 13 Figure 14 Figure 15 Figure 16 Figure 17 Figure 18 Figure 19 Figure 20 Figure 21 Flow chart of the segmentation method, according to the following steps: 1) volume of interest extraction, 2) determination of voxel features and 3) voxel classification. The learning phase is represented in detail in the classification box, while in red input and output data. . . . . . The figure shows a mesh representation compared with its projection onto a unit sphere; the colors are just a figurative representation for different sub-hippocampal regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of hippocampal shapes reconstructed with a different number of coefficients cm l : 1 coefficient, 5 coefficients, 10 coefficients and 15 coefficients. As expected, the more the number of coefficients, the more the details the model is able to capture. . . . . . . . . . . Comparison of hippocampal shapes with thee original no fixed (left) and fixed (right) topology. In the left figure, some isolated voxels are clearly visible. . . . . . . . . The figure represents two different hippocampal labellings aligned along the principal dimension. It has to be noted that this procedures yields a rigid registration which does not modify the mask dimensions. . . . . . . . . . . A watch from the standard MPEG database. . . . . . . . An example of the noisy process over the template watch. The figure shows a visual comparison between an image representing a disk and its counterpart obtained by randomization of scaling, translations and rotations. . . . A mathematical pseudo-landmark is defined as the contour voxel whose distance from the chord subtended by two consecutive landmark is maximum. . . . . . . . . . . The figure shows an example of how the probabilistic values are associated to a landmark and its neighbors. . . The Random Forest algorithm. A training set with N examples of d features is sampled k times with bootstrap. Each bootstrapped training set Ti is internally divided in a training and a test set with a 2:1 ratio. Then a random sample of f features is performed, those features are used to split the test set in two leaves. The procedure is iterated until the leaf size reaches a desired size, in classification this is usually set to one. . . . . . . . . . . 57 66 67 70 71 72 73 74 77 77 84 Figure 22 Figure 23 Figure 24 Figure 25 Figure 26 Figure 27 Figure 28 Figure 29 The simplified workflow implementation. In reddish diamonds there are the input/ouput modules, the backend analysis modules are represented by turquoise diamonds. To better emphasize the possibility to dynamically choose the local farm or the grid infrastructures, these backend modules are shown in yellow. . . . . . . . 90 A simplified representation of the hippocampus segmentation algorithm. In this case particular emphasis is given to computational issues: parameters and critical aspects are shown in round boxes, processing steps in rectangles and files in diamonds. The distributed computing is enclosed by a dotted line while the end user interface is shown in reddish diamonds. . . . . . . . . . . . . . . . 94 The segmentation algorithm in its LP implementation. It consists of four main modules: the Adaptive Training module measures the correlation between the image to be segmented and the training set, the Feature Extraction module calculates the features on the testing image according to the volume of interest determined by the previous module, the Random Forest Training module performs the learning phase for the chosen classifier and finally the segmentation is returned by the Random Forest Prediction module. Each module is compiled and therefore able to be run on a distributed computing structure. 95 Each module hides implementation details, in particular the checkinput, insertJob and getStatus modules which allow in a transparent way to control the workflow without the need to be concerned with submitting or monitoring issues. . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Web service call sequence implemented in the LP modules. 97 The figure shows the comparison between the gray level distribution of two randomly chosen right boxes, as obtained after registration preprocessing . . . . . . . . . . . 101 The cumulative image represented in the figure allows to evaluate how noise introduced by registration or statistical fluctuations among the different images affect especially the left tail. . . . . . . . . . . . . . . . . . . . . . . 102 The figure shows the bounding peri-hippocampal region obtained through FAPoD analysis (green) and the labeled mask (white). . . . . . . . . . . . . . . . . . . . . . . 103 Figure 30 The volume reconstructed by FAPoD (in mm3 ) varying the number of images used to retrieve the volume for (a) left and (b) right Hippocampus. . . . . . . . . . . . . . . . 104 Figure 31 The figure shows the correlation coefficients computed for an image Ii of the data set and the remaining images. As it can be shown there are several images moderately correlated with Ii . . . . . . . . . . . . . . . . . . . . . . . . 105 Figure 32 The figure shows the average correlation coefficients computed for both right (case a) and left (case b) Hippocampi 106 Figure 33 The figure shows the segmentation performances for left Hippocampus using the 10, 20, 30, 40 most correlated images and all the remaining 55 images (loo). . . . . . . . . 109 Figure 34 The left hippocampal average shape is shown both without (above left) and with (above right) scaled gray levels. From the lower figures it emerges that misclassification is uniformly distributed on the hippocampal contour. . . 112 Figure 35 The figure makes evident that no discrepancies can be found when comparing the spatial distributions of false positives (FP) and false negative (FN). . . . . . . . . . . . 113 Figure 36 The figure represents the average error and its standard deviation as a function of the segmentation threshold. An evident plateau for values near to t = 0.5 is found. . . 114 Figure 37 The figure represents the automated volumes against those obtained through manual segmentation. The straight line of perfect agreement is also represented in blue. . . . 115 Figure 38 The figure represents the volume differences against the mean volumes obtained by manual and automated segmentations. The straight lines in blue represent the mean value of differences and the 95% confidence limits. . . . . 116 Figure 39 In this figure, as a difference with the previous one, the confidence intervals are represented too. It can now be appreciate that only one measure is significantly different and that an an overall range of variability of about 1000 voxels is found, corresponding to a relative 15% variability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Figure 40 The figure shows both right (a) and left (b) hippocampal volumes, with CTRL/MCI/AD class discrimination. . . . 120 Figure 41 The ROC curves for right (a) and left (b) Hippocampi. The reported AUC is the measure of the discrimination between CTRL and AD subjects obtained with the screening scans. . . . . . . . . . . . . . . . . . . . . . . . . 121 Figure 42 The figures represent the correlation among the volumes obtained for the 4 acquisitions for the right Hippocampi. 125 Figure 43 The figures represent the correlation among the volumes obtained for the 4 acquisitions for the left Hippocampi. In general linear correlation is confirmed as expected; left Hippocampi have slightly better behavior, however comparable with left segmentations. . . . . . . . . . . . . 125 Figure 44 The colormap (p-values higher than 0.05 correspond to hot colors) figure allows to represent significant differences in the left hippocampal regions (a). They are especially significant in correspondence of the hippocampal head and its digitations. It is worthwhile to note how statistical differences are more evident in the left segmentations than in the right ones (b). . . . . . . . . . . . . 126 Figure 45 The figure shows how the 456 differences between screening and repeat segmentation volumes have mainly null values. In particular the standard deviation of this distribution can be used to determine a conservative value for the method uncertainty. . . . . . . . . . . . . . . . . . 127 Figure 46 The CTRL population is used to estimate a linear model with its 95% confidence interval, represented in red. The AD population is represented through the segmented volume and the related uncertainty in blue. The figure shows how nicely AD are separated from CTRL subjects. 129 Figure 47 As in the previous figure a linear model for CTRL population and its 95% confidence interval is shown in red. In this case represented volumes are obtained from 24 month follow up. It is evident how pathological atrophy rate makes AD subject hippocampal volumes easier to be distinguished after 2 years. . . . . . . . . . . . . . . . . 129 Figure 48 Figure 49 Figure 50 Figure 51 Figure 52 The AD male hippocampal volumes in (a) and female ones in (b) are shown in blue and compared respectively with male and female CTRL population. As expected from other studies, female atrophy rate is clearly different (and less severe) than the male one. As a consequence separating the two populations helps the discrimination between CTRL and AD. . . . . . . . . . . . . The figure shows the overall run-time distribution using the combination of the local farm and the grid infrastructure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . In this figure the time sequence of the executed segmentation is represented. The results show how in fact the 95% of the segmentations were executed in less than 7 hours. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . In this figure RF execution times are represented. It is evident a two peak distribution. . . . . . . . . . . . . . . . Distribution of the workflow execution times derived from job monitoring information. . . . . . . . . . . . . . . 130 131 132 133 133 L I S T O F TA B L E S Table 1 Table 2 Table 3 Table 4 A schematic view of the different types of dementia and of their principal symptom patterns. . . . . . . . . . . . . 15 The revised lexicon of AD. Particular attention is given to recent advances in use of reliable biomarkers of AD for definition of early stages of the disease as the Prodromal AD or ambiguous situations as for the MCI. . . . 21 The relaxation times T1 and T2 for different types of human tissues. . . . . . . . . . . . . . . . . . . . . . . . . . . 43 A schematic overview of the analyses presented in this work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Table 5 A schematic summary of the registration volumes. The volume represent the total average hippocampal volume qhile the core is the volume of the inner Hippocampus, namely the core accounting for the 60% of the total volume.100 Table 6 A schematic view of the different feature importance analyses performed. . . . . . . . . . . . . . . . . . . . . . . 108 Table 7 For each cross-validation iteration the Dice index distribution is calculated and then mean, median and standard deviation are averaged. The table shows these mean values and it clearly shows how performances increase with the number of training images, however at the cost of an increased spread in the distribution. . . . . . . . . . 108 Table 8 For each cross-validation iteration the Dice index distribution is calculated and then mean, median and standard deviation are averaged. The table shows these mean values and it clearly shows how performances increase with the number of training images, however at the cost of an increased spread in the distribution. . . . . . . . . . 110 149 Table 9 Table 10 In this table the volume comparison for both right and left Hippocampi is shown, in the left column the RF results while in the right one the RUSBoost ones . It is worthwhile to note how CTRL and AD classes are well separated, on the contrary CTRL and MCI cannot be well separated as the previous ones. By the way, let us stress that this behavior is expected being MCI a wide class including subjects who will never develop the AD pathology in their lives. . . . . . . . . . . . . . . . . . . . . 122 In this table the volume comparison for both right and left Hippocampi, as before, is considered by separating male and female subjects. Moreover, the effect of aging is considered by means of a linear de-trending. . . . . . . 124 BIBLIOGRAPHY [1] Alzheimer’s Disease International. World Alzheimer Report 2013 Overcoming the stigma of dementia, 2013. URL www.alzheimer.it/ Report2013.pdf. [2] American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders. Washington, DC, American Psychiatric Association, 2000. [3] G. McKhann, D. Drachman, F. Marshall, R. Katzman, D. Price, and E. M. Stadlan. Clinical Diagnosis of Alzheimer’s disease: Report of the NINCDS-ADRDA Work Group under the auspices of Department of Health and Human Services Task Force on Alzheimer’s Disease. Neurology, 34(7):939 – 944, 1984. [4] B. Dubois, H. H. Feldman, C. Jacova, S. T. DeKosky, P. Barberger-Gateau, J. Cummings, A. Delacourte, D. Galasko, S. Gauthier, G. Jicha, K. Meguro, J. O’Brien, F. Pasquier, P. Robert, M. Rossor, S. Salloway, Y. Stern, P. J. Visser, and P. Scheltens. Research criteria for the diagnosis of Alzheimer’s disease: revising the NINCDSÐADRDA criteria. Lancet Neurology, 6(8):734 – 746, 2007. [5] M. F. Folstein, S. E. Folstein, and P. R. McHugh. Mini-Mental State: a practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research, 12(3):189 – 198, 1975. [6] G. Blessed, B. E. Tomlinson, and M. Roth. The association between quantitative measures of dementia and of senile change in the cerebral grey matter of elderly subjects. The British Journal of Psychiatry, 114(512):797 – 811, 1968. [7] C. R. Jack Jr, M. S. Albert, D. S. Knopman, G. M. McKhann, R. A. Sperling, M. C. Carrillo, B. Thies, and C. H. Phelps. Introduction to the recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimer’s & Dementia, 7(3):257 – 262, 2011. 151 [8] D. G. Davis, F. A. Schmitt, D. R. Wekstein, and W. R. Markesbery. Alzheimer neuropathologic alterations in aged cognitively normal subjects. Journal of Neuropathology & Experimental Neurology, 58(4):376 – 388, 1999. [9] J. L. Price and J. C. Morris. Tangles and plaques in nondemented aging and "preclinical" Alzheimer’s disease. Annals of neurology, 45(3):358 – 368, 1999. [10] S. Alladi, J. Xuereb, T. Bak, P. Nestor, J. Knibb, K. Patterson, and J. R. Hodges. Focal cortical presentations of Alzheimer’s disease. Brain, 130 (10):2636 – 2645, 2007. [11] G. D. Rabinovici, W. J. Jagust, A. J. Furst, J. M. Ogar, C. A. Racine, E. C. Mormino, J. P. O’Neil, R. A. Lal, N. F. Dronkers, B. L. Miller, and M. L. Gorno-Tempini. Aβ amyloid and glucose metabolism in three variants of primary progressive aphasia. Annals of neurology, 64(4):388 – 401, 2008. [12] A. Lim, D. Tsuang, W. Kukull, D. Nochlin, J. Leverenz, W. McCormick, J. Bowen, L. Teri, J. Thompson, E. R. Peskind, M. Raskind, and E. B. Larson. Clinico-neuropathological correlation of Alzheimer’s disease in a community-based case series. Journal of the American Geriatrics Society, 47(5):564 – 569, 1999. [13] H. Petrovitch, L. R. White, G. W. Ross, S. C. Steinhorn, C. Li, K. H. Masaki, D. G. Davis, J. Nelson, J. Hardman, J. D. Curb, P. L. Blanchette, L. J. Launer, K. Yano, and M. D. Markesbery. Accuracy of clinical criteria for AD in the Honolulu-Asia Aging Study, a population-based study. Neurology, 57(2):226 – 234, 2001. [14] A. R. Varma, J. S. Snowden, J. J. Lloyd, P. R. Talbot, D. M. A. Mann, and D. Neary. Evaluation of the NINCDS-ADRDA criteria in the differentiation of Alzheimer’s disease and frontotemporal dementia. Journal of Neurology, Neurosurgery & Psychiatry, 66(2):184 – 188, 1999. [15] A. M. Kazee, T. A. Eskin, L. W. Lapham, K. R. Gabriel, K. D. McDaniel, and R. W. Hamill. Clinicopathologic correlates in Alzheimer disease: assessment of clinical and pathologic diagnostic criteria. Alzheimer Disease & Associated Disorders, 7(3):152 – 164, 1993. [16] D. Neary, J. Snowden, and D. Mann. Frontotemporal dementia. The Lancet Neurology, 4(11):771 – 780, 2005. [17] J. R. Hodges, K. Patterson, S. Oxbury, and E. Funnell. Semantic dementia progressive fluent aphasia with temporal lobe atrophy. Brain, 115(6):1783 – 1806, 1992. [18] M. Mesulam. Slowly progressive aphasia without generalized dementia. Annals of neurology, 11(6):592 – 598, 1982. [19] J. J. Rebeiz, E. H. Kolodny, and E. P. Richardson. Corticodentatonigral degeneration with neuronal achromasia. Archives of Neurology, 18(1):20 – 33, 1968. [20] W. R. G. Gibb, P. J. Luthert, and C. D. Marsden. Corticobasal degeneration. Brain, 112(5):1171 – 1192, 1989. [21] D. F. Benson, R. J. Davis, and B. D. Snyder. Posterior cortical atrophy. Archives of Neurology, 45(7):789, 1988. [22] I. G. McKeith, D. Galasko, K. Kosaka, E. K. Perry, D. W. Dickson, L. A. Hansen, D. P. Salmon, J. Lowe, S. S. Mirra, E. J. Byrne, G. Lennox, N. P. Quinn, J. A. Edwardson, P. G. Ince, C. Bergeron, A. Burns, B. L. Miller, S. Lovestone, D. Collerton, E. N. H. Jansen, C. Ballard, R. A. I. de Vos, G. K. Wilcock, K. A. Jellinger, and R. H. Perry. Consensus guidelines for the clinical and pathologic diagnosis of dementia with Lewy bodies (DLB) Report of the consortium on DLB international workshop. Neurology, 47(5):1113 – 1124, 1996. [23] G. C. Román, T. K. Tatemichi, T. Erkinjuntti, J. L. Cummings, J. C. Masdeu, J. H. Garcia, L. Amaducci, J. M. Orgogozo, A. Brun, A. Hofman, A. Hofman, D. M. Moody, M. D. O’Brien, T. Yamaguchi, J. Grafman, B. P. Drayer, D. A. Bennet, M. Fisher, J. Ogata, E. Kokmen, F. Bermejo, P. A. Wolf, P. B. Gorelick, K. L. Bick, A. K. Pajeau, M. A. Bell, C. DeCarli, A. Culebras, A. D. Korczyn, J. Bogousslavsky, A. Hartmann, and P. Scheinberg. Vascular dementia Diagnostic criteria for research studies: Report of the NINDS-AIREN International Workshop*. Neurology, 43(2): 250 – 250, 1993. [24] H. C. Chui, J. I. Victoroff, D. Margolin, W. Jagust, R. Shankle, and R. Katzman. Criteria for the diagnosis of ischemic vascular dementia proposed by the State of California Alzheimer’s Disease Diagnostic and Treatment Centers. Neurology, 42(3):473 – 473, 1992. [25] M. S. Forman, J. Farmer, J. K. Johnson, C. M. Clark, S. E. Arnold, H. Coslett, A. Chatterjee, H. I. Hurtig, J. H. Karlawish, H. J. Rosen, et al. Frontotemporal dementia: clinicopathological correlations. Annals of neurology, 59(6):952 – 962, 2006. [26] K. A. Josephs, J. L. Holton, M. N. Rossor, A. K. Godbolt, T. Ozawa, K. Strand, N. Khan, S. Al-Sarraj, and T. Revesz. Frontotemporal lobar degeneration and ubiquitin immunohistochemistry. Neuropathology and applied neurobiology, 30(4):369 – 373, 2004. [27] J. C. Morris, M. Storandt, J. P. Miller, D. W. McKeel, J. L. Price, E. H. Rubin, and L. Berg. Mild cognitive impairment represents early-stage Alzheimer disease. Archives of neurology, 58(3):397, 2001. [28] O. L. Lopez, J. T. Becker, W. Klunk, J. Saxton, D. I. Hamilton, R. L.and Kaufer, R. A. Sweet, C. C. Meltzer, S. Wisniewski, M. I. Kamboh, and S. T. DeKosky. Research evaluation and diagnosis of probable Alzheimer’s disease over the last two decades: I. Neurology, 55(12):1854 – 1862, 2000. [29] H. Braak and E. Braak. Neuropathological stageing of Alzheimer-related changes. Acta neuropathologica, 82(4):239 – 259, 1991. [30] A. Delacourte, J. P. David, N. Sergeant, L. Buee, A. Wattez, P. Vermersch, F. Ghozali, C. Fallet-Bianco, F. Pasquier, F. Lebert, et al. The biochemical pathway of neurofibrillary degeneration in aging and alzheimer’s disease. Neurology, 52(6):1158 – 1158, 1999. [31] B. Dubois, H. H. Feldman, C. Jacova, J. L. Cummings, S. T. DeKosky, P. Barberger-Gateau, A. Delacourte, G. B. Frisoni, N. C. Fox, D. Galasko, S. Gauthier, H. Hampel, G. A. Jicha, K. Meguro, J. O’Brien, F. Pasquier, P. Robert, M. Rossor, S. Salloway, M. Sarazin, L. C. de Souza, Y. Stern, P. J. Visser, and P. Scheltens. Revising the definition of Alzheimer’s disease: a new lexicon. The Lancet Neurology, 9(11):1118 – 1127, 2010. [32] B. Dubois and M. L. Albert. Amnestic MCI or prodromal Alzheimer’s disease? Lancet Neurology, 3(4):246 – 248, 2004. [33] T. H. Crook and S. H. Ferris. Age associated memory impairment. BMJ: British Medical Journal, 304(6828):714, 1992. [34] R. Levy. Aging-associated cognitive decline. International Psychogeriatrics, 6(1):63 – 68, 1994. [35] World Health Organization. ICD-10: International statistical classification of diseases and related health problems. World Health Organization, 2004. [36] C. Flicker, S. H. Ferris, and B. Reisberg. Mild cognitive impairment in the elderly predictors of dementia. Neurology, 41(7):1006 – 1006, 1991. [37] S. Larrieu, L. Letenneur, J. M. Orgogozo, C. Fabrigoule, H. Amieva, N. Le Carret, P. Barberger-Gateau, and J. F. Dartigues. Incidence and outcome of mild cognitive impairment in a population-based prospective cohort. Neurology, 59(10):1594 – 1599, 2002. [38] V. Jelic, M. Kivipelto, and B. Winblad. Clinical trials in mild cognitive impairment: lessons for the future. Journal of Neurology, Neurosurgery & Psychiatry, 77(4):429 – 438, 2006. [39] M. Grundman, R. C. Petersen, D. A. Bennett, H. H. Feldman, S. Salloway, P. J. Visser, L. J. Thal, D. Schenk, Z. Khachaturian, and W. Thies. Alzheimer’s Association Research Roundtable Meeting on Mild Cognitive Impairment: What have we learned? Alzheimer’s & Dementia, 2(3):220 – 233, 2006. [40] R. C. Petersen, R. G. Thomas, M. Grundman, D. Bennett, R. Doody, S. Ferris, D. Galasko, S. Jin, J. Kaye, A. Levey, E. Pfeiffer, M. Sano, C. H. van Dyck, and L. J. Thal. Vitamin E and donepezil for the treatment of mild cognitive impairment. New England Journal of Medicine, 352(23):2379 – 2388, 2005. [41] L. J. Thal, S. H. Ferris, L. Kirby, G. A. Block, C. R. Lines, E. Yuen, C. Assaid, M. L. Nessly, B. A. Norman, C. C. Baranak, and S. A. Reines. A randomized, double-blind, study of rofecoxib in patients with mild cognitive impairment. Neuropsychopharmacology, 30(6):1204 – 1215, 2005. [42] P. J. Visser, P. Scheltens, and F. R. J. Verhey. Do MCI criteria in drug trials accurately identify subjects with predementia Alzheimer’s disease? Journal of Neurology, Neurosurgery & Psychiatry, 76(10):1348 – 1354, 2005. [43] G. A. Jicha, J. E. Parisi, D. W. Dickson, K. Johnson, R. Cha, R. J. Ivnik, E. G. Tangalos, B. F. Boeve, D. S. Knopman, H. Braak, and R. C. Petersen. Neuropathologic outcome of mild cognitive impairment following progression to clinical dementia. Archives of neurology, 63(5):674 – 681, 2006. [44] C. R. Jack, D. W. Dickson, J. E. Parisi, Y. C. Xu, R. H. Cha, P. C. O’Brien, S. D. Edland, G. E. Smith, B. F. Boeve, E. G. Tangalos, E. Kokmen, and R. C. Petersen. Antemortem MRI findings correlate with hippocampal neuropathology in typical aging and dementia. Neurology, 58(5):750 – 757, 2002. [45] D. H. S. Silverman, S. S. Gambhir, H. W. C. Huang, J. Schwimmer, S. Kim, G. W. Small, J. Chodosh, J. Czernin, and M. E. Phelps. Evaluating early dementia with and without assessment of regional cerebral metabolism by PET: a comparison of predicted costs and benefits. Journal of Nuclear Medicine, 43(2):253 – 266, 2002. [46] F. Hulstaert, K. Blennow, A. Ivanoiu, H. C. Schoonderwaldt, M. Riemenschneider, P. P. De Deyn, C. Bancher, P. Cras, J. Wiltfang, P. D. Mehta, K. Iqbal, H. Pottel, E. Vanmechelen, and H. Vanderstichele. Improved discrimination of AD patients using β-amyloid (1-42) and tau levels in CSF. Neurology, 52(8):1555 – 1555, 1999. [47] P. J. Visser, P. Scheltens, E. Pelgrim, and F. R. J. Verhey. Medial temporal lobe atrophy and APOE genotype do not predict cognitive improvement upon treatment with rivastigmine in Alzheimer’s disease patients. Dementia and geriatric cognitive disorders, 19(2-3):126 – 133, 2005. [48] M. J. De Leon, A. E. George, J. Golomb, C. Tarshish, A. Convit, A. Kluger, S. De Santi, T. Mc Rae, S. H. Ferris, B. Reisberg, C. Ince, H. Rusinek, M. Bobinski, B. Quinn, D. C. Miller, and H. M. Wisniewski. Frequency of hippocampal formation atrophy in normal aging and Alzheimer’s disease. Neurobiology of aging, 18(1):1 – 11, 1997. [49] K. M. Gosche, J. A. Mortimer, C. D. Smith, W. R. Markesbery, and D. A. Snowdon. Hippocampal volume as an index of Alzheimer neuropathology Findings from the Nun Study. Neurology, 58(10):1476 – 1482, 2002. [50] C. Bottino, C. C. Castro, R. L. E. Gomes, C. A. Buchpiguel, R. L. Marchetti, and M. R. L. Neto. Volumetric MRI measurements can differentiate Alzheimer’s disease, mild cognitive impairment, and normal aging. International Psychogeriatrics, 14(01):59 – 72, 2002. [51] M. P. Laakso, H. Soininen, K. Partanen, M. Lehtovirta, M. Hallikainen, T. Hänninen, E. L. Helkala, P. Vainio, and P. J. Riekkinen Sr. MRI of the hippocampus in Alzheimer’s disease: sensitivity, specificity, and analysis of the incorrectly classified subjects. Neurobiology of aging, 19(1):23 – 31, 1998. [52] P. Scheltens, N. Fox, F. Barkhof, and C. De Carli. Structural magnetic resonance imaging in the practical assessment of dementia: beyond exclusion. The Lancet Neurology, 1(1):13 – 21, 2002. [53] A. Chincarini, P. Bosco, G. Gemme, S. Morbelli, D. Arnaldi, F. Sensi, I. Solano, N. Amoroso, S. Tangaro, R. Longo, S . Squarcia, and F. Nobili. Alzheimer’s disease markers from structural MRI and FDG-PET brain images. EPJ Plus 127, (11):135, 2012. [54] J. G. Csernansky, L. Wang, J. Swank, J. P. Miller, M. Gado, D. McKeel, M. I. Miller, and J. C. Morris. Preclinical detection of Alzheimer’s disease: hippocampal shape and volume predict dementia onset in the elderly. Neuroimage, 25(3):783 – 792, 2005. [55] L. G. Apostolova, R. A. Dutton, I. D. Dinov, K. M. Hayashi, A. W. Toga, J. L. Cummings, and P. M. Thompson. Conversion of mild cognitive impairment to Alzheimer disease predicted by hippocampal atrophy maps. Archives of Neurology, 63(5):693 – 699, 2006. [56] E. S. C. Korf, L. O. Wahlund, P. J. Visser, and P. Scheltens. Medial temporal lobe atrophy on MRI predicts dementia in patients with mild cognitive impairment. Neurology, 63(1):94 – 100, 2004. [57] C. R. Jack, R. C. Petersen, Y. C. Xu, P. C. O’Brien, G. E. Smith, R. J. Ivnik, B. F. Boeve, S. C. Waring, E. G. Tangalos, and E. Kokmen. Prediction of AD with MRI-based hippocampal volume in mild cognitive impairment. Neurology, 52(7):1397 – 1397, 1999. [58] P. J. Visser, P. Scheltens, F. R. J. Verhey, B. Schmand, L. J. Launer, J. Jolles, and C. Jonker. Medial temporal lobe atrophy and memory dysfunction as predictors for dementia in subjects with mild cognitive impairment. Journal of Neurology, 246(6):477 – 485, 1999. [59] P. J. Visser, F. R. J. Verhey, P. A. M. Hofman, P. Scheltens, and J. Jolles. Medial temporal lobe atrophy predicts Alzheimer’s disease in patients with minor cognitive impairment. Journal of Neurology, Neurosurgery & Psychiatry, 72(4):491 – 497, 2002. [60] O. Hansson, H. Zetterberg, P. Buchhave, E. Londos, K. Blennow, and L. Minthon. Association between CSF biomarkers and incipient Alzheimer’s disease in patients with mild cognitive impairment: a follow-up study. The Lancet Neurology, 5(3):228 – 234, 2006. [61] A. M. Fagan, M. A. Mintun, R. H. Mach, S. Y. Lee, C. S. Dence, A. R. Shah, G. N. Larossa, M. L. Spinner, W. E. Klunk, C. A. Mathis, et al. Inverse relation between in vivo amyloid imaging load and cerebrospinal fluid Aβ42 in humans. Annals of neurology, 59(3):512 – 519, 2006. [62] M. A. Mintun, G. N. Larossa, Y. I. Sheline, C. S. Dence, S. Y. Lee, R. H. Mach, C. A. Klunk, W. E.and Mathis, S. T. DeKosky, and J. C. Morris. [11C] PIB in a nondemented population Potential antecedent marker of Alzheimer disease. Neurology, 67(3):446 – 452, 2006. [63] J. C. Price, W. E. Klunk, B. J. Lopresti, X. Lu, S. K. Hoge, J. A .and Ziolko, D. P. Holt, C. C. Meltzer, S. T. DeKosky, and C. A. Mathis. Kinetic modeling of amyloid binding in humans using PET imaging and Pittsburgh Compound-B. Journal of Cerebral Blood Flow & Metabolism, 25(11):1528 – 1547, 2005. [64] T. Bartsch. The Clinical Neurobiology of the Hippocampus: An integrative view. OUP Oxford, 2012. [65] F. T. Lewis. The significance of the term hippocampus. The Journal of Comparative Neurology, 35(3):213 – 230, 1923. [66] J. C. Pruessner, L. M. Li, W. Serles, M. Pruessner, D. L. Collins, N. Kabani, S. Lupien, and A. C. Evans. Volumetry of Hippocampus and Amygdala with High-resolution MRI and Three-dimensional Analysis Software: Minimizing the Discrepancies between Laboratories. Cerebral Cortex, (10):433 – 442, 2000. [67] D. Shen, S. Moffat, S. M. Resnick, and C. Davatzikos. Measuring Size and Shape of the Hippocampus in MR Images Using a Deformable Shape Model. NeuroImage, 15(2):422 – 434, 2002. [68] O. Pedraza, D. Bowers, and R. Gilmore. Asymmetry of the hippocampus and amygdala in mri volumetric measurements of normal adults. Journal of the International Neuropsychological Society, 10(5):664 – 678, 2004. [69] L. Bergouignan, M. Chupin, Y. Czechowska, S. Kinkingnéhun, C. Lemogne, G. Le Bastard, M. Lepage, L. Garnero, O. Colliot, and P. Fossati. Can voxel based morphometry, manual segmentation and automated segmentation equally detect hippocampal volume differences in acute depression? Neuroimage, 45(1):29 – 37, 2009. [70] H. Wolf, M. Grunwald, F. Kruggel, S. G. Riedel-Heller, S. Angerhöfer, A. Hojjatoleslami, A. Hensel, T. Arendt, and H. J. Gertz. Hippocampal volume discriminates between normal cognition; questionable and mild dementia in the elderly. Neurobiology of aging, 22(2):177 – 186, 2001. [71] F. Shi, B. Liu, Y. Zhou, C. Yu, and T. Jiang. Hippocampal volume and asymmetry in mild cognitive impairment and alzheimer’s disease: Metaanalyses of mri studies. Hippocampus, 19(11):1055 – 1064, 2009. [72] A. A. Woolard and S. Heckers. Anatomical and functional correlates of human hippocampal volume asymmetry. Psychiatry Research: Neuroimaging, 201(1):48 – 53, 2012. [73] B. P. Rogers, J. M. Sheffield, A. S. Luksik, and S. Heckers. Systematic error in hippocampal volume asymmetry measurement is minimal with a manual segmentation protocol. Frontiers in neuroscience, 6:179, 2012. [74] W. Gerlach and O. Stern. Der experimentelle nachweis der richtungsquantelung im magnetfeld. Zeitschrift für Physik A Hadrons and Nuclei, 9(1):349 – 352, 1922. [75] I. I. Rabi, J. R. Zacharias, S. Millman, and P. Kusch. A New Method of Measuring Nuclear Magnetic Moment. Phys. Rev., 53(4):318 – 318, 1938. [76] F. Bloch. Nuclear induction. Physical Review, 70(7-8):460 – 474, 1946. [77] E. M. Purcell, H. C. Torrey, and R. V. Pound. Resonance Absorption by Nuclear Magnetic Moments in a Solid. Physical Review, 69(1-2):37 – 38, 1946. [78] E.M. Haacke, R.W. Brown, M.R. Thompson, and R. Venkatesan. Magnetic Resonance Imaging: Physical Principles and Sequence Design. Wiley, 1999. [79] M. Chupin, A. Hammers, E. Bardinet, O. Colliot, R. S. N. Liu, J. S. Duncan, L. Garnero, and L. Lemieux. Fully automatic segmentation of the hippocampus and the amygdala from mri using hybrid prior knowledge. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2007, pages 875 – 882. Springer, 2007. [80] P. Aljabar, R. A. Heckemann, A. Hammers, J. V. Hajnal, and D. Rueckert. Multi-atlas based segmentation of brain images: atlas selection and its effect on accuracy. Neuroimage, 46(3):726 – 738, 2009. [81] H. Wang, J. W. Suh, S. Das, J. Pluta, M. Altinay, and P. Yushkevich. Regression-based label fusion for multi-atlas segmentation. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, 2011. [82] F. van der Lijn, T. den Heijer, M. Breteler, and W. J. Niessen. Hippocampus segmentation in mr images using atlas registration, voxel classification, and graph cuts. Neuroimage, 43(4):708 – 720, 2008. [83] K. Kwak, U. Yoon, D. Lee, G. H. Kim, S. W. Seo, D. L. Na, H. Shim, and J. Lee. Fully-automated approach to hippocampus segmentation using a graph-cuts algorithm combined with atlas-based segmentation and morphological opening. Magnetic resonance imaging, 31(7):1090 – 1096, 2013. [84] D. L. Collins and J. C. Pruessner. Towards accurate, automatic segmentation of the hippocampus and amygdala from mri by augmenting animal with a template library and label fusion. NeuroImage, 52(4):1355 – 1366, 2010. [85] J. H. Morra, Z. Tu, L. G. Apostolova, A. E. Green, C. Avedissian, S. K. Madsen, N. Parikshak, X. Hua, A. W. Toga, C. R. Jack Jr, M. W. Weiner, and P. M. Thompson. Validation of a fully automated 3d hippocampal segmentation method using subjects with alzheimer’s disease mild cognitive impairment, and elderly controls. Neuroimage, 43(1):59 – 68, 2008. [86] J. H. Morra, Z. Tu, L. G. Apostolova, A. E. Green, A. W. Toga, and P. M. Thompson. Comparison of adaboost and support vector machines for detecting alzheimer’s disease through automated hippocampal segmentation. Medical Imaging, IEEE Transactions on, 29(1):30 – 43, 2010. [87] C. A. Bishop, M. Jenkinson, J. Andersson, J. Declerck, and D. Merhof. Novel fast marching for automated segmentation of the hippocampus (fmash): method and validation on clinical data. NeuroImage, 55(3):1009 – 1019, 2011. [88] P. Coupé, J. V. Manjón, V. Fonov, J. Pruessner, M. Robles, and D. L. Collins. Patch-based segmentation using expert priors: Application to hippocampus and ventricle segmentation. NeuroImage, 54(2):940 – 954, 2011. [89] M. J. Cardoso, K. Leung, M. Modat, S. Keihaninejad, D. Cash, J. Barnes, N. C. Fox, and S. Ourselin. Steps: Similarity and truth estimation for propagated segmentations and its application to hippocampal segmentation and brain parcelation. Medical image analysis, 17(6):671 – 684, 2013. [90] S. M. Nestor, E. Gibson, F.g Gao, A. Kiss, and S. E. Black. A direct morphometric comparison of five labeling protocols for multi-atlas driven automatic segmentation of the hippocampus in alzheimer’s disease. Neuroimage, 66(1):50 – 70, 2012. [91] H. Wang and P. A. Yushkevich. Spatial bias in multi-atlas based segmentation. In Computer Vision and Pattern Recognition (CVPR), Conference on. IEEE, 2012. [92] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham. Active Shape Models-Their Training and Application. Computer Vision and Image Understunding, 61(1):38 – 59, 1995. [93] T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6):681 – 685, 2001. [94] C. Davatzikos, S. M. Resnick, X. Wu, P. Parmpi, and C. M. Clark. Individual patient diagnosis of AD and FTD via high-dimensional pattern classification of MRI. NeuroImage, 41(4):1220 – 1227, 2008. [95] S. M. Pizer, P. T. Fletcher, S. Joshi, A. Thall, J. Z. Chen, Y. Fridman, D. S. Fritsch, A. G. Gash, J. M. Glotzer, M. R. Jiroutek, C. Lu, K. E. Muller, G. Tracton, P. Yushkevich, and E. L. Chaney. Deformable M-Reps for 3D Medical Image Segmentation. International Journal of Computer Vision 55, 55(2):85 – 106, 2003. [96] L. Wang, F. Beg, T. Ratnanather, C. Ceritoglu, L. Younes, J. C. Morris, J. G. Csernansky, and M. I. Miller. Large deformation diffeomorphism and momentum based hippocampal shape discrimination in dementia of the Alzheimer type. IEEE Trans. Med. Imaging 26, 26(4):462 – 470, 2007. [97] S. C. Zhu and A. Yuille. Region Competition: Unifying Snakes, Region Growing, and Bayes/MDL for Multi-band Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 18, 18(6):884 – 900, 1996. [98] Z. Tu, K. L. Narr, P. Dollár, I. Dinov, P. M. Thompson, and A. W. Toga. Brain anatomical structure segmentation by Hyhrid discriminative/generative models. IEEE Trans. Med. Imaging 27, 27(4):495 – 508, 2008. [99] Y. Zhang, M. Brady, and S. Smith. Segmentation of brain MR images through a hidden Markov random field model and the expectationmaximization algorithm. IEEE Trans. Med. Imaging, 20(1):45 – 57, 2001. [100] Z. Song, N. Tustison, B. Avants, and J. C. Gee. Integrated Graph Cuts for Brain MRI Segmentation. Proc. Med Image Comput Comput Assist Interv, 9 (2):831 – 838, 2006. [101] F. Sabattoli, M. Boccardi, S. Galluzzi, A. Treves, P. M. Thompson, and G. B. Frisoni. Hippocampal shape differences in dementia with Lewy bodies. Neuroimage, 41(3):699–705, 2008. [102] J. V. Hajnal and D. L. G. Hill. Medical image registration. CRC, 2010. [103] P. Viola and W. M. Wells III. Alignment by maximization of mutual information. International journal of computer vision, 24(2):137 – 154, 1997. [104] H. Gudbjartsson and S. Patz. The rician distribution of noisy mri data. Magnetic Resonance in Medicine, 34(6):910 – 914, 1995. [105] N. Amoroso, R. Bellotti, S. Bruno, A. Chincarini, G. Logroscino, S. Tangaro, and A. Tateo. Automated Shape Analysis landmark detection for medical image processing. Proceedings of the International Symposium, CompIMAGE, 2012. [106] C. Brechbühler, G. Gerig, and O. Kübler. Parametrization of closed surfaces for 3-d shape description. Computer vision and image understanding, 61(2):154 – 170, 1995. [107] L. Shen and F. Makedon. Spherical mapping for processing of 3d closed surfaces. Image and vision computing, 24(7):743 – 761, 2006. [108] L. Shen, H. A. Firpi, A. J. Saykin, and J. D. West. Parametric surface modeling and registration for comparison of manual and automated segmentation of the hippocampus. Hippocampus, 19(6):588 – 595, 2009. [109] A. Chincarini, P. Bosco, P. Calvini, G. Gemme, M. Esposito, C. Olivieri, L. Rei, S. Squarcia, G. Rodriguez, R. Bellotti, P. Cerello, I. De Mitri, A. Retico, F. Nobili, and The Alzheimer’s Disease Neuroimaging Initiative. Local MRI analysis approach in the diagnosis of early and prodromal Alzheimer’s disease. NeuroImage, 58(2):469 – 480, 2011. [110] P. Calvini, A. Chincarini, G. Gemme, M. A. Penco, S. Squarcia, F. Nobili, G. Rodriguez, R. Bellotti, E. Catanzariti, P. Cerello, I. De Mitri, M. E. Fantacci, MAGIC-5 Collaboration, and The Alzheimer’s Disease Neuroimaging Initiative. Automatic analysis of medial temporal lobe atrophy from structural MRIs for the early assessment of Alzheimer disease. Medical Physics, 36(8):3737 – 3747, 2009. [111] T. F. Cootes, D. Cooper, C. J. Taylor, and J. Graham. A Trainable Method of Parametric Shape Description. Proc. British Machine Vision Conference pub. Spring-Verlag, 10(5):54 – 61, 1991. [112] P. J. Besl and N. D. McKay. Method for registration of 3-d shapes. In Robotics-DL tentative, 1992. [113] A. Rangarajan, H. Chui, and J. S. Duncan. Rigid point feature registration using mutual information. Medical Image Analysis, 3(4):425 – 440, 1999. [114] L. R. Dice. Measures of the amount of ecologic association between species. Ecology, 26(3):297 – 302, 1945. [115] B. Fischl, D. H. Salat, E. Busa, M. Albert, M. Dieterich, C. Haselgrove, A. van der Kouwe, R. Killiany, D. Kennedy, S. Klaveness, A. Montillo, N. Makris, B. Rosen, and A. M. Dale. Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neurotechnique, 33(3):341 – 355, 2002. [116] B. Patenaude, S. M. Smith, D. N. Kennedy, and M. Jenkinson. A bayesian model of shape and appearance for subcortical brain segmentation. Neuroimage, 56(3):907 – 922, 2011. [117] L. Breiman. Random Forests. Machine Learning 45, (1):5 – 32, 2001. [118] C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse, and A. Napolitano. Rusboost: A hybrid approach to alleviating class imbalance. Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, 40(1): 185 – 197, 2010. [119] P. Viola and M. J. Jones. Robust Real-Time Face Detection. International Journal of Computer Vision, 57(2):137 – 154, 2004. [120] R. M. Haralick, K. Shanmugam, and I. Dinstein. Textural Features for Image Classification. IEEE Transactions on Systems, Man and Cybernetics, 3 (6):610 – 621, 1973. [121] R. W. Conners and C. A. Harlow. A Theoretical Comparison of Texture Algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2(3):204 – 222, 1980. [122] R. Bellotti, F. De Carlo, G. Gargano, G. Maggipinto, S. Tangaro, M. Castellano, R. Massafra, D. Cascio, F. Fauci, R. Magro, G. Raso, A. Lauria, G. Forni, S. Bagnasco, P. Cerello, S. C. Cheran, E. Lopez Torres, U. Bottigli, G. L. Masala, P. Oliva, A. Retico, M. E. Fantacci, R. Cataldo, I. De Mitri, and G. De Nunzio. A completely automated CAD system for mass detection in a large mammographic database. Medical Physics, 33(8):3066 – 3075, 2006. [123] S. Tangaro, F. De Carlo, G. Gargano, R. Bellotti, U. Bottigli, G. L. Masala, P. Cerello, S. Cheran, and R. Cataldo. Mass lesion detection in mammographic images using Haralik textural features. Proceedings of the International Symposium CompIMAGE 2006 - Computational Modelling of Objects Represented in Images: Fundamentals, Methods and Applications, 2007. [124] C. M. Bishop. Neural networks for pattern recognition. Oxford university press, 1995. [125] G. M. Weiss. Mining with rarity: a unifying framework. ACM SIGKDD Explorations Newsletter, 6(1):7 – 19, 2004. [126] G. E. Batista, R. C. Prati, and M. C. Monard. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter, 6(1):20 – 29, 2004. [127] C. Drummond, R. C. Holte, et al. C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In Workshop on Learning from Imbalanced Datasets II, 2003. [128] J. Montagnat, F. Bellet, H. Benoit-Cattin, V. Breton, L. Brunie, H. Duque, Y. Legré, I. E. Magnin, L.a Maigne, S. Miguet, J. M. Pierson, L. Seitz, and T. Tweed. Medical images simulation, storage, and processing on the european datagrid testbed. Journal of Grid Computing, 2(4):387 – 400, 2004. [129] R. Bellotti and S. Pascazio. Editorial: Advanced physical methods in brain research. EPJ Plus, 127(11):145, 2012. [130] D. Krefting, M. Vossberg, A. Hoheisel, and T. Tolxdorff. Simplified implementation of medical image processing algorithms into a grid using workflow management system. Future Generation Computer System, 26(4): 681 – 684, 2010. [131] M. W. A. Caan, S. Shahand, F. M. Vos, A. H. C. van Kampen, and S. D. Olabarriaga. Evolution of grid-based services for Diffusion Tensor Image analysis. Future Generation Computer Systems, 28(8), 2012. [132] R. Bellotti, P. Cerello, S. Tangaro, V. Bevilacqua, M. Castellano, G. Mastronardi, F. De Carlo, S. Bagnasco, U. Bottigli, R. Cataldo, E. Catanzariti, S.C. Cheran, P. Delogu, I. De Mitri, G. De Nunzio, M E. Fantacci, F. Fauci, G. Gargano, B. Golosio, P.L. Indovina, A. Lauria, E. Lopez Torres, R. Magro, G L. Masala, R. Massafra, P. Oliva, A. Preite Martinez, M. Quarta, G. Raso, A. Retico, M. Sitta, S. Stumbo, A. Tata, S. Squarcia, A. Schenone, E. Molinari, and B. Canesi. Distributed medical images analysis on a Grid infrastructure. Future Generation Computer Systems, 23(3):475 – 484, 2007. [133] P. Cerello, S. Bagnasco, U. Bottigli, S. C. Cheran, P. Delogu, M. E. Fantacci, F. Fauci, G. Forni, A. Lauria, E. Lopez Torres, R. Magro, G. L. Masala, P. Oliva, R. Palmiero, L. Ramello, G. Raso, A. Retico, M. Sitta, S. Stumbo, S. Tangaro, and E. Zanon. GPCALMA: a grid-based tool for mammographic screening. Methods of Information in Medicine, 44(2):244 – 248, 2005. [134] R. Bellotti, S. Bagnasco, U. Bottigli, M. Castellano, R. Cataldo, E. Catanzariti, P. Cerello, S. C. Cheran, F. De Carlo, P. Delogu, I. De Mitri, G. De Nunzio, M. E. Fantacci, F. Fauci, G. Forni, G. Gargano, B. Golosio, P. L. Indovina, A. Lauria, E. Lopez Torres, R. Magro, D. Martello, G. L. Masala, R. Massafra, P. Oliva, R. Palmiero, A. Preite Martinez, R. Prevete, M. Quarta, L. Ramello, G. Raso, A. Retico, M. Santoro, M. Sitta, S. Stumbo, S. Tangaro, A. Tata, and E. Zanon. The Magic-5 Project: Medical Applications on a Grid Infrastructure Connection . IEEE Nuclear Science Symposium Conference Record, 3:1902 – 1906, 2004. [135] S. Vicario, B. Balech, G. Donvito, P. Notarangelo, and G. Pesole. The biovel project: Robust phylogenetic workflows running on the grid. EMBnet. journal, 18(B):77, 2012. [136] A. Barker and J. Hemert. Scientific Workflow: A Survey and Research Directions. In Parallel Processing and Applied Mathematics, 2008. [137] D. E. Rex, J. Q. Ma, and A. W. Toga. The LONI Pipeline Processing Environment. NeuroImage, 19(3):1033 – 1048, 2003. [138] T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, M. Greenwood, T. Carver, K. Glover, M. R. Pocock, A. Wipat, and P. Li. Taverna: A tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 20:3045 – 3054, 2004. [139] B. Ludäscher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, E A. Lee, J. Tao, and Y. Zhao. Concurrency and Computation: Practice and Experience, 18(10):1039 – 1065, 2006. [140] I. Taylor, M. Shields, and I. Wang. Distributed P2P computing within Triana: a galaxy visualization test case. In Parallel and Distributed Processing Symposium, 2003. Proceedings. International, 2003. [141] T. Glatard, R. S. Soleman, D. J. Veltman, A. J. Nederveen, and S. D. Olabarriaga. Large-scale functional MRI study on a production grid. Future Generation Computer System, 26(4):685 – 692, 2010. [142] I. Dinov, K. Lozev, P. Petrosyan, Z. Liu, P. Eggert, J. Pierce, A. Zamayan, S. Chakrapani, and J. Van Horn. Neuroimaging Study Designs, Computational Analyses and Data Provenance Using the LONI Pipeline. PlosONE, 5(9), 2010. [143] A. J. MacKenzie-Graham, A. Payan, I. Dinov, J. D. Horn, and A. W. Toga. Neuroimaging Data Provenance Using the LONI Pipeline Workflow Environment. In Provenance and Annotation of Data and Processes, 2008. [144] E. Laure, F. Hemmer, and F. Prelz. Middleware for the next generation grid infrastructure. Computing in High Energy Physics and Nuclear Physics, 2004. [145] European Grid Infrastructure (EGI), Accessed Jan 2013. URL http://www. egi.eu. [146] J. Ma, W. Liu, and T. Glatard. A classification of file placement and replication methods on grids. Future Generation Computer Systems, 29(6): 1395–1406, 2013. [147] R. Pordes, D. Petravick, B. Kramer, D. Olson, M. Livny, A. Roy, P. Avery, K. Blackburn, T. Wenaus, F. Würthwein, I. Foster, R. Gardner, M. Wilde, A. Blatecky, J. McGee, and R. Quick. The open-science grid. Journal of Physics: Conference Series, 78(1):012057, 2007. [148] A. Tateo. Distributed analysis for feature selection in medical image processing. submitted, 2013. [149] J. A. Hanley. Characteristic (roc) curvel. Radiology, 743:29 – 36, 1982. [150] C. R. Jack, R. C. Petersen, Y. C. Xu, S. C. Waring, P. C. O’Brien, E. G. Tangalos, G. E. Smith, R. J. Ivnik, and E. Kokmen. Medial temporal atrophy on mri in normal aging and very mild alzheimer’s disease. Neurology, 49 (3):786 – 794, 1997. [151] K. I. Erickson, D. L. Miller, and K. A. Roecklein. The aging hippocampus interactions between exercise, depression, and bdnf. The Neuroscientist, 18(1):82 – 97, 2012. [152] M. J. Bland and D. G. Altman. Statistical methods for assessing agreement between two methods of clinical measurement. The Lancet Neurology, 327(8476):307 – 310, 1986. [153] O. Carmichael, H. Aizenstein, S. Davis, J. Becker, P. Thompson, C. Meltzer, and Y. Liu. Atlas-based Hippocampus Segmentation in Alzheimer’s Disease and Mild Cognitive Impairment. NeuroImage, 27(4): 979 – 990, 2005. [154] R. A. Heckemann, J. V. Hajnal, P. Aljabar, D. Rueckert, and A. Hammers. Automatic anatomical brain MRI segmentation combining label propagation and decision fusion. NeuroImage, 33(1):115 – 126, 2006. [155] P. Aljabar, R. Heckemann, A. Hammers, J. V. Hajnal, and D. Rueckert. Classifier Selection Strategies for Label Fusion Using Large Atlas Databases. 2007. [156] P. Aljabar, R. Heckemann, A. Hammers, J. V. Hajnal, and D. Rueckert. Multi-atlas based segmentation of brain images: Atlas selection and its effect on accuracy. NeuroImage, 46(3):726 – 738, 2009. [157] A. Hammers, R. Heckemann, M. J. Koepp, J. S. Duncan, J. V. Hajnal, D. Rueckert, and P. Aljabar. Automatic detection and quantification of hippocampal atrophy on MRI in temporal lobe epilepsy: a proof-ofprinciple study. NeuroImage, 36(1):38 – 47, 2007. [158] B. B. Avants, P. Yushkevich, J. Pluta, D. Minkoff, M. Korczykowski, J. Detre, and J. C. Gee. The optimal template effect in hippocampus studies of diseased populations. NeuroImage, 49(3):2457 – 2466, 2010. [159] F. van der Lijn, T. den Heijer, M. M. B. Breteler, and W. J. Niessen. Hippocampus segmentation in MR images using atlas registration, voxel classification, and graph cuts. NeuroImage, 43(4):708 – 720, 2008. [160] J. M. P. Lotjonen, R. Wolz, J. R. Koikkalainen, L. Thurfjell, G. Waldemar, H. Soininen, and D. Rueckert. Fast and robust multi-atlas segmentation of brain magnetic resonance images. NeuroImage, 49(3):2352 – 2365, 2010. [161] R. Wolz, P. Aljabar, J. V. Hajnal, A. Hammers, D. Rueckert, and The Alzheimer’s Disease Neuroimaging Initiative. LEAP: learning embeddings for atlas propagation. NeuroImage, 49(2):1316 – 1325, 2010. [162] M. Chupin, E. Gérardin, R. Cuingnet, C. Boutet, L. Lemieux, S. Lehéricy, H. Benali, L. Garnero, O. Colliot, and The Alzheimer’s Disease Neuroimaging Initiative. Fully automatic hippocampus segmentation and classification in Alzheimer’s disease and mild cognitive impairment applied on data from ADNI. Hippocampus, 19(6):579 – 587, 2009. [163] C. A. Bishop, M. Jenkinson, J. Andersson, J. Declerck, and D. Merhof. Novel Fast Marching for Automated Segmentation of the Hippocampus (FMASH): Method and validation on clinical data. NeuroImage, 55(3):1009 – 1019, 2011. [164] J. H. Morra, Z. Tu, L. G. Apostolova, A. E. Green, C. Avedissian, S. K. Madsen, N. Parikshak, X. Hua, A. W. Toga, C. R. Jack Jr, M. W. Weiner, P. M. Thompson, and The Alzheimer’s Disease Neuroimaging Initiative. Validation of a fully automated 3D hippocampal segmentation method using subjects with Alzheimer’s disease mild cognitive impairment, and elderly controls. NeuroImage, 43(1):59 – 68, 2008. [165] E. Geuze, E. Vermetten, and J. D Bremner. MR-based in vivo hippocampal volumetrics: 2. Findings in neuropsychiatric disorders. Molecular Psychiatry, 10(2):147 – 149, 2005. [166] C. Konrad, T. Ukas, C. Nebel, V. Arolt, A. W. Toga, and K. L. Narr. Defining the human hippocampus in cerebral magnetic resonance images an overview of current segmentation protocols. NeuroImage, 47(4):1185 – 1195, 2009. [167] M. Boccardi, R. Ganzola, M. Bocchetta, M. Pievani, A. Redolfi, G. Bartzokis, R. Camicioli, J. Csernansky, M. J. de Leone, L. deToledo Morrell, R. J. Killiany, S. Lehericy, J. Pantel, J. C. Pruessner, H. Soininen, C. Watson, S. Duchesne, C. R. Jack Jr, and G. B. Frisoni. DefiningSurvey of Protocols for the Manual Segmentation of the Hippocampus: Preparatory Steps Towards a Joint EADC-ADNI Harmonized Protocol. Journal of Alzheimer’s Disease, 26(Suppl. 3):61 – 75, 2011. [168] G. B. Frisoni and C. R. Jack. Harmonization of magnetic resonance-based manual hippocampal segmentation: a mandatory step for wide clinical use. Alzheimer’s & Dementia, 7(2):171 – 174, 2011. [169] M. Boccardi, M. Bocchetta, L. Apostolova, J. Barnes, G. Bartzokis, G. Corbetta, C. DeCarli, L. DeToledo-Morrell, M. Firbank, R. Ganzola, L. Gerritsen, W. Henneman, R. Killiany, N. Malykhin, P. Pasqualetti, J. Pruessner, A. Redolfi, N. Robitaille, H. Soininen, D. Tolomeo, L. Wang, H. Watson, S. Duchesne, C. Jack, and G. B. Frisoni. DelphiDelphi Consensus on Landmarks for the Manual Segmentation of the Hippocampus on MRI: Preliminary Results from the EADC-ADNI Harmonized Protocol Working Group. Neurology, 78(Suppl. 1):171 – 174, 2012. [170] S. Tangaro, R. Bellotti, F. De Carlo, G. Gargano, E. Lattanzio, P. Monno, R. Massafra, P. Delogu, M. E. Fantacci, A. Retico, M. Bazzocchi, S. Bagnasco, P. Cerello, S. C. Cheran, E. Lopez Torres, E. Zanon, A. Lauria, A. Sodano, D. Cascio, F. Fauci, R. Magro, G. Raso, R. Ienzi, U. Bottigli, G. L. Masala, P. Oliva, G. Meloni, A. P. Caricato, and R. Cataldo. MAGIC5: An Italian mammographic database of digitised images for research. La Radiologia Medica 113, (4):477 – 485, 2008. [171] A. Lauria, M. E. Fantacci, U. Bottigli, P. Delogu, F. Fauci, B. Golosio, P. L. Indovina, G. L. Masala, P. Oliva, R. Palmiero, G. Raso, S. Stumbo, and S. Tangaro. Diagnostic performance of radiologists with and without different CAD systems for mammography. International Society for Optics and Photonics, 2003. [172] B. van Ginneken, S. G. Armato, B. de Hoop, S. van Amelsvoort-van de Vorst, T. Duindam, M. Niemeijer, K. Murphy, A. Schilham, A. Retico, M. E. Fantacci, N. Camarlinghi, F. Bagagli, I. Gori, T. Hara, H. Fujita, G. Gargano, R. Bellotti, S. Tangaro, L. Bolaos, F. D. Carlo, P. Cerello, S. C. Cheran, E. Lopez Torres, and M. Prokop. Comparing and combining algorithms for computer-aided detection of pulmonary nodules in computed tomography scans: The ANODE09 study. Medical Image Analysis, 14(6):707–722, 2010. AKNOWLEDGMENT [...] I never said thank you. [...] And you’ll never have to. I’ve always thought that for people who really care about you there is no need to say thanks. By the way, this is somehow different, for all of you, wherever you are, let me say it just once. Thanks. 171