Semantic Big Data – from Dingoes to Drysdale

Transcription

Semantic Big Data – from Dingoes to Drysdale
e-Research Lab
School of ITEE, UQ
Jane Hunter <[email protected]>
The University of Queensland
Research Projects

Eco-informatics
◦
◦
◦
◦
◦

Automatic Analysis of Animal Accelerometry Data
OzTrack
Springbrook Wireless Sensor Network Analysis
3D Coal Seam Gas Water Quality Atlas
Automated Online Reef Report Cards
Biomedical-informatics
◦ Skeletome

Digital Humanities
◦ 20th Century Paint – art conservation
◦ 3D Semantic Annotations – museum artefact classification
◦ Post-war Qld Architecture – oral history archive

E-Social Sciences
◦ Indigenous Housing
Semantic Annotation of Animal
Accelerometry Data
 Animal-attached accelerometers
 monitor animal movement and behavior
 Tri-axial data streams




Large volumes of complex data
Lack of visualization, analysis tools and share-ability
Lack of analysis & pattern identification services
Free-ranging wild animal behavior

Lack of ground truth data
…
Endangered Species
Feral Pests
Production livestock
User Driven Requirements
Step 1
Upload
data
Step 2
Activity
recognition
running
walking
resting
walking
running
Step 3 – Analysis and Visualization
Walking, Running,
Resting, Sleeping, Feeding
Animal Health
Energy Consumption
Food/Water Requirements
feeding
walking
Objectives

Web-based semantic annotation and activity
recognition system to enable biologists to





Share tri-axial accelerometer data
Visualize and analyze tri-axial accelerometer data
Share expert knowledge
Help scientists understand the movement and
behavior of animals
Use surrogates/domestic animals (& video) to train
classifier –> automatically tag rare, wild, feral animals
User Interface - Tagging
Screenshot of SAAR Plot-Video interface and the annotation interface
User Interface – Automated Results
Screenshot of the SAAR Interface with human activity identification results
Evaluation

Tested on range of species (different sizes and gaits)
◦
◦
◦
◦
◦
◦
◦
◦
Australian dingo (Canis lupus dingo)
Eurasian badger (Meles meles)
Bengal tiger (Panthera tigris tigris)
African cheetah (Acinonyx jubatus),
American alligator (Alligator mississippiensis)
Hairy-nosed wombat (Lasiorhinus krefftii)
Eastern Grey kangaroo (Macropus giganteus)
Short-beaked echidna (Tachyglossus aculeatus)
Running, Walking, Standing, Sitting, Lying (Sternal
recumbency)
 Test dog classification module on range of species

Results
High scores (80-90%) if SL:SH = (2-3) (Spine length: Spine height)
Benefits
Subscribers login to online service
 Leverage community expertise to tag training sets
 Develop libraries of classifiers for different species
 Apply domestic species classifiers to wild species

Dogs ->dingos, foxes; birds->bats; horses->camels

Classifiers – improve over time as more data uploaded
Socio-economic and health benefits:
◦ livestock productivity – assess health, energy/food needs
◦ reduce spread of feral pests & viruses
◦ management & conservation of threatened species
OzTrack

Overlay of Camel Tracks on Vegetation
Semantic Sensor Networks
Lianli Gao
UQ
Michael Bruenig
CSIRO
125 sensor nodes :
• Air temperature
• Humidity
• Wind Speed
• Leaf Wetness
CSIRO Sensor Network
Semantic Fire Weather Index
Calculate FWI from: wind speed, relative
humidity, temperature
Limitations:
- Widely distributed sensors (tens of km apart)
- Updated once per day
- Urgent need for data with higher spatiotemporal resolution

System Architecture
Combine SPARQL inference rules with an Inverse Distance Weighting
to calculate accurate spatial distributions
Comparison with BoM FWIs
Skeletome

A community-driven knowledge curation
platform for Skeletal Dysplasias
◦
◦
◦
◦

Rare diseases
Affect the development of Human Skeleton
Complex medical issues
Caused by genetic abnormalities
Capture, integrate, correlate and analyse
clinical, radiographic, phenotypic and
genetic data
Verne Troyer
Cartilage-Hair
Hypoplasia
RMRP
Peter Dinklage
Danny Devito
Achondroplasia
Multiple Epiphyseal
Dysplasia (MED)
(MOST COMMON DISORDER)
FGFR3
COL9A2, COL9A3
COMP, MATR3
Challenges

Hundreds of different types
◦ 440 types in 40 groups
Difficult to diagnose, treat
 Few medical publications
 Doctors rely on:

◦ Existing patient data
◦ Expert knowledge
Requirements
Common terminology
 Data Integration
 Data Quality Control
 Knowledge Extraction and Transfer
 Privacy
 Expertise sharing

The Platform
Patient Archive
Knowledge
Bone Dysplasia Ontology
Base
Reasoning
Knowledge Base
of Disorders
Written Abstracts
Linked Genes
ISDS Grouping
X-Rays
Phenotypes
Inline Editing
Knowledge
Base Disorders
Written Abstracts
Linked Genes
ISDS Grouping
X-Rays
Phenotypes
Inline Editing
Knowledge
Base Disorders
Written Abstracts
Linked Genes
ISDS Grouping
X-Rays
Phenotypes
Inline Editing
Knowledge
Base Disorders
Written Abstracts
Linked Genes
ISDS Grouping
X-Rays
Phenotypes
Inline Editing
Knowledge
Base Disorders
Written Abstracts
Linked Genes
ISDS Grouping
X-Rays
Phenotypes
Inline Editing
Knowledge
Base Disorders
Written Abstracts
Linked Genes
ISDS Grouping
X-Rays
Phenotypes
Inline Editing
Knowledge
Base Disorders
Written Abstracts
Linked Genes
ISDS Grouping
X-Rays
Phenotypes
Inline Editing
Patient Sharing
Sharing patients with
multiple doctors
- X-rays
- Clinical Summaries
- Genetic reports
Anonymizes patient
data
Patientst
Discussing a
Patient
Inline commenting
Text posts with
PubMed Integration
Community
Diagnoses
Discussing a
Patient
Inline commenting
Text posts with
PubMed Integration
Community
Diagnoses
Discussing a
Patient
Inline commenting
Text posts with
PubMed Integration
Community
Diagnoses
Discussing a
Patient
Inline commenting
Text posts with
PubMed Integration
Community
Diagnoses
Discussing a
Patient
Inline commenting
Text posts with
PubMed Integration
Community
Diagnoses
Entity Term
Extraction
Phenotype Extraction
Diagnosis Extraction
Entity Term
Extraction
Phenotype Extraction
Diagnosis Extraction
Entity Term
Extraction
Phenotype Extraction
Diagnosis Extraction
Reasoning across Knowledge Base
• Analyze Diagnoses, Phenotypes, Genotypes
- Across Patients and Publications
• Extract/infer new relationships
- Disease <-> phenotypes <-> genotypes
- Capture provenance, certainty, temporality, severity,
polarity
Aboriginal Housing Crisis
Aboriginal communities:
 Inferior housing
 Inferior neighbourhoods
 Low home ownership
 More live in public housing
 Greater overcrowding, homelessness
 Move house more frequently
-> Adverse impact on health, well-being and education
[Dockery A.M., Ong R., Colquhoun S., Li J., Kendall, G. (2013), “Housing and children’s
development and well-being: evidence from Australian data”, AHURI Final Report No
201, March 2013]
Remote
Regional
Metropolitan
Central Desert
Dubbo, Mt Isa
Woodridge, Redfern
Plan
Housing Policies/
Strategies
Implement
Adapt
Housing
Programs/Investments/
Actions
Compare against Targets
- What works?
- What doesn’t?
Regional & cultural factors;
Crowding & homelessness;
Quality of Life Indicators;
Socio-economic Indicators;
Targets
Monitor/measure
- Regional needs analysis
- Housing programs
tailored to local context
Regional/Cultural Factors
Australian Census 2011
Challenges





Inaccurate data
Anonymized data – post-code level of geography
What are the optimum data sources/indicators
for successful housing programs?
For a given region, what are the most significant
factors that need to be considered, to satisfy the
housing needs of the local Indigenous community?
What are the optimum governance structures
that combine
◦ economies of scale
◦ localized approaches informed by Aboriginal
Community Councils?
Data Sources

Quantitative data:
◦
◦
◦
◦

ABS data on Aboriginal health and welfare, population and housing;
AURIN - “Social and Economic Indicators for Indigenous Communities”
IRSEO - Index of Relative Indigenous Socioeconomic Outcomes;
Data from the State/Territory Housing Departments, ICHOs and Community
Councils;
Qualitative data:
◦ LSIC – Longitudinal Study of Indigenous Children;
◦ 2002 and 2008 National Aboriginal and Torres Strait Islander Survey (NATSISS);
◦ HILDA (Household, Income and Labour Dynamics in Australia) Survey;

Publications:
◦ AHURI and FaHCSIA reports;
◦ Australian Policies Online;

Map sources:
◦ past ATSIC boundaries data (wards and regions) (Geosciences Australia);
◦ current FaHCSIA regions for Indigenous Coordination Centres ;
◦ AIATSIS Aboriginal Australia Map.
Indigenous Housing Ontology







Housing Policies,
Housing Programs
Housing Types
Property Management
Tenancies
Regional Demographics
Quality of Life indicators
◦ Quantitative – ABS data
◦ Qualitative Data – surveys/interviews

Targets
◦ Reduce overcrowding by 50% by 2020
◦ Reduce homeless by 50% by 2020
◦ Improve QoL indicators by 30%
Mapping Interface + R/Matlab Services
- Choose datasets, region and start/end times
- Understand the impact of regional, cultural and socio-economic factors on
Aboriginal Housing programs - link housing data to quality-of-life indicators
Commonalities
Unstructured
Data
Ontology
Registries
Data Quality
Machine Learning
Statistical Analysis
Inferencing Rules
Experts
Structured
Data
Marked-up
Data
Scalable RDF Triple Stores/
RDF Graphs
- Curated Knowledge
- Training Corpuses
- Case Studies
- Annotations
KnowledgeBase
Multi-variate
3D/4D
Spatio-temporal
Dynamic
Streaming
Textual
Integrated Data
Diagnosis
Classification
Decision Support
Modellling
Application Services
Contact
Jane Hunter <[email protected]>
 eResearch Lab at the University of
Queensland
 http://www.itee.uq.edu.au/~eresearch
