Challenges in Geospatial Data Handling and Modeling

Transcription

Challenges in Geospatial Data Handling and Modeling
IIIT-H
Challenges in Geospatial Data
Handling and Modeling
• Established in 1998
– a new star in the Indian Technical Education scenario
• Ranked 7th among Tech-Schools in India
(DataQuest, 2008; among Tech Universities in South Asia, 2009)
• A research university
K S Rajan
Lab for Spatial Informatics,
International Institute of Information Technology
Hyderabad
[email protected]
– ICT (CSE, ECE)
– Application Domains
• Research Centers and NO Departments
Garuda-NKN Partners Meet, July 25-26, 2013. @Bangalore
• GeoSpatial Technologies?
– Open Street Maps, Bing, Google Earth
– Car Navigation
• What does location mean to you?
– Just a point in space?
– A clue for what is happening in its
neighbourhood
– Help discover Larger spatial-temporal
phenomena – like Climatic, health/disease
surveillance
Computation – as a way of life!!
• Early stages of Computer Science
– Data storage and handling
– Mathematical process
• Mathematical theory – a main stay of
Computing theory
• Applications – Large data handling
• Sector-based approaches Science, BFSI, Census
• Age of Internet – Information handling
Geospatial Technologies
– Location is it a variable or a constant?
– Geospatial Information Systems – Modifying Computer
Science
• Simple Map Visualization to Web-based Map mashups
• DB to Geo/Spatial DB; Spatial Data Mining
• Analysis
– Statistical to evolving field of Spatio-Statistical tools
– BI to Geo-BI
• Modelling and Simulation
– Eg., Complex Climate-Social-Economic Integrated Modelling
– Remote Sensing - again, Volumes of Data to
Information – a still struggling journey
Computing and Domains
• Computing Paradigm - in all domains
• Interactions between Computing and the
domains
– A One-way street?
– Or Bi-directional?
• Disciplinary and Multi-disciplinary interests –
Opening up new paradigms of Computing
• The Indian Context
Need to move from Data to Information
1
Computer Science – GeoICT
Main functions of GIS
• Waking up to the exciting world of
– Algorithm development – Graph theory, etc
– Multi-dimensional Data
• Data structures and Data bases
• Data Mining – Spatio-Temporal
Mapping & Visualization: CG, Visual data manipulation,
Presentation, Automated Mapping
GIS
– Graphics and Visualization
• 3D – GRID-TIN from either/or to Hybrid
– Parallel Computing
– Information Extraction and Retrieval
– Software Engineering
Spatial Analysis: Rule and Relation based Analysis,
Simulations, Agent based modelling
Spatial Data Base: Data Collection & Generation, Retrieval,
Editing, Updating, Build Spatio-Temporal
data relation (data quality), Inventory
GIS: Another Application Area to a Frontier in CS/IT
Challenges
GI Science
???
• Technology (IT) – enabler ?
– More data (single info layers to mash-ups)
– More accessibility
– Evolution of Intelligent Decision Support Systems
• Rapid changes in IT infra
GI Systems
- Primary focus
on tools
GI Services
Data & More Data
– Are our data models good enough?
– How Location aware technologies can seamlessly
talk to say, Spatial Data Infrastructure
• World of Sensing, Data collation, and Separating the chaff
Source: Longley, Goodchild, Maguire, Rhind [2001]
Domain related Challenges
• Use of Tools in GIS Application areas
– Convenience of use
– BUT, Limitations to Scientific Discovery
– NEED, more and better parameterization
– Cadastral Mapping
• Mismatch between satellite derived data vs Land Records
VRGeo: Collaborative Mapping
Platform (1)
• Crowd-sourcing of Spatial Data
– GPS based inputs
– Satellite Images / Raster based
• Use any WMS data in the background
• Attributes based on needs
– Local sourcing – SMS2Map feature
– Structuring Unstructured data
• Centralised Geo-DBSMS based input
Developed by: IIIT-H; Hosted by GARUDA @ CDAC
2
VRGeo @ Davanagere
VRGeo – for Slum Mapping in Pune
3
Challenges in the Indian Context
VRGeo – SMS Interface
• Has to provide for Incremental Design
– Can’t get agencies to share data till they see value in it
• Language Localization
– Data Collection to Manipulation to GeoDB
management
• An SMS based Mapping Initiative
• Working with local Organisations
– WASSAN – for Livestock diseases, Agri pests,
Groundwater studies
– Hyderabad Urban Labs – for water bodies
• Data Interoperability
– Application driven Formats and Parameters –
unification of Data Model will take time
• A Generic Framework for ANY Theme
• SMS content 509932vit.ls.co.fmd.500.5.2
(27chars)
• Re-designing for Human disease surveillance (IIPH)
• Data Ownership + Security
– Though largely Govt., Distributed Authorities
– Map Policy & RS Policy of India
Kriti4SOUL: Citizen Initiative in
integrating Geo-Intelligence
Kriti4SOUL – Lake monitoring
• Mobile based Data Capture – Near-real time
– Geo-tagged Images
– Observation Recorded by Text / Voice
• Centralised Geospatial Data System
– Visualization and Assimilation
– Analytical Report Generation
• Interactive Model of g-Governance
Developed by a StartUp KAIINOS.com for SOUL
VRGeo: Collaborative Mapping
Platform (2)
Can Urban Floods be modeled as
a 3D Dynamic phenomena?
• Further plans
• 3D Visualization of a Phenomenon
– GPS device detection and upload
– Natural Terrain – water flow
– Discrete Object space ?
• GPS-Babel hacked
– Village level data generation, correction and update
– Case studies based Semantic Standardization
• Flood spread in an Urban Environment
• Challenges
• Cultural / Language aspects
• Attribute specifications
– Data representation – TIN / GRID / Lattice
– Large Near-real time data handling
• GPU based processing tools
D
4
Flood sequence modelled
Hydrological Modeling, Embarrassingly Parallel Computing,
Near-real time Analytics, Computer Vision,
Current GIS mobility landscape
GIS computation timeline
• Data collection on field
• Data Processing-Off field
• Framework
– Mobile nodes
– Communication
infrastructure
– Computational hubs
• Reliability and topological
stability questionable
Centralized computing
Open Source and GIS
Future of GIS and computing
• Future applications – resource and computation
hungry
• Relevance of GIS expanding
• Community based computing challenges current
paradigms
• Mobility gaining importance
• Computation amalgamating with mobility
•
•
•
•
•
GRASS GIS favoured open source distribution
Well documented sequential code
Large body of applications
Stable performance across sequential platforms
Starting point for our work
5
What Parallelization achieved
GRASS GIS Applications – flood
modeling
• Mapcalc
– Fundamental set of application
– Building block for multiple applications
– Embarrassingly parallel application
– Speedup of 5-6X over a sequential implementation
• Terracost
– Direct application of algorithm by Hazel, T., Toma, L.,
Vahrenhold, J., and Wickremesinghe, R. Terracost:
Computing least-cost-path surfaces for massive grid
terrains. J. Exp. Algorithmics 12 (Jun. 2008),1-31.
– Speedup gained 6X over sequential.
Advantages of amalgamation of HPC and
mobility
•
•
•
•
Real time computation
On-field analysis of data
Reduction of response time
Suitable for disaster prevention and management
– Communication not a bottleneck
– Efficient and fast response
– Real time updates possible
• Can nurse expansion of role of GIS
LSI Slide
Analysis
Mining Spatio-Temporal
Invariant Core (MiSTIC)
Study of Rainfall Patterns in
Monsoonal India
• Reference set of focal points determine the number of
cores that should be identified. Analysis has been done
for 7 reference focal points.
For each of the 56
years, set of valid
focal points are
detected and zones
are created for each
of them.
For the analysis in
this study, only a
subset of the data
with contiguous
landmass of the
mainland India with
non-extreme climatic
Figure 7: TwentyTwenty-five zones created, each marked by different
behavior (Central and
color, for each of the twentytwenty-five focal points highlighted with dark
Peninsular India) is
brown color for entire India in 1991.
1991. The color bar has the
considered.
corresponding zone ID
Ref
Focal
Point
Core
type
Core
size
Max
Freq
(%)
#NF
Years
#NS
Years
P1
CC
5
~32%
4
13
CLD
CR
5
~32%
7
5
CLD
CC
3
50%
1
2
CHD
CR
3
50%
0
0
CHD
CC
1
~28.5%
6
34
CND
CLD
P2
• For this analysis, a point is considered frequent in a core if
it has occurred at the same place within that core for more
than three years (i.e. min_sup = 5%) out of 56 years.
• The following table has the conditions to classify cores as
CHD/CLD/CND Core
min_freq
max_pruneTS
Type
with T=56 years
CHD
>=60% (~34)
<=10% (~6)
CLD
>=25% (~14)
& <60% (~33)
>10% (~7) &
<=33%(~19)
CND
<25% (<=13)
>33% (>=20)
P3
Classification
(CHD/CLD/CN
D)
CR
6
~32%
9
9
P4
CC
3
~52%
1
7
CLD
P5
CR
CC
3
2
~59%
62.5%
0
5
0
10
CHD
CLD
CR
3
~64%
12
0
CLD
CC
4
~39%
3
9
CLD
CR
5
~39%
3
1
CHD
CC
2
~8%
17
31
CND
CR
5
~34%
8
6
CLD
P6
P7
6
MiSTIC - Summary
• The detection of these core regions, especially the CHD can
help detect phenomena that exhibit highly localized occurrences
over time.
• Changes in climatic pattern over long periods may be discovered
by observing whether a given region has changed from say CHD
to CLD or to CND, if analyzed over decadal time periods.
Disease Occurrence Patterns
using MiSTIC:
Study of Salmonellosis in
Florida
• For the monsoonal rainfall phenomena, it is observed in this
work that CR is a better indicator of the core regions. This could
be attributed to the dynamic nature of the Monsoonal rainfall in
India.
Salmonellosis Disease in Florida
• New York [1994-2010]
• Map shows Disease
“hot-spots”
• Valuable insight into
disease prevalence
• 12 out of 19
Counties are Rural
(Non-Metropolitan
Statistical Areas) Sanitation-related
factors
– 18 out of 21 cores in either rural (12) or urban-rural fringe
(6) zone.
CC Focal Polygons
D
E
C
F
New York
Metropolitan
Statistical Areas
Lab for Spatial
Informatics
Zoning-based Analysis: An
alternative spatial correlation model?
D
Spatio-Temporal Data Mining Results from MiSTIC
C
Overlay of MSAs &
Cores
IIIT
60
Hyderabad
Results - I
• Zoning - Boundaries leading to disease area delineation
E
F
A
A
B
Lab for Spatial
Informatics
G
B
G
Waterway Networks - FL
IIIT
61
Hyderabad
Lab for Spatial
Informatics
Road Networks - FL
Overlay - FL
IIIT
62
Hyderabad
7
Prediction Results – I (Bay County, FL)
Results - II
• Zoning - Boundaries leading to disease area
delineation
Waterway Networks - NY
Road Networks - NY
Overlay - NY
Lab for Spatial
Informatics
Theory
Auto Geo-registration
Classification techniques
- time-series
- Spatial Data Mining
Moving Objects –
Data & Analysis
Drought Monitoring
GML and
Geo-Web
Irrigation Mapping
Lab for Spatial Informatics
IIIT Hyderabad
Constrained
Networks
Thank You !!
Spatial Statistics
& Spatial Data Mining
Spatial Information Parallel Computing
Extraction Eg.Roads
& Mobility
Cropping Season
For More, See THESIS on A Data-driven Framework for Extraction of
Spatio-Temporal Manifestations of Dynamic Processes
Geo-Spatial Information
Systems
OBIA, Fusion, Change Detection
Applications
45/5
IIIT
63
Hyderabad
Remote Sensing
CANSAT
Competition
40/10
Locating News
& Geo-context
Algorithms for
Airline Industry
Tessellations and
Mobile
Network Planning
Land Use Modeling
VRGeo – Collab Mapping
FOSS4G
LSIViewer
Environment / Policy / System Building
8