Data Bases: Population and Maintenance Geog 176B Lecture 8

Transcription

Data Bases: Population and Maintenance Geog 176B Lecture 8
Data Bases: Population and
Maintenance
Geog 176B Lecture 8
Chapters 9 and 10, Longley et al.
Data Collection
One of most expensive GIS activities
Many diverse sources (source integration,
data fusion, interoperability)
Two broad types of collection
Data capture (direct collection)
Data transfer
Two broad capture methods
Primary (direct measurement)
Secondary (indirect derivation)
Stages in Data Collection Projects
Planning
Evaluation
Editing / Improvement
Preparation
Digitizing / Transfer
Data Collection Techniques
Raster
Primary
Secondary
Vector
Digital remote
sensing images
GPS
measurements
Digital aerial
photographs
Survey
measurements
Scanned maps
Topographic
surveys
DEMs from maps
Toponymy data
sets from atlases
Primary Data Capture
Capture specifically for GIS use
Raster – remote sensing
e.g. SPOT and IKONOS satellites and aerial
photography
Passive and active sensors
Resolution is key consideration
Spatial
Spectral
Temporal
www.spot.ucsb.edu
Imagery for GIS
Vector Primary Data Capture
Surveying
Locations of objects determines by angle and
distance measurements from known locations
Uses expensive field equipment and crews
Most accurate method for large scale, small areas
GPS
Collection of satellites used to fix locations on
Earth’s surface
Differential GPS used to improve accuracy
Total Station
Pen/Portable PC and GPS
Secondary Geographic Data Capture
Data collected for other purposes can
be converted for use in GIS
Raster conversion
Scanning of maps, aerial photographs,
documents, etc
Important scanning parameters are spatial
and spectral (bit depth) resolution
Scanner
Raster to vector conversion
Vector Secondary Data Capture
Collection of vector objects from maps,
photographs, plans, etc.
Digitizing
Manual (table)
Heads-up and vectorization
Photogrammetry – the science and
technology of making measurements from
photographs, etc.
Digitizer
Data Transfer
Buy vs. build is an important question
Many widely distributed sources of GI
Includes geocoding
Key catalogs include
Geodata.gov
Geography Network
Access technologies
Translation
Direct read
Managing Data Capture Projects
Key principles
Clear plan, adequate resources, appropriate
funding, and sufficient time
Fundamental tradeoff among
Quality, accuracy, speed and price
Two strategies
Incremental
‘Blitzkrieg’
Alternative resource options
In house
Specialist external agency
A useful rule of thumb is that positions measured from maps are
accurate to about 0.5 mm on the map. Multiplying this by the
scale of the map gives the corresponding distance on the
ground.
Map scale
Ground distance corresponding
to 0.5 mm map distance
1:1250
62.5 cm
1:2500
1.25 m
1:5000
2.5 m
1:10,000
5m
1:24,000
12 m
1:50,000
25 m
1:100,000
50 m
1:250,000
125 m
1:1,000,000
500 m
1:10,000,000
5 km
Positional Accuracy (cont.)
within a database a typical UTM coordinate
pair might be:
Easting 579124.349 m
Northing 5194732.247 m
If the database was digitized from a 1:24,000
map sheet, the last four digits in each
coordinate (units, tenths, hundredths,
thousandths) would be questionable
Testing Positional Accuracy
Use an independent source of higher accuracy:
find a larger scale map
use precision GPS
Use internal evidence:
digitized polygons that are unclosed, lines
that overshoot or undershoot nodes, etc. are
indications of error
sizes of gaps, overshoots, etc. may be a
measure of positional accuracy
Testing Accuracy (cont.)
Compute accuracy from knowledge of the
errors introduced by different sources
e.g., 1 mm in source document
0.5 mm in map registration for digitizing
0.2 mm in digitizing
if sources combine independently, we can get
an estimate of overall accuracy...
(12 + 0.52 + 0.22) 0.5 = 1.14 mm
Definitions
Database – an integrated set of data
(attributes) on a particular subject
Geographic (=spatial) database database containing geographic data of
a particular subject for a particular area
Database Management System (DBMS)
– software to create, maintain and
access databases
A GIS links attribute and
spatial data
Attribute Data
• Flat File
• Relations
Map Data
• Point File
• Line File
• Area File
• Topology
• Theme
Advantages of Databases over Files
Avoids redundancy and duplication
Reduces data maintenance costs
Faster for large datasets
Applications are separated from the data
Applications persist over time
Support multiple concurrent applications
Better data sharing
Security and standards can be defined and
enforced
Disadvantages of Databases over Files
Expense
Complexity
Performance – especially complex data
types
Integration with other systems can be
difficult
Types of DBMS Model
Hierarchical
Network
Relational - RDBMS
Object-oriented - OODBMS
Object-relational - ORDBMS
Relational Databases rule now
Characteristics of DBMS (1)
Data model support for multiple data
types
e.g MS Access: Text, Memo, Number,
Date/Time, Currency, AutoNumber, Yes/No,
OLE Object (MS Object linking and
embedding), Hyperlink, Lookup Wizard
Load data from files, databases and
other applications
Index for rapid retrieval
Characteristics of DBMS (2)
Query language – SQL
Security – controlled access to data
Multi-level groups (e.g. census, NGA)
Controlled update using a transaction
manager
Versioning
Backup and recovery
Characteristics of DBMS (3)
Applications
Forms builder
Reportwriter
Internet Application Server
CASE tools
Programmable API (Applications
program interface)
Role of DBMS
System
Task
Geographic
Information
System
•
•
•
•
•
Data load
Editing
Visualization
Mapping
Analysis
Database
Management
System
•
•
•
•
Storage
Indexing
Security
Query
Data
Relational DBMS (1)
Data stored as tuples (tup-el),
conceptualized as tables
Table – data about a class of objects
Two-dimensional list (array)
Rows = objects
Columns = object states (properties,
attributes)
Table
Row = object
Vector feature
Column = attribute
Relational DBMS (2)
Most popular type of DBMS
Over 95% of data in DBMS is in RDBMS
Commercial systems
IBM DB2
Informix
Microsoft Access
Microsoft SQL Server
Oracle
Sybase
SQL
Structured (Standard) Query Language –
(pronounced SEQUEL)
Developed by IBM in 1970s
Now de facto and de jure standard for
accessing relational databases
Three types of usage
Stand alone queries
High level programming
Embedded in other applications
Types of SQL Statements
Data Definition Language (DDL)
Create, alter and delete data
CREATE TABLE, CREATE INDEX
Data Manipulation Language (DML)
Retrieve and manipulate data
SELECT, UPDATE, DELETE, INSERT
Data Control Languages (DCL)
Control security of data
GRANT, CREATE USER, DROP USER
Relational Join
Fundamental query operation
Occurs because
Data created/maintained by different users, but
integration needed for queries
Table joins use common keys (column values)
Table (attribute) join concept has been
extended to geographic case
Join
Record
ID
Address
#cars
1241
1242
1243
1244
123 State
St.
3
1
2
1
1801 Main
St.
2106 Elm
St.
7262 Pine
Drive
1241
Ford
2003
1241
Subaru
2000
1241
Honda
1999
1241
123 State St.
Ford
1241
123 State St.
Subaru
1241
123 State St.
Honda
1242
1801 Elm St.
Kia
Spatial indexing
Many maps tiled
B-tree (Balanced)
Grid indexing
Quad tree: Points/regions
R-tree (Based on MBR)
New global/spatial grids: QTM
Go2 Grids
38:53:22.08N 077:02:06.86W
US.DC.WAS.54.18.28.83.11
US.CA.SBA.UCSB.UCEN
Spatial Search:
Gateway to Spatial Analysis
Overlay is a spatial retrieval operation
that is equivalent to an attribute join.
Buffering is a spatial retrieval around
points, lines, or areas based on
distance.