Grid Approach - Copernicus Land Monitoring Services

Transcription

Service Contract 56003 based on the restricted procedure No
EEA/MDI/14/015 following a call for expression of interest
EEA/SES/13/005-CEI
Task 4.2 Final report
Assistance to the EEA in the production of
the new CORINE Land Cover (CLC) inventory,
including the support to the harmonisation of
national monitoring for integration at panEuropean level – Geometric test case and grid
approach
Task 4.2 - Develop and assess the grid
approach
Prepared by:
Geir-Harald Strand (ed.)
Roger Milego, Suvi Hatunen, Markus Törmä, Gerard Hazeu (Contributors)
Geoff Smith, Gergely Maucha (Reviewers)
21. September 2015
Version 1.0
Service Contract 56003 following a call for expression of interest
EEA/SES/13/005-CEI
Legal notice
The contents of this publication were created by space4environment under EEA service
contract No. 3436/B2014/R0-GIO/EEA.56003 and do not necessarily reflect the official
opinions of the European Commission or other institutions of the European Union. Neither
the European Environment Agency nor any person or company acting on behalf of the
Agency is responsible for the use that may be made of the information contained in this
report.
Copyright notice
© EEA, Copenhagen, 2015
Reproduction is authorised, provided the source is acknowledged, save where otherwise stated.
SUMMARY
This report is part of a study commissioned by the European Environmental Agency (EEA)
under the restricted procedure No. SC 56003 following a call for expression of interest
EEA/SES/13/005-CEI. The objective of this part of the overall study was to “… assess the
advantages and disadvantages of the “grid approach”, with the aim to find out how feasible it is
to store land cover information in a regular grid, taking into account different levels of details. An
aspect to be considered here is the connection to statistical analysis and time lines”.
Grid-based approaches are already used by many institutions. Considerable experience with
them has been gained through projects carried out by National Statistical Agencies in
cooperation with Eurostat, providing valuable knowledge and guidelines that can be transferred
to land monitoring.
The grid model allows land monitoring to utilize and merge existing data sources by using a
fixed, shared common geometry as an interface. These sources can be national high spatial
resolution maps and registers as well as high spatial resolution data retrieved from other
sources, e.g. the Copernicus Land Services. The grid approach does limit the generalization
options to simplification of the geometry, without any similar generalization of the attribute data.
Statistics based on the grid approach will therefore be unbiased (with respect to the input data).
A European standard for grids is already established by the INSPIRE process, and it is thus
easy to ensure that a grid approach in land monitoring is INSPIRE compatible. The same
standard is adopted by the NSAs, implying a broad basis for harmonization between the land
monitoring system and the geospatial statistical system in Europe.
The methods used to “populate” the grids are available as standard tools within off-the-shelf and
open source GIS software. The challenges linked to national projections and the coverage of
border areas between countries (or regions) can be solved in a standardized way. Statistics and
analysis as well as modelling exercises are fairly easy to implement in grids due to the regular
shape and size of the grid units. The grid approach is well suited for the visualization in small
cartographic scale and useful for making detailed and complex queries. The grid approach may
also improve the access to information across sectors because restricted data access for the
land monitoring community in many countries, could be more accessible if the data is
aggregated to a grid level.
On the negative side, the geometry of the grid model is unfamiliar for many users and does not
correspond to any land cover or land use “objects” in the real world. With respect to the current
CORINE Land Cover system, mixed classes will have to be decomposed into their constituents
and registered as such. The semantic work being developed in parallel within EAGLE will
contribute to this work. Change data is only available as net changes in a grid approach, unless
specific caution is taken to represent gross changes as specific measurements. Finally, grids
are populated, in many cases using national high spatial resolution data. The actual content,
definition and quality of these data may vary from one country to another. Harmonization and
standardization is required to ensure compatibility across the countries participating in the land
monitoring effort.
The two most important open issues are grid resolution and choice of attributes. Future
implementation of a grid approach is probably going to be a compromise involving a balance
between several concerns about grid resolution and attribute availability.
High grid resolution (small grid cells) is attractive due to the level of detail, while a coarser grid
resolution is beneficial because:
•
•
•
It may be sufficient to cover the needs for pan-European monitoring.
Data with restricted access may become available (as detailed location is not traceable).
Smaller amounts of data are easier to manage.
Task 4.2 – Develop and assess the grid approach
Page 4 of 91
Table of content
SUMMARY ....................................................................................................................................... 1
0
INTRODUCTION........................................................................................................................ 5
1
THE GRID APPROACH ............................................................................................................... 6
2
3
4
5
1.1
GENERAL AND OVERALL EXPLANATION OF THE GRID APPROACH ....................................... 6
1.2
LITERATURE REVIEW ....................................................................................................... 7
1.3
THE GRID APPROACH IN EUROPEAN LAND MONITORING ................................................... 8
1.4
POPULATING GRIDS – DIFFERENT METHODS .................................................................... 9
1.5
GRIDS AND CLC ........................................................................................................... 16
1.6
SPECIAL ISSUES ........................................................................................................... 19
THE GRID APPROACH USED IN EUROSTAT/ EFGS .................................................................. 26
2.1
INTRODUCTION AND BACKGROUND ............................................................................... 26
2.2
GEOSTAT 1A ............................................................................................................. 26
2.3
GEOSTAT 1B ............................................................................................................. 32
2.4
GEOSTAT RESULTS AND THE GRID APPROACH IN LAND MONITORING........................... 33
CASE STUDY: STATISTICAL ANALYSIS .................................................................................... 35
3.1
SOURCE DATASETS AND THE GRID POPULATION PROCESS ............................................ 35
3.2
COMPARISON OF PAN-EUROPEAN CLC AND NATIONAL HR CLC .................................... 36
3.3
COMPARISON OF GRID RESOLUTIONS ............................................................................ 45
3.4
LAND COVER AND POPULATION...................................................................................... 46
CASE STUDY: CHANGE DETECTION ........................................................................................ 48
4.1
CHANGE DETECTION USING GRID APPROACH ................................................................. 48
4.2
GROSS AND NET CHANGES ............................................................................................ 53
CASE STUDY: VISUALIZATION ............................................................................................... 55
5.1
CARTOGRAPHY AND VISUALISATION UNDER THE GRID APPROACH .................................. 55
5.2
VISUAL ANALYSIS ......................................................................................................... 57
6
CONCLUSION ......................................................................................................................... 66
7
REFERENCES ......................................................................................................................... 69
ANNEX 1: DESCRIPTION OF GEOSTAT 1B APPENDICES CONTENT ............................................... 71
ANNEX 2: COMPARISON OF PAN-EUROPEAN CLC AND NATIONAL HR CLC USING 1 KM GRID ....... 75
ANNEX 3: COMPARISON OF PAN-EUROPEAN CLC AND NATIONAL HR CLC USING 100M GRID ..... 82
Page 5 of 91
0 INTRODUCTION
This report is part of a study commissioned by the European Environmental Agency (EEA)
under the restricted procedure no SC 56003 following a call for expression of interest
EEA/SES/13/005-CEI. The objective of this part of the overall study was to “… assess the
advantages and disadvantages of the “grid approach”, with the aim to find out how feasible it is
to store land cover information in a regular grid, taking into account different scales of level of
details. An aspect to be considered here is the connection to statistical analysis and time lines”.
The study consisted of three main parts, organized into five chapters. The first part (chapter 1)
is a description and discussion of the grid approach. The chapter sets out to explain the
approach in general, followed by more detailed discussions about particular issues relevant for
pan-European land monitoring linked to the existing CORINE Land Cover (CLC) initiative.
The second part (Chapter 2) is a review of the first two GISDATA projects (1A and 1B) carried
out by the European Forum for Geography and Statistics (EFGS; a voluntary cooperation
between national statistical agencies) in close cooperation with Eurostat. These projects have
established the methodology implemented by the NSAs and used for a grid-based approach to
the handling of European population data.
The third part is a set of case studies using CLC data from the province of Kymenlaakso in
south-eastern Finland. Finnish national High spatial and thematic Resolution CORINE Land
Cover (HR CLC) is used as an example of a detailed national data source, while pan-European
CLC data for the same area is used for comparison. The case studies are carried out with a 100
m and a 1 km grid to allow an appraisal of the differences in grid resolution.
The first case study (Chapter 3) applies a grid approach for statistical purposes. The first theme
chosen for the study is simply to compare the pan-European CLC and the Finnish national HR
CLC, showing how the grid approach can be used to avoid the statistical bias which, due to the
effect of cartographic generalization, is implied in a vector approach. A second theme is a
descriptive comparison of grid resolution with 100 m and 1 km grids used as examples. A third
theme is the relationship between land cover and population data.
The second case study (Chapter 4) is a study of the impact of the grid approach on the
documentation of land cover changes. Statistics based on multi-temporal comparison of grids
document gross changes, while net changes may be hidden. The changes in forest area are
used as an example.
The third case study (Chapter 5) addresses cartographic visualization. The challenge is not so
much the visual appearance of the regular grid. This format can be unfamiliar to the user, but
less so when the grid cell is small due to the cartographic scale. The fact that the grid is
characterized by many attributes and not simply classified does open new opportunities for
cartographic visualization and the study is aiming to demonstrate some of these techniques.
Finally the conclusions and recommendations from the study are found in chapter 6.
The report also includes three annexes. One provides additional information about the
GISDATA projects and the other two contain results that was prepared for, but not used in, the
case study of a grid approach applied for statistical purposes (Case study 1).
Page 6 of 91
1 THE GRID APPROACH
1.1
GENERAL AND OVERALL EXPLANATION OF THE GRID APPROACH
The textbooks usually differentiate between “raster GIS” and “vector GIS” (Couclelis 1992,
Congalton 1997). The “vector GIS” does, like the paper map, attempt to represent individual
objects that can be identified, measured, classified and digitized. The “raster GIS”, on the other
hand, build closely on the structure of remote sensing imagery and has, as a methodology,
acted as a bridge between GIS and remote sensing. The “raster GIS” attempts to represent
continuous spatial fields approximated by a partition of the land into small spatial units with
uniform size and shape. The “raster GIS” therefore have structural properties that make them
particularly suitable for use in modelling exercises. The interpretation and information content of
the elementary unit of the raster model – the pixel – is, however, problematic, as discussed by
Fisher (1997).
A grid is a spatial model falling somewhere between the vector model and the raster model. An
informal explanation is that the grid model is a vector model with the visual appearance of a
raster. The grid is a spatial partition of the Earth’s surface into non-overlapping and non-empty
regular parts. Similar to the raster model, the grid consists of square spatial units (“cells”) of
uniform size and shape. Unlike the raster-model, the spatial units in the grid are internally
heterogeneous, with the internal composition of the grid cell represented as an attribute vector.
The grid looks like a raster. The difference is the information attached to the grid cell. A raster
cell (pixel) is classified to a particular land cover class. The grid cell, on the other hand, is
characterized by how much it contains of each land cover class The grid is, as a result, a more
information-rich data model than the raster.
Figure 1.1: CLC with a 1 km raster/grid superimposed (top) illustrating the difference between encoding a
particular unit as raster pixel (centre) or a grid cell (bottom). “daa” is a Norwegian unit: 10 daa = 1 hectare.
The difference between the raster and the grid is illustrated in Figure 1.1, using an example
based on CLC. CLC is originally a vector data set but can be converted into a new data set with
uniform spatial units of e.g. 1 X 1 kilometres. The result can either be a “raster” where a single
CLC class is assigned to each pixel (representing for example the dominant land cover class
Page 7 of 91
inside the pixel; or the CLC class found at the centre of the pixel unit). Alternatively, the result is
a “grid” where an attribute vector representing the complete composition of land cover classes
inside the unit is assigned to each grid cell. Spatial information on a more detailed scale than
the size of the spatial unit is in both cases lost, but the overall loss of information is less in the
grid than in the raster.
A grid data model is not the result of mapping in any traditional sense. The grid is “populated”
with information from (one or more) existing sources. This information could include, but is not
restricted to information about land use and land cover (LU/LC). One method frequently used to
populate grids with LU/LC information is to calculate spatial coverage of each LC/LU class
based on a geometric overlay between the grid and detailed polygon datasets. The assignment
of attributes can furthermore involve counting or statistical processing of register data for
observations made inside each grid cell (for instance, not only the area of a class, but also the
number of objects it represented). Ancillary databases provide important and relevant
information about the content and context of the grid cells. For instance, the number of
buildings, length of roads or number of people found within the grid cells.
1.2
LITERATURE REVIEW
Gridded LU/LC data are particularly suited for use in modelling exercises due to the
combination of a simple and regular geometry with a detailed description of the land
composition within each grid unit. The approach is in particular useful for global or other widearea studies, using large (30” or larger) grid units (Wen and Tateishi 2001). Traditional mapping
at this scale is difficult due to the extent of the area, while a pure raster approach with each
pixel assigned to a single class, in these situations is inadequate for representing the variation
of the land surface (Pierce and Running 1995).
Meteorologists use statistical representation of multiple land surfaces within a grid cell to
parameterize the effects of sub-grid heterogeneity for land-atmosphere energy exchange
(Bonan et al. 1993). Schlünzen and Katzfey (2003) aggregated CLC data for north western
Europe into a 30” grid to integrate land use and meteorological data to examine sub-grid-scale
land use effects on meteorological and climate models and thereby produced a more accurate
simulation. Equally, soil scientists have used a grid approach to model the global soil
vulnerability to water erosion (Batjes 1996) and hydrologists have used the approach for water
resource management (Smith et al. 2010).
Among biological studies, Thuiller et al. (2004) created a 10’ grid for Europe populated with land
cover data from a 1 km resolution map to assess the influence of land cover and climate on
species distributions across Europe. Grid models populated with land cover composition as well
as a host of other data were used by Mehr et al. (2011) in a study of species richness for bats.
Similarly Chiatante et al. (2014) used gridded data in their study of threatened bird species in
Mediterranean farmland landscapes.
The landscape itself has also been modelled as a populated grid, e.g. by Strand (2011) who
used such a model to asses uncertainty in delineation of landscape types.
Modelling and simulation is an integral part of many climate change studies. It is therefore no
surprise that the data rich grid model is often employed in such studies (e.g. Gibbard et al.
2005; van Asselen and Verburg 2012; Letourneau et al. 2012).
1.3
Page 8 of 91
THE GRID APPROACH IN EUROPEAN LAND MONITORING
The grid approach in LU/LC mapping was examined by the FP7 project Harmonised European
Land Monitoring (HELM; Ben-Asher 2013). The project compiled a non-exhaustive overview of
the grid approach in European land monitoring. The main findings are summarized below.
EEA periodically converts the CLC dataset into a raster to carry out analysis.
Iceland uses grids for various monitoring activities like forest (woodland and plantations), soil,
plants, birds and LULUCF but there is not a common grid and the different initiatives usually
cover only part of the country.
The Netherlands have a gridded land cover data base. The origin was pixels (25 m x 25 m), due
to the use of satellite imagery, but the spatial units/cells are now fixed. Strictly speaking, these
cells are raster cells as the information attached to each raster cell is just one LU/LC class
value. Although they are populated with data from all different data sources but on basis of a
kind of query land cover classes are attributed to each cell. There is currently no database
connected to the «raster/grid IDs».
Norway has developed a standardized national grid (Strand & Bloch 2009) with flexible spatial
resolution. Statistics Norway is publishing population data for the 1 km grid. NIBIO (previously
the Norwegian Forest and Landscape Institute) provides land cover and land resource data for
the 5 km and 1 km grids.
Austria: Have started research on grid mapping with various types of data, grid cells 200 –
2000 m.
Finland (Finnish Environment Institute) produces its national CLC data as a 25 m raster
classification. There are also other information systems like the urban structure monitoring
system (YKR) which utilizes a grid approach, at least partially. YKR is an information system for
land use experts, making it possible to monitor long-term changes of urban areas as well as to
carry out analysis and research on urban areas. YKR contains information about people,
housing including summer cottages, employment and land cover and land use in a 250 m grid.
The former Finnish Forest Research Institute (METLA) – now part of the newly established
LUKE institute - provides information concerning forests in a grid format. The National Land
Survey has produced SLICES land use data using 25 and 10 m grid size for the years 1999,
2005 and 2010.
Switzerland uses a 25 m grid (due to satellite images) for forest mixture classes. Starting in
2012 there will also be a 5 m raster representing tree type (coniferous/deciduous) with updating
foreseen every three years, according to the availability of new aerial images. The national area
statistics uses a 100 m raster measured at the centre point, with 72 LU/LC classes. Population
(census) is available on a 100 m grid if more than a minimum number of persons are living
there.
EFGS (European Forum for GeoStatistics) is a voluntary cooperation between several National
Statistical Institutions (NSIs) across Europe (and recently spreading to other parts of the world
as well) that have started using grids for integration and harmonization of spatial data. Eurostat
is informally involved in this cooperation. The steering committee consists of the national
contact points from the NSIs that participate in the European Statistical System (ESS) project
GEOSTAT (see chapter 2 for a comprehensive summary of the GEOSTAT projects).
The System of Environmental-Economic Accounting is an internationally agreed standard for
harmonized statistics on the environment and its relationship with the economy, organized by
the United Nations Statistics Division (UNStats). With respect to environmental-economic
accounting, the grid cell reportedly turns out to be a fairly common statistical unit for spatially
referenced environmental data, including land cover and land use, used in the national
accounting systems (Vardon et al. 2011).
The European Spatial Planning Observation Network (ESPON) is an EU program aiming to
support the “reinforcement of regional policy with studies, data and observation of development
trends” (European Commission 2007). Potential and actual use of grids in a spatial planning
context is described in an ESPON report addressing the modifiable areal unit problem
(Anonymous, no date) and gridded data structures are given renewed attention in recent results
from projects organized under ESPON (e.g. Milego and Ramos, 2011).
There is furthermore a movement towards increased use of grids in the landscape research
community, linked to the concepts landscape character assessment and the development of a
European landscape classification system. The European Landscape Character Assessment
Page 9 of 91
Initiative (ELCAI) running in the period 2003 – 2005 documented a widespread use of grids in
national landscape descriptions throughout Europe (Wascher 2005) and led to the development
of a proposed pan-European landscape description based on a grid approach (Mücher et al.
2010).
1.4
POPULATING GRIDS – DIFFERENT METHODS
As described in section 1.1, the grid approach is dealing with a spatial model somewhere in
between the vector model and the raster model. The grid as a spatial model consists of square
spatial units of uniform size and shape that can have an (internally) heterogeneous composition,
i.e. the grids are “geometrically uniform polygons”. Furthermore, each class in a raster model
corresponds to an attribute in a grid model. A grid cell is thus capable to carry more information
than a raster cell. Each grid has a unique ID that is related to a database containing the attribute
data.
Grids can be populated with spatial data in numerous ways. However, the different methods to
populate grids depend on 1) the data (source) type; and 2) the purpose of the model. As
possible spatial data sources we make a distinction between vector data (point, line or polygon)
and raster data that are both georeferenced or so called spatial explicit. An overlay between the
grid model and different data sources makes it possible to populate the grids with a wide variety
of information. Each class in a raster/vector data model corresponds to an attribute in a grid
model. On the basis of the information that the populated grids contain a new spatial
classifications / representations can be made.
VECTOR DATA
Vector data give a representation of the world using points, lines, and polygons. Vector models
are useful for storing data that has discrete boundaries, such as country borders, land parcels,
and streets (ESRI dictionary1). Examples of vector data (polygons) in the context of Copernicus
are the original CLC and Urban Atlas (http://land.copernicus.eu/) datasets. For lines and point
data Copernicus does not have any specific examples. Road and railways are examples of the
vector line data type. Sample data such as census returns or soil cores are examples of point
data.
The population of grids with vector data follows the same methodology for point, line and
polygon data. After an overlay operation, a frequency is calculated and the resulting table is
joined with the grid data set. Points can be assigned directly to grid-polygons, but the overlay
operation needs to split up line and polygon objects into parts that can be assigned to unique
grid-polygons. The geometry of the the grids need to be maintained throughout the operation.
Each point, line or polygon is attributed with the ID of the grid cell with which it intersects and a
frequency is calculated per grid cell, which means that the per grid cell the number of points, the
length of lines or the area of polygons for each class per grid cell is presented as a table. The
table is joined with the grid dataset (a “spatial join” operation). Additionally various statistical
operations on the newly joined table can be performed (%, majority, min, max, etc.).
RASTER DATA
Raster data make a representation of the world as a surface divided into a regular grid of cells.
Raster models are useful for storing data that varies continuously, as in an aerial photograph, a
satellite image, a surface of chemical concentrations, or an elevation surface (ESRI dictionary2).
Examples of raster data in the context of Copernicus are different HRLs, EU-DEM and IMAGE
2000, 2006, 2009 and 2012 (http://land.copernicus.eu/) datasets.
The population of grids with information from raster data sets can follow to alternative routes.
The first alternative is used when the difference between the grid cell size and the raster pixel
size is small. In this case, the grid can be treated like a raster and populated using raster
resampling methodology. The second alternative is used when the raster pixel size is
considerably smaller than the grid cell size. In this case, the raster is treated like a point data set
and summaries (or other relevant statistics) based on the pixels falling inside each grid cell is
used to populate the grid cell.
Population of grids also depends on the purpose of the grid model. Is the grid model
represented as a sample point (usually the centre point of grid cell) or as a polygon? Grids
represented as a sample point can be used to assign data from existing, discrete polygon maps
to a grid. The method is relatively easy and fast to implement and does not need much
1 http://support.esri.com/en/knowledgebase/GISDictionary/term/vector%20data%20model
2 http://support.esri.com/en/knowledgebase/GISDictionary/term/raster%20data%20model
Page 10 of 91
computer power. A disadvantage is that a grid cell can extend over several different polygons
and the method only assigns data from the polygon that intersects with the sample point chosen
to represent the grid (Strand & Bloch, 2009). Taking the grid as a polygon and intersecting it
with another vector (polygon map) data source overcomes this disadvantage. Area
measurements of the different polygons within each grid cell are stored in this approach.
However, the disadvantages of this method are the complexity to implement and the amount of
computer power needed.
Another special grid model is the Dutch VIsueel Ruimtelijk Informatie Systeem (VIRIS). The
VIRIS dataset is a geodatabase with 54 different raster datasets with a spatial resolution of 25
m. The raster datasets are produced by rasterizing the Dutch Topographical dataset on attribute
information (TDN code). The datasets have three attributes for each fixed grid: pixel value,
OBJECT ID and COUNT (Clement et al, in prep). The pixel value represents the surface area
(polygon), length (line) and number (point) of classes of the topographical data base of the
Netherlands. Each class is represented by a one raster dataset. The raster datasets are not
integrated into one grid with corresponding attribute table. Each raster dataset has the same
size, location and corresponding IDs of its raster cells.
Figure 1.2: VIRIS 2.0 dataset for the year 2010. Forest surface area is represented in a white-black colour
gradation (black = no forest; white is 625 m2 forest). Red – green colour indicates the urban surface area
of each grid.
Figure 1.2 shows an example of the VIRIS data set. Two out of the 54 VIRIS raster datasets
(urban development and deciduous forest) with same size (25 m spatial resolution, i.e. 625 m2)
and location are presented as an overlay. The surface area urban development is presented in
colours (red = 625 m2 and green = 1 m2) and the deciduous forest surface area is presented in
black (0 m2) and white (625 m2). The indicated raster cell has 65 m2 of urban development and
396 m2 of deciduous forest. The remaining surface area is occupied by other classes not shown
in the figure
The raster datasets are kept separate for specific user requirements; however they could be
easily integrated in one grid model as the raster cells of each raster dataset have common IDs.
The VIRIS 2.0 geodatabase contains raster datasets for the years 1996, 2000, 2004, 2008 and
2010 making it possible to monitor changes between years for specific classes.
EXAMPLES
Examples on how to populate a grid model with vector data:
1.
2.
3.
4.
Page 11 of 91
Simplistic models like (centre) point sampling
Counting
Area measurement
Area measurement raster datasets
Ad 1: The grid model is handled as points and data is assigned to these points from an existing,
discrete polygon map. The sampling points representing the grid cells are often the centre
points of the grid cells, but any point could be used. The point data set is intersected with a
polygon map (CORINE Land Cover, soil map, etc.). The land cover class or soil type at each
specific sample point is transferred to the attribute table of the point data set through a spatial
join (see above for details on advantage and disadvantages of methodology). The 100 meter
CLC raster used for analytical purposes by EEA can be thought of as a simplistic grid model
populated from the vector CLC by this approach.
Figure 1.3 Norwegian point data on barns and cowsheds superimposed on the Norwegian SSB5KM grid
database together with an excerpt of the building table for the Eidskog municipality. The lower table shows
the results of a spatial join where each building (one row in the table) has been assigned to a grid-cell
(attribute: SSBID). Aggregating the table on SSBID will give a summary (number of barns) for each grid
cell3.
Ad 2: The counting methodology often deals with spatial explicit point data. Though polygon
data represented as (or converted into) point data can also be a data source used to populate
grid cells. Counting the number of polygons or classes within a grid can be used as an
indication of the diversity of the grid cell. Even counting raster cells with a specific class falling
within a grid cell is a possibility to populate the grid model. For example, the number of raster
cells with a 100% tree cover density (from the HRL Tree Cover Density) value per grid cell. An
example of the counting methodology is the combination of the Norwegian point data set (here
restricted to barns and cowsheds of the Norwegian population register) with the grid model. In
Figure 1.3 the spatial explicit point data is combined with the grid model SSB5KM. The spatial
join of the two data sets produces a table containing the information from the point dataset
linked with the grid cell where the point is located (Figure 1.3 bottom)3.
3
Figures 1.3, 1.4 and 1.5 are taken in slightly adapted form from Strand & Bloch (2009)
Page 12 of 91
Figure 1.4. Visual representation of the mean floor area of barns and cowsheds in each grid cell together
with a table containing the mean floor area and the number of barns and cowsheds per SSB5KM grid cell
for the Eidskog municipality (darker = higher average)3.
By counting the number of barns and cowsheds within each grid cell a new attribute can be
attached to the grid model SSB5KM that can be used for further (statistical) analysis on the
attributes of the cases counted (e.g. mean floor area barns and cowsheds per grid cell on basis
of column Areal_f (in Figure 1.3) and visualization (Figure 1.4). The attribute table belonging to
the grid data model is extended with an attribute after each analysis.
Ad 3. The area measurement example deals with assigning data from an existing, discrete
polygon map to a grid. It measures the complete area extent of each polygon within each grid
cell. In the example one single SSB5KM grid cell is superimposed on the Norwegian land
resource map of Eidskog. The thematic polygons are clipped along the grid cell and each land
resource polygon is attributed with a grid cell ID (ArcGIS Identity (or Union) operation). The
result is a new polygon dataset with (small) polygons having a unique reference to a single grid
cell and a single land resource polygon. A frequency is calculated and presented as a table in
which for each grid cell (rows) the area per land resource class (a column per land resource
class) is presented (Figure 1.5)3.
Ad 4. Two examples are presented on how to populate a grid model with raster data. The first
example is the HRL Imperviousness raster layer (20 x 20 m) combined with a standard grid of 1
x 1 km (Figure 1.6). The HRL Imperviousness shows per raster cell the percentage (%) of
imperviousness. In the second example the Dutch national land cover dataset LGN7 (Landelijk
Grondgebruik Nederland) is combined with the same standard grid. LGN7 is a raster dataset
containing 39 LU/LC classes at a spatial resolution of 25 m (reference year 2012).
The standard grid of 1 x 1 km is rasterized on the grid ID attribute into 20 m. After a tabulate
area, for each 1 x 1 km grid the area of each HRL Imperviousness class is presented in a table.
The table is joined with the original grid. Figure 1.6 shows the standard grid of 1 x 1 km
superimposed on the HRL Imperviousness together with an excerpt of a table containing the
areas of each imperviousness class per 1 x 1 km grid.
Figure 1.7 shows the standard 1 x 1 km grid superimposed on the LGN7 dataset. The LGN7
dataset is resampled to 20 x 20 m raster cells. The corresponding table is the result of the
spatial join of the rasterized grid and the LGN7 dataset. The table shows the area of each LGN7
class for the 36 grid cells á 1 km2.
Both tables of Figure 1.6 and Figure 1.7 can be combined to show the surface area of the
imperviousness classes (%) and the surface area of LGN classes per grid cell in one table.
Page 13 of 91
Figure 1.5. Close-up of SSB5KM grid superimposed on land resource map of Eidskog with a table
containing surface areas for the different land resource classes per grid cell3.Each row in the table
corresponds to one grid cell in the map. SSBID is the unique id of the grid cell. A10 through A80
corresponds to land cover categories (A10 is built-up land, A20 agricultural land etc.) and the attribute is
the acreage of the land cover category in square meter. P10 through P80 represent the land cover
coverage as percent.
Page 14 of 91
Figure 1.6. The 1 x 1 km standard grid superimposed on the HRL Imperviousness for an area within the
province of Utrecht, the Netherlands (36 grid cells á 1 km2). The table is showing an excerpt of the result of
the area calculation per class (percentage of Imperviousness) for each of the 36 grid cells.
Page 15 of 91
Figure 1.7 The 1 x 1 km grid superimposed on the LGN7 dataset with an excerpt of the table resulting from
joining the 1 x 1 km grid rasterized into 20 x 20 m raster cells with the LGN7 raster dataset.
1.5
Page 16 of 91
GRIDS AND CLC
CLC is the de facto standard for pan-European land monitoring. Any grid approach to land
monitoring therefore has to be discussed in the context of CLC. Important issues are
information content compared to CLC, backward compatibility and the analytical opportunities
with respect to the indicators required by EEA.
SCALE AND RESOLUTION
It is hardly possible to imagine a spatial data model independent of scale and resolution. CLC is
a product closely linked to a particular cartographic scale in terms of minimal mapping unit
(MMU). The typical cartographic scale is 1:100 000 and smaller and the smallest mapping unit
size is 25 ha. The MMU corresponds to a 500 x 500 meter grid cell. Data from a study area in
Finland (see also chapters 3 - 5 of this report) showed that less than 25 % of the 1 x 1 km grid
cells were completely embedded inside a single CLC polygon (i.e. contained a single CLC
class). The typical (most frequent) number of CLC classes in a 1 x 1 km grid cell (when CLC
data was fetched from the pan European CLC map) was three. When national HR CLC data (20
meter pixel size) was used to populate the grid, the typical number of classes in each 1 x 1 km
grid cell was eight.
Figure 1.8: Distribution of 1 x 1 km grid cells with respect to a) number of CLC classes from the panEuropean map [left]; and b) number of CLC classes from the national HR map [right].
Figure 1.8 illustrates the generalizing effect of the current CLC approach. The complex LULC
composition of each 1 km2 grid cell, shown when the grid cells are populated directly from HR
CLC, is lost when the grid cells are populated from the pan-European CLC. The figure also
shows that the 1 km2 grid and the pan-European CLC are fairly similar in terms of geometrical
resolution when the size of the features is considered. When the resolution of the polygons
used to populate the grid is more detailed than grid, the graph appear as symmetric, like the
right hand graph in Figure 1.8. On the other hand, the graph will appear as right-skewed (large
bars close to zero and a right-hand tail of short bars) when the resolution of the polygons used
to populate the grid is less detailed than the grid. When the resolution is similar, most grid cells
will cut across two or three polygon boundaries (due to the systematic partition) and each grid
cell ends up being populated with two to four classes.
A 1 km2 grid populated with data from the pan-European CLC does not make sense. It provides
the same statistical information as the vector data set framed in a less precise geometry,
although with a spatial resolution not too different from the pan-European CLC. The question is
therefore whether a 1 km2 grid populated from HR data sources represents an improvement
over the current CLC product. The statistical information under such an approach is more
detailed and less biased, and the spatial resolutions sufficiently close to the pan-European CLC
to allow meaningful visualisation at small cartographic scales (large regions, country level and
European maps).
It is also possible to apply more detailed grids: 500, 250 or 100 m are all relevant options. In this
study, we have concentrated on 100 m grid resolution, since this is the resolution used in the
raster datasets produced from the pan-European CLC and HRLs by EEA for analytical
purposes.
Page 17 of 91
The 100 m resolution grid does, as expected, contain fewer classes per grid cell (Figure 1.9)
than the 1 km2 grid since the finer spatial resolution allows less pixels inside each grid cell. Most
(around 75%) of the 100 m grid cells fall entirely inside a single pan-European CLC object
(Figure 1.9 left). Also more than 40% of the grid cells in our Finnish study area contained a
single CLC class when populated from the national HR CLC, while another 40 % in this case
contained 2-3 classes.
Figure 1.9: Distribution of 100 m grid cells with respect to a) number of CLC classes from the panEuropean map [left]; and b) number of CLC classes from the national HR map [right].
As the pan-European CLC data sets are normally converted to a 100 m raster for analysis, a
100 m grid resolution may seem most appropriate for a possible grid-based approach. It is,
however, important to take into account the basic difference between a raster and a grid
approach. The raster approach is limited to a single class per pixel. This is usually the CLC
class found at the centre (or another point location) of the pixel. The grid is, on the other hand,
encoded with the complete and exact statistical distribution of land cover classes inside each
grid cell. A 100 m grid is therefore much more information-rich than a 100 m raster.
An analysis of the Norwegian CLC2012 dataset revealed that the mean polygon size (when
ocean was excluded) was 3.17 km2. This figure is, even when ocean is excluded, influenced by
a skewed distribution due to a few large lakes and glaciers, and a number of large and
homogeneous forest and mountain areas. 70 % of the polygons in the vector data set were less
than 1 km2 in area. The average size of these smaller polygons was 0.48 km2. A 1 km grid does
therefore represent a certain degree of simplification with respect to geometrical resolution
when compared to vector CLC. The 100 m grid, on the other hand, has a finer spatial resolution
than the vector CLC but does still represent a simplification with respect to geometrical shape.
However, a 1 km grid resolution may be sufficient to maintain a degree of detail which is at least
compatible with the vector CLC for pan-European, regional and national studies since the loss
in geometrical detail is compensated by the gain in details and accuracy with respect to the
characterization of the spatial units.
CLASSIFICATION SYSTEM
CLC is using a three level hierarchical classification system with 44 classes at the most detailed
level. Many of the classes are both scale- and source-dependent, i.e. they are defined for
manual interpretation of satellite imagery at a certain cartographic scale. One consequence of
this approach is that some classes may be defined in terms of a multitude of elements. Airports,
for example, typically consist of a mixture of buildings, paved runways, roads, parking lots,
storage areas and lawns. The common feature for the area is the land use, not the land cover.
CLC as a classification system is also scale dependent. It entails “mixed classes”. Mixed
classes include, in particular, the CLC classes 242 “Complex cultivation patterns” and 243
“Land principally occupied by agriculture, with significant areas of natural vegetation”. These
classes are artefacts of the cartographic scale, and not meaningful in a grid approach where
data are available on a more detailed level. A grid approach would typically populate and
characterize the grid cells with information about the exact amount of both cultivated and natural
land. The fact that the basic types are mixed inside a grid cell can be retrieved by analysis but is
not itself “a class”.
Less obvious examples of mixed classes are the built-up classes 111 Continuous urban fabric
and 112 Discontinuous urban fabric. Both will, at the micro-scale, be a mixture of buildings,
Page 18 of 91
roads, other sealed surfaces and unsealed areas. The classification as 111 or 112 depends on
the composition of these factors within larger areas. The solution for a CLC compatible grid
approach can be to use a detailed land cover map recoded to CLC classes and populate the
grid by overlay (as described in chapter 1.4 above). This is the approach used in the case
studies below, where the Finnish national HR CLC is used to populate the grids. An alternative
approach is to populate the grid with counts and measurements of more elementary features
(sealed area, number of buildings, size of buildings etc.) and use data models to derive
information compatible with the CLC class definitions for analytical purposes.
The main difference between the grid approach and other (raster and vector approaches) is that
the need for a classification system is relaxed since “characterization” for many purposes can
replace “classification”. The explanation is that there are, fundamentally, two different
approaches to land cover and land use mapping: Classification and characterization.
Classification is the approach where spatial units are assigned to categories in a pre-defined
classification system. The method is well-known from CLC and many national land monitoring
systems. Characterization, on the other hand, is the approach where the spatial units are
described using a set of independent diagnostic criteria (Di Gregorio and Jansen 2000,
Gomarasca 2009).
Characterization entails that a spatial unit (the grid cell in our case), instead of being classified,
is described by a number of characteristics. These characteristics may, of course, themselves
be derived from a classification system (as in the case studies in chapters 3 – 5 below where
the grid cells are characterized by a mixed composition of CLC classes), but can also consist of
counts (number of buildings, meters of road, number of inhabitants) or other measurements.
The grid approach allows flexible classification opportunities after data has been collected and
stored against the grid cell.
HARMONIZED DATA
CLC is a harmonized approach to land monitoring in the sense that it applies a standardized
classification system throughout Europe. A grid approach will collect information from existing
maps and registers. The standardization of information retrieved from these registers is an
essential prerequisite for the grid approach. The issue has not been explored further in the
current study, but may possibly be approached using the EAGLE semantic model.
ANALYTICAL OPPORTUNITIES WITH RESPECT TO CLC
The regular CORINE land cover (CLC) inventories proved to be an essential source for
monitoring land cover (LC) and land use (LU) change in Europe. Over the past 30 years,
national teams from 39 European countries who are member of the European Environmental
Information and Observation Network (EIONET) are working closely together with the EEA and
in particular with the CLC technical unit to monitor changes within the European landscape.
CLC is recognised as a well-established, reliable information source to support environment
policies and beyond. Its use is indispensable in several policies (e.g. no land take by 2050 as
targeted in the 7th Environment Action Programme of the European Union).Within the EEA CLC
is used in many different EU wide applications:
•
•
•
•
•
•
•
•
•
Analysis of different land cover types over Europe
Monitoring and trend analysis of land cover changes in Europe in 1990-2012
Landscape fragmentation in Europe
Ecosystem mapping and assessment
Ecosystem service mapping
High nature value farmland and the Common Agricultural Policy
Land and ecosystem natural capital accounts
Land use and scenario modelling for Integrated Sustainability Assessment
Nutrient accounts Soil functionalities
However, more spatially and thematically detailed monitoring of land cover and its changes is
needed for EEA analysis (Copernicus activities next to CLC). The requirement can be to have
more thematic and spatial detail (for specific geographic area like urban areas (Urban Atlas),
riparian zones, and Natura2000 sites) or higher temporal frequencies and/or more thematic
detail (for thematic areas like imperviousness, forest, grassland, wetlands and water).
EXAMPLE
Page 19 of 91
The Land and Ecosystem Accounts (LEAC4 – see Figure 1.10) is a practical example of the grid
approach. Many CLC based indicators, like land take indicator (CSI 014/LSI 001)5 calculation is
based on LEAC method as described in the EEA technical report (No 11/2006).
Figure 1.10: Overview of the LEAC method
The core data of the LEAC project have been structured in a relational database model to allow
quick and easy analyses. These databases have been made publicly accessible through the
Internet. From the LEAC database, various geographical layers have been derived such as land
cover flows, Corilis, the green potential background layer and the dominant land-cover types.
The LEAC method uses a 1 km resolution grid (called LEAC cubes) as an interface between
land cover data and reporting units. Reporting units (e.g. country areas) are rasterized
according to the 1 km resolution EEA reference grid6, land cover flows are calculated based on
100 m resolution raster CLC change data, summarized first for the 1 km grid units, and
summarized in a second step for the reporting units.
1.6
SPECIAL ISSUES
In this subchapter, we discuss a range of special issues connected to the possible use of a gridbased approach in European land monitoring. These include the handling of national vs a
common European cartographic projection; border areas between countries; representation of
land cover or land use change; visualization; and the amount of data in the pan-European
database.
PROJECTIONS
Current grid projects across Europe are mostly carried out with data in national projections.
Pan-European monitoring following a grid approach will have to use a common projection
across the entire continent. The solution is most likely to follow the INSPIRE Specification on
Geographical Grid Systems (TWG-RS 2009). Key elements of the INSPIRE specification
include that it is:
4
http://www.eea.europa.eu/themes/landuse/interactive/land-and-ecosystem-accounting-leac
http://www.eea.europa.eu/data-and-maps/indicators/land-take-2
6 http://www.eionet.europa.eu/gis/docs/eea_reference_grid_v1.pdf
5
Page 20 of 91
•
Based on the ETRS89 Lambert Azimuthal Equal Area coordinate reference system with the
centre of the projection at the point 52° N, 10° E and setting a false northing: Y0 = 3210000
m and false easting: X0 = 4321000 m.
•
Identified in INSPIRE as ETRS89-LAEA.
•
Designated as Grid_ETRS89-LAEA. For identification of an individual resolution level the
cell size in meters is appended to the name, such that the grid at a resolution level of 100
km is designated as Grid_ETRS89-LAEA_100K.
•
Hierarchical in metric coordinates to the power of 10. The resolution of the grid is 1 m, 10 m,
100 m, 1 000 m, 10 000 m, 100 000 m. The cell size is denoted in metres (“m”) for cell sizes
up to 100 m or kilometres (“km”) for cell sizes from 1 000 m and above.
•
Using the lower left corner of the grid cell as reference point
•
Grid cells are identified by a cell code composed of the size of the cell and the coordinates
of the lower left cell corner in ETRS89-LAEA. Values for northing and easting shall be
divided by 10n, where n is the number of trailing zeros in the cell size value. Example: The
cell code “1kmN2599E4695“ identifies the 1km grid cell with coordinates of the lower left
corner: Y=2599000m, X=4695000m.
Figure 1.11: ETRS89-LAEA 1Km grid projected in the Norwegian national projection UTM33. The
originally square grid units appear here as a rhombi.
An ETRS89-LAEA grid will not appear as a regular grid when it is drawn on a map in a national
projection (unless ETRS89-LAEA is the national projection). The size of the units will also be
changed slightly and depart from the originally uniform size and shape. This is illustrated in
Figure 1.11 where an ETRS89-LAEA 1 km grid is superimposed on a map in the Norwegian
national projection ETRS89 UTM-33. The grid cells here appear as rhombi. This is not a
problem as long as the grid cells are represented as polygons.
A possible and feasible solution to the challenges related to map projections is to establish the
grid with regular units as polygon data in ETRS89-LAEA and attach a unique identifier (label) to
each grid cell (as described in the INSPIRE specification TWG-RS 2009). The subset of the grid
that intersects a particular country (or study area) is selected and converted to a polygon data
set in the national projection. This polygon data set can be populated with data from national
maps and registers using methods described in chapter 1.4 above. The result is a polygon data
set with an attached attribute table in the form of an ordinary data table where the grid cells are
represented as rows and attributes as columns. The attribute table should include the grid cell
identifier.
Due to the different size of the grid cell under different projections, the attribute table will have to
be post-processed for data that are sensitive to these differences. The solution is probably to
handle all areal figures as fractions of the cell size instead of as absolute figures. Frequency
data (e.g. buildings or population) are not affected by the projection and can simply be
summarized. Linear data (length of roads, coastline, creeks etc.) may have to be scaled, using
a scale factor based on the total area of the grid cell calculated in the two projection systems
involved.
The delivery of national data in a system as described here simply amounts to uploading the
data table – not the grid map – to a central processing unit. The data table contains all the data
compiled for each grid cell, and the unique grid cell identifier ensures that the data table can be
linked directly to the original, untransformed ERTS89-LAEA grid.
BORDER AREAS
Page 21 of 91
Border areas between two countries or two study regions constitute a particular challenge when
the grid approach is used for pan-European monitoring (Figure 1.12). Border areas will be
covered by grid cells populated independently by several national data providers. The merging
of these data into a central repository must be undertaken by a central, coordinating institution.
To facilitate the merging, each national data provider must also include grid cells partially
overlapping their own territory. Areal data can be encoded as relative coverage (percent of the
whole grid cell) to minimize the possible effect of different projections used in the countries
involved.
Figure 1.12: Border areas. The grid cell surrounded by a thick borderline includes three countries: Norway
(here with topographic background map), Finland (most of the white and violet areas) and Sweden (a
small part of the violet area in the lower left corner of the grid cell.
LAND COVER CHANGES
Changes and change data constitute a particular challenge under the grid approach. The
employment of the grid cell as the basic unit of observation results in the loss of information
about gross changes. This requires an explanation of the difference between gross changes
and net changes. Net change is the calculated difference in area between two specific points in
time while gross change is the total area modified during this period. One problem linked to the
grid approach is that information is restricted to the net changes. The net changes may actually
hide more complex (and interesting) patterns of gross changes.
The difference between gross and net changes is exemplified in Figure 1.13. The example is a
single grid cell composed of 25 hectare Built-up land, 25 hectare Agriculture and 50 hectare
Forest (left). After some years the content has changed to 37.5 hectare Built-up land, 25
hectare agriculture and 37.5 hectare forest. The net change is that the coverage of Built-up land
has increased 12.5 hectare and the coverage of Forest has decreased 12.5 hectare. The gross
change is not known at the grid cell level. One possible scenario is that 12.5 hectare Forest land
was developed into Built-up land (center). Another possibility is that 12.5 hectare Agricultural
land was developed into Built-up land, while 12.5 hectare Forest land was cleared and
developed as new Agricultural land (right).
Figure 1.13: Gross and net changes. The example is a grid cell composed of Built-up land, Agriculture and
Forest (left). Two different development scenarios (center and right) result in the same net changes,
although the gross changes are clearly different.
Page 22 of 91
The solution, unless the limitation regarding calculation of change information from grid data is
accepted, is to populate the grid with change data in addition to status data. The changes are in
this case identified using the more detailed data source and each particular change class (e.g.
change from status A into status B) is handled as a separate attribute in the grid model.
Figure 1.14 shows an example from the Norwegian land resource map where all changes are
logged. Each change is stored as a polygon with LU/LC class before and after the change (in
addition to area and date). Based on this data source, it is possible to calculate the area
affected by each (or selected) change type(s) within the grid cells.
Figure 1.14: Detailed land resource map (Norway) where changes are logged when the map is updated.
Each colour represents a particular kind of change. It is possible under a grid approach to calculate the
total area of each kind of change for each grid cell.
LAND COVER TIME-SERIES
A drawback of the vector based CLC-change mapping method is, that changes are delineated
with respect to two status years. As a consequence unnecessary or contradictory changes may
occur if comparing two CLC-change datasets. As an example, two European CORINE Land
Cover Change (CLC-Change) datasets were produced during the CLC2000 and CLC2006
campaigns:
•
•
CLC-Change(1990-2000);
CLC-Change(2000-2006)
As the second CLC change mapping (in CLC2006) was not harmonized with the results of the
first CLC change mapping (in CLC2000), geometric and thematic contradictions might occur
between the two CLC change maps.
Extra efforts have been taken to harmonize the above two CLC-change datasets7. The
methodology is clearly based on a 100 m raster, because due the 5 ha MMU and generalization
rules it was impossible to follow change delineations over more than two reference dates, while
it was easy to decide to which CLC class a 100 m cell does belong in a certain reference year.
As a conclusion, CLC changes between two dates may be interpreted in irregular shape
polygon format, but a CLC time-series are better to be created and interpreted in a raster / grid
approach.
DATA SIZE
The amount of data generated by a grid approach depends on two factors: the grid resolution
and the number of attributes used for characterization. Key figures for Norway (with a land area
of 323 758 km2) is that a 5 km grid consists of 20 747 grid cells when the grid is limited to the
land area (including inland water, but not extending into the ocean - see example in Figure 1.15
below). A higher resolution 1 km grid results in approximately 25 times as many grid cells
(533 918 in the Norwegian dataset) while a 100 m grid will have around 50 million grid cells in
Norway alone.
7 Development of methodology to eliminate contradictions between CLC-Change1990-2000 and CLCChange2000-2006. ETC-LUSI report. 2011.
Page 23 of 91
Figure 1.15: Agricultural area (%) in a 5 x 5 km grid of Norway. The map exemplifies visualization of a
single attribute in a grid model by using a graduate colour scale.
The challenge with respect to a complete European coverage depends on the area included.
EEA (2010) is reporting 5 424 171 km2 as the total area covered by the analysis in that report.
This figure would imply approximately 220 000 grid cells when a 5 km model is used; 5.5 million
grid cells when a 1 km model is used; and more than 500 million grid cells when a 100 m model
is used.
VISUALIZATION
Visualization of grid data can be a challenge because the grid model is data-rich. In land
monitoring based on polygons (e.g. CLC), each polygon is assigned to a single land cover class
and the corresponding visualization method is the traditional and well-known choropleth map.
Grid models obviously provide excellent opportunities to visualize each class separately, since
the amount of each class in each grid cell is known. An example is the accompanying map of
agricultural area in Norway, shown in Figure 1.15 using a graduate colour scale. The grid cell
size in the example is 5 x 5 km.
The graduate colour method is clearly insufficient for visualization of a grid model more closely
related to CLC. Although each land cover class is represented as a separate attribute in the grid
model, there is still a need for a composite map showing all the land cover classes together in a
single map. SYKE (Finland) has developed a method for this purpose (Figure 1.16).
The SYKE approach is designed for the Finnish HR CLC aggregated to a 1 km grid. Each
original CLC class is a separate attribute in the 1 km grid model. Each grid cell is thus attached
to an attribute vector representing the statistical distribution of CLC classes inside the grid cell.
The visualization involves two steps. The first step is to determine which CLC Level 1 class that
dominates the grid cell. The second step is to find the dominating CLC Level 3 class within the
Page 24 of 91
previously identified Level 1 class. The result is the class assigned to the grid cell for
visualization.
Example:
A particular grid cell has a statistical profile where the combination of CLC classes is
10% 1.1.1 Continuous urban fabric
30% 1.1.2 Discontinuous urban fabric
20% 1.2.1 Industrial and commercial land
40% 3.2.1 Conifer forest
The visualization according to the Finnish method will show this grid cell as 1.1.2
(Discontinuous urban fabric) because it is the dominating Level 3 class within the dominating
level 1 class (Built-up land).
Figure 1.16: Left image is the Finnish land cover data (vector format, 25 ha MMU). Right image is the
data aggregated to 1 km grid visualized using the Finnish method for visualization of grid data.
WORKING WITH GRIDS
The storage system for a grid based data structure can either be ID based or raster based. An
ID based structure consists of a geometry representing the locations of grid cells (either as
points, polygons or as a raster) and with a unique ID attached to each raster grid cell. The
geometrical storage structure does not include any attribute data. The attributes are instead
stored in separate attribute data tables with the unique ID taking the position as a unique
reference that can be used to link the attribute table and the geometry.
A raster based storage structure, on the other hand, organizes the attribute data in a tabular
structure where the position of an attribute value inside the table reflects the location of the
observation. Each raster table corresponds to one attribute column in an attribute data table. A
large number of raster tables are required in order to uphold the information in a grid structure,
but it is possible to use this approach to “simulate” the grid structure using raster data tools.
The advantages of the raster based solution are that raster processing usually is (much) faster
than vector data processing. Raster data may be also be compressed and require less storage
(at the cost of less efficient processing). The pan-European 100m resolution raster data (CLC,
HRLs) are therefore stored and processed as raster data (Maucha 2013). The disadvantage of
raster based solutions is the need for special software for analysis; the number of raster layers
involved and the separation of the storage and processing environment from the data capturing
environment (that anyhow has to be polygon based).
The advantage of an ID based solution is that data are stored and processed in database tables
and standard database processing tools and statistical software can be applied for data analysis
(e.g. Strand 1997). IDs refer to fixed locations even without a “real” GIS layer in the background.
Many related tables may be stored in the same database and data transferred between users
simply as data tables. No dedicated GIS software is needed for ID based powerful statistical
Page 25 of 91
analysis (e.g. LEAC cubes: see http://www.eea.europa.eu/themes/landuse/interactive/land-andecosystem-accounting-leac)
The anticipated limitation of ID based solution is that working with pan-European layers the
vector based grid solution is very slow when the current (proprietary) GIS solutions are used.
The length of unique IDs – at least when the INSPIRE recommendation is followed - may also
be too long to be stored in existing commercial raster formats. Open source spatial database
and software and open source spatial data analytical software is likely to change this picture,
but the land cover monitoring community seems so far to have limited experience with these
tools.
ACCESS TO INFORMATION
The HELM project (Ben-Asher 2013) reported that access to detailed national data for use in
pan-European monitoring can be restricted. It can even be difficult to access national data
across sectors, although the situation is improving with the introduction of national spatial data
infrastructures in many countries. The grid approach can potentially improve this situation. The
access to detailed vector data may be restricted due to concern about data privacy, social
safety or commercial considerations. It may be easier to release these data in a grid format.
National statistical agencies do, as an example, have a restrictive policy regarding the
distribution of detailed population data. A 1 km grid resolution with particular handling of grid
cells with very few inhabitants has, however, been found acceptable as a way to distribute these
data (see more details in chapter 2 below).
A grid approach may also open other national data sources for use in land monitoring. There is
so far no systematic study of this possibility, but it is likely that agencies now reluctant to give
access to LPIS, cadastres, road data, building registers etc. will be more forthcoming towards
requests for data delivered on a sufficiently coarse grid format.
Page 26 of 91
2 THE GRID APPROACH USED IN EUROSTAT/
EFGS
2.1
INTRODUCTION AND BACKGROUND
This chapter is aimed at reviewing the contributions to the grid approach carried out by the
European Forum for Geography and Statistics (EFGS) and Eurostat through the GEOSTAT
project. The current report collects and summarises the main outcomes of the projects
GEOSTAT 1A and GEOSTAT 1B and discusses potential synergies of such grid approaches in
relation to land monitoring in Europe.
The European Forum for Geography and Statistics (http://www.efgs.info/) – former European
Forum for Geostatistics – started as a voluntary cooperation between National Statistical
Institutions (NSIs) in the Nordic countries in 1998, on the use of geographic information systems
(GIS) and statistics. Today, it gathers people from more than 40 states and territories and holds
annual conferences and meetings. At the time of writing, the next EFGS Conference will be held
in Vienna in November 2015
(http://www.statistik.at/web_en/about_us/events/efgs2015/index.html).
The activities of the EFGS are focused on the development of best practices in the production
of geostatistics in Europe, putting a special emphasis on the grid-based statistics and the spatial
disaggregation of statistical data.
At the beginning of 2010, Eurostat, in cooperation with the EFGS, launched the GEOSTAT
project, aimed at promoting grid-based statistics and, more generally, working towards the
integration of statistical and geospatial information in a common information infrastructure for
the EU. Its main objective is to develop common guidelines for the collection and production of
spatial- and grid-statistics within the European Statistical System (ESS). The first concrete
action carried out has been the GEOSTAT 1 project, aimed at representing various population
characteristics in 1 km² grid datasets. GEOSTAT 1 was divided in three phases:
GEOSTAT 1A produced a population prototype dataset for 2006 and developed a methodology
to be followed by others for generating European grid statistics.
GEOSTAT 1B produced a first version of a grid dataset for the Census year 2011. The first
version currently holds data from 18 of the EU and EFTA countries, with the remaining 14
countries filled with modelled data. It has also created a detailed manual for creating population
grids from statistical information using aggregation or disaggregation techniques. In a case
study on accessibility to health care, the project has demonstrated how grid statistics can be
used for spatial analysis. The GEOSTAT 2011 V1 dataset is available for download, together
with that for 2006 (http://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/populationdistribution-demography).
The aim of GEOSTAT 1C (currently on-going) is to add further national data to the GEOSTAT
2011 data. A second version of the GEOSTAT dataset can be expected this year (2015).
This report makes an assessment of the outcomes of the GEOSTAT 1A and 1B. A list of the
documents is provided in Annex 1
2.2
GEOSTAT 1A
GEOSTAT 1A was the first step in the long-term framework of integrating spatial information
and statistics, promoted by Eurostat and the ESS. It had the aim of developing a vision and the
methodological foundations for a population grid dataset.
The project claimed “The GEOSTAT vision is to have a combination of grid-based statistical
systems which meets needs at different geographical levels, from the national and continental
level to the global level”, whereas “the GEOSTAT action is about developing guidelines for
datasets and methods to link 2011 Census statistics to a common harmonised grid”.
The project wanted to take advantage of the fact that many NSI have the ability to produce
statistics for very small areas, especially regarding population variables. They took on board
previous ideas of displaying population concentrations in a geographical grid. They especially
mention the population grid that JRC produced in 2004, using disaggregation techniques, and
they highlight its limitations.
Page 27 of 91
OBJECTIVES OF GEOSTAT 1A
The objectives defined for GEOSTAT 1A were:
•
•
•
•
•
Develop a set of methods, tools and guidelines to create harmonised data sets.
Create a population grid using existing population data sources.
Prepare a vision document for a spatial data infrastructure for geostatistics.
Contribute to the integration of GEOSTAT within all major European information
systems (GMES, INSPIRE, SEIS etc.).
Disseminate and share the results from the GEOSTAT project among NSI.
USER AND PRODUCER NEEDS AND REQUIREMENTS
To have a better knowledge on user needs and best practices, GEOSTAT 1A carried out 2
different surveys.
Regarding data users, the majority of respondents used grid-based statistics for the purposes of
spatial analyses. They mainly needed population, housing and economic variables on the basis
of grids. Moreover, there was a clear correlation between area of interest and grid size. The
result of preferred grid sizes is shown in Figure 2.1.
Figure 2.1: The preferred grid sizes reported from the GEOSTAT 1A survey.
As for data producers, 19 organizations answered the survey and it was concluded that most
countries publish the data in national grid systems, although they also consider the one-grid
system and Europe-wide harmonization to be important and most agreed that there should be a
grid dataset available covering the whole of Europe. Producers also rated grid sizes up to 1 km
as the most relevant for meeting users requirements. For reasons of data confidentiality and
national data protection acts, almost all data producers face certain restrictions in their
dissemination of grid statistics. Business models differ from one country to another. Some data
are freely distributed; in other cases a fee is requested for delivering data to pay for the
production of the statistics.
RECOMMENDATIONS ON DATA SPECIFICATIONS
From the surveys some recommendations are given with regard to grid-based statistics and
their publication at European level.
Map projections
Page 28 of 91
Recommendation 1: Grid data for the European GEOSTAT dataset should be
referenced to the European Grid Grid_ETRS89-LAEA_1K.
If no direct aggregation from national point data sources into the INSPIRE grid is
possible, the GEOSTAT project proposed to convert national grids into the INSPIRE
1 km² grid for the European GEOSTAT dataset.
Grid sizes
Recommendation 2: The GEOSTAT grid dataset should have a grid cell size of 1 km².
Coding system
Recommendation 3: For unambiguous referencing and identification of a grid cell, the
cell code should be composed of the size of the cell (1 and the coordinates of the lower
left cell corner in ETRS89-LAEA should be used). The cell size should be denoted in
kilometres (‘km’) for cell size 1 km. Values for northing and easting should be divided by
1000. The cell code ‘1kmN2599E4695’ identifies the 1 km grid cell with coordinates of
the lower left corner: Y=2599000m, X=4695000m.
Confidentiality
Recommendation 4: For the European GEOSTAT dataset at 1 km² the total population
should be disclosed without restrictions for any grid cell. The GEOSTAT project
recommends that data protection measures should depend on the sensitive nature of
the variables. Absolute counts of the statistical units (such as number of inhabitants,
households, buildings, workplaces) should be disseminated without any restrictions,
even for the smallest grid sizes.
Data dissemination
Recommendation 5: The GEOSTAT dataset may be downloaded free of charge and
without access restrictions in the form of a .csv file with the statistical data, a grid net
shape file and an INSPIRE metadata file in ISO19139 encoding. The dataset may be
downloaded from the EFGS or Eurostat websites.
Data update frequency
Recommendation 6: The GEOSTAT dataset version 1 should have the reference year
2006. The next version of the European GEOSTAT dataset should have the reference
year 2011.
Data quality measures
Recommendation 7: The data quality of the GEOSTAT dataset version 1 should be in
the form of INSPIRE metadata encoded in ISO19139 .xml files. The production process
should be documented per grid data source and country.
THE GEOSTAT 1A DATASET
To achieve the final goal of GEOSTAT, which has been explained before, it was agreed that it
was important to have an initial example of a dataset which already highlighted the main
concepts of the final data deliverable and also helps to identify obstacles and challenges. It was
decided to fix 2006 as the reference year for the first GEOSTAT population grid, building on the
idea of the former EFGS map (ESS Eurogrid Population Map 2009) of the European population
on 1 km² grid, where some nationally produced population grids were integrated into an
estimated European population grid (JRC 2001).
In this phase of the project, GEOSTAT collected, on the one hand, some national experiences
working with grid data. On the second hand, based on that experience and derived guidelines,
the NSI were invited to produce their own country’s harmonised population grid data for 2006,
which would be part of the final European dataset. In parallel, an European-wide estimated
population grid was updated and improved.
National data sources
Grid-based data from national sources were produced by quite diverse methods and from a
wide variety of data sources. Moreover, documentation was sometimes very poor. Therefore it
was not easy to assess the quality and limitations of a grid product. In GEOSTAT 1A six
countries shared their experiences on that topic (Austria, Finland, Norway, Poland, Portugal and
France).
National production methods
Page 29 of 91
In this section, GEOSTAT 1A assessed different production methods from countries experience.
Grid data can be produced by aggregation or disaggregation methods, with the former being the
preferred one in terms of accuracy and quality.
Conversion to ETRS89_LAEA
Some tests were done converting national gridded data to the European grid and the results
were assessed. The main conclusions were that the way in which the conversion is undertaken
is very important. More details in the source grid data improved the standard of the derived grid
data. The best results were obtained when converting the source data to the European
Reference and transferring that information to the grid, although there was the drawback of
doubling the source information.
Country reports
Other methods, used by some countries to produce the GEOSTAT 2006 dataset, consisted of
aggregating population data from point sources (e.g. building registers). Depending on the
resolution of population source data, aggregation or disaggregation methods should be applied.
Some countries had to, partially or completely, disaggregate the population data, but others
were able to produce the new dataset by aggregating population figures to the 1 km2 grid cells.
Each national approach was discussed within the project. The best option would be to use
building centroids with population data when these are available.
European grid production initiatives
In parallel to national initiatives, an improved European population grid was produced by means
of disaggregation techniques. Taking the 2001 JRC grid as a starting point, a 2006 population
grid was produced by the Austrian Institute of Technology (AIT), using the soil sealing HRL as
ancillary data to improve the disaggregation of population figures. In addition to the European
disaggregated grid, AIT also produced national grid datasets for those countries that share a
border with a GEOSTAT country. This simplifies the integration of the two datasets along the
borders and to deal more accurately with population in shared border cells.
Description of the GEOSTAT 1A dataset
The GEOSTAT Population grid was made from the national grids complemented with the
European population grid by AIT, whenever national grids are not available. This section of the
GEOSTAT 1A Final Report describes some of the specifications of the product, such as field
names, metadata, etc.
Statistics of the GEOSTAT 1A grid dataset
The reference grid Grid_ETRS89_LAEA_Europe_1K, intersecting with the EU27 + EFTA
landmass delineated by a coastline/boundary dataset at scale 1:100 000, has a total of
4 884 516 grid cells whereas the GEOSTAT_2006_POP grid contains 1 945 315 inhabited grid
cells with at least one inhabitant. In total the GEOSTAT national dataset represents 43% of the
total 505 Mio EU 27+ EFTA citizens population in 2006 and covers 2 446 199 km² equal to
~50% of the total land surface of 4 825 275km² while the disaggregated population dataset
covers 57% of the population.
A method for generating population grid statistics
As it has been said, the production of the GEOSTAT 2006 population grid was the “testing
phase” of the ultimate goal of producing a high quality population grid of the 2011 census.
Although the product was really good, it still had some limitations, such as different national
production processes, or the fact that it only accounts for total population at the place of
residence. Under this section, the report highlights different aspects to be taken into account for
future developments.
General method to generate grids
The production of statistical grids usually consists of two elements: a grid net and a statistical
table. A grid net is not always necessary, because georeferenced data can be processed just
like any territorial statistics or like spatial statistics by point georeferences. Grid-based statistics
are territorial statistics which use direct x and y coordinates as a location code. The first
precondition for generating grid-based statistics is that thematic data can be georeferenced.
Aggregation method: Figure 2.2 shows a schema of the proposed workflow for the
aggregation method. They highlight the importance of using the same reference system and, in
case of needed transformations; they should be carried out with the source layers.
Page 30 of 91
Disaggregation method: The overall workflow of the disaggregation approach is described in
detail in the literature (see Batista e Silva et. al 2011, Gallego 2010 for further references). The
disaggregation method is particularly useful when large territories are covered in a multinational
to global context. GEOSTAT was not meant to improve such European grids, but the
disaggregated grid is very useful to cover gaps when country grids are not available.
A disaggregation model is needed to estimate, for each grid cell or part of grid cell, different
density categories defined by ancillary data. The calculation of density categories and weights
depends on the ancillary data available. Ancillary data can be LU/LC, night time lights, soil
sealing, road network, buildings etc.
Figure 2.2: The proposed workflow for the aggregation method.
Confidentiality management
The GEOSTAT project recommends that data protection measures should depend on the
sensitivity of the variables. Absolute counts of the statistical units (such as number of
inhabitants, households, buildings, workplaces) should be disseminated without any restriction,
even for the smallest grid sizes. However, if the grid cells are not sufficiently populated, further
characteristics of the statistical units should be protected (e.g. a person's age).
One way to control the dissemination and use of grid-based statistics is to release them on
condition that a specific license is granted. An agreement on the use of similar confidentiality
handling methods is needed when harmonized cross-border datasets are disseminated.
GEOSTAT stated that there was a need for further technical developments of disclosure control
methods which take the spatial characteristics of grid data into account.
Data Quality
GEOSTAT was convinced that, generally speaking, the quality of population grid data produced
by National Statistical Institutes is higher than disaggregated European-wide population grid
data. National data sources are much more accurate than European population data sources.
There are two aspects to the notion of quality in the context of grid-based statistics:
•
•
quality of the georeferenced source data
quality of the production process.
Different metadata properties give an idea of the quality of the gridded datasets.
Grid sizes
Two grid systems are proposed by GEOSTAT: a primary one, with cells with 10a (where 6≥a≥ 0)
size in meters (i.e. 100 km, 10 km, 1 km, 100 m, 10 m and 1m); a secondary one, defined for a
Page 31 of 91
more extensive series of grids: 10m, 25m, 50m, 100m, 250m, 500m, 1km. 2.5km, 5km, 10km,
25km, 50km. The secondary system has the advantage that can always be aggregated up to
the primary one.
A CONCEPT FOR A GEOSTAT DATA INFRASTRUCTURE
Under this chapter, it is discussed the GEOSTAT dataset as a distributed service, where all
NSIs which deliver data to the system share the benefits of its use and also contribute to the
definition of the terms or business model under which the data may be used.
The publication of grid statistics falls under the INSPIRE directive and has to fulfil its
requirements. However, being a statistical and not just a spatial dataset, there are also certain
restrictions and conditions as regards compliance of the GEOSTAT dataset with INSPIRE.
A concept for a future GEOSTAT data infrastructure is defined and detailed, in terms of data
types, data services, rights management, metadata and web services.
CONCLUSIONS AND FURTHER WORK
Some conclusions are given, on the basis of surveys, discussions and results gained from the
project. The different actions carried out by the project have led to improve the overall scope
and quality of the existing ‘ESSgrid’ effort to map the European population on 1 km² grids.
Knowledge sharing
The knowledge and results gained have been spread to the ESS and through the EFGS, putting
emphasis on the fact that non-participating countries also benefit from the developments and
are encouraged to implement the same approach.
Sustainability
ESS partnership builds upon the existing EFGS network and website, which consisted of
national contacts from 32 countries. Support to ESS was requested to solve some issues and
motivate more countries to participate. One of the aims is being able to produce grids more
often than census is undertaken (every 10 years).
Grid production
The project has concentrated on total population. Two systems of grids was proposed: a
primary and a secondary one. Conversions of grid data were tested with different sizes of
national grids. Support for the development and refinement of disaggregation methods in
cooperation with research institutes will remain high on GEOSTAT's agenda.
One package — one provider
The goal of the EFGS is to act as a hub for users of grid statistics, a single point of contact.
It was stated that users mainly need population, housing and economic variables on the basis of
grids.
Confidentiality
It was concluded that data providers differentiate between more and less sensitive data and
define accordingly the thresholds for variables and corresponding grid cell sizes. As the 1km²
grid can be considered to be a fairly rough grid, no restrictions should be applied to the
publication of total numbers such as the total population figure, the total number of buildings
and the total number of dwellings.
Quality
The GEOSTAT project examined a common process of quality assessment. In the area of gridbased statistics there are two major perspectives: one is the quality of its georeferenced source
data and the other is the quality of the production process. The quality of grid data should be
described both from the spatial perspective (INSPIRE) and the statistical one (SDMX).
Dissemination
The project disseminated population grid data by 1 km2 free of charge via EFGS.
Tools
It was identified that more support for tools would be needed for future iterations, and tools
should be non-proprietary and open source.
2.3
Page 32 of 91
GEOSTAT 1B
The GEOSTAT 1B action was about developing guidelines for datasets and methods to link
population and housing census 2011 statistics to a common harmonized grid. This action was
launched at the beginning of 2011 and lasted for 24 months. GEOSTAT 1B was built on the
work undertaken by the EFGS and the project GEOSTAT 1A.
The GEOSTAT 1B final report was made following the structure of a statistical production chain.
POPULATION
GRID STATISTICS AS PART OF THE STATISTICAL PRODUCTION
CHAIN
In this project the focus was put on increasing the use of GIS in the work of NSIs. It was
decided to follow the statistical production chain to analyze the different possibilities. This
production chain is also known as the Generic Statistical Business Process Model (GSBPM).
On this basis, the report describes the following aspects/phases of the process chain:
•
•
•
•
•
Specific needs
Design
Build, collect and process
Analyse
Disseminate
Specific needs
Two overall user needs were identified:
•
•
The need for integrating geography and statistics
The need for spatially referenced statistics in a hierarchical system of stable and neutral
grids.
It was stressed the need of integrating geography and statistics to better describe the world and
analyse different phenomena, and e.g. contributing to support the EU2020 strategy on
sustainable and inclusive growth.
Although GEOSTAT 1B focused on population and housing information, other discussion
forums (e.g. UN Expert Group on the Integration of Statistical and Geospatial Information)
identified other topics to benefit from such type of integration, like agriculture and economic
censuses and environmental-economic accounting.
As already identified in GEOSTAT 1A, the need for detailed population data by grid is
widespread. These needs can be described from two perspectives:
•
•
The need from users for grid data as a tool for analysis and information.
The need from producers of statistics to produce grid data or to support the production
of statistics using grid data, e.g. in the sample design.
In GEOSTAT 1B the overall objective was to produce population data on grids including the
following variables:
•
- Total population, population divided by age and sex.
Like in GEOSTAT 1A, grid data were produced by means of aggregation of micro-data when
possible and, for those countries not providing such information, by disaggregation. Both
methods are fully described in the appendices of the GEOSTAT 1B report.
Design
Under this section, the report describes the grid to be used by GEOSTAT 1B and the
characteristics of the statistical variables. This is a summary:
The grid is the LAEA European Reference Grid at 1 km2 as in GEOSTAT 1A. The naming
convention is GEOSTAT_grid_POP_1K_CC_YYYY (CC:country code, YYYY: ref. year, e.g.
GEOSTAT_grid_POP_1K_SE_2011). There is also a convention for field names.
The variables to be integrated are:
•
•
•
•
TOT_P [Number of inhabitants, total]
TOT_M [Number of inhabitants, male]
TOT_F [Number of inhabitants, female]
M_00_14 [Number of inhabitants, male age 0-14]
•
•
•
•
•
Page 33 of 91
M_15_64 [Number of inhabitants, male age 15-64]
M_65_ [Number of inhabitants, male age 65-]
F_00_14 [Number of inhabitants, female age 0-14]
F_15_64 [Number of inhabitants, female age 15-64]
F_65_ [Number of inhabitants, female age 65-]
Some parameters were defined to make a quality assessment of the primary data production
and also the grid data production. Data source quality was also discussed in the appendices of
the report.
Build, collect and process
Based on the design, the various partners carried out their work with building, collecting and
processing their data. Depending on the data availability, there were two different approaches to
produce the population grids: “aggregated” and “hybrid”.
Following Finnish developments, the “aggregated” approach is fully described in the
appendices, and a webtool was also produced to support producers carrying out this method.
The NSI of Bulgaria followed and described in a detailed way (also in the appendices) the socalled “hybrid” approach.
Analyse
As a response to the data request from the project, the following countries contributed to the
GEOSTAT 2011 grid: Belgium, Bulgaria, Czech Republic, Estonia, Finland, France, Ireland,
Netherlands, Norway, Poland, Portugal, Slovenia, Sweden and Switzerland.
To test the population grid, the GEOSTAT 1B partners agreed to work together on a case study
calculating the population within 30 minutes from emergency hospitals. GIS was used and the
aim was to explore:
•
•
the consequences of using grids as statistical units instead of other statistical units as
e.g. municipality areas.
the consequences of data confidentiality policies applied by various NSIs when
publishing population grids.
The results of this exercise are fully described in one appendix of GEOSTAT 1B.
Disseminate
Four workshops and two Conferences were organised during the development of the
GEOSTAT 1B project to disseminate the methods and results. The EFGS published data and
report and remains as one of the main channels for dissemination.
OVERARCHING PROCESSES – QUALITY MANAGEMENT
Some aspects regarding the quality improvement of the production system and the data itself
were introduced by some project partners. Particularly, Statistics Portugal introduced a new
sampling frame for household survey purposes, on the basis of the European Grid. For each
building their corresponding GRID Cell is known. For the sampling process of the household
surveys, cells are selected within strata (NUTS3 areas), and a spatial sorting algorithm assures
the contiguity/proximity of each cell. An appendix details this new sampling infrastructure. In
parallel, the Czech Statistical Office tested the disaggregation method to allocate people that
cannot be linked to the exact place of usual residence, which is almost 1% of total population in
that country. That approach is also detailed in one of the appendices to the GEOSTAT 1B final
report.
GEOSTAT 1B APPENDICES
The GEOSTAT 1B Final Report contains several appendices which complement the information
of the report with technical details, case studies or posters with methodological schemas. The
list of the appendices included has been provided in section 1. Annex 1 includes a short
description of the contents of each Appendix.
2.4
GEOSTAT RESULTS AND THE GRID APPROACH IN LAND MONITORING
The projects GEOSTAT 1A and 1B (and ongoing 1C) are a proof of concept of a grid approach
in the framework of statistical data, or, more specifically, population statistics. The efforts carried
out so far by Eurostat and the EFGS to develop some methods and produce population grid
data, highlight the big importance that the grid can have at European level as data storage and
analysis system. Besides providing detailed and homogeneous information for large areas and
Page 34 of 91
overcoming the problem of heterogeneous and arbitrary administrative boundaries, the grid
approach has proven to be useful for different types of analysis (e.g. accessibility to emergency
hospitals, vegetation change detection…).
If we think of the interaction of GEOSTAT and the grid approach in land monitoring described by
the EAGLE team, we see different aspects where benefits and synergies can be established,
basically focused in two topics:
•
•
The methodology and standardisation of working processes
Combination of statistical and land cover / land use information through the Grid.
THE GEOSTAT METHODS AND THE GRID APPROACH IN LAND MONITORING
The GEOSTAT experience is very valuable to any kind of grid approach, as it has provided
information regarding:
•
•
•
•
•
User and producer needs (e.g. preferred grid size).
Methods of aggregation / disaggregation and hybrid to feed a grid with data depending
on the source information.
Standardisation aspects, such as projections, coding, field names, etc.
Dissemination and confidentiality issues.
Quality assessment.
These different topics have been summarised in this report, which can be used for reference
when implementing a grid approach for land monitoring. Although GEOSTAT is working
basically on population statistics, accounting for number of people within a grid cell or
accounting for the number of hectares of a specific land cover class is not so different.
Therefore all the issues about populating grids discussed by GEOSTAT are useful for our
purposes. Moreover, some case studies on the use of gridded information have been
undertaken, providing also valuable information.
COMBINATION
OF STATISTICAL DATA AND LAND COVER DATA THROUGH THE
GRID
The grid approach has a great potential in terms of data integration. The European Reference
Grid can become a common integrator layer for different types and formats of data. The
challenge of integrating data which are thematically, spatially and/or temporary heterogeneous
has already been faced, for instance, by the ESPON Programme. A single grid cell can contain
at the same time information on e.g. population, land cover and administrative units. In this way
all this data can be queried at once and provide integrated assessments, for instance through
OLAP8 databases. The UAB has been working in this direction with the EEA and under the
ESPON Programme. For more info:
http://www.espon.eu/export/sites/default/Documents/Publications/ScientificReports/ThirdDecem
ber2014/202425_ESPON_SCIENTIFIC_REPORT_3.pdf
8
OnLine Analytical Processing.
Page 35 of 91
3 CASE STUDY: STATISTICAL ANALYSIS
The main objective of the next three chapters is to analyze
and demonstrate various aspects of the grid approach
using real data. When using the grid approach it is possible
to maintain (or preserve) all detailed quantitative
information on land cover classes provided by diverse data
sources and to use this information for querying, data
integration and statistical purposes. However, the cell size
of the grid has an effect on the information regarding the
spatial distribution of the land cover classes. These effects
will also be demonstrated.
The practical exercise was divided into three actions: The
first was the population of the grids, which formed a base
for the next actions, that were the quantitative and
qualitative analysis. The quantitative analysis focused on
comparing the coverage of national High Resolution CLC
classes and European CLC classes in each grid cell. The
qualitative analysis was done by visual comparisons of
maps made from populated grids and original EU CLC
vector maps.
The province of Kymenlaakso in south-eastern Finland
(Figure 3.1) was selected as a test area for the study. The
main reason to select a Finnish test case was the fact that
Finnish Environment Institute has produced the European
CLC 25 ha data using a bottom up approach based on a
more spatially detailed national CLC data set and thus has
comparable national and European data from the years
000, 2006 and 2012.
3.1
Figure 3.1: Kymenlaakso province in
dark green
SOURCE DATASETS AND THE GRID POPULATION PROCESS
The two main source datasets used in the study were the pan-European Corine Land Cover 25
ha vector for 2012 (EU CLC) and the National High Resolution Corine Land Cover 20 m raster
for 2012 (HR CLC) from Finland (Figure 3.2). Additionally, Finnish High Resolution Corine Land
Cover Changes 2006-2012 (MMU 1ha, 20m raster) and High Resolution Corine Land Cover
25m raster for 2006 were used in the change detection part of the quantitative analysis.
Population data from Statistics Finland was used to demonstrate the use of additional datasets
together with LC grid data.
HR CLC and EU CLC used the same (standard European) LU/LC classification. However, there
were two classes that are present in HR CLC but absent in EU CLC, due to the coarser scale
and generalization procedures. These are 111 (Continuous Urban Fabric) and 244 (Agroforestry Areas). The former is generalized into 112 (Discontinuous Urban Fabric) and the latter
into 231 (Pastures). There are also two classes that are generated by the generalization
process of EU CLC and thus have no equivalent in HR CLC. Those classes are 141 (Green
Urban areas) and 242 (Complex Cultivation).
For this exercise, two grids were produced; 1 km and100m x 100m. Both grid datasets were
populated with the same set of attributes: the amount of each CLC class from both HR CLC
data and EU CLC data. Both sets of attributes were created in percentages and in ha / m2.
Additionally the majority class for each grid cell was calculated from both HR and EU data. The
grid population process was run as Python scripts in ESRI ArcMap, with the Tabulate Area (for
raster dataset) and Tabulate Intersection (for vector dataset) tools forming the base of the
process.
Page 36 of 91
Figure 3.2. City of Kouvola shown in HR CLC (left) and EU CLC (right)
The change detection analysis was only carried out for forest changes using the 1 km grid.
Forest changes are the prominent change type in Finland and thus deemed suitable for this
exercise.
3.2
COMPARISON OF PAN-EUROPEAN CLC AND NATIONAL HR CLC
This example of a statistical analysis is a comparison of (the pan-European) EU CLC and (the
Finnish national) HR CLC using a 1 km grid as a basis for the comparison. A global analysis
provides an overview of the area and share of all the LC classes according to the two different
data sources. Further comparison on a cell by cell basis shows in more detail how the amount
of each CLC class is reported differently in the two data sources.
It should be noted that this study did not compare the grid approach to the vector-based CLC
approach. The study is primarily a study demonstrating how the grid approach can be used to
analyze and compare data. We could have chosen other datasets for this purpose. Using CLC
data from two different sources does, however, give the added value that the results can be
used to demonstrate some of the aspects of a gridded approach to CLC compatible land
monitoring based on more detailed data from maps and registers.
DESCRIPTIVE STATISTICS
A general overview of all the land cover classes in the study area (areas, percentages and
absolute and relative differences, between the HR CLC and the EU CLC) is found in Table 3.1.
Note that there was a slight difference of total area between HR CLC and EU CLC (11 ha). This
was probably due to rounding errors and the presence of “no-data” pixels in EU CLC data.
There are four classes (242, 321, 322 and 333) which appear in the final table, but have no area
at all, so they are not considered in the analysis (highlighted in light red in Table 3.1).
At first sight, we notice some important differences in terms of total area and share for some of
the classes, depending on the source of the data. To better analyse those differences, we have
distinguished between majority classes and minority classes, using a coverage threshold of 5%.
In the “Diff” columns, areas and percentages in red indicate underestimation by EU CLC,
whereas black colours indicate overestimation. One notable result refers to class 243 (Land
principally occupied by agriculture...), as it is almost non-existent in HR CLC (0.4%) but
represents a share of 5.2% in EU CLC. This class seems to be seriously affected by a
combination of the different MMU and the fact that it is a mixed class (something similar can be
Page 37 of 91
observed regarding the “mixed forest” class). These issues are discussed in the analysis class
by class below, and in the concluding chapter.
Table 3.1. Total figures (area, percentage and differences) for all classes according to the HR CLC and the
EU CLC within the 1 km populated grid
Class
111
112
121
122
123
124
131
132
133
141
142
211
222
231
242
243
244
311
312
313
321
322
324
331
332
333
411
412
421
511
512
523
Description
Area_N
Area_E
%_N
%_E
Diff. E-N
% Diff E-N
Continuous urban fabric
546
0
0.07%
0.00%
-546
-100.00%
Discontinuous urban fabric
11611
14218
1.51%
1.85%
2607
22.45%
Industrial or commercial units
5598
2887
0.73%
0.38%
-2711
-48.43%
Road and rail networks and associated6648
land
43
0.87%
0.01%
-6605
-99.35%
Port areas
1003
1357
0.13%
0.18%
354
35.29%
Airports
329
424
0.04%
0.06%
95
28.88%
Mineral extraction sites
1849
796
0.24%
0.10%
-1053
-56.95%
Dump sites
265
201
0.03%
0.03%
-64
-24.15%
Construction sites
386
305
0.05%
0.04%
-81
-20.98%
Green urban areas
0
397
0.00%
0.05%
397
100.00%
Sport and leisure facilities
5656
366
0.74%
0.05%
-5290
-93.53%
Non-irrigated arable land
88285
70174
11.49%
9.13%
-18111
-20.51%
Fruit trees and berry plantations
209
36
0.03%
0.00%
-173
-82.78%
Pastures
212
0
0.03%
0.00%
-212
-100.00%
Complex cultivation patterns
0
0
0.00%
0.00%
0
Land principally occupied by agriculture,
3555with significant
40049 areas0.46%
of natural vegetation
5.21%
36494
1026.55%
Agro-forestry areas
10
0
0.00%
0.00%
-10
-100.00%
Broad-leaved forest
17709
3757
2.30%
0.49%
-13952
-78.78%
Coniferous forest
241067
251617
31.37%
32.74%
10550
4.38%
Mixed forest
76005
106230
9.89%
13.82%
30225
39.77%
Natural grasslands
0
0
0.00%
0.00%
0
Moors and heathland
0
0
0.00%
0.00%
0
Transitional woodland-shrub
63594
35485
8.28%
4.62%
-28109
-44.20%
Beaches, dunes, sands
22
0
0.00%
0.00%
-22
-100.00%
Bare rocks
615
0
0.08%
0.00%
-615
-100.00%
Sparsely vegetated areas
0
0
0.00%
0.00%
0
-2114
-72.30%
Inland marshes
2924
810
0.38%
0.11%
Peat bogs
6355
4663
0.83%
0.61%
-1692
-26.62%
Salt marshes
2289
1272
0.30%
0.17%
-1017
-44.43%
Water courses
2819
2173
0.37%
0.28%
-646
-22.92%
Water bodies
46606
46473
6.06%
6.05%
-133
-0.29%
Sea and ocean
182323
184746
23.72%
24.04%
2423
1.33%
TOTAL HA
768490
768479
100.00%
100.00%
-11
0.00%
Legend:
Area_N
Area_E
%_N
%_E
Diff. E-N
% Diff E-N
Total area (ha) in HR Finnish CLC
Total area (ha) in EU CLC
Percentage of class over total area in HR Finnish CLC
Percentage of class over total area in EU CLC
Difference in area (ha) between EU CLC and HR Finnish CLC
Difference in percentage between EU CLC and HR Finnish CLC
MAJORITY CLASSES ANALYSIS OF 1 KM GRID DATA
We will now present a majority classes analysis of 1 km grid data populated with HR Finnish
CLC 2012 and EU CLC 2012. We consider as majority classes those having a share above 5%
in the HR CLC. We find six classes over this threshold in 1 km grid, as shown in Table 3.2.
Table 3.2. Total figures (area, percentage and differences) for majority classes according to the HR CLC
and the EU CLC within the 1 km populated grid
Class
312
523
211
313
324
512
Page 38 of 91
Description
Area_N
Area_E
%_N
%_E
Diff. E-N
% Diff E-N
Coniferous forest
241067
251617
31,37%
32,74%
10550
4,38%
Sea and ocean
182323
184746
23,72%
24,04%
2423
1,33%
88285
70174
11,49%
9,13%
-18111
-20,51%
Mixed forest
76005
106230
9,89%
13,82%
30225
39,77%
63594
35485
8,28%
4,62%
-28109
-44,20%
Water bodies
46606
46473
6,06%
6,05%
-133
-0,29%
Out of the six majority classes in the 1 km grid, one is clearly overestimated by EU CLC9,
whereas two of them are clearly underestimated. Out of these six classes Coniferous forest,
Mixed forest and Transitional woodland and shrub will be discussed in more detail below.
Scatterplots for the other classes can be found in Annex 2.
Figure 3.3. Statistical distribution of the area of the majority CLC classes (classes covering more than 5 %
of the area according to the national HR CLC).
Coniferous forest (312) is the class with the highest share in the selected NUTS region,
representing more than 31% of the area. Although EU CLC accounts for 10 550 hectares more
than HR CLC, this difference represents only 4.4%, so the class is only slightly overestimated.
The second largest class, in terms of share (23.7%), is Sea and ocean (523), with very similar
figures in EU and HR data. The EU CLC is again slightly overestimating this class, but only by
1.3%.
The third class is Non-irrigated arable land (211), accounting for 11.5% of HR CLC. This class is
clearly underestimated by EU CLC. There is one fifth less area of this class in EU CLC
compared to the HR CLC, that is equal to 18 111 hectares. It is likely that the different MMU
between the two datasets is the main cause of the underestimation, together with the effect of
the mixed class 243, which can include 211 patches in EU CLC.
The fourth class, with a share of 9.9% in HR CLC, is Mixed forest (313). This class is
overestimated in EU CLC by almost 40% (30 225 more ha). Again, the larger MMU and the fact
that it is a mixed class can be the causes for this overrepresentation.
The fifth class is Transitional woodland and shrub (324), which accounts for 8.3% in HR CLC,
but only 4.6% in EU CLC. The European dataset clearly underestimates this class, probably in
favour of the mixed forest class. MMU can again be a complementary reason of such
difference.
9
In comparison to the HR CLC, which has a much higher level of detail and, therefore is closer to ground
truth
Page 39 of 91
Finally, the sixth majority class is water bodies. In this case, in terms of figures there are no
significant differences between HR CLC and EU CLC. In both cases, the class account for 6%
of the share and there is only a tiny underestimation of 0.3% by EU CLC.
Figure 3.4. The positions of the majority classes in a dispersion chart (1 km grid)
The red line in Figure 3.4 represents the “expected” or “ideal” situation, where the EU CLC
would contain exactly the same information as the HR CLC in the1 km grid. The black line is the
fitted regression line for the majority class shares.
Despite the differences highlighted amongst the majority classes for the two data sources, we
can see that the fitted regression line would explain almost 95% of the variance. The regression
line shows that, in general terms, EU CLC slightly overestimates classes which have the highest
share amongst the majority classes, whereas it underestimates to some extent classes with
lower shares (with the clear exception of mixed forest).
We will now proceed to take a closer look at some of the majority classes in the 1 km grid on a
cell-by-cell basis, to examine patterns of behaviour in relation to the area estimates reported by
each dataset. The dots in the scatterplots represent grid cells in the 1 km grid.
Page 40 of 91
Figure 3.5. Scatterplot of Coniferous forest (312) in 1 km grid
The Coniferous forest plot (Figure 3.5) shows a clear pattern of underestimation by EU CLC
when the area of the class in HR CLC in the 1 km grid cell is approx. below 30 ha, and a clear
tendency towards overestimation when areas are above 30 ha. Nevertheless, the variance is
rather large and the r-squared indicates that the fitted model would explain up to 82% of the
cases. It is interesting to see that in EU CLC there are cells with a 100% share of this class (100
ha), whereas data from the HR CLC never fills an entire cell completely with this class. On the
other hand, there are cells where coniferous forest coverage, according to HR CLC covers up to
50 ha, but with no coniferous forest according to the EU CLC. In general, there is
overestimation of class 312 Coniferous forest by EU CLC, due to the fact that the class in the
generalized product contains small patches of Broad-leaved forest (311) or Transitional
woodland and shrub (324), which as a consequence are underestimated. This reciprocity will
also be demonstrated below.
Figure 3.6. Scatterplot of Mixed forest (313) in 1 km grid
Mixed forest (313) shows a serious problem of overestimation by EU CLC (Figure 3.6). This
problem is directly linked to the MMU and the fact that it is itself a mixed class. EU CLC is not
able to distinguish broad-leaved and coniferous forests within a patch smaller than 25 ha, while
the HR CLC is able to do so. EU CLC is therefore unable to separate between truly mixed forest
(coniferous and deciduous trees together) and cartographically mixed forest (patches of
coniferous and deciduous forest inside the mapping unit). That is why Mixed forest is
Page 41 of 91
overestimated by EU CLC and Broad-leaved forest (311) or Transitional woodland and shrub
(324) at the same time underestimated.
Figure 3.7. Scatterplot of Transitional woodland and shrub (324) in 1 km grid
For Transitional woodland and shrub (324) there is a general trend of underestimation (Figure
3.7), but also a considerable number of cases where this class is overestimated in EU CLC.
(This is hard to see in the figure due to considerable overlap of points in the scatter plot). The
regression line explains only 47% of the variance, indicating that the relation between EU CLC
and HR CLC is weak. Again, the different MMU and the fact that it is a kind of mixed class may
cause those deviations.
MINORITY CLASSES ANALYSIS OF 1 KM GRID DATA
Minority classes are those having a share of the land between 0.1 and 5% in the HR CLC. We
find 12 classes between this range and they are listed in Table 3.3.
Table 3.3. Total figures (area, percentage and differences) for minority classes according to the HR CLC
Class
311
112
122
412
142
121
243
411
511
421
131
123
Description
Area_N
Area_E
%_N
%_E
Diff. E-N
% Diff E-N
Broad-leaved forest
17709,00
3757,00
2,30%
0,49%
-13952
-78,78%
Discontinuous urban fabric
11611
14218
1,51%
1,85%
2607
22,45%
Road and rail networks and associated6648
land
43
0,87%
0,01%
-6605
-99,35%
Peat bogs
6355,00
4663,00
0,83%
0,61%
-1692
-26,62%
Sport and leisure facilities
5656,00
366,00
0,74%
0,05%
-5290
-93,53%
Industrial or commercial units
5598
2887
0,73%
0,38%
-2711
-48,43%
Land principally occupied by agriculture,
3555,00with significant
40049,00 areas0,46%
of natural vegetation
5,21%
36494
1026,55%
Inland marshes
2924,00
810,00
0,38%
0,11%
-2114
-72,30%
Water courses
2819,00
2173,00
0,37%
0,28%
-646
-22,92%
Salt marshes
2289,00
1272,00
0,30%
0,17%
-1017
-44,43%
Mineral extraction sites
1849
796
0,24%
0,10%
-1053
-56,95%
Port areas
1003
1357
0,13%
0,18%
354
35,29%
In general terms, we can see that most of the minority classes are underestimated by the EU
CLC by a range of -22% up to -99%. Two classes are overestimated (Port areas (123) and
Discontinuous urban fabric (112)) and one class (Land principally occupied by agriculture
(243)), surprisingly, showing an exceptional overestimation by EU CLC (+1026%). The causes
of such differences are discussed in this chapter.
Page 42 of 91
Figure 3.8. Percentages of minority classes in 1 km grid
In Figure 3.8 we can see that most of the minority classes are underestimated by EU CLC,
except for 112 and 123. Class 243 is a clear outlier in the series. This fact can be also
appreciated in the next scatterplot, where we can see most of the classes below the red line.
Nevertheless, as there are a few classes that are overestimated (including the big
overestimation of class 243), no linear pattern can be established for the minority classes
(correlation is non-existent), as we can see in Figure 3.9.
To provide readable information about the classes representing less than 1% of the area, we
show a zoomed scatterplot in figure 3.10.
In this chapter some minority classes are introduced, on a cell-by-cell basis in the 1 km grid, to
look for patterns of behaviour in relation to the areas reported by each dataset. The interested
reader will find scatterplots for the remaining CLC classes in Annex 2.
Page 43 of 91
Figure 3.9. The positions of the minority classes in a dispersion chart (1 km grid)
Figure 3.10 The positions of the minority classes in a dispersion chart (1 km grid), classes representing
less than 1 % of the cell area
Figure 3.11. Scatterplot of Broad-leaved forest (311) in 1 km grid
Broad-leaved forest (311) is following two separate but interfering patterns (Figure 3.11). We
can see first is that many areas from HR CLC (all of them below 20 hectares) are not present in
EU CLC. The patches present in both databases are generally overestimated by the European
dataset. The MMU issue is, as in many other cases, a possible explanation since it can cause
that patches of broad-leaved forest are included within mixed forest in EU CLC. A second
pattern is the increasing overestimation of the class by EU CLC in areas where the class is well
represented. The value of R2 tells us that the regression line explains well up to 85% of the
variation in the response variable - here EU CLC.
Page 44 of 91
Figure 3.12. Scatterplot of Discontinuous urban fabric (112) in 1 km grid
Discontinuous urban fabric (112) is overestimated in EU CLC, as it is clearly shown by the
regression line in Figure 3.12, which explains 85% of the variance. The larger the area is, the
higher the overestimation. The fact that Continuous urban fabric (111) in HR CLC is generalised
into 112 in EU CLC explains basically this overestimation. The smaller MMU in HR CLC might
have caused a more detailed delimitation of discontinuous urban fabric, excluding some
complementary elements (infrastructures, green urban...) that in EU CLC are part of 112.
Figure 3.13. Scatterplot of Land principally occupied by agriculture, with significant areas of natural
vegetation (243) in 1 km grid
The overestimation of the class 243 class by EU CLC is very important, representing 5.2% of
the total share within the dataset, whereas in HR CLC is only 0.46% (Figure 3.13). this is an
effect of the fact that 243 is a scale-dependent class and depend on the context. In the detailed
HR CLC, most locations will either be agriculture or not agriculture. When a 20 meter pixel (the
unit in the HR CLC) is classified as 243, it is either based on a micro-scale analysis, or an
analysis of the context around the pixel. This is an example of the classification and
measurement challenges described in Chapter 1.
3.3
Page 45 of 91
COMPARISON OF GRID RESOLUTIONS
The grid resolution has no impact on global statistics. Populating the 100m grid and the 1km
from the same data source, e.g. HR CLC, will yield the same total figures for the study area.
These are identical to the summary statistics calculated from the input data directly, with no bias
introduced. This is because the grid simply is a partition of the dataset. No spatial generalization
is introduced.
The two grid resolutions were examined using the two forest classes Broadleaved forest (311)
and Coniferous forest (312) as examples. Table 3.4 shows the proportion of grid cells that are
either “completely filled or empty” or “partially filled” with land cover class 311 and 312 using a
100 m and a 1 km grid. When the 100 m grid is used to represent class 311, 85,2 % of the grid
cells are either left empty or completely covered by this class. Only 14,8 % of the grid cells have
a coverage of class 311 somewhere between the extremes 0 % and 100 %. The proportion of
grid cells that are left empty or completely covered is reduced to 24,9 % when the 1 km grid is
used. With this coarser grid resolution, 75,1 % of the grid cells have a coverage of class 311
somewhere between the extremes 0 % and 100 %.
Table 3.4. Proportion of grid cells that are either “completely filled or empty” or “partially filled” with land
cover class 311 and 312 using a 100m and a 1km grid
% Completely filled or empty grid
cells
311
312
% Partially filled grid cells
311
312
100 m grid
85,2
50,9
14,8
49,1
1 km grid
24,9
17,3
75,1
82,7
The trend is similar, although less extreme, for the more common class 312. When a 100 m grid
is used, approximately 50 % of the grid cells have coverage somewhere between the extremes
0 % and 100 %. This is increased to 82,7 % when a 1 km grid is used. This shows the obvious
fact that as the size of the grid cell becomes smaller; the grid becomes more similar to a raster
with a single class per grid cell.
The frequency distribution for the grid cells where the proportion of class 311 is larger than 0 %
and less than 100 % for the two grid resolutions are shown in Figure 3.14. The shape is
identical, with most of the grid cells having proportions close to 0 %. This is because 311 is a
comparatively rare class in the study area, covering only 2,3 % of the area (ref Table 3.1). The
difference between the grids is the magnitude of the occurrences. We rarely find more than 20
% broad leaved forest inside a 1 km grid cell, while up to 60 % coverage is common in 100 m
grid cells when the class is present.
Figure 3.14. Frequency distribution of grid cells where the proportion of class 311 is larger than 0 % and
less than 100 % in the 100m grid (left) and 1 km grid (right)
The difference between the grids is the spatial resolution. Both grids convey basically the same
information about the presence and overall distribution of the land cover class, but the more
Page 46 of 91
detailed 100 m grid does convey more detailed and accurate information about the locations
where the class is found. The difference is important with respect to local applications and
detailed cartography, but probably not for wide-area monitoring purposes.
3.4
LAND COVER AND POPULATION
To provide an example of analysis of land cover data combined with other data, we attached
population data released for the 1 km grid by Statistics Finland and examined the likely
relationship between the population size and the % Discontinuous urban fabric (112). The
correlation between the two attributes is 0,69. The positive relationship is shown in Figure 3.15.
This illustration also shows that the ten grid cells with the highest population (> 2000 people) in
most cases have less than 30 % land cover 112 (scattered dots in the upper half of the graph).
Class 112 is still the most common class in a third of these grid cells, but the coverage is
comparatively low because much of the space is left to Green urban areas (141) and
Coniferous forest (312). The remaining two thirds of these high population grid cells are
dominated by Industrial and commercial units (121) and Road and railway networks (122). The
simple analysis points towards two kinds of urban dwellings in this part of Finland: One (less
frequent) dominated by green areas and proximity to forests; and another dominated by
commercial and industrial buildings and infrastructure.
There are 250 grid cells (1 km grid) in the study area with 100 or more inhabitants. The same
grid cells contain 2.3 km2 of Non-irrigated agricultural land. This is a small amount of the
agricultural land in the study area, but probably under pressure from development and therefore
vulnerable to land take. The frequency distribution (Figure 3.16) shows that most of this land is
small in extent.
Finally, we calculated the male/female ratio for study area. The ratio was 0.96, implying slightly
more women than men in this region. We then selected the 1km grid cells where Non-irrigated
agricultural land (211) constitute 25 % or more of the land and recalculated the ratio which
turned out to be 1.05. This is as expected, since women – at least in the Nordic countries – tend
to migrate to the cities while the men stay on in the rural areas. This is of course far from
ground-breaking science, but demonstrates how easy it is to link land monitoring data to other
data sources for analytical purposes under the grid approach.
Figure 3.15. Scattergram of population vs % Discontinuous urban fabric (112) in 1 km grid cells.
Page 47 of 91
Figure 3.16. Histogram of Non-irrigated agricultural land in 1 km grid cells with 100 or more inhabitants.
Page 48 of 91
4 CASE STUDY: CHANGE DETECTION
Land cover changes constitute a particular challenge when a grid approach is used. The
challenge is linked to the difference between net and gross changes. This difference is
explained in chapter 1.6 under the sub-heading “Land cover changes”. An illustration is also
found in Figure 1.13. Net change is the calculated difference in area between two specific points
in time while gross change is the total area modified during this period. Gross changes can be
larger than the reported net changes if changes are reciprocal. A simplistic but illustrative
example is a 1 km2 grid cell composed of forest and clear cuttings. During a monitoring period,
10 ha forest is cleared and 10 ha of former clear cutting is planted and reforested. The net
change is nil, but the gross change is 20 ha.
This chapter is a case study of how a grid approach can be used for change detection when
information about the gross changes is unavailable. The data used here is the same as the data
used for the case study of statistical analysis above. It consists of the HR CLC for Finland for
2006 and 2012. The spatial resolution of the source data was 25 m raster for 2006 and 20 m
raster for 2012. The source datasets and the grid population process are accounted for in
chapter 3.1 above. As in chapter 3, the province of Kymenlaakso in south-eastern Finland was
used as test area for the study (Figure 4.1).
A more general discussion concerning net and gross changes is included at the end of the
chapter.
Figure 4.1. Finnish HR CLC2006 with 1 km grid of part of Kymenlaakso test area. City of Kouvola is at
lower middle part of image.
4.1
CHANGE DETECTION USING GRID APPROACH
The topic of this subchapter is to examine how changes are represented when no information
about the gross changes is available. This is the situation when the grids have been populated
using HR data, but the data available to the analyst is restricted to the summary figures for the
grid cells. In this case, increase in a land cover class in one part of a grid cell can cancel out a
decrease in the same land cover class in another part of the cell. To the analyst, it looks like no
change has taken place. A possible, but not necessarily secure way to handle this situation is to
examine several land cover classes in its context – exemplified by the reciprocal relationship
between forest growth and clear cuts.
Changes of forests (coded as CLC31x (including CLC classes 311, 312 and 313) and
transitional woodlands (CLC class 324) were determined as follows:
•
•
The proportion of forests (the sum of classes 311, 312 and 313) was computed for each 1
km grid cell for years 2006 and 2012.
The proportion of transitional woodlands (class 324) was computed for each 1 km grid cell
for years 2006 and 2012.
•
•
Page 49 of 91
The net forest change was computed per grid cell by subtracting the proportion of forest for
year 2012 from the proportion of forest for year 2006.
The net transitional woodland change was computed by subtracting the proportion of
transitional woodland for year 2012 from the proportion of transitional woodland for year
2006.
Figures 4.2 show the net change of forests (CLC classes 31x) between 2006 and 2012. Figure
4.3 is a similar illustration of the net change of transitional woodland (CLC class 324) between
2006 and 2012. When Figures 4.2 and 4.3 are compared, it can be noticed that large change
(decrease) in the proportion of forests generally corresponds to large change (increase) in the
proportion of transitional woodlands. On the other hand, there is also a general increase of the
proportion of forests in several locations that does not correspond with the decrease of the
proportion of transitional woodlands. Statistical figures based on the Finnish HR CLC2006 and
2012 from the cities Iitti and Kouvola (Table 4.1) reveal that both forests and transitional
woodlands have increased and areas of artificial surfaces and agricultural areas have
decreased10. This is again net figures and does not reveal the gross changes.
The reference data for comparison was computed using the Finnish HR CLC changes 2006 –
2012 with 1 ha minimum mapping unit. The following information was computed as proportions
for each 1 km grid cell:
•
•
•
Growth of forest (CLC31x) classes 2006 – 2012.
Growth of transitional woodland (CLC324) classes 2006 – 2012 (i.e. recent forest clearcuts).
Forest change as growth of forest – growth of transitional woodland.
Table 4.1. Areas in km2 of selected CLC classes within the cities Kouvola and Iitti, estimated using Finnish
HR CLC2006 and CLC2012.
Artificial surfaces
Agricultural areas
Forests
Transitional woodland
(CLC324)
(CLC11x, 12x)
(CLC21x, 22x, 23x)
(CLC31x)
CLC 2006
193
662
1868
329
CLC2012
141
605
1950
354
-52
-57
82
25
Change
-109
107
The following comparisons were made
A. Forest growth: This is a study of locations where forest area increased according to the
reference data. The net change in forest area (CLC31x) reported in the grid was modelled
as a function of the change observed in the reference data.
B. Transitional woodland growth: This is a study of locations where transitional woodland area
increased according to the reference data. The net change in transitional woodland area
(CLC324) reported in the grid was modelled as a function of the change observed in the
reference data.
10
The explanation is linked to the change in spatial resolution in the Finnish HR CLC – from 25 m pixels in
2006 down to 20 m pixels in 20102. This explanation is, however, of no concern here since the objective of
the study is to see how well the grid approach can handle changes irrespective of the cause of these
changes.
Page 50 of 91
Figure 4.2. The change of the proportions of forests in 1 km2 grid cells represented as color-coded change
classes.
Figure 4.3. The change of the proportions of transitional woodlands in 1 km2 grid cells represented as
color-coded change classes.
Page 51 of 91
The statistics of the results are found in Table 4.2 where
•
•
•
•
•
•
n: number of samples in analysis.
RMSE: Root Mean Squared Error between changes provided by grid approach and
reference
CC: Pearson correlation coefficient
R2: Coefficient of determination
Sl: Slope of linear regression equation
Int: Intercept of linear regression equation
Table 4.2. The statistics of the comparisons of forest and transitional woodland changes according to the
grid approach and reference data.
Case
n
RMSE
CC
R2
Sl
Int
A
2977
5.86
0.41
0.17
0.904
1.186
B
3814
4.43
0.69
0.48
0.944
-2.550
The results are considered as better if the correlation coefficient (CC) and coefficient of
determination (R2) are closer to 1, RMSE closer to 0, and slope closer to 1 and intercept closer
to 0. According to the statistics, the correspondence between the reference data and the grid
data for transitional woodland growth (CLC324, Case B) is slightly better than the
correspondence between the reference data and the grid data for forest growth (CLC31x, Case
A).
Figures 4.4 and 4.5 show scatterplots, linear regression lines and residual plots illustrating Case
A and Case B. Figure 4.4 shows increase in forest area according to the reference data along
the horizontal axis. Change in forest area according to the grid data is shown along the vertical
axis. The slope of the regression line is 0,9 indicating that there are grid cells where some of the
increase in area is cancelled out and net change underestimates the actual increase in forest
area.
Figure 4.4. The scatterplot, linear regression line and residual plot between reference CLC31x growth and
forest change of grid approach (Case A).
Figure 4.5 illustrates the same relationship for transitional forest. It can be noticed that
although clearly present, the effect is small and mostly found in grid cells where absolute
change is small. Cases A (Figure 4.4) and Case B (Figure 4.5) examine how well the forest or
transitional woodland change estimated using the grid approach correspond to reference data.
Page 52 of 91
The residuals of estimation error are in both cases large in grid cells where the change
(according to reference data) is small. The residuals decrease as the change (according to
reference data) increases. Visual inspection of the residual plots indicates that forest changes
will be interpreted quite reliably when the change is larger than 15 - 17%. The difference is not
large, and the two figures together indicate a threshold between 15 to 20 %. This interpretation
also implies that less than 15-17% of changes in the reference data will result in large
deviations between net and gross changes at 1 km grid level
Figure 4.5. The scatterplot, linear regression line and residual plot between reference CLC324 growth and
transitional woodland change of grid approach (Case B).
An attempt to examine the relationship between gross and net changes is found in Figure
4.6.Gross changes is here shown on the horizontal axis of the graphs. Gross decrease is
represented in the leftmost graph, while gross increase is represented in the rightmost graph.
The vertical axis in both graphs represents the resulting net change reported in the grid
approach.
Figure 4.6. Scatterplots based on the Finnish data used above. Dots represent 1 km grid cells. The vertical
axis is the net change reported as grid cell total. The horizontal axis represents gross decrease (left) and
gross increase (right) in forest area.
In both cases, very high (positive or negative) gross change is quite correctly reflected in the net
changes. Small (0 – 15 %) gross decrease can be “hidden” inside grid cells where the net
change is smaller or positive due to a reciprocal forest increase. This is seen in the leftmost
graph. The rightmost graph does (as explained above) show a good relationship between high
gross forest increase and the reported net changes. Small (0 – 15 %) gross increase can
(similar to small gross decrease) be “hidden” inside grid cells where the net change is small or
negative. Again, this effect is due to reciprocity inside the grid cells.
Page 53 of 91
Figure 4.7 Relationship between net changes and gross changes in forest (CLC31x) area (%). See text for
explanation.
The relationship between net and gross change in forest (CLC31x) at the 1 km grid level are
explored in Figure 4.7. The grid cells were grouped into 5 % classes according to the net
change in forest area. These classes are represented along the horizontal axis of the graph.
Examples: “-40” is the class containing all 1 km grid cells where the net reduction of forest area
is in the range 40-45 % and “5” is the class containing all 1 km grid cells where the net increase
in forest area is in the range 5-10 %. “0” is the class containing all 1 km grid cells where the net
change in forest area is between -5 and 5 %.
Each class is represented with two bars. The blue bar at the top shows the distribution of gross
increase in forest area in grid cells in the class, while the green bar at the bottom shows the
distribution of gross decrease in forest area in grid cells in the class11. The graph shows that
grid cells with net forest change around zero usually has small gross changes. A net change
around zero can be an effect of reciprocity, but the area involved is not large. Grid cells where
the net change is negative (left hand part of the graph) usually has small amounts of forest
increase and large amounts of forest decrease. Reciprocity does lead to an “underreporting” of
forest decrease, but the general picture is correct. Grid cells where the net change is positive
(right hand part of the graph) usually has small amounts of forest decrease and larger amounts
of forest increase. Again, the dominant trend (forest increase in this case) is “underreported”
due to reciprocity but the general picture is correct.
It is self-evident that forest decrease is the dominant gross change in areas with large net
decrease, and conversely that forest increase is the dominant gross change in areas with large
net increase. Figure 4.7 is, however, reassuring in the respect that it shows that net change
reflects dominant gross change quite well, and that the effect of reciprocity is limited. Gross
changes in both directions are fairly small in grid cells where the reported net change is close to
zero, and counteracting gross changes are small in grid cells where the net change is large in
either direction.
4.2
GROSS AND NET CHANGES
Documentation of change is clearly a challenge with respect to the grid approach. There is
always the risk that larger gross changes are hidden behind the net changes reported at the
grid cell level. Furthermore, the problem will increase when the size of the grid cells is
increasing. The magnitude of the problem is, however, uncertain.
First, it must be pointed out that change detection also is an issue with respect to CLC and
similar polygon-based monitoring systems. A polygon-based system does require specification
of a MMU size and land cover change incidents below the MMU size will go undetected. Many
small clear cuttings can take place inside a forest, and can be later also subject to regrowth of
forest – they will be all undetected if they are smaller than the MMU. This is to a certain extent
handled in CLC through the CLC-Changes approach. Still, the CLC-Changes specification is
11
The bars are not perfectly aligned vertically due to an artifact in the SPSS© software. It should still be
quite clear which two bars belong to which class.
Page 54 of 91
also polygon based and involves a MMU size of 5 ha. Changes with an area below that size will
go unreported. The grid approach does not introduce a change detection issue that is unheard
of in the context of CLC, but the challenge does become more visible.
Second, the change issue is not equally distributed among the land cover classes. Forest, used
in the example in this chapter, is probably the land cover class where the challenge is most
serious. Clear cutting and forest regrowth is a succession process in an ongoing biological
circle. Areas where commercial forestry leads to clear cutting are the same areas where new
forest is reestablished on the very same locations and the process is quite fast. The example in
the first part of this chapter demonstrate that even in forest area, change is reported quite
reliably when it exceeds 15 % of a 1 km grid cell (15 ha). This figure is smaller than the MMU
used in CLC, but larger than the MMU used in CLC-Changes.
Changes in forest areas are spatially reciprocal, with increase and decrease taking place in the
same region. This reciprocity is rare for other land cover types involved in land cover change.
Decrease in agricultural area is either a result of land take or abandonment which takes place in
regions different from the opening of new agricultural land. The increase in built-up land is rarely
accompanied by conversion of built-up land to other land cover categories. Glaciers may grow
or shrink as a result of long-term climatic trends, but the change is uniform over large areas. All
things considered, the challenge with respect to change documentation under the grid approach
should probably not be exaggerate
Third, the change issue can be solved provided that change is registered in the databases used
as input to the grid system. The requirement is that change is recorded and saved in the
database and not discarded when the register is updated. Such a system can provide the grid
database with measurements of gross changes as separate variables attached to the grid cell.
An example is the Norwegian land resource database which provides information on the current
land cover and can be used to calculate land cover coverage for grid cells. In addition, the
database records and stores every incident of land cover change. These recordings can be
used to calculate the magnitude of any specific land cover change (from one specific class to
another) within a certain time window and attach the information to a grid system. A grid based
monitoring system can thus provide information about previous and current land cover
distribution and net changes, but also gross changes taking place inside the grid cells. It is
probably not possible today, due to lack of harmonized data across Europe, but will be
increasingly feasible in the future as national land cover and land use recording systems are
developed.
Page 55 of 91
5 CASE STUDY: VISUALIZATION
This chapter will address the visualization aspect of the grid approach. Visually, a grid appears
to be similar to a raster, at least for maps covering large areas with many grid cells. We will
show examples of cartography that can be used to present grid data, followed by a section on
visual analysis under the grid approach.
5.1
CARTOGRAPHY AND VISUALISATION UNDER THE GRID APPROACH
Both grids, 1 km and 100 m, were used for visualizations and visual analysis. With the grid
approach, all sorts of ancillary data can be attached to each cell to help with the necessary
analysis, visualisations and queries. In this test case we used the population data of Finland as
ancillary data. With these attributes, a wide selection of phenomena can be visualised using a
large toolbox consisting of several methods.
The easiest way to visualise grid data is to select a relevant attribute, (e.g. amount of class 211)
and make a choropleth map (with the grid cell as the cartographic unit) based on that attribute
(Figure 5.1). These maps can be created for each CLC class. A generalized land cover grid can
also be produced based on the majority fields (Figure 5.2). It has to be noticed that the bigger
grid cell size, the more generalized (and possibly misleading) the majority map can be. More
complex visualisations can be achieved when two different attributes are shown in the same
map. This kind of two-class map can show either the same phenomenon from different
perspectives, e.g. amount of class 112 from both HR data and EU data (Figure 5.3), or different
phenomena to present a correlation or other relationship between them, e.g. amount of class
122 and the population (Figure 5.4).
Figures 5.1 (left) and 5.2 (right): Choropleth map showing amount of Non-irrigated arable land in 1 km grid
(left) and Majority map in 1 km grid (right).
Page 56 of 91
Figures 5.3. Two-class map showing Discontinuous Urban Fabric for both HR CLC and EU CLC in 1 km
grid.
Using these visualisations, it is easy to make comparisons. A difference between HR and EU
CLC class percentages can show the over- or underestimation of EU data for certain classes.
Comparing choropleth maps with the EU vector map shows the amount of detailed information
stored in grid versus the generalized overview of the vector map. These comparisons are
covered in more detail in the next section.
Page 57 of 91
Figure 5.4. Maps comparing Discontinuous Urban Fabric with Population data in 1 km grid
5.2
VISUAL ANALYSIS
Visual analysis has been carried out for a selection of maps. The most interesting aspect of the
grid datasets is the ability to compare the amount of a class calculated from HR CLC data and
EU CLC data. This kind of visual analysis and comparison can be an important addition to the
statistical analysis.
As an example, we calculated the difference between EU and HR data (in ha) for class 312 and
313 (Figure 5.5). The positive values represent overestimation by EU CLC 25 ha data
Page 58 of 91
compared to HR CLC 20m data. Conversely, negative values represent underestimation by EU
CLC 25 ha data. From this pair of maps, it is possible to see that in many places the
overestimation of mixed forest (313) leads to underestimation of coniferous forest (312) and
vice versa. This is an effect of the fact that generalization methods have the tendency of putting
emphasis on the most common classes at the expense of the rare classes.
Quite similar phenomena can be seen, only presented in a different way, with one map showing
the amount of class 211 (in percentages) calculated from EU CLC 25 ha data and another
showing the same class calculated from HR CLC 20m data (Figure 5.6). The generalized EU
data shows high percentages for most of the areas where class 211 is present, and lower
percentages on the edges only. The HR data has a lot more variety in the percentages and
smaller fields that have been completely left out from the EU data with 25 ha MMU can be seen
all over the area.
An interesting comparison in this exercise is also the one between the1 km grid and the 100 m
grid. In visual terms, the difference is quite stunning. With two maps showing the amount of
class 324 in different grid sizes (Figure 5.7), it is easy to notice that most of the 1 km cells have
the same, quite low level of the class 324, whereas the 100 m grid has quite large areas with no
class 324 present and some smaller areas with quite high amount of class 324. So although the
grid cell size does not affect the general statistics of the classes, it does affect the information
about the spatial distribution. Larger cell size implies less detailed knowledge of the spatial
distribution of the classes. With majority maps, the loss of information content is even more
obvious (Figure 5.8). With smaller grid cells it is possible to maintain the diversity of the classes
in a better way than with the bigger grid cells. In Finland where coniferous forest is common,
increasing the grid cell size will make it more likely to find coniferous forest as the majority class.
The presence of other classes will therefore be lost in cartography based on the “majority class
method” as the grid cell size is increasing.
The last comparison exemplified in this study is showing 1 km grid data with the amount of 112
and 324 HR classes respectively and EU 25 ha vector data that has the grid on the top (Figure
5.9). The fact that each grid cell contains a lot more information than a generalized vector data
can easily be seen. This is especially visible in the case of class 112 which is quite rare in the
EU 25 ha vector map in this region, but has frequent occurrence in the grid map. Similar
phenomenon can be observed in the comparison of 100 m grid data and EU CLC 25 ha vector
data with the grid on top of it (shown for class 121 in Figure 5.10).
In this exercise the majority class in each grid cell was simply the class with largest area
compared to other classes. In very heterogeneous areas, this can be quite misleading, since
there might be many classes, each covering only a small portion of the cell area. The majority
class may not be that much bigger than the others. To minimize bias in these situations, it is
possible to make more complex majority calculations. Figure 5.12 shows the simple majority
(the class with largest area) next to a slightly more sophisticated majority map (in this called the
two-step majority). Both are calculated from the HR CLC for the 1 km grid. The two-step
majority is done in two steps using the main level CLC classes to find a limited set of classes,
from where the actual 3rd level CLC majority class can be calculated (Figure 5.11). From this
pair of maps (Figure 5.12) it is possible to see that using the two-step majority the forest classes
take over areas that have been classified as lakes or agricultural areas with the simple majority
method. In some cells the two-step method seems to emphasize the artificial classes. It is also
possible to adjust the majority method to form heterogeneous classes like mixed forest or
weight the method to lessen the impact of the most common classes in favour of the rarer
classes.
It should be pointed out that the present study is carried out with only a part of Finland as the
study area. For visualization of the entire country, a 100m grid is too detailed but 1km or even
larger cell size like 5 km may be more appropriate. Similar considerations of course also apply
to visualization of the entire EEA32 region.
Page 59 of 91
Figures 5.5: Difference map showing the differences between EU CLC and HR CLC for Coniferous Forest
(class 312) and Mixed Forest (class 313) in 1km grid
Page 60 of 91
Figure 5.6: Non-irrigated Arable Land (class 211) calculated from EU CLC and HR CLC in 100m grid.
Page 61 of 91
Figure 5.7: Comparison of 1 km grid and 100m grid showing Transitional Woodland and Shrub (class 324)
Figures 5.8: Comparison of 100m grid and 1 km grid showing majority class.
Page 62 of 91
Page 63 of 91
Figure 5.9: 1 km grid maps showing Discontinuous Urban Fabric (class 112) and Transitional Woodland
and Shrub (class 324) compared to EU CLC 25ha map.
Page 64 of 91
Figure 5.10: 100m grid map showing Industrial or Commercial Units (class 121) compared to EU CLC
25ha map.
Page 65 of 91
Figure 5.11. Two-step majority method: First the areas of main level classes are calculated and from the
largest main level class a third level class with largest area is selected as majority.
Figure 5.12. Simple majority map and two-step majority map.
Page 66 of 91
6 CONCLUSION
The technical and methodological basis for grid approach to land monitoring is available.
Considerable experience with this approach has been gained through projects carried out by
National Statistical Agencies in cooperation with Eurostat, providing valuable knowledge and
guidelines that can be transferred to land monitoring. Existing pilots and projects, including this
study, demonstrate the feasibility of the grid approach.
The grid approach is an applicable method for storing statistical land cover information. From a
statistical point of view, the size of the grid is irrelevant. The same statistical information can be
derived from grids with 100 m and 1 km cell sizes. The grid structure just transfers the statistics
of the source data with no generalization applied. When the cell size increases the size of
dataset decreases and thus makes data processing and usage of the data easier. The larger
cell size does however affect the information on the spatial distribution of the classes. The same
phenomena can also be seen in the maps and visualizations of the grid data. The grid cell size
has the biggest effect on the majority maps, because the most common classes will dominate in
the map especially with the larger cell sizes. With the grid approach it is also possible to perform
change detection if the changed area is larger than 15-20 % of the cell area, at least this was
the case for the tested forest classes.
The spatial distribution of classes within each grid cell is lost. Grids are therefore not suitable for
detailed studies, planning and assessment at the local level. However, the maintenance of the
statistical precision of the input data probably outweighs the loss of spatial detail on the subgrid-cell level for macro-studies at the regional, national and continental level. This is possible
because working with grids allows us to store information coming from datasets with higher
spatial resolution than the resolution of the grid itself. Moreover, using standard, widely spread,
reference grids gives the chance to combine the LC data with other types of data from other
sources (e.g. population, socioeconomic indicators, biodiversity, etc.).
ADVANTAGES
The grid model allows land monitoring to utilize and merge existing data sources by using a
fixed, shared common geometry as an interface. These sources can be national high spatial
resolution maps and registers as well as high spatial resolution data retrieved from other
sources, e.g. the Copernicus land services. The grid approach does limit the generalization to
simplification of the geometry, without any similar generalization of the attribute data. Statistics
based on the grid approach will therefore be unbiased (with respect to the input data), thus
avoiding differences in statistics between the pan-European land monitoring and various other
monitoring systems.
Vector LC/LU maps are usually generalized based on a priori defined Minimum Mapping Units
or similar parameters. It is easy to identify LC/LU change objects between two reference dates,
but nearly impossible to follow changes over more than two reference dates. This is due to the
generalization rules. The fixed geometry of a grid model allows creation of easily interpretable
time series of LC/LU data.
A European standard for grids is already established by the INSPIRE process, and it is thus
easy to ensure that a grid approach in land monitoring is INSPIRE compatible. The same
standard is adopted by the NSAs, implying a broad basis for harmonization between the land
monitoring system and the geospatial statistical system in Europe.
The methods used to “populate” the grids are available as standard tools by off-the-shelf GIS
software. Much of the data processing is tabular and can also be carried out using standard
database software and spreadsheets.
The challenge linked to national projections and the coverage of border areas between
countries (or regions) can be solved in a standardized way. This allows data capture to take
place in national systems for later entry into a uniform pan-European database. Data captured
by centralized systems (e.g. Copernicus land high resolution layers) can be entered directly into
the central database.
Statistics and analysis as well as modelling exercises are fairly easy to implement due to the
regular shape and size of the grid units. Statistical bias is avoided, as mentioned above.
Visualization and cartography does become more complicated than under the familiar vector
approach, but this is due to an increased potential for visualization.
Page 67 of 91
From the visual point of view, the grid approach is very well suited for the visualization of
amounts of certain attributes in each grid cell. Visualization based on the grid approach will
show all the cells where that attribute is present, even the ones with just a little bit of it. The grid
approach is also suitable for presenting quite detailed data in larger units without losing the
actual precision of the original data. The visualization of majority class can lead to misleading
and confusing outcomes with larger grid cell sizes, but the effect can be lessened with weighted
majority methods. The grid approach is also useful for making all kinds of detailed and complex
queries.
The grid approach may improve the access to information across sectors. The work done by the
NSAs demonstrate how population data, where access at the individual level is restricted due to
confidentiality issues, can be released when they are compiled for a grid structure. It is quite
possible that data from other sectors (transport, agriculture, hydrology, cadasters, building
registers etc.) with restricted access for the land monitoring community in many countries, could
be more accessible if the data is aggregated to a grid level.
CHALLENGES
The geometry of the grid model is unfamiliar for many users. The spatial units are artificial and
do not correspond to any land cover or land use “objects” in the real world. This is in particular
an issue when maps are drawn in large scale, where the regular grid cell structure is visible.
There will be issues of compatibility between a grid approach and the current CLC. These
compatibility issues are caused by the scale-dependent classification system used in CLC.
Mixed classes will have to be decomposed into their constituents and registered as such.
Mixtures can be derived at the grid cell level if sufficiently detailed information is available. This
is done as an analysis of the land cover composition of the grid cell. More work is needed to
analyze the composition of the CLC classes and prescribe attributes replacing the
classifications in these cases. The semantic work known as the “EAGLE matrix” may contribute
to this work. The backward compatibility of such an approach, with previous CLC data will also
have to be examined in detail.
Change data is only represented as net changes in grid approach, unless specific caution is
taken to represent gross changes as specific measurements. This is possible, but depends
entirely on the available high spatial resolution data used to populate the grid.
Grids are populated, in many cases using national high spatial resolution data. The actual
content, definition and quality of these data may vary from one country to another.
Harmonization and standardization is required to ensure compatibility across the countries
participating in the land monitoring effort.
OPEN ISSUES
The two most important issues remaining open are grid resolution / size and choice of
attributes.
In theory grids can be established with any resolution. The choice partly depends on the
required spatial resolution, partly on the available data, and partly on the feasibility of storing /
processing data. As an example, for pan-European applications the currently available
technology sets the limits:
•
The pan-European Imperviousness database is available as a 20 m resolution raster,
containing 325 000 (columns) x 230 000 (rows) cells12. Stored in a compressed format, the
size of the dataset is around 1.5 Gbyte (with pyramid layers). Current technology allows a
quick visualization of the data in any scale with desktop tools, but the analysis and
combination of layers at this size require more powerful hardware and software tools
•
The pan-European 100 m resolution databases (e.g. CLC / aggregated Imperviousness)
contain 65 000 x 46 000 raster cell. Stored in a compressed format, the size of the dataset
is around 100 Mbyte (with pyramid layers). Current desktop technology allows analyses and
combination of layers. Typical processing time is some minutes. 100 m resolution is
comparable with the required delineation accuracy and Minimum Mapping Width of linear
elements in European CLC.
•
Pan-European statistical grid approaches like Land and Ecosystem Accounting (LEAC) is
based on a 1 km grid. The 1 km resolution is a technical compromise as well, unique IDs
may be created and processed easily with common desktop tools for pan-European 1 km
datasets, this contains about 7 716 000 cells (rows in the attribute table) excluding sea and
12
Including Azores, Canarias and Turkey as well.
Page 68 of 91
ocean areas. A pan-European 100 m resolution grid would require much larger attribute
table and longer unique IDs, which is not anymore supported by desktop tools.
Implementation of a grid approach is probably going to be a compromise involving a balance
between several concerns about grid resolution and attribute availability.
High grid resolution (small grid cells) is attractive due to the level of detail, while a coarser grid
resolution is beneficial because:
•
•
•
It may be sufficient for the needs of pan-European monitoring
Data with restricted access may become available (as detailed location is not traceable)
Smaller amounts of data are easier to manage
Page 69 of 91
7 REFERENCES
van Asselen, S. and Verburg, P. H. 2012. A Land System representation for global
assessments and land-use modeling. Global Change Biology, 18: 3125–3148. doi:
10.1111/j.1365-2486.2012.02759.x
Batjes, N.H.1996. Global assessment of land vulnerability to water erosion on a 1/2
grid, Land Degradation & Development, 7: 353-365
by 1/2
Ben-Asher, Z. (ed.) (2013). HELM – Harmonised European Land Monitoring: Findings and
recommendations of the HELM Project. Tel-Aviv, Israel: The HELM Project.
Bonan, G.B., Pollard, D. and Thompson, S.L. 1993. Influence of Subgrid-Scale Heterogeneity in
Leaf Area Index, Stomatal Resistance, and Soil Moisture on Grid-Scale Land–Atmosphere
Interactions, Journal of Climate, 6: 1882–1897
Chiatante, G., Brambilla, M., and Bogliani, G. 2014. Spatially explicit conservation issues for
threatened bird species in Mediterranean farmland landscapes, Journal for Nature
Conservation, 22: 103–112
Clement, J., van Randen, Y. and Storm, M., in prep. Handleiding en metadata VIRIS2.0 VIsual
Ruimtelijk Information SystemTOP10 Monitoring Dataset 1991-2011. Werkdocument XX,
Wettelijke Onderzoekstaken Natuur & Milieu. Wageningen, Juli 20??
Couclelis, H. 1992. People manipulate objects (but cultivate fields): Beyond the raster-vector
debate in GIS, in Frank, A.U., Campari, I. and Formentini, U. (eds) Theories and Methods of
Spatio-Temporal Reasoning in Geographic Space, Lecture Notes in Computer Science 639: 65
– 77, Springer Berlin Heidelberg
Congalton, R.G. 1997. Exploring and evaluating the consequences of vector-to-raster and
raster-to-vector conversion. Photogrammetric Engineering and Remote Sensing, 63: 425-434.
Di Gregorio, A., and Jansen, L.J.M. (2000) Land Cover Classification System (LCCS):
Classification Concepts and User Manual.. Environment and Natural Resources Service,
GCP/RAF/287/ITA Africover - East Africa Project and Soil Resources, Management and
Conservation Service. FAO, Rome
EEA 2010. The European Environment. State and outlook 2010. Land Use. European
Environment Agency, Copenhagen
European Commission (2007) ESPON 2013 Programme: European observation network on
territorial development and cohesion, European Commission Decision C(2007) 5313 of 7
November 2007
Fisher, P. 1997. The pixel: a snare and a delusion. International Journal of Remote Sensing, 18:
679-685. doi:10.1080/014311697219015
Gibbard, S., K. Caldeira, G. Bala, T. J. Phillips, and Wickett, M. 2005. Climate effects of global
land cover change, Geophysical Research Letters, 32: L23705, doi:10.1029/2005GL024550.
Gomarasca, M.A. (2009) Basics of Geomatics. Springer Netherlands
Letourneau, A., Verburg, P.H. and Stehfest, E. 2012. A land-use systems approach to represent
land-use dynamics at continental and global scales Environmental Modelling & Software,33:
61–79, doi:10.1016/j.envsoft.2012.01.007
Maucha, G. 2013. Recommendation for an indicator of imperviousness and imperviousness
change, including documentation of technical specifications of suggested indicator(s) and its
application context. ETC-LUSI report 2013.
Mehr, M., Brandl, R., Hothorn, T., Dziock, F., Förster, B. and Müller, J. 2011. Land use is more
important than climate for species richness and composition of bat assemblages on a regional
scale,
Mammalian
Biology
Zeitschrift
für
Säugetierkunde,
76:
451–460,
http://dx.doi.org/10.1016/j.mambio.2010.09.004
Milego, R. and Ramos, M.J. (2011) Disaggregation of socioeconomic data into a regular grid
and
combination
with
other
types
of
data,
ESPON
Technical
report
http://www.espon.eu/export/sites/default/Documents/ToolsandMaps/ESPON2013Database/2.2_
TR_grids.pdf accessed 20.11.2012
Page 70 of 91
Mücher CA., Klijn JA., Wascher DM., Schaminée JHJ. (2010) A new European Landscape
Classification (LANMAP): A transparent, flexible and user-oriented methodology to distinguish
landscapes, Ecological Indicators 10: 87–103
Pierce, L.L. and Running, S.W. 1995. The effects of aggregating sub-grid land surface variation
on large-scale estimates of net primary production, Landscape Ecology, 10: 239-253
Schlünzen, K. H. and Katzfey, J. J. 2003. Relevance of sub-grid-scale land-use effects for
mesoscale models, Tellus A, 55: 232–246
Smith, M. L., Zhou, W., Cadenasso, M., Grove, M. and Band, L. E. 2010. Evaluation of the
National Land Cover Database for Hydrologic Applications in Urban and Suburban Baltimore,
Maryland. JAWRA Journal of the American Water Resources Association, 46: 429–442. doi:
10.1111/j.1752-1688.2009.00412.x
Strand, G-H. 1997. Cartographic modelling with relational algebra, Norsk Geografisk Tidsskrift
(Norwegian Journal of Geography) 51: 61 - 69
Strand, GH. 2011. Uncertainty in classification and delineation of landscapes: A probabilistic
approach to landscape modelling, Environmental Modelling & Software 26: 1150 - 1157
Strand, G.H. and Bloch, V.V.H. (2009) Statistical grids for Norway. Documentation of national
grids for analysis and visualisation of spatial data in Norway. Document 2009/9. Statistics
Norway, Oslo
Thuiller, W., Araújo,M.B. and Lavorel, S. 2004. Do we need land-cover data to model species
distributions in Europe?, Journal of Biogeography, 31: 353–361, DOI: 10.1046/j.03050270.2003.00991.x
TWG-RS. 2009. D2.8.I.2 INSPIRE Data Specification on Geographical grid systems Guidelines
v3.0,
http://inspire.ec.europa.eu/documents/Data_Specifications/INSPIRE_Specification_GGS_v3.0.p
df
Vardon, M., Eigenraam, M., McDonald, J, Mount, R. and Cadogan-Cowper, A. (2011) Towards
an integrated structure for SEEA ecosystem stock and flow accounts. Issue paper for the
meeting on SEEA experimental ecosystems accounts, London, 5 – 7 December 2011
Wascher, D.M. (Ed.) (2005) European Landscape Character Areas – Typology, Cartography
and Indicators for the Assessment of Sustainable Landscapes. Final ELCAI Project Report,
Landscape Europe, pp. 5-31
Wen, C.G. and Tateishi, R. 2001. 30-second degree grid land cover classification of Asia,
International Journal of Remote Sensing, 22: 3845-3854
Page 71 of 91
ANNEX 1: DESCRIPTION OF GEOSTAT 1B
APPENDICES CONTENT
This annex contains a list of the GISDATA 1A and GISDATA 1B documents that provides the
basis for the description of the projects in this report, and a short description of fifteen of the
appendices to the GEOSTAT 1B report.
THE GISDATA 1A AND GISDATA 1B DOCUMENTS USED WERE:
•
•
GEOSTAT 1A – Representing Census data in a European population grid. Final Report.
GEOSTAT 1B – Final report
GEOSTAT 1B – Appendix 1 – Production procedures for a harmonised European
population grid
GEOSTAT 1B – Appendix 2 – EFGS Standard for official grid statistics, population
variables v.1.0.
GEOSTAT 1B – Appendix 3 – Testing and quality assessment of pan-European
population grids
GEOSTAT 1B – Appendix 4 – Production process for a harmonised European
population grid in Finland
GEOSTAT 1B – Appendix 5 – Contribute to the GEOSTAT population grid 2011
GEOSTAT 1B – Appendix 6 – Poster production process hybrid
GEOSTAT 1B – Appendix 8 – Prototype of Bulgarian population grid 2011. Hybrid
solution in producing grid statistics.
GEOSTAT 1B – Appendix 9 – Bulgarian Population grid 2011. Technical report.
GEOSTAT 1B – Appendix 10 – Using the European Grid “ETRS89/LAEA_PT_1K”
as the foundation for the new Portuguese Sampling Structure.
GEOSTAT 1B – Appendix 11 – Access to emergency hospitals – An operative case
study for testing gridded population data.
GEOSTAT 1B – Appendix 12 – EFGS 2012 web survey
GEOSTAT 1B – Appendix 13 – EFGS Open data license template
GEOSTAT 1B – Appendix 14 – Vegetation change detection for statistical
institutes. A case study for combining building register data with satellite imagery
GEOSTAT 1B – Appendix 15 – ESSnet GEOSTAT 1B Flyer
GEOSTAT 1B – Appendix 16 – Disaggregation methods for georeferencing
inhabitants with unknown place of residence: the case study of population census
2011 in the Czech Republic.
GEOSTAT 1B – Appendix 17 – EFGS modified quality assessment parameters.
GEOSTAT 1B – APPENDIX 1 – PRODUCTION PROCEDURES FOR A HARMONISED
EUROPEAN POPULATION GRID
This appendix contains guidelines for helping National Statistical Institutes to produce
harmonised population grid data compatible with INSPIRE specifications. Many elements in
these guidelines can also be relevant for the production of other detailed geo-referenced data.
These procedures focus on the bottom-up approach or aggregation method. They describe how
to convert national data to European grid data in different ways, depending on the source data:
•
•
Conversion and aggregation of point-based data (micro-data) into European Grid data
Conversion of existing national grid data into European Grid data (using both centroids
and intersections)
Each of the procedures are very well documented and described step by step. The GIS
software that was used is ArcGIS and gvSIG.
GEOSTAT 1B – APPENDIX 2 – EFGS STANDARD FOR OFFICIAL GRID STATISTICS,
POPULATION VARIABLES V.1.0.
This appendix contains the standard names for population variables to be used for official grid
statistics (described previously under the “Design” section).
GEOSTAT 1B – APPENDIX 3 – TESTING AND QUALITY ASSESSMENT OF PAN-EUROPEAN
POPULATION GRIDS
This appendix describes a quality assessment carried out on the European population grid
made by disaggregation by the AIT, which used the following data as source information:
•
•
•
•
•
Page 72 of 91
EEA Fast Track Service Precursor on Land Monitoring - Degree of soil sealing 20061
Population per LAU2 (LAU1) 2006
LAU2 borderlines
Corine Land Cover 2006
Open Street Map data
The quality check is performed for each of these source datasets in comparison to Norwegian
data. Additionally, a quality check is undertaken for the disaggregated European grid in
comparison to the aggregated national one, on a km2 basis.
The main conclusion extracted from the appendix is that the disaggregation method can be
improved with the use of national data, e.g. to create a correct infrastructure mask or to
distinguish residential areas from rocky/stony areas.
GEOSTAT 1B – APPENDIX 4 – PRODUCTION PROCESS FOR A HARMONISED EUROPEAN
FINLAND
POPULATION GRID IN
A poster with the production process used in Finland is shown. It corresponds to a bottom-up
methodology.
GEOSTAT 1B – APPENDIX 5 – CONTRIBUTE TO THE GEOSTAT POPULATION GRID 2011
This appendix shows a poster made to promote the NSIs contribution to the creation of the
GEOSTAT population grid.
GEOSTAT 1B – APPENDIX 6 – POSTER PRODUCTION PROCESS HYBRID
This is a poster with the schema that summarises the hybrid approach.
GEOSTAT 1B – APPENDIX 8 – PROTOTYPE OF BULGARIAN POPULATION GRID 2011.
HYBRID SOLUTION IN PRODUCING GRID STATISTICS.
This appendix encloses a presentation by the NSI of Bulgaria explaining a prototype population
grid 2011 for Bulgaria, based on an hybrid solution.
GEOSTAT 1B – APPENDIX 9 – BULGARIAN POPULATION GRID 2011. TECHNICAL
REPORT.
This report details the process of creation of the Bulgarian population grid, based on a hybrid
approach, i.e. the combination of aggregation and disaggregation techniques. They point out
these reasons in order to use the dual solution:
•
•
•
•
Census 2011 is not geocoded;
The lack of national register of addresses and buildings with coordinates;
Aggregation cannot be applied to the whole territory of the country;
To improve the accuracy of population density distribution by getting a higher quality
than just disaggregation.
The different steps, including quality assessment, are detailed in the appendix.
GEOSTAT
1B
–
APPENDIX
10
–
USING
THE
EUROPEAN
GRID
“ETRS89/LAEA_PT_1K” AS THE FOUNDATION FOR THE NEW PORTUGUESE SAMPLING
STRUCTURE.
It is described how the existance of the European population grid has influenced in the creation
of a new Portuguese Sampling Infrastructure. The use of the European dataset as one of the
geographical components of the sampling infrastructure was a major development, since the
former sampling process was dependent on the administrative division.
Page 73 of 91
GEOSTAT 1B – APPENDIX 11 – ACCESS TO EMERGENCY HOSPITALS – AN OPERATIVE
CASE STUDY FOR TESTING GRIDDED POPULATION DATA.
In order to show the possibilities and test the usage of gridded population data, a case study
was agreed and carried out by the GEOSTAT 1B project partners. The case study consisted on
calculating the population within 30 minutes from emergency hospitals, and it is summarised in
this appendix.
Figure A1.1: Example from appendix 11 of the GEOSTAT 1B report:.Case study from Estonia,
The results of the different case studies demonstrates clearly the advantages of grid based
statistics compared to administrative boundaries for carrying out spatial analyses. Nevertheless,
introducing of confidentiality thresholds and consequently suppression of grid cells below these
thresholds may result in that the advantages of grid statistics are lost.
Grid statistics in this case study proved to be very useful when identifying the population’s
driving distance to emergency hospitals. The case study was carried out for the Czech
Republic, Estonia, Finland, Norway and Bulgaria. One sample map for Estonia is, as an
example, shown in Figure A1.1..
The methodology and results are very well detailed in this appendix of the GEOSTAT 1B report.
GEOSTAT 1B – APPENDIX 12 – EFGS 2012 WEB SURVEY
This appendix includes the results of a web survey carried out amongst NSIs to find out the
availability of data to feed the grid, including aspects of pricing, confidentiality, etc.
GEOSTAT 1B – APPENDIX 13 – EFGS OPEN DATA LICENSE TEMPLATE
In this appendix, a license template for NSIs is provided, so that they can provide user rights for
their data under some specific terms specified in the license document.
GEOSTAT 1B – APPENDIX 14 – VEGETATION CHANGE DETECTION FOR STATISTICAL
INSTITUTES. A CASE STUDY FOR COMBINING BUILDING REGISTER DATA WITH SATELLITE
IMAGERY
A case study was developed by Statistics Norway within the “hybrid” approach. The objective of
the case study was to combine building register data, land use data and satellite imagery in
order to follow changes in vegetation caused by construction and land use change. The results
were promising and even with low resolution it was possible to follow changes in vegetation in
various land use categories. The methodology proved to be operable for combining satellite
Page 74 of 91
imagery with register data. Transforming the data into grids made also the data more
manageable for statistical institutes who are more used to work with statistics software than
remote sensing software.
GEOSTAT 1B – APPENDIX 15 – ESSNET GEOSTAT 1B FLYER
This appendix contains a flier that was produced to explain the project GEOSTAT 1B and
promote it, together with the EFGS.
GEOSTAT 1B – APPENDIX 16 – DISAGGREGATION METHODS FOR GEOREFERENCING
INHABITANTS WITH UNKNOWN PLACE OF RESIDENCE: THE CASE STUDY OF POPULATION
CENSUS 2011 IN THE CZECH REPUBLIC.
This short report under appendix 16 explains the methodology undertaken in the Czech
Republic in order to disaggregate a small piece of population (about 0.9%) which cannot be
directly linked to the exact place of residence. Actually, the paper discusses different
approaches and their advantages and disadvantages. In some cases, population is fictitiously
allocated to the building points, whereas in other cases it is directly linked to the grid cells.
GEOSTAT 1B – APPENDIX 17 – EFGS MODIFIED QUALITY ASSESSMENT PARAMETERS.
This is a collection of quality assessment sheets, with parameters and their values, referring
both to the production of primary data and the production of grid data.
Page 75 of 91
ANNEX 2: COMPARISON OF PAN-EUROPEAN CLC
AND NATIONAL HR CLC USING 1 KM GRID
This annex contains further documentation of results from the grid approach analysis for
Kymenlaakso province in south-eastern Finland The analysis is presented in chapter 3 of this
report, where the methodology also is explained. The analysis was carried out for a large
number of CLC classes using a 1 km grid. Part of the output was used as examples in chapter
3, but the rest is presented here.
These plots can be read as a study of the statistical bias in the current, vector based panEuropean CLC. The Finnish national HR CLC is a detailed dataset with 20 meter pixels, giving a
fairly accurate account of the statistical composition of land cover classes. The grid is here used
as a spatial unit where the two data sources can be compared. The grid cells are seen as “dots”
in the scatterplots. These dots will be placed along the diagonal when the pan-European CLC
and national HR CLC are equal. The dot will be located above the diagonal when a class is
over-represented in the pan-European CLC. Conversely, the dots will be located below the
diagonal when a class is under-represented in the pan-European CLC.
Class 211 Non-irrigated arable land is underestimated by EU CLC. The regression line,
explanatory for 85% of cases, shows that the level of underestimation is independent from the
surface area of the HR CLC class within the cell. Nevertheless, the effect of MMU can be seen
for smaller areas, where sometimes the class disappears in EU CLC. In a few other cases,
larger areas are overestimated by EU CLC, filling the cell completely with that class only,
whereas the original figures are between 80 and 95 ha.
Page 76 of 91
Just like for the Sea and ocean, there is almost a perfect correlation between both datasets in
Water bodies also (Class 512). The regression line explains 98.4% of the variance and shows a
very weak overestimation from EU CLC when the area of the class within the 1 km grid cell is
high.
Class 523 Sea and ocean in the 1 km2 grid shows a very close relationship of cell values
between HR CLC and EU CLC. The regression line exhibits an almost perfect correlation with
an r-squared very close to 100%. The MMU effect is not noticed here, due to the combination of
big share and homogeneity of this class (most of the cells are 100% covered by “sea and
ocean”).
Page 77 of 91
MINORITY CLASS SCATTERPLOTS, 1 KM GRID
A CLC class covering less than 5 % or more of the study area, according to the more accurate
national HR CLC, is considered as a “majority class”.
For class 121 (Industrial or commercial units), we can see a double trend: on the one hand,
some areas smaller than 20 ha disappear in EU CLC; on the other hand, most of the areas
represented in EU CLC are rather overestimated. The regression line shows this trend, but it
does not explain 30% of the variance. That is why, in the end, the absolute figures show an
underestimation of such phenomena in the European land cover dataset.
Roads and railways are underestimated in EU CLC. As we can see, most areas in HR CLC
simply disappear in EU CLC due to the effect of generalisation and the minimum width of the
European CLC (100m).
Page 78 of 91
Class 123 “Port areas” is the smallest class included in the minority classes, but it shows a
different trend compared to other minority ones. It is overestimated in EU CLC, as it is shown by
the regression line which explains up to 96% of the variance. The bigger the area of the port
areas in HR CLC, the higher is the overestimation by EU CLC.
Class 131 (Mineral extraction sites) show a similar trend that we have seen for other minority
classes. Smaller sites disappear in EU CLC. Nevertheless, some of the sites represented in EU
CLC are a little bit overestimated. Correlation is weak due to the fact that some areas drop
down to zero in EU CLC.
Page 79 of 91
Most of the sport and leisure facilities reported by the HR CLC are lost in EU CLC, due to the
different MMU. Linear correlation between figures for this class is very weak. When the share of
sport areas is bigger than 15 ha, there is a slight trend of overestimation in EU CLC.
In the case of inland marshes, due to the MMU again, this class is underestimated by EU CLC
in comparison to HR CLC. Figures are fairly correct when 411 is present in the EU CLC, but
many areas where 411 constitute below 20 % of the area (20 ha per km2) is lost due to
generalization.
Page 80 of 91
Peat bogs are rather well represented in EU CLC in comparison to HR CLC. Correlation is very
good and the regression line explains 94% of the variance. Nevertheless, there are some
smaller units (less than 20 ha) which disappear in EU CLC, due to the MMU effect, causing a bit
of underestimation of this class in the European dataset.
The case of salt marshes is very similar to the water courses, they are a bit underestimated.
Due to the MMU effect, some of the areas disappear in EU CLC.
Page 81 of 91
The regression line shows that water courses are slightly underestimated in EU CLC (r-squared
of 87.6%). For grid cells with less than 15 ha of water courses, a lot of times that share vanishes
in EU CLC. Due to the different MMU, some smaller water courses are lost in the European
dataset.
Page 82 of 91
ANNEX 3: COMPARISON OF PAN-EUROPEAN CLC
AND NATIONAL HR CLC USING 100M GRID
This annex contains the results from an analysis for Kymenlaakso province in south-eastern
Finland where the pan-European CLC was compared to the Finnish national HR CLC using a
100m grid as the basis for the comparison. The analysis is analogous to the analysis found in
Chapter 3 and in Annex 2 and the results is identical with respect to the main topic. The results
are included partly as documentation for the interested reader, partly because they demonstrate
similarities and differences between the two grid resolutions in terms of statistical and visual
results.
Carrying out a general analysis of the results (areas, percentages and absolute and relative
differences, between the HR CLC and the EU CLC) we will find that the global figures are
exactly the same as the figures obtained when using the 1 km grid in Chapter 3. Numerically,
and with respect to the aggregated figures, there is no difference at all using a 100m grid or a 1
km grid. This is due to the fact that, despite the size of the grid cell, all the information from the
original data sources are accounted for and kept linked to each individual grid cell. The only
difference is the number of cells. In the case of the 100m grid, we have 769 118 cells with
surface areas and % for each LC class for both the HR CLC and EU CLC source data,
whereas in the case of the 1 km grid we have 7 763 cells.
As the figures are the same as the ones in the analysis carried out using the 1 km grid, the
majority classes are also the same. We only reproduce A3.1 for reference, but we are not
displaying again the same analysis and charts to compare majority classes.
Table A3.1. Total figures (area, percentage and differences) for majority classes according to the HR CLC
Class
312
523
211
313
324
512
Description
Area_N
Area_E
%_N
%_E
Diff. E-N
% Diff E-N
Coniferous forest
241067
251617
31,37%
32,74%
10550
4,38%
Sea and ocean
182323
184746
23,72%
24,04%
2423
1,33%
88285
70174
11,49%
9,13%
-18111
-20,51%
Mixed forest
76005
106230
9,89%
13,82%
30225
39,77%
63594
35485
8,28%
4,62%
-28109
-44,20%
Water bodies
46606
46473
6,06%
6,05%
-133
-0,29%
Some general statements about the analysis of the scatterplots: We should take into account
that we are now analysing cells of 100m x 100m (1 ha). EU CLC has a MMU of 25ha (i.e.
polygons smaller than 25 ha are not mapped in EU CLC), which means that populating a grid
with 1ha cells with EU CLC data results in many grids with the same CLC class (25 ha = 25 grid
cells of 1ha).
The HR CLC raster that has been used to populate the grid has a resolution of 20m x 20m. This
means that we can have up to 25 raster cells within each 1ha grid cell and the minimum area for
a CLC class is 400 m2. The result of this configuration is that you have strips of data at intervals
of 400 m2 for the HR CLC axis in the scatterplot. The large number of observations (grid cells)
for each CLC together with the 400 m2 interval results in 25 strips of data at intervals of 400 m2
for the HR CLC axis in the graphs without seeing the individual observations per grid back in the
graph
The “border effect” of EU CLC polygons against HR CLC raster pixels is also noticed, meaning
that when grid cells overlays a generalised EU CLC polygon border it can include high shares of
a class which is not reported by the HR CLC raster dataset at 20m x 20m.
Finally, we should mention that scatterplots have been produced by means of the R software,
as MS Excel was not able to respond to the huge number of figures. The red line in each chart
is the fitted regression line. The value for r-squared is shown next to each chart. Units in both
axis are square metres (m2).
A CLC class covering 5 % or more of the study area, according to the more accurate national
HR CLC, is considered as a “majority class”.
Page 83 of 91
r2= 0.678
Figure A3.1. Scatterplot of Discontinuous urban fabric (112) in 100 m grid
Discontinuous urban fabric (112) is overestimated in EU CLC (Figure 3.18), although there is
not a very strong linear correlation. The fact that class Continuous urban fabric (111) in HR CLC
is generalised to 112 in EU CLC explains basically this overestimation. The smaller MMU in HR
CLC might have caused a more detailed delimitation of discontinuous urban fabric, excluding
some complementary elements (infrastructures, green urban...) that in EU CLC are part of 112.
A CLC class covering less than 5 % or more of the study area, according to the more accurate
national HR CLC, is considered as a “majority class”.
r2= 0.538
Figure A3.2. Scatterplot of Industrial or commercial units (121) in 100 m grid
Although it might seem that there is a lot of overestimation by EU CLC, the fact is that lots of
industrial areas are lost and the global trend is of underestimation of this class by the European
layer. That is why the r-squared for the fitted regression line is rather low.
Page 84 of 91
r2= 0.118
Figure A3.3. Scatterplot of Roads and rail network (122) in 100 m grid
Roads and railways are underestimated in EU CLC. As we can see, most areas in HR CLC
simply disappear in EU CLC due to the effect of generalisation and the minimum width of the
European CLC (100m).
r2=0.910
Figure A3.4. Scatterplot of Port areas (123) in 100 m grid
For port areas, we see the same trend as for the previous grid analysis, which shows a large
overestimation by EU CLC. The border effect of CLC polygons is also noticed in some cells
where HR CLC has no ports at all and EU CLC accounts for some area of the class.
Page 85 of 91
r2=0.567
Figure A3.5. Scatterplot of Mineral extraction sites (131) in 100 m grid
Class 131 (Mineral extraction sites) show a similar trend that we have seen for other minority
classes. Smaller sites disappear in EU CLC. Nevertheless, some of the sites represented in EU
CLC are to some extent overestimated.
r2=0.304
Figure A3.6. Scatterplot of Sports and leisure facilities (142) in 100 m grid
This class shows similar trends to the previous analysis. Most of the sport and leisure facilities
reported by the HR CLC are not present in the 1ha grids populated with EU CLC, due to the
different MMU.
Page 86 of 91
r2= 0.803
Figure A3.7. Scatterplot of Non-irrigated arable land (211) in 100 m grid
Class 211 is underestimated by EU CLC. The regression line, explanatory for 80% of cases,
shows that the underestimation is higher for higher shares of the HR CLC. There are lots of
cells where HR CLC shares of this class are lost in EU CLC, although some exceptions are
found in the other direction.
r2= 0.180
Figure A3.8. Scatterplot of Land principally occupied by agriculture (243) in 100 m grid
Taking a look at the global figures in table 1, the class 243 is strongly overestimated by the EU
CLC, accounting for 5.2% of the total share, whereas in HR CLC accounts for only 0.46%.
Class 243 (1 hectare cells) shows an increase in overestimation of the EU CLC with a decrease
in surface area of the grids populated with HR CLC (Figure 3.19). The fact that this is a mixed
class, together with the different MMU for both source layers, causes that EU CLC groups some
pure classes (e.g. 211 and 311) within this mixed one.
Page 87 of 91
r2=0.244
Figure A3.9. Scatterplot of Broad-leaved forest (311) in 100m grid
Like in the analysis by the 1 km grid, Broad-leaved forest (311) is not following a linear pattern
on a cell by cell analysis (Figure 3.17). Lots of patches are lost in EU CLC, but the patches that
are present in both datasets, are generally overestimated by the European dataset. The MMU
can cause patches of broad-leaved forest to be included within mixed forest in EU CLC.
r2= 0.754
Figure A3.10. Scatterplot of Coniferous forest (312) in 100 m grid
Coniferous forest (312) is the most common class and is present in a large number of grid cells.
Due to the strips in the chart (Figure 3.14), it is a bit difficult to make a visual analysis, but the
regression line suggests that the EU CLC slightly overestimates this class for low values,
whereas it to some extent underestimates the highest values. This linear regression explains up
to 75% of the variance, which is a bit lower than in the case of the 1 km grid. There are lots of
cells with some share of CLC class 313 in EU CLC and no share in HR CLC, resulting in a slight
overestimation of this class by the EU CLC.
Page 88 of 91
r2= 0.512
Figure A3.11. Scatterplot of Mixed forest (313) in 100 m grid
The mixed forest class (313) is overestimated by the EU CLC in most of the cells, although
there is not a clear linear regression that explains most of the variance (Figure 3.15). In general
terms, the overestimation is seen for lower values in HR CLC. The effect of a mixed class
together with the MMU of the European layer is an explanation for that.
r2= 0.487
Figure A3.12. Scatterplot of Transitional woodland-shrub (324) in 100 m grid
In the class 324, there is a general trend of underestimation, but as it can be seen in the Figure
3.16 and also looking at the low value of the r-squared, there is high variation in the different
cases, and sometimes EU CLC overestimates this class, whereas in other cells all HR CLC
share is lost. Again, the different MMU and the fact that it is a kind of mixed class may cause
those deviations.
Page 89 of 91
r2= 0.556
Figure A3.13. Scatterplot of Inland marsh (411) in 100 m grid
In the case of inland marshes, due to the MMU again, many share of this class is lost by EU
CLC in comparison to HR CLC.
r2=0.869
Figure A3.14. Scatterplot of Peat bogs (412) in 100 m grid
In general terms, peat bogs are well represented by EU CLC, although in some cases small
patches are lost, causing a bit of underestimation by the European layer.
Page 90 of 91
r2= 0.770
Figure A3.15. Scatterplot of Salt marshes (421) in 100 m grid
The case of salt marshes is very similar to the water courses. Due to the MMU effect, some of
the areas disappear in EU CLC or are a bit underestimated.
r2= 0.873
Figure A3.16. Scatterplot of Water courses (511) in 100 m grid
Water courses are slightly underestimated in EU CLC, basically because some small water
courses are lost, due to the different MMU and CLC width thresholds (100m).
Page 91 of 91
r2=0.953
Figure A3.17. Scatterplot of Water bodies (512) in 100 m grid
As for the “Water bodies”, there is a very good correlation between figures from both datasets.
The regression line explains 95.3% of the variance. In general there is very little
underestimation by EU CLC for higher shares in HR CLC, whereas there is very slight
overestimation for lower shares.
r2= 0.993
Figure A3.18. Scatterplot of Sea and ocean (523) in 100 m grid
In a similar way to the other grid analysis, “Sea and ocean” is a class that shows a big
correlation of cell values between HR CLC and EU CLC. The regression line shows the almost
perfect correlation of figures with an r-squared very close to 100%. This is basically caused by
the big share and homogeneity of this class (most of the cells are 100% covered by “sea and
ocean”). Nevertheless the effect of the polygon borders of the EU CLC is noticed because some
cells have some share of this class according to the European layer and a value of 0 according
to the HR CLC.

Grid Approach - Copernicus Land Monitoring Services

Transcription

Similar documents

Heritage Category: Listing List Entry No : 1227165 Grade: II County

HISTORICAL MAP PACK LEGEND

AB Reinforcement Grid

Art I - Distortion and Caricature

I+1

BusyBodyBook - Durant Road Middle School PTA

Business perspective of smart grid services

Grid Resource Information Frameworks

Opera Graphic Design Tempo