Grid Approach - Copernicus Land Monitoring Services
Transcription
Grid Approach - Copernicus Land Monitoring Services
Service Contract 56003 based on the restricted procedure No EEA/MDI/14/015 following a call for expression of interest EEA/SES/13/005-CEI Task 4.2 Final report Assistance to the EEA in the production of the new CORINE Land Cover (CLC) inventory, including the support to the harmonisation of national monitoring for integration at panEuropean level – Geometric test case and grid approach Task 4.2 - Develop and assess the grid approach Prepared by: Geir-Harald Strand (ed.) Roger Milego, Suvi Hatunen, Markus Törmä, Gerard Hazeu (Contributors) Geoff Smith, Gergely Maucha (Reviewers) 21. September 2015 Version 1.0 Service Contract 56003 following a call for expression of interest EEA/SES/13/005-CEI Legal notice The contents of this publication were created by space4environment under EEA service contract No. 3436/B2014/R0-GIO/EEA.56003 and do not necessarily reflect the official opinions of the European Commission or other institutions of the European Union. Neither the European Environment Agency nor any person or company acting on behalf of the Agency is responsible for the use that may be made of the information contained in this report. Copyright notice © EEA, Copenhagen, 2015 Reproduction is authorised, provided the source is acknowledged, save where otherwise stated. SUMMARY This report is part of a study commissioned by the European Environmental Agency (EEA) under the restricted procedure No. SC 56003 following a call for expression of interest EEA/SES/13/005-CEI. The objective of this part of the overall study was to “… assess the advantages and disadvantages of the “grid approach”, with the aim to find out how feasible it is to store land cover information in a regular grid, taking into account different levels of details. An aspect to be considered here is the connection to statistical analysis and time lines”. Grid-based approaches are already used by many institutions. Considerable experience with them has been gained through projects carried out by National Statistical Agencies in cooperation with Eurostat, providing valuable knowledge and guidelines that can be transferred to land monitoring. The grid model allows land monitoring to utilize and merge existing data sources by using a fixed, shared common geometry as an interface. These sources can be national high spatial resolution maps and registers as well as high spatial resolution data retrieved from other sources, e.g. the Copernicus Land Services. The grid approach does limit the generalization options to simplification of the geometry, without any similar generalization of the attribute data. Statistics based on the grid approach will therefore be unbiased (with respect to the input data). A European standard for grids is already established by the INSPIRE process, and it is thus easy to ensure that a grid approach in land monitoring is INSPIRE compatible. The same standard is adopted by the NSAs, implying a broad basis for harmonization between the land monitoring system and the geospatial statistical system in Europe. The methods used to “populate” the grids are available as standard tools within off-the-shelf and open source GIS software. The challenges linked to national projections and the coverage of border areas between countries (or regions) can be solved in a standardized way. Statistics and analysis as well as modelling exercises are fairly easy to implement in grids due to the regular shape and size of the grid units. The grid approach is well suited for the visualization in small cartographic scale and useful for making detailed and complex queries. The grid approach may also improve the access to information across sectors because restricted data access for the land monitoring community in many countries, could be more accessible if the data is aggregated to a grid level. On the negative side, the geometry of the grid model is unfamiliar for many users and does not correspond to any land cover or land use “objects” in the real world. With respect to the current CORINE Land Cover system, mixed classes will have to be decomposed into their constituents and registered as such. The semantic work being developed in parallel within EAGLE will contribute to this work. Change data is only available as net changes in a grid approach, unless specific caution is taken to represent gross changes as specific measurements. Finally, grids are populated, in many cases using national high spatial resolution data. The actual content, definition and quality of these data may vary from one country to another. Harmonization and standardization is required to ensure compatibility across the countries participating in the land monitoring effort. The two most important open issues are grid resolution and choice of attributes. Future implementation of a grid approach is probably going to be a compromise involving a balance between several concerns about grid resolution and attribute availability. High grid resolution (small grid cells) is attractive due to the level of detail, while a coarser grid resolution is beneficial because: • • • It may be sufficient to cover the needs for pan-European monitoring. Data with restricted access may become available (as detailed location is not traceable). Smaller amounts of data are easier to manage. Task 4.2 – Develop and assess the grid approach Page 4 of 91 Table of content SUMMARY ....................................................................................................................................... 1 0 INTRODUCTION........................................................................................................................ 5 1 THE GRID APPROACH ............................................................................................................... 6 2 3 4 5 1.1 GENERAL AND OVERALL EXPLANATION OF THE GRID APPROACH ....................................... 6 1.2 LITERATURE REVIEW ....................................................................................................... 7 1.3 THE GRID APPROACH IN EUROPEAN LAND MONITORING ................................................... 8 1.4 POPULATING GRIDS – DIFFERENT METHODS .................................................................... 9 1.5 GRIDS AND CLC ........................................................................................................... 16 1.6 SPECIAL ISSUES ........................................................................................................... 19 THE GRID APPROACH USED IN EUROSTAT/ EFGS .................................................................. 26 2.1 INTRODUCTION AND BACKGROUND ............................................................................... 26 2.2 GEOSTAT 1A ............................................................................................................. 26 2.3 GEOSTAT 1B ............................................................................................................. 32 2.4 GEOSTAT RESULTS AND THE GRID APPROACH IN LAND MONITORING........................... 33 CASE STUDY: STATISTICAL ANALYSIS .................................................................................... 35 3.1 SOURCE DATASETS AND THE GRID POPULATION PROCESS ............................................ 35 3.2 COMPARISON OF PAN-EUROPEAN CLC AND NATIONAL HR CLC .................................... 36 3.3 COMPARISON OF GRID RESOLUTIONS ............................................................................ 45 3.4 LAND COVER AND POPULATION...................................................................................... 46 CASE STUDY: CHANGE DETECTION ........................................................................................ 48 4.1 CHANGE DETECTION USING GRID APPROACH ................................................................. 48 4.2 GROSS AND NET CHANGES ............................................................................................ 53 CASE STUDY: VISUALIZATION ............................................................................................... 55 5.1 CARTOGRAPHY AND VISUALISATION UNDER THE GRID APPROACH .................................. 55 5.2 VISUAL ANALYSIS ......................................................................................................... 57 6 CONCLUSION ......................................................................................................................... 66 7 REFERENCES ......................................................................................................................... 69 ANNEX 1: DESCRIPTION OF GEOSTAT 1B APPENDICES CONTENT ............................................... 71 ANNEX 2: COMPARISON OF PAN-EUROPEAN CLC AND NATIONAL HR CLC USING 1 KM GRID ....... 75 ANNEX 3: COMPARISON OF PAN-EUROPEAN CLC AND NATIONAL HR CLC USING 100M GRID ..... 82 Task 4.2 – Develop and assess the grid approach Page 5 of 91 0 INTRODUCTION This report is part of a study commissioned by the European Environmental Agency (EEA) under the restricted procedure no SC 56003 following a call for expression of interest EEA/SES/13/005-CEI. The objective of this part of the overall study was to “… assess the advantages and disadvantages of the “grid approach”, with the aim to find out how feasible it is to store land cover information in a regular grid, taking into account different scales of level of details. An aspect to be considered here is the connection to statistical analysis and time lines”. The study consisted of three main parts, organized into five chapters. The first part (chapter 1) is a description and discussion of the grid approach. The chapter sets out to explain the approach in general, followed by more detailed discussions about particular issues relevant for pan-European land monitoring linked to the existing CORINE Land Cover (CLC) initiative. The second part (Chapter 2) is a review of the first two GISDATA projects (1A and 1B) carried out by the European Forum for Geography and Statistics (EFGS; a voluntary cooperation between national statistical agencies) in close cooperation with Eurostat. These projects have established the methodology implemented by the NSAs and used for a grid-based approach to the handling of European population data. The third part is a set of case studies using CLC data from the province of Kymenlaakso in south-eastern Finland. Finnish national High spatial and thematic Resolution CORINE Land Cover (HR CLC) is used as an example of a detailed national data source, while pan-European CLC data for the same area is used for comparison. The case studies are carried out with a 100 m and a 1 km grid to allow an appraisal of the differences in grid resolution. The first case study (Chapter 3) applies a grid approach for statistical purposes. The first theme chosen for the study is simply to compare the pan-European CLC and the Finnish national HR CLC, showing how the grid approach can be used to avoid the statistical bias which, due to the effect of cartographic generalization, is implied in a vector approach. A second theme is a descriptive comparison of grid resolution with 100 m and 1 km grids used as examples. A third theme is the relationship between land cover and population data. The second case study (Chapter 4) is a study of the impact of the grid approach on the documentation of land cover changes. Statistics based on multi-temporal comparison of grids document gross changes, while net changes may be hidden. The changes in forest area are used as an example. The third case study (Chapter 5) addresses cartographic visualization. The challenge is not so much the visual appearance of the regular grid. This format can be unfamiliar to the user, but less so when the grid cell is small due to the cartographic scale. The fact that the grid is characterized by many attributes and not simply classified does open new opportunities for cartographic visualization and the study is aiming to demonstrate some of these techniques. Finally the conclusions and recommendations from the study are found in chapter 6. The report also includes three annexes. One provides additional information about the GISDATA projects and the other two contain results that was prepared for, but not used in, the case study of a grid approach applied for statistical purposes (Case study 1). Task 4.2 – Develop and assess the grid approach Page 6 of 91 1 THE GRID APPROACH 1.1 GENERAL AND OVERALL EXPLANATION OF THE GRID APPROACH The textbooks usually differentiate between “raster GIS” and “vector GIS” (Couclelis 1992, Congalton 1997). The “vector GIS” does, like the paper map, attempt to represent individual objects that can be identified, measured, classified and digitized. The “raster GIS”, on the other hand, build closely on the structure of remote sensing imagery and has, as a methodology, acted as a bridge between GIS and remote sensing. The “raster GIS” attempts to represent continuous spatial fields approximated by a partition of the land into small spatial units with uniform size and shape. The “raster GIS” therefore have structural properties that make them particularly suitable for use in modelling exercises. The interpretation and information content of the elementary unit of the raster model – the pixel – is, however, problematic, as discussed by Fisher (1997). A grid is a spatial model falling somewhere between the vector model and the raster model. An informal explanation is that the grid model is a vector model with the visual appearance of a raster. The grid is a spatial partition of the Earth’s surface into non-overlapping and non-empty regular parts. Similar to the raster model, the grid consists of square spatial units (“cells”) of uniform size and shape. Unlike the raster-model, the spatial units in the grid are internally heterogeneous, with the internal composition of the grid cell represented as an attribute vector. The grid looks like a raster. The difference is the information attached to the grid cell. A raster cell (pixel) is classified to a particular land cover class. The grid cell, on the other hand, is characterized by how much it contains of each land cover class The grid is, as a result, a more information-rich data model than the raster. Figure 1.1: CLC with a 1 km raster/grid superimposed (top) illustrating the difference between encoding a particular unit as raster pixel (centre) or a grid cell (bottom). “daa” is a Norwegian unit: 10 daa = 1 hectare. The difference between the raster and the grid is illustrated in Figure 1.1, using an example based on CLC. CLC is originally a vector data set but can be converted into a new data set with uniform spatial units of e.g. 1 X 1 kilometres. The result can either be a “raster” where a single CLC class is assigned to each pixel (representing for example the dominant land cover class Task 4.2 – Develop and assess the grid approach Page 7 of 91 inside the pixel; or the CLC class found at the centre of the pixel unit). Alternatively, the result is a “grid” where an attribute vector representing the complete composition of land cover classes inside the unit is assigned to each grid cell. Spatial information on a more detailed scale than the size of the spatial unit is in both cases lost, but the overall loss of information is less in the grid than in the raster. A grid data model is not the result of mapping in any traditional sense. The grid is “populated” with information from (one or more) existing sources. This information could include, but is not restricted to information about land use and land cover (LU/LC). One method frequently used to populate grids with LU/LC information is to calculate spatial coverage of each LC/LU class based on a geometric overlay between the grid and detailed polygon datasets. The assignment of attributes can furthermore involve counting or statistical processing of register data for observations made inside each grid cell (for instance, not only the area of a class, but also the number of objects it represented). Ancillary databases provide important and relevant information about the content and context of the grid cells. For instance, the number of buildings, length of roads or number of people found within the grid cells. 1.2 LITERATURE REVIEW Gridded LU/LC data are particularly suited for use in modelling exercises due to the combination of a simple and regular geometry with a detailed description of the land composition within each grid unit. The approach is in particular useful for global or other widearea studies, using large (30” or larger) grid units (Wen and Tateishi 2001). Traditional mapping at this scale is difficult due to the extent of the area, while a pure raster approach with each pixel assigned to a single class, in these situations is inadequate for representing the variation of the land surface (Pierce and Running 1995). Meteorologists use statistical representation of multiple land surfaces within a grid cell to parameterize the effects of sub-grid heterogeneity for land-atmosphere energy exchange (Bonan et al. 1993). Schlünzen and Katzfey (2003) aggregated CLC data for north western Europe into a 30” grid to integrate land use and meteorological data to examine sub-grid-scale land use effects on meteorological and climate models and thereby produced a more accurate simulation. Equally, soil scientists have used a grid approach to model the global soil vulnerability to water erosion (Batjes 1996) and hydrologists have used the approach for water resource management (Smith et al. 2010). Among biological studies, Thuiller et al. (2004) created a 10’ grid for Europe populated with land cover data from a 1 km resolution map to assess the influence of land cover and climate on species distributions across Europe. Grid models populated with land cover composition as well as a host of other data were used by Mehr et al. (2011) in a study of species richness for bats. Similarly Chiatante et al. (2014) used gridded data in their study of threatened bird species in Mediterranean farmland landscapes. The landscape itself has also been modelled as a populated grid, e.g. by Strand (2011) who used such a model to asses uncertainty in delineation of landscape types. Modelling and simulation is an integral part of many climate change studies. It is therefore no surprise that the data rich grid model is often employed in such studies (e.g. Gibbard et al. 2005; van Asselen and Verburg 2012; Letourneau et al. 2012). Task 4.2 – Develop and assess the grid approach 1.3 Page 8 of 91 THE GRID APPROACH IN EUROPEAN LAND MONITORING The grid approach in LU/LC mapping was examined by the FP7 project Harmonised European Land Monitoring (HELM; Ben-Asher 2013). The project compiled a non-exhaustive overview of the grid approach in European land monitoring. The main findings are summarized below. EEA periodically converts the CLC dataset into a raster to carry out analysis. Iceland uses grids for various monitoring activities like forest (woodland and plantations), soil, plants, birds and LULUCF but there is not a common grid and the different initiatives usually cover only part of the country. The Netherlands have a gridded land cover data base. The origin was pixels (25 m x 25 m), due to the use of satellite imagery, but the spatial units/cells are now fixed. Strictly speaking, these cells are raster cells as the information attached to each raster cell is just one LU/LC class value. Although they are populated with data from all different data sources but on basis of a kind of query land cover classes are attributed to each cell. There is currently no database connected to the «raster/grid IDs». Norway has developed a standardized national grid (Strand & Bloch 2009) with flexible spatial resolution. Statistics Norway is publishing population data for the 1 km grid. NIBIO (previously the Norwegian Forest and Landscape Institute) provides land cover and land resource data for the 5 km and 1 km grids. Austria: Have started research on grid mapping with various types of data, grid cells 200 – 2000 m. Finland (Finnish Environment Institute) produces its national CLC data as a 25 m raster classification. There are also other information systems like the urban structure monitoring system (YKR) which utilizes a grid approach, at least partially. YKR is an information system for land use experts, making it possible to monitor long-term changes of urban areas as well as to carry out analysis and research on urban areas. YKR contains information about people, housing including summer cottages, employment and land cover and land use in a 250 m grid. The former Finnish Forest Research Institute (METLA) – now part of the newly established LUKE institute - provides information concerning forests in a grid format. The National Land Survey has produced SLICES land use data using 25 and 10 m grid size for the years 1999, 2005 and 2010. Switzerland uses a 25 m grid (due to satellite images) for forest mixture classes. Starting in 2012 there will also be a 5 m raster representing tree type (coniferous/deciduous) with updating foreseen every three years, according to the availability of new aerial images. The national area statistics uses a 100 m raster measured at the centre point, with 72 LU/LC classes. Population (census) is available on a 100 m grid if more than a minimum number of persons are living there. EFGS (European Forum for GeoStatistics) is a voluntary cooperation between several National Statistical Institutions (NSIs) across Europe (and recently spreading to other parts of the world as well) that have started using grids for integration and harmonization of spatial data. Eurostat is informally involved in this cooperation. The steering committee consists of the national contact points from the NSIs that participate in the European Statistical System (ESS) project GEOSTAT (see chapter 2 for a comprehensive summary of the GEOSTAT projects). The System of Environmental-Economic Accounting is an internationally agreed standard for harmonized statistics on the environment and its relationship with the economy, organized by the United Nations Statistics Division (UNStats). With respect to environmental-economic accounting, the grid cell reportedly turns out to be a fairly common statistical unit for spatially referenced environmental data, including land cover and land use, used in the national accounting systems (Vardon et al. 2011). The European Spatial Planning Observation Network (ESPON) is an EU program aiming to support the “reinforcement of regional policy with studies, data and observation of development trends” (European Commission 2007). Potential and actual use of grids in a spatial planning context is described in an ESPON report addressing the modifiable areal unit problem (Anonymous, no date) and gridded data structures are given renewed attention in recent results from projects organized under ESPON (e.g. Milego and Ramos, 2011). There is furthermore a movement towards increased use of grids in the landscape research community, linked to the concepts landscape character assessment and the development of a European landscape classification system. The European Landscape Character Assessment Task 4.2 – Develop and assess the grid approach Page 9 of 91 Initiative (ELCAI) running in the period 2003 – 2005 documented a widespread use of grids in national landscape descriptions throughout Europe (Wascher 2005) and led to the development of a proposed pan-European landscape description based on a grid approach (Mücher et al. 2010). 1.4 POPULATING GRIDS – DIFFERENT METHODS As described in section 1.1, the grid approach is dealing with a spatial model somewhere in between the vector model and the raster model. The grid as a spatial model consists of square spatial units of uniform size and shape that can have an (internally) heterogeneous composition, i.e. the grids are “geometrically uniform polygons”. Furthermore, each class in a raster model corresponds to an attribute in a grid model. A grid cell is thus capable to carry more information than a raster cell. Each grid has a unique ID that is related to a database containing the attribute data. Grids can be populated with spatial data in numerous ways. However, the different methods to populate grids depend on 1) the data (source) type; and 2) the purpose of the model. As possible spatial data sources we make a distinction between vector data (point, line or polygon) and raster data that are both georeferenced or so called spatial explicit. An overlay between the grid model and different data sources makes it possible to populate the grids with a wide variety of information. Each class in a raster/vector data model corresponds to an attribute in a grid model. On the basis of the information that the populated grids contain a new spatial classifications / representations can be made. VECTOR DATA Vector data give a representation of the world using points, lines, and polygons. Vector models are useful for storing data that has discrete boundaries, such as country borders, land parcels, and streets (ESRI dictionary1). Examples of vector data (polygons) in the context of Copernicus are the original CLC and Urban Atlas (http://land.copernicus.eu/) datasets. For lines and point data Copernicus does not have any specific examples. Road and railways are examples of the vector line data type. Sample data such as census returns or soil cores are examples of point data. The population of grids with vector data follows the same methodology for point, line and polygon data. After an overlay operation, a frequency is calculated and the resulting table is joined with the grid data set. Points can be assigned directly to grid-polygons, but the overlay operation needs to split up line and polygon objects into parts that can be assigned to unique grid-polygons. The geometry of the the grids need to be maintained throughout the operation. Each point, line or polygon is attributed with the ID of the grid cell with which it intersects and a frequency is calculated per grid cell, which means that the per grid cell the number of points, the length of lines or the area of polygons for each class per grid cell is presented as a table. The table is joined with the grid dataset (a “spatial join” operation). Additionally various statistical operations on the newly joined table can be performed (%, majority, min, max, etc.). RASTER DATA Raster data make a representation of the world as a surface divided into a regular grid of cells. Raster models are useful for storing data that varies continuously, as in an aerial photograph, a satellite image, a surface of chemical concentrations, or an elevation surface (ESRI dictionary2). Examples of raster data in the context of Copernicus are different HRLs, EU-DEM and IMAGE 2000, 2006, 2009 and 2012 (http://land.copernicus.eu/) datasets. The population of grids with information from raster data sets can follow to alternative routes. The first alternative is used when the difference between the grid cell size and the raster pixel size is small. In this case, the grid can be treated like a raster and populated using raster resampling methodology. The second alternative is used when the raster pixel size is considerably smaller than the grid cell size. In this case, the raster is treated like a point data set and summaries (or other relevant statistics) based on the pixels falling inside each grid cell is used to populate the grid cell. Population of grids also depends on the purpose of the grid model. Is the grid model represented as a sample point (usually the centre point of grid cell) or as a polygon? Grids represented as a sample point can be used to assign data from existing, discrete polygon maps to a grid. The method is relatively easy and fast to implement and does not need much 1 http://support.esri.com/en/knowledgebase/GISDictionary/term/vector%20data%20model 2 http://support.esri.com/en/knowledgebase/GISDictionary/term/raster%20data%20model Task 4.2 – Develop and assess the grid approach Page 10 of 91 computer power. A disadvantage is that a grid cell can extend over several different polygons and the method only assigns data from the polygon that intersects with the sample point chosen to represent the grid (Strand & Bloch, 2009). Taking the grid as a polygon and intersecting it with another vector (polygon map) data source overcomes this disadvantage. Area measurements of the different polygons within each grid cell are stored in this approach. However, the disadvantages of this method are the complexity to implement and the amount of computer power needed. Another special grid model is the Dutch VIsueel Ruimtelijk Informatie Systeem (VIRIS). The VIRIS dataset is a geodatabase with 54 different raster datasets with a spatial resolution of 25 m. The raster datasets are produced by rasterizing the Dutch Topographical dataset on attribute information (TDN code). The datasets have three attributes for each fixed grid: pixel value, OBJECT ID and COUNT (Clement et al, in prep). The pixel value represents the surface area (polygon), length (line) and number (point) of classes of the topographical data base of the Netherlands. Each class is represented by a one raster dataset. The raster datasets are not integrated into one grid with corresponding attribute table. Each raster dataset has the same size, location and corresponding IDs of its raster cells. Figure 1.2: VIRIS 2.0 dataset for the year 2010. Forest surface area is represented in a white-black colour gradation (black = no forest; white is 625 m2 forest). Red – green colour indicates the urban surface area of each grid. Figure 1.2 shows an example of the VIRIS data set. Two out of the 54 VIRIS raster datasets (urban development and deciduous forest) with same size (25 m spatial resolution, i.e. 625 m2) and location are presented as an overlay. The surface area urban development is presented in colours (red = 625 m2 and green = 1 m2) and the deciduous forest surface area is presented in black (0 m2) and white (625 m2). The indicated raster cell has 65 m2 of urban development and 396 m2 of deciduous forest. The remaining surface area is occupied by other classes not shown in the figure The raster datasets are kept separate for specific user requirements; however they could be easily integrated in one grid model as the raster cells of each raster dataset have common IDs. The VIRIS 2.0 geodatabase contains raster datasets for the years 1996, 2000, 2004, 2008 and 2010 making it possible to monitor changes between years for specific classes. EXAMPLES Examples on how to populate a grid model with vector data: Task 4.2 – Develop and assess the grid approach 1. 2. 3. 4. Page 11 of 91 Simplistic models like (centre) point sampling Counting Area measurement Area measurement raster datasets Ad 1: The grid model is handled as points and data is assigned to these points from an existing, discrete polygon map. The sampling points representing the grid cells are often the centre points of the grid cells, but any point could be used. The point data set is intersected with a polygon map (CORINE Land Cover, soil map, etc.). The land cover class or soil type at each specific sample point is transferred to the attribute table of the point data set through a spatial join (see above for details on advantage and disadvantages of methodology). The 100 meter CLC raster used for analytical purposes by EEA can be thought of as a simplistic grid model populated from the vector CLC by this approach. Figure 1.3 Norwegian point data on barns and cowsheds superimposed on the Norwegian SSB5KM grid database together with an excerpt of the building table for the Eidskog municipality. The lower table shows the results of a spatial join where each building (one row in the table) has been assigned to a grid-cell (attribute: SSBID). Aggregating the table on SSBID will give a summary (number of barns) for each grid cell3. Ad 2: The counting methodology often deals with spatial explicit point data. Though polygon data represented as (or converted into) point data can also be a data source used to populate grid cells. Counting the number of polygons or classes within a grid can be used as an indication of the diversity of the grid cell. Even counting raster cells with a specific class falling within a grid cell is a possibility to populate the grid model. For example, the number of raster cells with a 100% tree cover density (from the HRL Tree Cover Density) value per grid cell. An example of the counting methodology is the combination of the Norwegian point data set (here restricted to barns and cowsheds of the Norwegian population register) with the grid model. In Figure 1.3 the spatial explicit point data is combined with the grid model SSB5KM. The spatial join of the two data sets produces a table containing the information from the point dataset linked with the grid cell where the point is located (Figure 1.3 bottom)3. 3 Figures 1.3, 1.4 and 1.5 are taken in slightly adapted form from Strand & Bloch (2009) Task 4.2 – Develop and assess the grid approach Page 12 of 91 Figure 1.4. Visual representation of the mean floor area of barns and cowsheds in each grid cell together with a table containing the mean floor area and the number of barns and cowsheds per SSB5KM grid cell for the Eidskog municipality (darker = higher average)3. By counting the number of barns and cowsheds within each grid cell a new attribute can be attached to the grid model SSB5KM that can be used for further (statistical) analysis on the attributes of the cases counted (e.g. mean floor area barns and cowsheds per grid cell on basis of column Areal_f (in Figure 1.3) and visualization (Figure 1.4). The attribute table belonging to the grid data model is extended with an attribute after each analysis. Ad 3. The area measurement example deals with assigning data from an existing, discrete polygon map to a grid. It measures the complete area extent of each polygon within each grid cell. In the example one single SSB5KM grid cell is superimposed on the Norwegian land resource map of Eidskog. The thematic polygons are clipped along the grid cell and each land resource polygon is attributed with a grid cell ID (ArcGIS Identity (or Union) operation). The result is a new polygon dataset with (small) polygons having a unique reference to a single grid cell and a single land resource polygon. A frequency is calculated and presented as a table in which for each grid cell (rows) the area per land resource class (a column per land resource class) is presented (Figure 1.5)3. Ad 4. Two examples are presented on how to populate a grid model with raster data. The first example is the HRL Imperviousness raster layer (20 x 20 m) combined with a standard grid of 1 x 1 km (Figure 1.6). The HRL Imperviousness shows per raster cell the percentage (%) of imperviousness. In the second example the Dutch national land cover dataset LGN7 (Landelijk Grondgebruik Nederland) is combined with the same standard grid. LGN7 is a raster dataset containing 39 LU/LC classes at a spatial resolution of 25 m (reference year 2012). The standard grid of 1 x 1 km is rasterized on the grid ID attribute into 20 m. After a tabulate area, for each 1 x 1 km grid the area of each HRL Imperviousness class is presented in a table. The table is joined with the original grid. Figure 1.6 shows the standard grid of 1 x 1 km superimposed on the HRL Imperviousness together with an excerpt of a table containing the areas of each imperviousness class per 1 x 1 km grid. Figure 1.7 shows the standard 1 x 1 km grid superimposed on the LGN7 dataset. The LGN7 dataset is resampled to 20 x 20 m raster cells. The corresponding table is the result of the spatial join of the rasterized grid and the LGN7 dataset. The table shows the area of each LGN7 class for the 36 grid cells á 1 km2. Both tables of Figure 1.6 and Figure 1.7 can be combined to show the surface area of the imperviousness classes (%) and the surface area of LGN classes per grid cell in one table. Task 4.2 – Develop and assess the grid approach Page 13 of 91 Figure 1.5. Close-up of SSB5KM grid superimposed on land resource map of Eidskog with a table containing surface areas for the different land resource classes per grid cell3.Each row in the table corresponds to one grid cell in the map. SSBID is the unique id of the grid cell. A10 through A80 corresponds to land cover categories (A10 is built-up land, A20 agricultural land etc.) and the attribute is the acreage of the land cover category in square meter. P10 through P80 represent the land cover coverage as percent. Task 4.2 – Develop and assess the grid approach Page 14 of 91 Figure 1.6. The 1 x 1 km standard grid superimposed on the HRL Imperviousness for an area within the province of Utrecht, the Netherlands (36 grid cells á 1 km2). The table is showing an excerpt of the result of the area calculation per class (percentage of Imperviousness) for each of the 36 grid cells. Task 4.2 – Develop and assess the grid approach Page 15 of 91 Figure 1.7 The 1 x 1 km grid superimposed on the LGN7 dataset with an excerpt of the table resulting from joining the 1 x 1 km grid rasterized into 20 x 20 m raster cells with the LGN7 raster dataset. Task 4.2 – Develop and assess the grid approach 1.5 Page 16 of 91 GRIDS AND CLC CLC is the de facto standard for pan-European land monitoring. Any grid approach to land monitoring therefore has to be discussed in the context of CLC. Important issues are information content compared to CLC, backward compatibility and the analytical opportunities with respect to the indicators required by EEA. SCALE AND RESOLUTION It is hardly possible to imagine a spatial data model independent of scale and resolution. CLC is a product closely linked to a particular cartographic scale in terms of minimal mapping unit (MMU). The typical cartographic scale is 1:100 000 and smaller and the smallest mapping unit size is 25 ha. The MMU corresponds to a 500 x 500 meter grid cell. Data from a study area in Finland (see also chapters 3 - 5 of this report) showed that less than 25 % of the 1 x 1 km grid cells were completely embedded inside a single CLC polygon (i.e. contained a single CLC class). The typical (most frequent) number of CLC classes in a 1 x 1 km grid cell (when CLC data was fetched from the pan European CLC map) was three. When national HR CLC data (20 meter pixel size) was used to populate the grid, the typical number of classes in each 1 x 1 km grid cell was eight. Figure 1.8: Distribution of 1 x 1 km grid cells with respect to a) number of CLC classes from the panEuropean map [left]; and b) number of CLC classes from the national HR map [right]. Figure 1.8 illustrates the generalizing effect of the current CLC approach. The complex LULC composition of each 1 km2 grid cell, shown when the grid cells are populated directly from HR CLC, is lost when the grid cells are populated from the pan-European CLC. The figure also shows that the 1 km2 grid and the pan-European CLC are fairly similar in terms of geometrical resolution when the size of the features is considered. When the resolution of the polygons used to populate the grid is more detailed than grid, the graph appear as symmetric, like the right hand graph in Figure 1.8. On the other hand, the graph will appear as right-skewed (large bars close to zero and a right-hand tail of short bars) when the resolution of the polygons used to populate the grid is less detailed than the grid. When the resolution is similar, most grid cells will cut across two or three polygon boundaries (due to the systematic partition) and each grid cell ends up being populated with two to four classes. A 1 km2 grid populated with data from the pan-European CLC does not make sense. It provides the same statistical information as the vector data set framed in a less precise geometry, although with a spatial resolution not too different from the pan-European CLC. The question is therefore whether a 1 km2 grid populated from HR data sources represents an improvement over the current CLC product. The statistical information under such an approach is more detailed and less biased, and the spatial resolutions sufficiently close to the pan-European CLC to allow meaningful visualisation at small cartographic scales (large regions, country level and European maps). It is also possible to apply more detailed grids: 500, 250 or 100 m are all relevant options. In this study, we have concentrated on 100 m grid resolution, since this is the resolution used in the raster datasets produced from the pan-European CLC and HRLs by EEA for analytical purposes. Task 4.2 – Develop and assess the grid approach Page 17 of 91 The 100 m resolution grid does, as expected, contain fewer classes per grid cell (Figure 1.9) than the 1 km2 grid since the finer spatial resolution allows less pixels inside each grid cell. Most (around 75%) of the 100 m grid cells fall entirely inside a single pan-European CLC object (Figure 1.9 left). Also more than 40% of the grid cells in our Finnish study area contained a single CLC class when populated from the national HR CLC, while another 40 % in this case contained 2-3 classes. Figure 1.9: Distribution of 100 m grid cells with respect to a) number of CLC classes from the panEuropean map [left]; and b) number of CLC classes from the national HR map [right]. As the pan-European CLC data sets are normally converted to a 100 m raster for analysis, a 100 m grid resolution may seem most appropriate for a possible grid-based approach. It is, however, important to take into account the basic difference between a raster and a grid approach. The raster approach is limited to a single class per pixel. This is usually the CLC class found at the centre (or another point location) of the pixel. The grid is, on the other hand, encoded with the complete and exact statistical distribution of land cover classes inside each grid cell. A 100 m grid is therefore much more information-rich than a 100 m raster. An analysis of the Norwegian CLC2012 dataset revealed that the mean polygon size (when ocean was excluded) was 3.17 km2. This figure is, even when ocean is excluded, influenced by a skewed distribution due to a few large lakes and glaciers, and a number of large and homogeneous forest and mountain areas. 70 % of the polygons in the vector data set were less than 1 km2 in area. The average size of these smaller polygons was 0.48 km2. A 1 km grid does therefore represent a certain degree of simplification with respect to geometrical resolution when compared to vector CLC. The 100 m grid, on the other hand, has a finer spatial resolution than the vector CLC but does still represent a simplification with respect to geometrical shape. However, a 1 km grid resolution may be sufficient to maintain a degree of detail which is at least compatible with the vector CLC for pan-European, regional and national studies since the loss in geometrical detail is compensated by the gain in details and accuracy with respect to the characterization of the spatial units. CLASSIFICATION SYSTEM CLC is using a three level hierarchical classification system with 44 classes at the most detailed level. Many of the classes are both scale- and source-dependent, i.e. they are defined for manual interpretation of satellite imagery at a certain cartographic scale. One consequence of this approach is that some classes may be defined in terms of a multitude of elements. Airports, for example, typically consist of a mixture of buildings, paved runways, roads, parking lots, storage areas and lawns. The common feature for the area is the land use, not the land cover. CLC as a classification system is also scale dependent. It entails “mixed classes”. Mixed classes include, in particular, the CLC classes 242 “Complex cultivation patterns” and 243 “Land principally occupied by agriculture, with significant areas of natural vegetation”. These classes are artefacts of the cartographic scale, and not meaningful in a grid approach where data are available on a more detailed level. A grid approach would typically populate and characterize the grid cells with information about the exact amount of both cultivated and natural land. The fact that the basic types are mixed inside a grid cell can be retrieved by analysis but is not itself “a class”. Less obvious examples of mixed classes are the built-up classes 111 Continuous urban fabric and 112 Discontinuous urban fabric. Both will, at the micro-scale, be a mixture of buildings, Task 4.2 – Develop and assess the grid approach Page 18 of 91 roads, other sealed surfaces and unsealed areas. The classification as 111 or 112 depends on the composition of these factors within larger areas. The solution for a CLC compatible grid approach can be to use a detailed land cover map recoded to CLC classes and populate the grid by overlay (as described in chapter 1.4 above). This is the approach used in the case studies below, where the Finnish national HR CLC is used to populate the grids. An alternative approach is to populate the grid with counts and measurements of more elementary features (sealed area, number of buildings, size of buildings etc.) and use data models to derive information compatible with the CLC class definitions for analytical purposes. The main difference between the grid approach and other (raster and vector approaches) is that the need for a classification system is relaxed since “characterization” for many purposes can replace “classification”. The explanation is that there are, fundamentally, two different approaches to land cover and land use mapping: Classification and characterization. Classification is the approach where spatial units are assigned to categories in a pre-defined classification system. The method is well-known from CLC and many national land monitoring systems. Characterization, on the other hand, is the approach where the spatial units are described using a set of independent diagnostic criteria (Di Gregorio and Jansen 2000, Gomarasca 2009). Characterization entails that a spatial unit (the grid cell in our case), instead of being classified, is described by a number of characteristics. These characteristics may, of course, themselves be derived from a classification system (as in the case studies in chapters 3 – 5 below where the grid cells are characterized by a mixed composition of CLC classes), but can also consist of counts (number of buildings, meters of road, number of inhabitants) or other measurements. The grid approach allows flexible classification opportunities after data has been collected and stored against the grid cell. HARMONIZED DATA CLC is a harmonized approach to land monitoring in the sense that it applies a standardized classification system throughout Europe. A grid approach will collect information from existing maps and registers. The standardization of information retrieved from these registers is an essential prerequisite for the grid approach. The issue has not been explored further in the current study, but may possibly be approached using the EAGLE semantic model. ANALYTICAL OPPORTUNITIES WITH RESPECT TO CLC The regular CORINE land cover (CLC) inventories proved to be an essential source for monitoring land cover (LC) and land use (LU) change in Europe. Over the past 30 years, national teams from 39 European countries who are member of the European Environmental Information and Observation Network (EIONET) are working closely together with the EEA and in particular with the CLC technical unit to monitor changes within the European landscape. CLC is recognised as a well-established, reliable information source to support environment policies and beyond. Its use is indispensable in several policies (e.g. no land take by 2050 as targeted in the 7th Environment Action Programme of the European Union).Within the EEA CLC is used in many different EU wide applications: • • • • • • • • • Analysis of different land cover types over Europe Monitoring and trend analysis of land cover changes in Europe in 1990-2012 Landscape fragmentation in Europe Ecosystem mapping and assessment Ecosystem service mapping High nature value farmland and the Common Agricultural Policy Land and ecosystem natural capital accounts Land use and scenario modelling for Integrated Sustainability Assessment Nutrient accounts Soil functionalities However, more spatially and thematically detailed monitoring of land cover and its changes is needed for EEA analysis (Copernicus activities next to CLC). The requirement can be to have more thematic and spatial detail (for specific geographic area like urban areas (Urban Atlas), riparian zones, and Natura2000 sites) or higher temporal frequencies and/or more thematic detail (for thematic areas like imperviousness, forest, grassland, wetlands and water). EXAMPLE Task 4.2 – Develop and assess the grid approach Page 19 of 91 The Land and Ecosystem Accounts (LEAC4 – see Figure 1.10) is a practical example of the grid approach. Many CLC based indicators, like land take indicator (CSI 014/LSI 001)5 calculation is based on LEAC method as described in the EEA technical report (No 11/2006). Figure 1.10: Overview of the LEAC method The core data of the LEAC project have been structured in a relational database model to allow quick and easy analyses. These databases have been made publicly accessible through the Internet. From the LEAC database, various geographical layers have been derived such as land cover flows, Corilis, the green potential background layer and the dominant land-cover types. The LEAC method uses a 1 km resolution grid (called LEAC cubes) as an interface between land cover data and reporting units. Reporting units (e.g. country areas) are rasterized according to the 1 km resolution EEA reference grid6, land cover flows are calculated based on 100 m resolution raster CLC change data, summarized first for the 1 km grid units, and summarized in a second step for the reporting units. 1.6 SPECIAL ISSUES In this subchapter, we discuss a range of special issues connected to the possible use of a gridbased approach in European land monitoring. These include the handling of national vs a common European cartographic projection; border areas between countries; representation of land cover or land use change; visualization; and the amount of data in the pan-European database. PROJECTIONS Current grid projects across Europe are mostly carried out with data in national projections. Pan-European monitoring following a grid approach will have to use a common projection across the entire continent. The solution is most likely to follow the INSPIRE Specification on Geographical Grid Systems (TWG-RS 2009). Key elements of the INSPIRE specification include that it is: 4 http://www.eea.europa.eu/themes/landuse/interactive/land-and-ecosystem-accounting-leac http://www.eea.europa.eu/data-and-maps/indicators/land-take-2 6 http://www.eionet.europa.eu/gis/docs/eea_reference_grid_v1.pdf 5 Task 4.2 – Develop and assess the grid approach Page 20 of 91 • Based on the ETRS89 Lambert Azimuthal Equal Area coordinate reference system with the centre of the projection at the point 52° N, 10° E and setting a false northing: Y0 = 3210000 m and false easting: X0 = 4321000 m. • Identified in INSPIRE as ETRS89-LAEA. • Designated as Grid_ETRS89-LAEA. For identification of an individual resolution level the cell size in meters is appended to the name, such that the grid at a resolution level of 100 km is designated as Grid_ETRS89-LAEA_100K. • Hierarchical in metric coordinates to the power of 10. The resolution of the grid is 1 m, 10 m, 100 m, 1 000 m, 10 000 m, 100 000 m. The cell size is denoted in metres (“m”) for cell sizes up to 100 m or kilometres (“km”) for cell sizes from 1 000 m and above. • Using the lower left corner of the grid cell as reference point • Grid cells are identified by a cell code composed of the size of the cell and the coordinates of the lower left cell corner in ETRS89-LAEA. Values for northing and easting shall be divided by 10n, where n is the number of trailing zeros in the cell size value. Example: The cell code “1kmN2599E4695“ identifies the 1km grid cell with coordinates of the lower left corner: Y=2599000m, X=4695000m. Figure 1.11: ETRS89-LAEA 1Km grid projected in the Norwegian national projection UTM33. The originally square grid units appear here as a rhombi. An ETRS89-LAEA grid will not appear as a regular grid when it is drawn on a map in a national projection (unless ETRS89-LAEA is the national projection). The size of the units will also be changed slightly and depart from the originally uniform size and shape. This is illustrated in Figure 1.11 where an ETRS89-LAEA 1 km grid is superimposed on a map in the Norwegian national projection ETRS89 UTM-33. The grid cells here appear as rhombi. This is not a problem as long as the grid cells are represented as polygons. A possible and feasible solution to the challenges related to map projections is to establish the grid with regular units as polygon data in ETRS89-LAEA and attach a unique identifier (label) to each grid cell (as described in the INSPIRE specification TWG-RS 2009). The subset of the grid that intersects a particular country (or study area) is selected and converted to a polygon data set in the national projection. This polygon data set can be populated with data from national maps and registers using methods described in chapter 1.4 above. The result is a polygon data set with an attached attribute table in the form of an ordinary data table where the grid cells are represented as rows and attributes as columns. The attribute table should include the grid cell identifier. Due to the different size of the grid cell under different projections, the attribute table will have to be post-processed for data that are sensitive to these differences. The solution is probably to handle all areal figures as fractions of the cell size instead of as absolute figures. Frequency data (e.g. buildings or population) are not affected by the projection and can simply be summarized. Linear data (length of roads, coastline, creeks etc.) may have to be scaled, using a scale factor based on the total area of the grid cell calculated in the two projection systems involved. The delivery of national data in a system as described here simply amounts to uploading the data table – not the grid map – to a central processing unit. The data table contains all the data compiled for each grid cell, and the unique grid cell identifier ensures that the data table can be linked directly to the original, untransformed ERTS89-LAEA grid. BORDER AREAS Task 4.2 – Develop and assess the grid approach Page 21 of 91 Border areas between two countries or two study regions constitute a particular challenge when the grid approach is used for pan-European monitoring (Figure 1.12). Border areas will be covered by grid cells populated independently by several national data providers. The merging of these data into a central repository must be undertaken by a central, coordinating institution. To facilitate the merging, each national data provider must also include grid cells partially overlapping their own territory. Areal data can be encoded as relative coverage (percent of the whole grid cell) to minimize the possible effect of different projections used in the countries involved. Figure 1.12: Border areas. The grid cell surrounded by a thick borderline includes three countries: Norway (here with topographic background map), Finland (most of the white and violet areas) and Sweden (a small part of the violet area in the lower left corner of the grid cell. LAND COVER CHANGES Changes and change data constitute a particular challenge under the grid approach. The employment of the grid cell as the basic unit of observation results in the loss of information about gross changes. This requires an explanation of the difference between gross changes and net changes. Net change is the calculated difference in area between two specific points in time while gross change is the total area modified during this period. One problem linked to the grid approach is that information is restricted to the net changes. The net changes may actually hide more complex (and interesting) patterns of gross changes. The difference between gross and net changes is exemplified in Figure 1.13. The example is a single grid cell composed of 25 hectare Built-up land, 25 hectare Agriculture and 50 hectare Forest (left). After some years the content has changed to 37.5 hectare Built-up land, 25 hectare agriculture and 37.5 hectare forest. The net change is that the coverage of Built-up land has increased 12.5 hectare and the coverage of Forest has decreased 12.5 hectare. The gross change is not known at the grid cell level. One possible scenario is that 12.5 hectare Forest land was developed into Built-up land (center). Another possibility is that 12.5 hectare Agricultural land was developed into Built-up land, while 12.5 hectare Forest land was cleared and developed as new Agricultural land (right). Figure 1.13: Gross and net changes. The example is a grid cell composed of Built-up land, Agriculture and Forest (left). Two different development scenarios (center and right) result in the same net changes, although the gross changes are clearly different. Task 4.2 – Develop and assess the grid approach Page 22 of 91 The solution, unless the limitation regarding calculation of change information from grid data is accepted, is to populate the grid with change data in addition to status data. The changes are in this case identified using the more detailed data source and each particular change class (e.g. change from status A into status B) is handled as a separate attribute in the grid model. Figure 1.14 shows an example from the Norwegian land resource map where all changes are logged. Each change is stored as a polygon with LU/LC class before and after the change (in addition to area and date). Based on this data source, it is possible to calculate the area affected by each (or selected) change type(s) within the grid cells. Figure 1.14: Detailed land resource map (Norway) where changes are logged when the map is updated. Each colour represents a particular kind of change. It is possible under a grid approach to calculate the total area of each kind of change for each grid cell. LAND COVER TIME-SERIES A drawback of the vector based CLC-change mapping method is, that changes are delineated with respect to two status years. As a consequence unnecessary or contradictory changes may occur if comparing two CLC-change datasets. As an example, two European CORINE Land Cover Change (CLC-Change) datasets were produced during the CLC2000 and CLC2006 campaigns: • • CLC-Change(1990-2000); CLC-Change(2000-2006) As the second CLC change mapping (in CLC2006) was not harmonized with the results of the first CLC change mapping (in CLC2000), geometric and thematic contradictions might occur between the two CLC change maps. Extra efforts have been taken to harmonize the above two CLC-change datasets7. The methodology is clearly based on a 100 m raster, because due the 5 ha MMU and generalization rules it was impossible to follow change delineations over more than two reference dates, while it was easy to decide to which CLC class a 100 m cell does belong in a certain reference year. As a conclusion, CLC changes between two dates may be interpreted in irregular shape polygon format, but a CLC time-series are better to be created and interpreted in a raster / grid approach. DATA SIZE The amount of data generated by a grid approach depends on two factors: the grid resolution and the number of attributes used for characterization. Key figures for Norway (with a land area of 323 758 km2) is that a 5 km grid consists of 20 747 grid cells when the grid is limited to the land area (including inland water, but not extending into the ocean - see example in Figure 1.15 below). A higher resolution 1 km grid results in approximately 25 times as many grid cells (533 918 in the Norwegian dataset) while a 100 m grid will have around 50 million grid cells in Norway alone. 7 Development of methodology to eliminate contradictions between CLC-Change1990-2000 and CLCChange2000-2006. ETC-LUSI report. 2011. Task 4.2 – Develop and assess the grid approach Page 23 of 91 Figure 1.15: Agricultural area (%) in a 5 x 5 km grid of Norway. The map exemplifies visualization of a single attribute in a grid model by using a graduate colour scale. The challenge with respect to a complete European coverage depends on the area included. EEA (2010) is reporting 5 424 171 km2 as the total area covered by the analysis in that report. This figure would imply approximately 220 000 grid cells when a 5 km model is used; 5.5 million grid cells when a 1 km model is used; and more than 500 million grid cells when a 100 m model is used. VISUALIZATION Visualization of grid data can be a challenge because the grid model is data-rich. In land monitoring based on polygons (e.g. CLC), each polygon is assigned to a single land cover class and the corresponding visualization method is the traditional and well-known choropleth map. Grid models obviously provide excellent opportunities to visualize each class separately, since the amount of each class in each grid cell is known. An example is the accompanying map of agricultural area in Norway, shown in Figure 1.15 using a graduate colour scale. The grid cell size in the example is 5 x 5 km. The graduate colour method is clearly insufficient for visualization of a grid model more closely related to CLC. Although each land cover class is represented as a separate attribute in the grid model, there is still a need for a composite map showing all the land cover classes together in a single map. SYKE (Finland) has developed a method for this purpose (Figure 1.16). The SYKE approach is designed for the Finnish HR CLC aggregated to a 1 km grid. Each original CLC class is a separate attribute in the 1 km grid model. Each grid cell is thus attached to an attribute vector representing the statistical distribution of CLC classes inside the grid cell. The visualization involves two steps. The first step is to determine which CLC Level 1 class that dominates the grid cell. The second step is to find the dominating CLC Level 3 class within the Task 4.2 – Develop and assess the grid approach Page 24 of 91 previously identified Level 1 class. The result is the class assigned to the grid cell for visualization. Example: A particular grid cell has a statistical profile where the combination of CLC classes is 10% 1.1.1 Continuous urban fabric 30% 1.1.2 Discontinuous urban fabric 20% 1.2.1 Industrial and commercial land 40% 3.2.1 Conifer forest The visualization according to the Finnish method will show this grid cell as 1.1.2 (Discontinuous urban fabric) because it is the dominating Level 3 class within the dominating level 1 class (Built-up land). Figure 1.16: Left image is the Finnish land cover data (vector format, 25 ha MMU). Right image is the data aggregated to 1 km grid visualized using the Finnish method for visualization of grid data. WORKING WITH GRIDS The storage system for a grid based data structure can either be ID based or raster based. An ID based structure consists of a geometry representing the locations of grid cells (either as points, polygons or as a raster) and with a unique ID attached to each raster grid cell. The geometrical storage structure does not include any attribute data. The attributes are instead stored in separate attribute data tables with the unique ID taking the position as a unique reference that can be used to link the attribute table and the geometry. A raster based storage structure, on the other hand, organizes the attribute data in a tabular structure where the position of an attribute value inside the table reflects the location of the observation. Each raster table corresponds to one attribute column in an attribute data table. A large number of raster tables are required in order to uphold the information in a grid structure, but it is possible to use this approach to “simulate” the grid structure using raster data tools. The advantages of the raster based solution are that raster processing usually is (much) faster than vector data processing. Raster data may be also be compressed and require less storage (at the cost of less efficient processing). The pan-European 100m resolution raster data (CLC, HRLs) are therefore stored and processed as raster data (Maucha 2013). The disadvantage of raster based solutions is the need for special software for analysis; the number of raster layers involved and the separation of the storage and processing environment from the data capturing environment (that anyhow has to be polygon based). The advantage of an ID based solution is that data are stored and processed in database tables and standard database processing tools and statistical software can be applied for data analysis (e.g. Strand 1997). IDs refer to fixed locations even without a “real” GIS layer in the background. Many related tables may be stored in the same database and data transferred between users simply as data tables. No dedicated GIS software is needed for ID based powerful statistical Task 4.2 – Develop and assess the grid approach Page 25 of 91 analysis (e.g. LEAC cubes: see http://www.eea.europa.eu/themes/landuse/interactive/land-andecosystem-accounting-leac) The anticipated limitation of ID based solution is that working with pan-European layers the vector based grid solution is very slow when the current (proprietary) GIS solutions are used. The length of unique IDs – at least when the INSPIRE recommendation is followed - may also be too long to be stored in existing commercial raster formats. Open source spatial database and software and open source spatial data analytical software is likely to change this picture, but the land cover monitoring community seems so far to have limited experience with these tools. ACCESS TO INFORMATION The HELM project (Ben-Asher 2013) reported that access to detailed national data for use in pan-European monitoring can be restricted. It can even be difficult to access national data across sectors, although the situation is improving with the introduction of national spatial data infrastructures in many countries. The grid approach can potentially improve this situation. The access to detailed vector data may be restricted due to concern about data privacy, social safety or commercial considerations. It may be easier to release these data in a grid format. National statistical agencies do, as an example, have a restrictive policy regarding the distribution of detailed population data. A 1 km grid resolution with particular handling of grid cells with very few inhabitants has, however, been found acceptable as a way to distribute these data (see more details in chapter 2 below). A grid approach may also open other national data sources for use in land monitoring. There is so far no systematic study of this possibility, but it is likely that agencies now reluctant to give access to LPIS, cadastres, road data, building registers etc. will be more forthcoming towards requests for data delivered on a sufficiently coarse grid format. Task 4.2 – Develop and assess the grid approach Page 26 of 91 2 THE GRID APPROACH USED IN EUROSTAT/ EFGS 2.1 INTRODUCTION AND BACKGROUND This chapter is aimed at reviewing the contributions to the grid approach carried out by the European Forum for Geography and Statistics (EFGS) and Eurostat through the GEOSTAT project. The current report collects and summarises the main outcomes of the projects GEOSTAT 1A and GEOSTAT 1B and discusses potential synergies of such grid approaches in relation to land monitoring in Europe. The European Forum for Geography and Statistics (http://www.efgs.info/) – former European Forum for Geostatistics – started as a voluntary cooperation between National Statistical Institutions (NSIs) in the Nordic countries in 1998, on the use of geographic information systems (GIS) and statistics. Today, it gathers people from more than 40 states and territories and holds annual conferences and meetings. At the time of writing, the next EFGS Conference will be held in Vienna in November 2015 (http://www.statistik.at/web_en/about_us/events/efgs2015/index.html). The activities of the EFGS are focused on the development of best practices in the production of geostatistics in Europe, putting a special emphasis on the grid-based statistics and the spatial disaggregation of statistical data. At the beginning of 2010, Eurostat, in cooperation with the EFGS, launched the GEOSTAT project, aimed at promoting grid-based statistics and, more generally, working towards the integration of statistical and geospatial information in a common information infrastructure for the EU. Its main objective is to develop common guidelines for the collection and production of spatial- and grid-statistics within the European Statistical System (ESS). The first concrete action carried out has been the GEOSTAT 1 project, aimed at representing various population characteristics in 1 km² grid datasets. GEOSTAT 1 was divided in three phases: GEOSTAT 1A produced a population prototype dataset for 2006 and developed a methodology to be followed by others for generating European grid statistics. GEOSTAT 1B produced a first version of a grid dataset for the Census year 2011. The first version currently holds data from 18 of the EU and EFTA countries, with the remaining 14 countries filled with modelled data. It has also created a detailed manual for creating population grids from statistical information using aggregation or disaggregation techniques. In a case study on accessibility to health care, the project has demonstrated how grid statistics can be used for spatial analysis. The GEOSTAT 2011 V1 dataset is available for download, together with that for 2006 (http://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/populationdistribution-demography). The aim of GEOSTAT 1C (currently on-going) is to add further national data to the GEOSTAT 2011 data. A second version of the GEOSTAT dataset can be expected this year (2015). This report makes an assessment of the outcomes of the GEOSTAT 1A and 1B. A list of the documents is provided in Annex 1 2.2 GEOSTAT 1A GEOSTAT 1A was the first step in the long-term framework of integrating spatial information and statistics, promoted by Eurostat and the ESS. It had the aim of developing a vision and the methodological foundations for a population grid dataset. The project claimed “The GEOSTAT vision is to have a combination of grid-based statistical systems which meets needs at different geographical levels, from the national and continental level to the global level”, whereas “the GEOSTAT action is about developing guidelines for datasets and methods to link 2011 Census statistics to a common harmonised grid”. The project wanted to take advantage of the fact that many NSI have the ability to produce statistics for very small areas, especially regarding population variables. They took on board previous ideas of displaying population concentrations in a geographical grid. They especially mention the population grid that JRC produced in 2004, using disaggregation techniques, and they highlight its limitations. Task 4.2 – Develop and assess the grid approach Page 27 of 91 OBJECTIVES OF GEOSTAT 1A The objectives defined for GEOSTAT 1A were: • • • • • Develop a set of methods, tools and guidelines to create harmonised data sets. Create a population grid using existing population data sources. Prepare a vision document for a spatial data infrastructure for geostatistics. Contribute to the integration of GEOSTAT within all major European information systems (GMES, INSPIRE, SEIS etc.). Disseminate and share the results from the GEOSTAT project among NSI. USER AND PRODUCER NEEDS AND REQUIREMENTS To have a better knowledge on user needs and best practices, GEOSTAT 1A carried out 2 different surveys. Regarding data users, the majority of respondents used grid-based statistics for the purposes of spatial analyses. They mainly needed population, housing and economic variables on the basis of grids. Moreover, there was a clear correlation between area of interest and grid size. The result of preferred grid sizes is shown in Figure 2.1. Figure 2.1: The preferred grid sizes reported from the GEOSTAT 1A survey. As for data producers, 19 organizations answered the survey and it was concluded that most countries publish the data in national grid systems, although they also consider the one-grid system and Europe-wide harmonization to be important and most agreed that there should be a grid dataset available covering the whole of Europe. Producers also rated grid sizes up to 1 km as the most relevant for meeting users requirements. For reasons of data confidentiality and national data protection acts, almost all data producers face certain restrictions in their dissemination of grid statistics. Business models differ from one country to another. Some data are freely distributed; in other cases a fee is requested for delivering data to pay for the production of the statistics. RECOMMENDATIONS ON DATA SPECIFICATIONS From the surveys some recommendations are given with regard to grid-based statistics and their publication at European level. Map projections Task 4.2 – Develop and assess the grid approach Page 28 of 91 Recommendation 1: Grid data for the European GEOSTAT dataset should be referenced to the European Grid Grid_ETRS89-LAEA_1K. If no direct aggregation from national point data sources into the INSPIRE grid is possible, the GEOSTAT project proposed to convert national grids into the INSPIRE 1 km² grid for the European GEOSTAT dataset. Grid sizes Recommendation 2: The GEOSTAT grid dataset should have a grid cell size of 1 km². Coding system Recommendation 3: For unambiguous referencing and identification of a grid cell, the cell code should be composed of the size of the cell (1 and the coordinates of the lower left cell corner in ETRS89-LAEA should be used). The cell size should be denoted in kilometres (‘km’) for cell size 1 km. Values for northing and easting should be divided by 1000. The cell code ‘1kmN2599E4695’ identifies the 1 km grid cell with coordinates of the lower left corner: Y=2599000m, X=4695000m. Confidentiality Recommendation 4: For the European GEOSTAT dataset at 1 km² the total population should be disclosed without restrictions for any grid cell. The GEOSTAT project recommends that data protection measures should depend on the sensitive nature of the variables. Absolute counts of the statistical units (such as number of inhabitants, households, buildings, workplaces) should be disseminated without any restrictions, even for the smallest grid sizes. Data dissemination Recommendation 5: The GEOSTAT dataset may be downloaded free of charge and without access restrictions in the form of a .csv file with the statistical data, a grid net shape file and an INSPIRE metadata file in ISO19139 encoding. The dataset may be downloaded from the EFGS or Eurostat websites. Data update frequency Recommendation 6: The GEOSTAT dataset version 1 should have the reference year 2006. The next version of the European GEOSTAT dataset should have the reference year 2011. Data quality measures Recommendation 7: The data quality of the GEOSTAT dataset version 1 should be in the form of INSPIRE metadata encoded in ISO19139 .xml files. The production process should be documented per grid data source and country. THE GEOSTAT 1A DATASET To achieve the final goal of GEOSTAT, which has been explained before, it was agreed that it was important to have an initial example of a dataset which already highlighted the main concepts of the final data deliverable and also helps to identify obstacles and challenges. It was decided to fix 2006 as the reference year for the first GEOSTAT population grid, building on the idea of the former EFGS map (ESS Eurogrid Population Map 2009) of the European population on 1 km² grid, where some nationally produced population grids were integrated into an estimated European population grid (JRC 2001). In this phase of the project, GEOSTAT collected, on the one hand, some national experiences working with grid data. On the second hand, based on that experience and derived guidelines, the NSI were invited to produce their own country’s harmonised population grid data for 2006, which would be part of the final European dataset. In parallel, an European-wide estimated population grid was updated and improved. National data sources Grid-based data from national sources were produced by quite diverse methods and from a wide variety of data sources. Moreover, documentation was sometimes very poor. Therefore it was not easy to assess the quality and limitations of a grid product. In GEOSTAT 1A six countries shared their experiences on that topic (Austria, Finland, Norway, Poland, Portugal and France). National production methods Task 4.2 – Develop and assess the grid approach Page 29 of 91 In this section, GEOSTAT 1A assessed different production methods from countries experience. Grid data can be produced by aggregation or disaggregation methods, with the former being the preferred one in terms of accuracy and quality. Conversion to ETRS89_LAEA Some tests were done converting national gridded data to the European grid and the results were assessed. The main conclusions were that the way in which the conversion is undertaken is very important. More details in the source grid data improved the standard of the derived grid data. The best results were obtained when converting the source data to the European Reference and transferring that information to the grid, although there was the drawback of doubling the source information. Country reports Other methods, used by some countries to produce the GEOSTAT 2006 dataset, consisted of aggregating population data from point sources (e.g. building registers). Depending on the resolution of population source data, aggregation or disaggregation methods should be applied. Some countries had to, partially or completely, disaggregate the population data, but others were able to produce the new dataset by aggregating population figures to the 1 km2 grid cells. Each national approach was discussed within the project. The best option would be to use building centroids with population data when these are available. European grid production initiatives In parallel to national initiatives, an improved European population grid was produced by means of disaggregation techniques. Taking the 2001 JRC grid as a starting point, a 2006 population grid was produced by the Austrian Institute of Technology (AIT), using the soil sealing HRL as ancillary data to improve the disaggregation of population figures. In addition to the European disaggregated grid, AIT also produced national grid datasets for those countries that share a border with a GEOSTAT country. This simplifies the integration of the two datasets along the borders and to deal more accurately with population in shared border cells. Description of the GEOSTAT 1A dataset The GEOSTAT Population grid was made from the national grids complemented with the European population grid by AIT, whenever national grids are not available. This section of the GEOSTAT 1A Final Report describes some of the specifications of the product, such as field names, metadata, etc. Statistics of the GEOSTAT 1A grid dataset The reference grid Grid_ETRS89_LAEA_Europe_1K, intersecting with the EU27 + EFTA landmass delineated by a coastline/boundary dataset at scale 1:100 000, has a total of 4 884 516 grid cells whereas the GEOSTAT_2006_POP grid contains 1 945 315 inhabited grid cells with at least one inhabitant. In total the GEOSTAT national dataset represents 43% of the total 505 Mio EU 27+ EFTA citizens population in 2006 and covers 2 446 199 km² equal to ~50% of the total land surface of 4 825 275km² while the disaggregated population dataset covers 57% of the population. A method for generating population grid statistics As it has been said, the production of the GEOSTAT 2006 population grid was the “testing phase” of the ultimate goal of producing a high quality population grid of the 2011 census. Although the product was really good, it still had some limitations, such as different national production processes, or the fact that it only accounts for total population at the place of residence. Under this section, the report highlights different aspects to be taken into account for future developments. General method to generate grids The production of statistical grids usually consists of two elements: a grid net and a statistical table. A grid net is not always necessary, because georeferenced data can be processed just like any territorial statistics or like spatial statistics by point georeferences. Grid-based statistics are territorial statistics which use direct x and y coordinates as a location code. The first precondition for generating grid-based statistics is that thematic data can be georeferenced. Aggregation method: Figure 2.2 shows a schema of the proposed workflow for the aggregation method. They highlight the importance of using the same reference system and, in case of needed transformations; they should be carried out with the source layers. Task 4.2 – Develop and assess the grid approach Page 30 of 91 Disaggregation method: The overall workflow of the disaggregation approach is described in detail in the literature (see Batista e Silva et. al 2011, Gallego 2010 for further references). The disaggregation method is particularly useful when large territories are covered in a multinational to global context. GEOSTAT was not meant to improve such European grids, but the disaggregated grid is very useful to cover gaps when country grids are not available. A disaggregation model is needed to estimate, for each grid cell or part of grid cell, different density categories defined by ancillary data. The calculation of density categories and weights depends on the ancillary data available. Ancillary data can be LU/LC, night time lights, soil sealing, road network, buildings etc. Figure 2.2: The proposed workflow for the aggregation method. Confidentiality management The GEOSTAT project recommends that data protection measures should depend on the sensitivity of the variables. Absolute counts of the statistical units (such as number of inhabitants, households, buildings, workplaces) should be disseminated without any restriction, even for the smallest grid sizes. However, if the grid cells are not sufficiently populated, further characteristics of the statistical units should be protected (e.g. a person's age). One way to control the dissemination and use of grid-based statistics is to release them on condition that a specific license is granted. An agreement on the use of similar confidentiality handling methods is needed when harmonized cross-border datasets are disseminated. GEOSTAT stated that there was a need for further technical developments of disclosure control methods which take the spatial characteristics of grid data into account. Data Quality GEOSTAT was convinced that, generally speaking, the quality of population grid data produced by National Statistical Institutes is higher than disaggregated European-wide population grid data. National data sources are much more accurate than European population data sources. There are two aspects to the notion of quality in the context of grid-based statistics: • • quality of the georeferenced source data quality of the production process. Different metadata properties give an idea of the quality of the gridded datasets. Grid sizes Two grid systems are proposed by GEOSTAT: a primary one, with cells with 10a (where 6≥a≥ 0) size in meters (i.e. 100 km, 10 km, 1 km, 100 m, 10 m and 1m); a secondary one, defined for a Task 4.2 – Develop and assess the grid approach Page 31 of 91 more extensive series of grids: 10m, 25m, 50m, 100m, 250m, 500m, 1km. 2.5km, 5km, 10km, 25km, 50km. The secondary system has the advantage that can always be aggregated up to the primary one. A CONCEPT FOR A GEOSTAT DATA INFRASTRUCTURE Under this chapter, it is discussed the GEOSTAT dataset as a distributed service, where all NSIs which deliver data to the system share the benefits of its use and also contribute to the definition of the terms or business model under which the data may be used. The publication of grid statistics falls under the INSPIRE directive and has to fulfil its requirements. However, being a statistical and not just a spatial dataset, there are also certain restrictions and conditions as regards compliance of the GEOSTAT dataset with INSPIRE. A concept for a future GEOSTAT data infrastructure is defined and detailed, in terms of data types, data services, rights management, metadata and web services. CONCLUSIONS AND FURTHER WORK Some conclusions are given, on the basis of surveys, discussions and results gained from the project. The different actions carried out by the project have led to improve the overall scope and quality of the existing ‘ESSgrid’ effort to map the European population on 1 km² grids. Knowledge sharing The knowledge and results gained have been spread to the ESS and through the EFGS, putting emphasis on the fact that non-participating countries also benefit from the developments and are encouraged to implement the same approach. Sustainability ESS partnership builds upon the existing EFGS network and website, which consisted of national contacts from 32 countries. Support to ESS was requested to solve some issues and motivate more countries to participate. One of the aims is being able to produce grids more often than census is undertaken (every 10 years). Grid production The project has concentrated on total population. Two systems of grids was proposed: a primary and a secondary one. Conversions of grid data were tested with different sizes of national grids. Support for the development and refinement of disaggregation methods in cooperation with research institutes will remain high on GEOSTAT's agenda. One package — one provider The goal of the EFGS is to act as a hub for users of grid statistics, a single point of contact. It was stated that users mainly need population, housing and economic variables on the basis of grids. Confidentiality It was concluded that data providers differentiate between more and less sensitive data and define accordingly the thresholds for variables and corresponding grid cell sizes. As the 1km² grid can be considered to be a fairly rough grid, no restrictions should be applied to the publication of total numbers such as the total population figure, the total number of buildings and the total number of dwellings. Quality The GEOSTAT project examined a common process of quality assessment. In the area of gridbased statistics there are two major perspectives: one is the quality of its georeferenced source data and the other is the quality of the production process. The quality of grid data should be described both from the spatial perspective (INSPIRE) and the statistical one (SDMX). Dissemination The project disseminated population grid data by 1 km2 free of charge via EFGS. Tools It was identified that more support for tools would be needed for future iterations, and tools should be non-proprietary and open source. Task 4.2 – Develop and assess the grid approach 2.3 Page 32 of 91 GEOSTAT 1B The GEOSTAT 1B action was about developing guidelines for datasets and methods to link population and housing census 2011 statistics to a common harmonized grid. This action was launched at the beginning of 2011 and lasted for 24 months. GEOSTAT 1B was built on the work undertaken by the EFGS and the project GEOSTAT 1A. The GEOSTAT 1B final report was made following the structure of a statistical production chain. POPULATION GRID STATISTICS AS PART OF THE STATISTICAL PRODUCTION CHAIN In this project the focus was put on increasing the use of GIS in the work of NSIs. It was decided to follow the statistical production chain to analyze the different possibilities. This production chain is also known as the Generic Statistical Business Process Model (GSBPM). On this basis, the report describes the following aspects/phases of the process chain: • • • • • Specific needs Design Build, collect and process Analyse Disseminate Specific needs Two overall user needs were identified: • • The need for integrating geography and statistics The need for spatially referenced statistics in a hierarchical system of stable and neutral grids. It was stressed the need of integrating geography and statistics to better describe the world and analyse different phenomena, and e.g. contributing to support the EU2020 strategy on sustainable and inclusive growth. Although GEOSTAT 1B focused on population and housing information, other discussion forums (e.g. UN Expert Group on the Integration of Statistical and Geospatial Information) identified other topics to benefit from such type of integration, like agriculture and economic censuses and environmental-economic accounting. As already identified in GEOSTAT 1A, the need for detailed population data by grid is widespread. These needs can be described from two perspectives: • • The need from users for grid data as a tool for analysis and information. The need from producers of statistics to produce grid data or to support the production of statistics using grid data, e.g. in the sample design. In GEOSTAT 1B the overall objective was to produce population data on grids including the following variables: • - Total population, population divided by age and sex. Like in GEOSTAT 1A, grid data were produced by means of aggregation of micro-data when possible and, for those countries not providing such information, by disaggregation. Both methods are fully described in the appendices of the GEOSTAT 1B report. Design Under this section, the report describes the grid to be used by GEOSTAT 1B and the characteristics of the statistical variables. This is a summary: The grid is the LAEA European Reference Grid at 1 km2 as in GEOSTAT 1A. The naming convention is GEOSTAT_grid_POP_1K_CC_YYYY (CC:country code, YYYY: ref. year, e.g. GEOSTAT_grid_POP_1K_SE_2011). There is also a convention for field names. The variables to be integrated are: • • • • TOT_P [Number of inhabitants, total] TOT_M [Number of inhabitants, male] TOT_F [Number of inhabitants, female] M_00_14 [Number of inhabitants, male age 0-14] Task 4.2 – Develop and assess the grid approach • • • • • Page 33 of 91 M_15_64 [Number of inhabitants, male age 15-64] M_65_ [Number of inhabitants, male age 65-] F_00_14 [Number of inhabitants, female age 0-14] F_15_64 [Number of inhabitants, female age 15-64] F_65_ [Number of inhabitants, female age 65-] Some parameters were defined to make a quality assessment of the primary data production and also the grid data production. Data source quality was also discussed in the appendices of the report. Build, collect and process Based on the design, the various partners carried out their work with building, collecting and processing their data. Depending on the data availability, there were two different approaches to produce the population grids: “aggregated” and “hybrid”. Following Finnish developments, the “aggregated” approach is fully described in the appendices, and a webtool was also produced to support producers carrying out this method. The NSI of Bulgaria followed and described in a detailed way (also in the appendices) the socalled “hybrid” approach. Analyse As a response to the data request from the project, the following countries contributed to the GEOSTAT 2011 grid: Belgium, Bulgaria, Czech Republic, Estonia, Finland, France, Ireland, Netherlands, Norway, Poland, Portugal, Slovenia, Sweden and Switzerland. To test the population grid, the GEOSTAT 1B partners agreed to work together on a case study calculating the population within 30 minutes from emergency hospitals. GIS was used and the aim was to explore: • • the consequences of using grids as statistical units instead of other statistical units as e.g. municipality areas. the consequences of data confidentiality policies applied by various NSIs when publishing population grids. The results of this exercise are fully described in one appendix of GEOSTAT 1B. Disseminate Four workshops and two Conferences were organised during the development of the GEOSTAT 1B project to disseminate the methods and results. The EFGS published data and report and remains as one of the main channels for dissemination. OVERARCHING PROCESSES – QUALITY MANAGEMENT Some aspects regarding the quality improvement of the production system and the data itself were introduced by some project partners. Particularly, Statistics Portugal introduced a new sampling frame for household survey purposes, on the basis of the European Grid. For each building their corresponding GRID Cell is known. For the sampling process of the household surveys, cells are selected within strata (NUTS3 areas), and a spatial sorting algorithm assures the contiguity/proximity of each cell. An appendix details this new sampling infrastructure. In parallel, the Czech Statistical Office tested the disaggregation method to allocate people that cannot be linked to the exact place of usual residence, which is almost 1% of total population in that country. That approach is also detailed in one of the appendices to the GEOSTAT 1B final report. GEOSTAT 1B APPENDICES The GEOSTAT 1B Final Report contains several appendices which complement the information of the report with technical details, case studies or posters with methodological schemas. The list of the appendices included has been provided in section 1. Annex 1 includes a short description of the contents of each Appendix. 2.4 GEOSTAT RESULTS AND THE GRID APPROACH IN LAND MONITORING The projects GEOSTAT 1A and 1B (and ongoing 1C) are a proof of concept of a grid approach in the framework of statistical data, or, more specifically, population statistics. The efforts carried out so far by Eurostat and the EFGS to develop some methods and produce population grid data, highlight the big importance that the grid can have at European level as data storage and analysis system. Besides providing detailed and homogeneous information for large areas and Task 4.2 – Develop and assess the grid approach Page 34 of 91 overcoming the problem of heterogeneous and arbitrary administrative boundaries, the grid approach has proven to be useful for different types of analysis (e.g. accessibility to emergency hospitals, vegetation change detection…). If we think of the interaction of GEOSTAT and the grid approach in land monitoring described by the EAGLE team, we see different aspects where benefits and synergies can be established, basically focused in two topics: • • The methodology and standardisation of working processes Combination of statistical and land cover / land use information through the Grid. THE GEOSTAT METHODS AND THE GRID APPROACH IN LAND MONITORING The GEOSTAT experience is very valuable to any kind of grid approach, as it has provided information regarding: • • • • • User and producer needs (e.g. preferred grid size). Methods of aggregation / disaggregation and hybrid to feed a grid with data depending on the source information. Standardisation aspects, such as projections, coding, field names, etc. Dissemination and confidentiality issues. Quality assessment. These different topics have been summarised in this report, which can be used for reference when implementing a grid approach for land monitoring. Although GEOSTAT is working basically on population statistics, accounting for number of people within a grid cell or accounting for the number of hectares of a specific land cover class is not so different. Therefore all the issues about populating grids discussed by GEOSTAT are useful for our purposes. Moreover, some case studies on the use of gridded information have been undertaken, providing also valuable information. COMBINATION OF STATISTICAL DATA AND LAND COVER DATA THROUGH THE GRID The grid approach has a great potential in terms of data integration. The European Reference Grid can become a common integrator layer for different types and formats of data. The challenge of integrating data which are thematically, spatially and/or temporary heterogeneous has already been faced, for instance, by the ESPON Programme. A single grid cell can contain at the same time information on e.g. population, land cover and administrative units. In this way all this data can be queried at once and provide integrated assessments, for instance through OLAP8 databases. The UAB has been working in this direction with the EEA and under the ESPON Programme. For more info: http://www.espon.eu/export/sites/default/Documents/Publications/ScientificReports/ThirdDecem ber2014/202425_ESPON_SCIENTIFIC_REPORT_3.pdf 8 OnLine Analytical Processing. Task 4.2 – Develop and assess the grid approach Page 35 of 91 3 CASE STUDY: STATISTICAL ANALYSIS The main objective of the next three chapters is to analyze and demonstrate various aspects of the grid approach using real data. When using the grid approach it is possible to maintain (or preserve) all detailed quantitative information on land cover classes provided by diverse data sources and to use this information for querying, data integration and statistical purposes. However, the cell size of the grid has an effect on the information regarding the spatial distribution of the land cover classes. These effects will also be demonstrated. The practical exercise was divided into three actions: The first was the population of the grids, which formed a base for the next actions, that were the quantitative and qualitative analysis. The quantitative analysis focused on comparing the coverage of national High Resolution CLC classes and European CLC classes in each grid cell. The qualitative analysis was done by visual comparisons of maps made from populated grids and original EU CLC vector maps. The province of Kymenlaakso in south-eastern Finland (Figure 3.1) was selected as a test area for the study. The main reason to select a Finnish test case was the fact that Finnish Environment Institute has produced the European CLC 25 ha data using a bottom up approach based on a more spatially detailed national CLC data set and thus has comparable national and European data from the years 000, 2006 and 2012. 3.1 Figure 3.1: Kymenlaakso province in dark green SOURCE DATASETS AND THE GRID POPULATION PROCESS The two main source datasets used in the study were the pan-European Corine Land Cover 25 ha vector for 2012 (EU CLC) and the National High Resolution Corine Land Cover 20 m raster for 2012 (HR CLC) from Finland (Figure 3.2). Additionally, Finnish High Resolution Corine Land Cover Changes 2006-2012 (MMU 1ha, 20m raster) and High Resolution Corine Land Cover 25m raster for 2006 were used in the change detection part of the quantitative analysis. Population data from Statistics Finland was used to demonstrate the use of additional datasets together with LC grid data. HR CLC and EU CLC used the same (standard European) LU/LC classification. However, there were two classes that are present in HR CLC but absent in EU CLC, due to the coarser scale and generalization procedures. These are 111 (Continuous Urban Fabric) and 244 (Agroforestry Areas). The former is generalized into 112 (Discontinuous Urban Fabric) and the latter into 231 (Pastures). There are also two classes that are generated by the generalization process of EU CLC and thus have no equivalent in HR CLC. Those classes are 141 (Green Urban areas) and 242 (Complex Cultivation). For this exercise, two grids were produced; 1 km and100m x 100m. Both grid datasets were populated with the same set of attributes: the amount of each CLC class from both HR CLC data and EU CLC data. Both sets of attributes were created in percentages and in ha / m2. Additionally the majority class for each grid cell was calculated from both HR and EU data. The grid population process was run as Python scripts in ESRI ArcMap, with the Tabulate Area (for raster dataset) and Tabulate Intersection (for vector dataset) tools forming the base of the process. Task 4.2 – Develop and assess the grid approach Page 36 of 91 Figure 3.2. City of Kouvola shown in HR CLC (left) and EU CLC (right) The change detection analysis was only carried out for forest changes using the 1 km grid. Forest changes are the prominent change type in Finland and thus deemed suitable for this exercise. 3.2 COMPARISON OF PAN-EUROPEAN CLC AND NATIONAL HR CLC This example of a statistical analysis is a comparison of (the pan-European) EU CLC and (the Finnish national) HR CLC using a 1 km grid as a basis for the comparison. A global analysis provides an overview of the area and share of all the LC classes according to the two different data sources. Further comparison on a cell by cell basis shows in more detail how the amount of each CLC class is reported differently in the two data sources. It should be noted that this study did not compare the grid approach to the vector-based CLC approach. The study is primarily a study demonstrating how the grid approach can be used to analyze and compare data. We could have chosen other datasets for this purpose. Using CLC data from two different sources does, however, give the added value that the results can be used to demonstrate some of the aspects of a gridded approach to CLC compatible land monitoring based on more detailed data from maps and registers. DESCRIPTIVE STATISTICS A general overview of all the land cover classes in the study area (areas, percentages and absolute and relative differences, between the HR CLC and the EU CLC) is found in Table 3.1. Note that there was a slight difference of total area between HR CLC and EU CLC (11 ha). This was probably due to rounding errors and the presence of “no-data” pixels in EU CLC data. There are four classes (242, 321, 322 and 333) which appear in the final table, but have no area at all, so they are not considered in the analysis (highlighted in light red in Table 3.1). At first sight, we notice some important differences in terms of total area and share for some of the classes, depending on the source of the data. To better analyse those differences, we have distinguished between majority classes and minority classes, using a coverage threshold of 5%. In the “Diff” columns, areas and percentages in red indicate underestimation by EU CLC, whereas black colours indicate overestimation. One notable result refers to class 243 (Land principally occupied by agriculture...), as it is almost non-existent in HR CLC (0.4%) but represents a share of 5.2% in EU CLC. This class seems to be seriously affected by a combination of the different MMU and the fact that it is a mixed class (something similar can be Task 4.2 – Develop and assess the grid approach Page 37 of 91 observed regarding the “mixed forest” class). These issues are discussed in the analysis class by class below, and in the concluding chapter. Table 3.1. Total figures (area, percentage and differences) for all classes according to the HR CLC and the EU CLC within the 1 km populated grid Class 111 112 121 122 123 124 131 132 133 141 142 211 222 231 242 243 244 311 312 313 321 322 324 331 332 333 411 412 421 511 512 523 Description Area_N Area_E %_N %_E Diff. E-N % Diff E-N Continuous urban fabric 546 0 0.07% 0.00% -546 -100.00% Discontinuous urban fabric 11611 14218 1.51% 1.85% 2607 22.45% Industrial or commercial units 5598 2887 0.73% 0.38% -2711 -48.43% Road and rail networks and associated6648 land 43 0.87% 0.01% -6605 -99.35% Port areas 1003 1357 0.13% 0.18% 354 35.29% Airports 329 424 0.04% 0.06% 95 28.88% Mineral extraction sites 1849 796 0.24% 0.10% -1053 -56.95% Dump sites 265 201 0.03% 0.03% -64 -24.15% Construction sites 386 305 0.05% 0.04% -81 -20.98% Green urban areas 0 397 0.00% 0.05% 397 100.00% Sport and leisure facilities 5656 366 0.74% 0.05% -5290 -93.53% Non-irrigated arable land 88285 70174 11.49% 9.13% -18111 -20.51% Fruit trees and berry plantations 209 36 0.03% 0.00% -173 -82.78% Pastures 212 0 0.03% 0.00% -212 -100.00% Complex cultivation patterns 0 0 0.00% 0.00% 0 Land principally occupied by agriculture, 3555with significant 40049 areas0.46% of natural vegetation 5.21% 36494 1026.55% Agro-forestry areas 10 0 0.00% 0.00% -10 -100.00% Broad-leaved forest 17709 3757 2.30% 0.49% -13952 -78.78% Coniferous forest 241067 251617 31.37% 32.74% 10550 4.38% Mixed forest 76005 106230 9.89% 13.82% 30225 39.77% Natural grasslands 0 0 0.00% 0.00% 0 Moors and heathland 0 0 0.00% 0.00% 0 Transitional woodland-shrub 63594 35485 8.28% 4.62% -28109 -44.20% Beaches, dunes, sands 22 0 0.00% 0.00% -22 -100.00% Bare rocks 615 0 0.08% 0.00% -615 -100.00% Sparsely vegetated areas 0 0 0.00% 0.00% 0 -2114 -72.30% Inland marshes 2924 810 0.38% 0.11% Peat bogs 6355 4663 0.83% 0.61% -1692 -26.62% Salt marshes 2289 1272 0.30% 0.17% -1017 -44.43% Water courses 2819 2173 0.37% 0.28% -646 -22.92% Water bodies 46606 46473 6.06% 6.05% -133 -0.29% Sea and ocean 182323 184746 23.72% 24.04% 2423 1.33% TOTAL HA 768490 768479 100.00% 100.00% -11 0.00% Legend: Area_N Area_E %_N %_E Diff. E-N % Diff E-N Total area (ha) in HR Finnish CLC Total area (ha) in EU CLC Percentage of class over total area in HR Finnish CLC Percentage of class over total area in EU CLC Difference in area (ha) between EU CLC and HR Finnish CLC Difference in percentage between EU CLC and HR Finnish CLC MAJORITY CLASSES ANALYSIS OF 1 KM GRID DATA We will now present a majority classes analysis of 1 km grid data populated with HR Finnish CLC 2012 and EU CLC 2012. We consider as majority classes those having a share above 5% in the HR CLC. We find six classes over this threshold in 1 km grid, as shown in Table 3.2. Table 3.2. Total figures (area, percentage and differences) for majority classes according to the HR CLC and the EU CLC within the 1 km populated grid Task 4.2 – Develop and assess the grid approach Class 312 523 211 313 324 512 Page 38 of 91 Description Area_N Area_E %_N %_E Diff. E-N % Diff E-N Coniferous forest 241067 251617 31,37% 32,74% 10550 4,38% Sea and ocean 182323 184746 23,72% 24,04% 2423 1,33% Non-irrigated arable land 88285 70174 11,49% 9,13% -18111 -20,51% Mixed forest 76005 106230 9,89% 13,82% 30225 39,77% Transitional woodland-shrub 63594 35485 8,28% 4,62% -28109 -44,20% Water bodies 46606 46473 6,06% 6,05% -133 -0,29% Out of the six majority classes in the 1 km grid, one is clearly overestimated by EU CLC9, whereas two of them are clearly underestimated. Out of these six classes Coniferous forest, Mixed forest and Transitional woodland and shrub will be discussed in more detail below. Scatterplots for the other classes can be found in Annex 2. Figure 3.3. Statistical distribution of the area of the majority CLC classes (classes covering more than 5 % of the area according to the national HR CLC). Coniferous forest (312) is the class with the highest share in the selected NUTS region, representing more than 31% of the area. Although EU CLC accounts for 10 550 hectares more than HR CLC, this difference represents only 4.4%, so the class is only slightly overestimated. The second largest class, in terms of share (23.7%), is Sea and ocean (523), with very similar figures in EU and HR data. The EU CLC is again slightly overestimating this class, but only by 1.3%. The third class is Non-irrigated arable land (211), accounting for 11.5% of HR CLC. This class is clearly underestimated by EU CLC. There is one fifth less area of this class in EU CLC compared to the HR CLC, that is equal to 18 111 hectares. It is likely that the different MMU between the two datasets is the main cause of the underestimation, together with the effect of the mixed class 243, which can include 211 patches in EU CLC. The fourth class, with a share of 9.9% in HR CLC, is Mixed forest (313). This class is overestimated in EU CLC by almost 40% (30 225 more ha). Again, the larger MMU and the fact that it is a mixed class can be the causes for this overrepresentation. The fifth class is Transitional woodland and shrub (324), which accounts for 8.3% in HR CLC, but only 4.6% in EU CLC. The European dataset clearly underestimates this class, probably in favour of the mixed forest class. MMU can again be a complementary reason of such difference. 9 In comparison to the HR CLC, which has a much higher level of detail and, therefore is closer to ground truth Task 4.2 – Develop and assess the grid approach Page 39 of 91 Finally, the sixth majority class is water bodies. In this case, in terms of figures there are no significant differences between HR CLC and EU CLC. In both cases, the class account for 6% of the share and there is only a tiny underestimation of 0.3% by EU CLC. Figure 3.4. The positions of the majority classes in a dispersion chart (1 km grid) The red line in Figure 3.4 represents the “expected” or “ideal” situation, where the EU CLC would contain exactly the same information as the HR CLC in the1 km grid. The black line is the fitted regression line for the majority class shares. Despite the differences highlighted amongst the majority classes for the two data sources, we can see that the fitted regression line would explain almost 95% of the variance. The regression line shows that, in general terms, EU CLC slightly overestimates classes which have the highest share amongst the majority classes, whereas it underestimates to some extent classes with lower shares (with the clear exception of mixed forest). We will now proceed to take a closer look at some of the majority classes in the 1 km grid on a cell-by-cell basis, to examine patterns of behaviour in relation to the area estimates reported by each dataset. The dots in the scatterplots represent grid cells in the 1 km grid. Task 4.2 – Develop and assess the grid approach Page 40 of 91 Figure 3.5. Scatterplot of Coniferous forest (312) in 1 km grid The Coniferous forest plot (Figure 3.5) shows a clear pattern of underestimation by EU CLC when the area of the class in HR CLC in the 1 km grid cell is approx. below 30 ha, and a clear tendency towards overestimation when areas are above 30 ha. Nevertheless, the variance is rather large and the r-squared indicates that the fitted model would explain up to 82% of the cases. It is interesting to see that in EU CLC there are cells with a 100% share of this class (100 ha), whereas data from the HR CLC never fills an entire cell completely with this class. On the other hand, there are cells where coniferous forest coverage, according to HR CLC covers up to 50 ha, but with no coniferous forest according to the EU CLC. In general, there is overestimation of class 312 Coniferous forest by EU CLC, due to the fact that the class in the generalized product contains small patches of Broad-leaved forest (311) or Transitional woodland and shrub (324), which as a consequence are underestimated. This reciprocity will also be demonstrated below. Figure 3.6. Scatterplot of Mixed forest (313) in 1 km grid Mixed forest (313) shows a serious problem of overestimation by EU CLC (Figure 3.6). This problem is directly linked to the MMU and the fact that it is itself a mixed class. EU CLC is not able to distinguish broad-leaved and coniferous forests within a patch smaller than 25 ha, while the HR CLC is able to do so. EU CLC is therefore unable to separate between truly mixed forest (coniferous and deciduous trees together) and cartographically mixed forest (patches of coniferous and deciduous forest inside the mapping unit). That is why Mixed forest is Task 4.2 – Develop and assess the grid approach Page 41 of 91 overestimated by EU CLC and Broad-leaved forest (311) or Transitional woodland and shrub (324) at the same time underestimated. Figure 3.7. Scatterplot of Transitional woodland and shrub (324) in 1 km grid For Transitional woodland and shrub (324) there is a general trend of underestimation (Figure 3.7), but also a considerable number of cases where this class is overestimated in EU CLC. (This is hard to see in the figure due to considerable overlap of points in the scatter plot). The regression line explains only 47% of the variance, indicating that the relation between EU CLC and HR CLC is weak. Again, the different MMU and the fact that it is a kind of mixed class may cause those deviations. MINORITY CLASSES ANALYSIS OF 1 KM GRID DATA Minority classes are those having a share of the land between 0.1 and 5% in the HR CLC. We find 12 classes between this range and they are listed in Table 3.3. Table 3.3. Total figures (area, percentage and differences) for minority classes according to the HR CLC and the EU CLC within the 1 km populated grid Class 311 112 122 412 142 121 243 411 511 421 131 123 Description Area_N Area_E %_N %_E Diff. E-N % Diff E-N Broad-leaved forest 17709,00 3757,00 2,30% 0,49% -13952 -78,78% Discontinuous urban fabric 11611 14218 1,51% 1,85% 2607 22,45% Road and rail networks and associated6648 land 43 0,87% 0,01% -6605 -99,35% Peat bogs 6355,00 4663,00 0,83% 0,61% -1692 -26,62% Sport and leisure facilities 5656,00 366,00 0,74% 0,05% -5290 -93,53% Industrial or commercial units 5598 2887 0,73% 0,38% -2711 -48,43% Land principally occupied by agriculture, 3555,00with significant 40049,00 areas0,46% of natural vegetation 5,21% 36494 1026,55% Inland marshes 2924,00 810,00 0,38% 0,11% -2114 -72,30% Water courses 2819,00 2173,00 0,37% 0,28% -646 -22,92% Salt marshes 2289,00 1272,00 0,30% 0,17% -1017 -44,43% Mineral extraction sites 1849 796 0,24% 0,10% -1053 -56,95% Port areas 1003 1357 0,13% 0,18% 354 35,29% In general terms, we can see that most of the minority classes are underestimated by the EU CLC by a range of -22% up to -99%. Two classes are overestimated (Port areas (123) and Discontinuous urban fabric (112)) and one class (Land principally occupied by agriculture (243)), surprisingly, showing an exceptional overestimation by EU CLC (+1026%). The causes of such differences are discussed in this chapter. Task 4.2 – Develop and assess the grid approach Page 42 of 91 Figure 3.8. Percentages of minority classes in 1 km grid In Figure 3.8 we can see that most of the minority classes are underestimated by EU CLC, except for 112 and 123. Class 243 is a clear outlier in the series. This fact can be also appreciated in the next scatterplot, where we can see most of the classes below the red line. Nevertheless, as there are a few classes that are overestimated (including the big overestimation of class 243), no linear pattern can be established for the minority classes (correlation is non-existent), as we can see in Figure 3.9. To provide readable information about the classes representing less than 1% of the area, we show a zoomed scatterplot in figure 3.10. In this chapter some minority classes are introduced, on a cell-by-cell basis in the 1 km grid, to look for patterns of behaviour in relation to the areas reported by each dataset. The interested reader will find scatterplots for the remaining CLC classes in Annex 2. Task 4.2 – Develop and assess the grid approach Page 43 of 91 Figure 3.9. The positions of the minority classes in a dispersion chart (1 km grid) Figure 3.10 The positions of the minority classes in a dispersion chart (1 km grid), classes representing less than 1 % of the cell area Figure 3.11. Scatterplot of Broad-leaved forest (311) in 1 km grid Broad-leaved forest (311) is following two separate but interfering patterns (Figure 3.11). We can see first is that many areas from HR CLC (all of them below 20 hectares) are not present in EU CLC. The patches present in both databases are generally overestimated by the European dataset. The MMU issue is, as in many other cases, a possible explanation since it can cause that patches of broad-leaved forest are included within mixed forest in EU CLC. A second pattern is the increasing overestimation of the class by EU CLC in areas where the class is well represented. The value of R2 tells us that the regression line explains well up to 85% of the variation in the response variable - here EU CLC. Task 4.2 – Develop and assess the grid approach Page 44 of 91 Figure 3.12. Scatterplot of Discontinuous urban fabric (112) in 1 km grid Discontinuous urban fabric (112) is overestimated in EU CLC, as it is clearly shown by the regression line in Figure 3.12, which explains 85% of the variance. The larger the area is, the higher the overestimation. The fact that Continuous urban fabric (111) in HR CLC is generalised into 112 in EU CLC explains basically this overestimation. The smaller MMU in HR CLC might have caused a more detailed delimitation of discontinuous urban fabric, excluding some complementary elements (infrastructures, green urban...) that in EU CLC are part of 112. Figure 3.13. Scatterplot of Land principally occupied by agriculture, with significant areas of natural vegetation (243) in 1 km grid The overestimation of the class 243 class by EU CLC is very important, representing 5.2% of the total share within the dataset, whereas in HR CLC is only 0.46% (Figure 3.13). this is an effect of the fact that 243 is a scale-dependent class and depend on the context. In the detailed HR CLC, most locations will either be agriculture or not agriculture. When a 20 meter pixel (the unit in the HR CLC) is classified as 243, it is either based on a micro-scale analysis, or an analysis of the context around the pixel. This is an example of the classification and measurement challenges described in Chapter 1. Task 4.2 – Develop and assess the grid approach 3.3 Page 45 of 91 COMPARISON OF GRID RESOLUTIONS The grid resolution has no impact on global statistics. Populating the 100m grid and the 1km from the same data source, e.g. HR CLC, will yield the same total figures for the study area. These are identical to the summary statistics calculated from the input data directly, with no bias introduced. This is because the grid simply is a partition of the dataset. No spatial generalization is introduced. The two grid resolutions were examined using the two forest classes Broadleaved forest (311) and Coniferous forest (312) as examples. Table 3.4 shows the proportion of grid cells that are either “completely filled or empty” or “partially filled” with land cover class 311 and 312 using a 100 m and a 1 km grid. When the 100 m grid is used to represent class 311, 85,2 % of the grid cells are either left empty or completely covered by this class. Only 14,8 % of the grid cells have a coverage of class 311 somewhere between the extremes 0 % and 100 %. The proportion of grid cells that are left empty or completely covered is reduced to 24,9 % when the 1 km grid is used. With this coarser grid resolution, 75,1 % of the grid cells have a coverage of class 311 somewhere between the extremes 0 % and 100 %. Table 3.4. Proportion of grid cells that are either “completely filled or empty” or “partially filled” with land cover class 311 and 312 using a 100m and a 1km grid % Completely filled or empty grid cells 311 312 % Partially filled grid cells 311 312 100 m grid 85,2 50,9 14,8 49,1 1 km grid 24,9 17,3 75,1 82,7 The trend is similar, although less extreme, for the more common class 312. When a 100 m grid is used, approximately 50 % of the grid cells have coverage somewhere between the extremes 0 % and 100 %. This is increased to 82,7 % when a 1 km grid is used. This shows the obvious fact that as the size of the grid cell becomes smaller; the grid becomes more similar to a raster with a single class per grid cell. The frequency distribution for the grid cells where the proportion of class 311 is larger than 0 % and less than 100 % for the two grid resolutions are shown in Figure 3.14. The shape is identical, with most of the grid cells having proportions close to 0 %. This is because 311 is a comparatively rare class in the study area, covering only 2,3 % of the area (ref Table 3.1). The difference between the grids is the magnitude of the occurrences. We rarely find more than 20 % broad leaved forest inside a 1 km grid cell, while up to 60 % coverage is common in 100 m grid cells when the class is present. Figure 3.14. Frequency distribution of grid cells where the proportion of class 311 is larger than 0 % and less than 100 % in the 100m grid (left) and 1 km grid (right) The difference between the grids is the spatial resolution. Both grids convey basically the same information about the presence and overall distribution of the land cover class, but the more Task 4.2 – Develop and assess the grid approach Page 46 of 91 detailed 100 m grid does convey more detailed and accurate information about the locations where the class is found. The difference is important with respect to local applications and detailed cartography, but probably not for wide-area monitoring purposes. 3.4 LAND COVER AND POPULATION To provide an example of analysis of land cover data combined with other data, we attached population data released for the 1 km grid by Statistics Finland and examined the likely relationship between the population size and the % Discontinuous urban fabric (112). The correlation between the two attributes is 0,69. The positive relationship is shown in Figure 3.15. This illustration also shows that the ten grid cells with the highest population (> 2000 people) in most cases have less than 30 % land cover 112 (scattered dots in the upper half of the graph). Class 112 is still the most common class in a third of these grid cells, but the coverage is comparatively low because much of the space is left to Green urban areas (141) and Coniferous forest (312). The remaining two thirds of these high population grid cells are dominated by Industrial and commercial units (121) and Road and railway networks (122). The simple analysis points towards two kinds of urban dwellings in this part of Finland: One (less frequent) dominated by green areas and proximity to forests; and another dominated by commercial and industrial buildings and infrastructure. There are 250 grid cells (1 km grid) in the study area with 100 or more inhabitants. The same grid cells contain 2.3 km2 of Non-irrigated agricultural land. This is a small amount of the agricultural land in the study area, but probably under pressure from development and therefore vulnerable to land take. The frequency distribution (Figure 3.16) shows that most of this land is small in extent. Finally, we calculated the male/female ratio for study area. The ratio was 0.96, implying slightly more women than men in this region. We then selected the 1km grid cells where Non-irrigated agricultural land (211) constitute 25 % or more of the land and recalculated the ratio which turned out to be 1.05. This is as expected, since women – at least in the Nordic countries – tend to migrate to the cities while the men stay on in the rural areas. This is of course far from ground-breaking science, but demonstrates how easy it is to link land monitoring data to other data sources for analytical purposes under the grid approach. Figure 3.15. Scattergram of population vs % Discontinuous urban fabric (112) in 1 km grid cells. Task 4.2 – Develop and assess the grid approach Page 47 of 91 Figure 3.16. Histogram of Non-irrigated agricultural land in 1 km grid cells with 100 or more inhabitants. Task 4.2 – Develop and assess the grid approach Page 48 of 91 4 CASE STUDY: CHANGE DETECTION Land cover changes constitute a particular challenge when a grid approach is used. The challenge is linked to the difference between net and gross changes. This difference is explained in chapter 1.6 under the sub-heading “Land cover changes”. An illustration is also found in Figure 1.13. Net change is the calculated difference in area between two specific points in time while gross change is the total area modified during this period. Gross changes can be larger than the reported net changes if changes are reciprocal. A simplistic but illustrative example is a 1 km2 grid cell composed of forest and clear cuttings. During a monitoring period, 10 ha forest is cleared and 10 ha of former clear cutting is planted and reforested. The net change is nil, but the gross change is 20 ha. This chapter is a case study of how a grid approach can be used for change detection when information about the gross changes is unavailable. The data used here is the same as the data used for the case study of statistical analysis above. It consists of the HR CLC for Finland for 2006 and 2012. The spatial resolution of the source data was 25 m raster for 2006 and 20 m raster for 2012. The source datasets and the grid population process are accounted for in chapter 3.1 above. As in chapter 3, the province of Kymenlaakso in south-eastern Finland was used as test area for the study (Figure 4.1). A more general discussion concerning net and gross changes is included at the end of the chapter. Figure 4.1. Finnish HR CLC2006 with 1 km grid of part of Kymenlaakso test area. City of Kouvola is at lower middle part of image. 4.1 CHANGE DETECTION USING GRID APPROACH The topic of this subchapter is to examine how changes are represented when no information about the gross changes is available. This is the situation when the grids have been populated using HR data, but the data available to the analyst is restricted to the summary figures for the grid cells. In this case, increase in a land cover class in one part of a grid cell can cancel out a decrease in the same land cover class in another part of the cell. To the analyst, it looks like no change has taken place. A possible, but not necessarily secure way to handle this situation is to examine several land cover classes in its context – exemplified by the reciprocal relationship between forest growth and clear cuts. Changes of forests (coded as CLC31x (including CLC classes 311, 312 and 313) and transitional woodlands (CLC class 324) were determined as follows: • • The proportion of forests (the sum of classes 311, 312 and 313) was computed for each 1 km grid cell for years 2006 and 2012. The proportion of transitional woodlands (class 324) was computed for each 1 km grid cell for years 2006 and 2012. Task 4.2 – Develop and assess the grid approach • • Page 49 of 91 The net forest change was computed per grid cell by subtracting the proportion of forest for year 2012 from the proportion of forest for year 2006. The net transitional woodland change was computed by subtracting the proportion of transitional woodland for year 2012 from the proportion of transitional woodland for year 2006. Figures 4.2 show the net change of forests (CLC classes 31x) between 2006 and 2012. Figure 4.3 is a similar illustration of the net change of transitional woodland (CLC class 324) between 2006 and 2012. When Figures 4.2 and 4.3 are compared, it can be noticed that large change (decrease) in the proportion of forests generally corresponds to large change (increase) in the proportion of transitional woodlands. On the other hand, there is also a general increase of the proportion of forests in several locations that does not correspond with the decrease of the proportion of transitional woodlands. Statistical figures based on the Finnish HR CLC2006 and 2012 from the cities Iitti and Kouvola (Table 4.1) reveal that both forests and transitional woodlands have increased and areas of artificial surfaces and agricultural areas have decreased10. This is again net figures and does not reveal the gross changes. The reference data for comparison was computed using the Finnish HR CLC changes 2006 – 2012 with 1 ha minimum mapping unit. The following information was computed as proportions for each 1 km grid cell: • • • Growth of forest (CLC31x) classes 2006 – 2012. Growth of transitional woodland (CLC324) classes 2006 – 2012 (i.e. recent forest clearcuts). Forest change as growth of forest – growth of transitional woodland. Table 4.1. Areas in km2 of selected CLC classes within the cities Kouvola and Iitti, estimated using Finnish HR CLC2006 and CLC2012. Artificial surfaces Agricultural areas Forests Transitional woodland (CLC324) (CLC11x, 12x) (CLC21x, 22x, 23x) (CLC31x) CLC 2006 193 662 1868 329 CLC2012 141 605 1950 354 -52 -57 82 25 Change -109 107 The following comparisons were made A. Forest growth: This is a study of locations where forest area increased according to the reference data. The net change in forest area (CLC31x) reported in the grid was modelled as a function of the change observed in the reference data. B. Transitional woodland growth: This is a study of locations where transitional woodland area increased according to the reference data. The net change in transitional woodland area (CLC324) reported in the grid was modelled as a function of the change observed in the reference data. 10 The explanation is linked to the change in spatial resolution in the Finnish HR CLC – from 25 m pixels in 2006 down to 20 m pixels in 20102. This explanation is, however, of no concern here since the objective of the study is to see how well the grid approach can handle changes irrespective of the cause of these changes. Task 4.2 – Develop and assess the grid approach Page 50 of 91 Figure 4.2. The change of the proportions of forests in 1 km2 grid cells represented as color-coded change classes. Figure 4.3. The change of the proportions of transitional woodlands in 1 km2 grid cells represented as color-coded change classes. Task 4.2 – Develop and assess the grid approach Page 51 of 91 The statistics of the results are found in Table 4.2 where • • • • • • n: number of samples in analysis. RMSE: Root Mean Squared Error between changes provided by grid approach and reference CC: Pearson correlation coefficient R2: Coefficient of determination Sl: Slope of linear regression equation Int: Intercept of linear regression equation Table 4.2. The statistics of the comparisons of forest and transitional woodland changes according to the grid approach and reference data. Case n RMSE CC R2 Sl Int A 2977 5.86 0.41 0.17 0.904 1.186 B 3814 4.43 0.69 0.48 0.944 -2.550 The results are considered as better if the correlation coefficient (CC) and coefficient of determination (R2) are closer to 1, RMSE closer to 0, and slope closer to 1 and intercept closer to 0. According to the statistics, the correspondence between the reference data and the grid data for transitional woodland growth (CLC324, Case B) is slightly better than the correspondence between the reference data and the grid data for forest growth (CLC31x, Case A). Figures 4.4 and 4.5 show scatterplots, linear regression lines and residual plots illustrating Case A and Case B. Figure 4.4 shows increase in forest area according to the reference data along the horizontal axis. Change in forest area according to the grid data is shown along the vertical axis. The slope of the regression line is 0,9 indicating that there are grid cells where some of the increase in area is cancelled out and net change underestimates the actual increase in forest area. Figure 4.4. The scatterplot, linear regression line and residual plot between reference CLC31x growth and forest change of grid approach (Case A). Figure 4.5 illustrates the same relationship for transitional forest. It can be noticed that although clearly present, the effect is small and mostly found in grid cells where absolute change is small. Cases A (Figure 4.4) and Case B (Figure 4.5) examine how well the forest or transitional woodland change estimated using the grid approach correspond to reference data. Task 4.2 – Develop and assess the grid approach Page 52 of 91 The residuals of estimation error are in both cases large in grid cells where the change (according to reference data) is small. The residuals decrease as the change (according to reference data) increases. Visual inspection of the residual plots indicates that forest changes will be interpreted quite reliably when the change is larger than 15 - 17%. The difference is not large, and the two figures together indicate a threshold between 15 to 20 %. This interpretation also implies that less than 15-17% of changes in the reference data will result in large deviations between net and gross changes at 1 km grid level Figure 4.5. The scatterplot, linear regression line and residual plot between reference CLC324 growth and transitional woodland change of grid approach (Case B). An attempt to examine the relationship between gross and net changes is found in Figure 4.6.Gross changes is here shown on the horizontal axis of the graphs. Gross decrease is represented in the leftmost graph, while gross increase is represented in the rightmost graph. The vertical axis in both graphs represents the resulting net change reported in the grid approach. Figure 4.6. Scatterplots based on the Finnish data used above. Dots represent 1 km grid cells. The vertical axis is the net change reported as grid cell total. The horizontal axis represents gross decrease (left) and gross increase (right) in forest area. In both cases, very high (positive or negative) gross change is quite correctly reflected in the net changes. Small (0 – 15 %) gross decrease can be “hidden” inside grid cells where the net change is smaller or positive due to a reciprocal forest increase. This is seen in the leftmost graph. The rightmost graph does (as explained above) show a good relationship between high gross forest increase and the reported net changes. Small (0 – 15 %) gross increase can (similar to small gross decrease) be “hidden” inside grid cells where the net change is small or negative. Again, this effect is due to reciprocity inside the grid cells. Task 4.2 – Develop and assess the grid approach Page 53 of 91 Figure 4.7 Relationship between net changes and gross changes in forest (CLC31x) area (%). See text for explanation. The relationship between net and gross change in forest (CLC31x) at the 1 km grid level are explored in Figure 4.7. The grid cells were grouped into 5 % classes according to the net change in forest area. These classes are represented along the horizontal axis of the graph. Examples: “-40” is the class containing all 1 km grid cells where the net reduction of forest area is in the range 40-45 % and “5” is the class containing all 1 km grid cells where the net increase in forest area is in the range 5-10 %. “0” is the class containing all 1 km grid cells where the net change in forest area is between -5 and 5 %. Each class is represented with two bars. The blue bar at the top shows the distribution of gross increase in forest area in grid cells in the class, while the green bar at the bottom shows the distribution of gross decrease in forest area in grid cells in the class11. The graph shows that grid cells with net forest change around zero usually has small gross changes. A net change around zero can be an effect of reciprocity, but the area involved is not large. Grid cells where the net change is negative (left hand part of the graph) usually has small amounts of forest increase and large amounts of forest decrease. Reciprocity does lead to an “underreporting” of forest decrease, but the general picture is correct. Grid cells where the net change is positive (right hand part of the graph) usually has small amounts of forest decrease and larger amounts of forest increase. Again, the dominant trend (forest increase in this case) is “underreported” due to reciprocity but the general picture is correct. It is self-evident that forest decrease is the dominant gross change in areas with large net decrease, and conversely that forest increase is the dominant gross change in areas with large net increase. Figure 4.7 is, however, reassuring in the respect that it shows that net change reflects dominant gross change quite well, and that the effect of reciprocity is limited. Gross changes in both directions are fairly small in grid cells where the reported net change is close to zero, and counteracting gross changes are small in grid cells where the net change is large in either direction. 4.2 GROSS AND NET CHANGES Documentation of change is clearly a challenge with respect to the grid approach. There is always the risk that larger gross changes are hidden behind the net changes reported at the grid cell level. Furthermore, the problem will increase when the size of the grid cells is increasing. The magnitude of the problem is, however, uncertain. First, it must be pointed out that change detection also is an issue with respect to CLC and similar polygon-based monitoring systems. A polygon-based system does require specification of a MMU size and land cover change incidents below the MMU size will go undetected. Many small clear cuttings can take place inside a forest, and can be later also subject to regrowth of forest – they will be all undetected if they are smaller than the MMU. This is to a certain extent handled in CLC through the CLC-Changes approach. Still, the CLC-Changes specification is 11 The bars are not perfectly aligned vertically due to an artifact in the SPSS© software. It should still be quite clear which two bars belong to which class. Task 4.2 – Develop and assess the grid approach Page 54 of 91 also polygon based and involves a MMU size of 5 ha. Changes with an area below that size will go unreported. The grid approach does not introduce a change detection issue that is unheard of in the context of CLC, but the challenge does become more visible. Second, the change issue is not equally distributed among the land cover classes. Forest, used in the example in this chapter, is probably the land cover class where the challenge is most serious. Clear cutting and forest regrowth is a succession process in an ongoing biological circle. Areas where commercial forestry leads to clear cutting are the same areas where new forest is reestablished on the very same locations and the process is quite fast. The example in the first part of this chapter demonstrate that even in forest area, change is reported quite reliably when it exceeds 15 % of a 1 km grid cell (15 ha). This figure is smaller than the MMU used in CLC, but larger than the MMU used in CLC-Changes. Changes in forest areas are spatially reciprocal, with increase and decrease taking place in the same region. This reciprocity is rare for other land cover types involved in land cover change. Decrease in agricultural area is either a result of land take or abandonment which takes place in regions different from the opening of new agricultural land. The increase in built-up land is rarely accompanied by conversion of built-up land to other land cover categories. Glaciers may grow or shrink as a result of long-term climatic trends, but the change is uniform over large areas. All things considered, the challenge with respect to change documentation under the grid approach should probably not be exaggerate Third, the change issue can be solved provided that change is registered in the databases used as input to the grid system. The requirement is that change is recorded and saved in the database and not discarded when the register is updated. Such a system can provide the grid database with measurements of gross changes as separate variables attached to the grid cell. An example is the Norwegian land resource database which provides information on the current land cover and can be used to calculate land cover coverage for grid cells. In addition, the database records and stores every incident of land cover change. These recordings can be used to calculate the magnitude of any specific land cover change (from one specific class to another) within a certain time window and attach the information to a grid system. A grid based monitoring system can thus provide information about previous and current land cover distribution and net changes, but also gross changes taking place inside the grid cells. It is probably not possible today, due to lack of harmonized data across Europe, but will be increasingly feasible in the future as national land cover and land use recording systems are developed. Task 4.2 – Develop and assess the grid approach Page 55 of 91 5 CASE STUDY: VISUALIZATION This chapter will address the visualization aspect of the grid approach. Visually, a grid appears to be similar to a raster, at least for maps covering large areas with many grid cells. We will show examples of cartography that can be used to present grid data, followed by a section on visual analysis under the grid approach. 5.1 CARTOGRAPHY AND VISUALISATION UNDER THE GRID APPROACH Both grids, 1 km and 100 m, were used for visualizations and visual analysis. With the grid approach, all sorts of ancillary data can be attached to each cell to help with the necessary analysis, visualisations and queries. In this test case we used the population data of Finland as ancillary data. With these attributes, a wide selection of phenomena can be visualised using a large toolbox consisting of several methods. The easiest way to visualise grid data is to select a relevant attribute, (e.g. amount of class 211) and make a choropleth map (with the grid cell as the cartographic unit) based on that attribute (Figure 5.1). These maps can be created for each CLC class. A generalized land cover grid can also be produced based on the majority fields (Figure 5.2). It has to be noticed that the bigger grid cell size, the more generalized (and possibly misleading) the majority map can be. More complex visualisations can be achieved when two different attributes are shown in the same map. This kind of two-class map can show either the same phenomenon from different perspectives, e.g. amount of class 112 from both HR data and EU data (Figure 5.3), or different phenomena to present a correlation or other relationship between them, e.g. amount of class 122 and the population (Figure 5.4). Figures 5.1 (left) and 5.2 (right): Choropleth map showing amount of Non-irrigated arable land in 1 km grid (left) and Majority map in 1 km grid (right). Task 4.2 – Develop and assess the grid approach Page 56 of 91 Figures 5.3. Two-class map showing Discontinuous Urban Fabric for both HR CLC and EU CLC in 1 km grid. Using these visualisations, it is easy to make comparisons. A difference between HR and EU CLC class percentages can show the over- or underestimation of EU data for certain classes. Comparing choropleth maps with the EU vector map shows the amount of detailed information stored in grid versus the generalized overview of the vector map. These comparisons are covered in more detail in the next section. Task 4.2 – Develop and assess the grid approach Page 57 of 91 Figure 5.4. Maps comparing Discontinuous Urban Fabric with Population data in 1 km grid 5.2 VISUAL ANALYSIS Visual analysis has been carried out for a selection of maps. The most interesting aspect of the grid datasets is the ability to compare the amount of a class calculated from HR CLC data and EU CLC data. This kind of visual analysis and comparison can be an important addition to the statistical analysis. As an example, we calculated the difference between EU and HR data (in ha) for class 312 and 313 (Figure 5.5). The positive values represent overestimation by EU CLC 25 ha data Task 4.2 – Develop and assess the grid approach Page 58 of 91 compared to HR CLC 20m data. Conversely, negative values represent underestimation by EU CLC 25 ha data. From this pair of maps, it is possible to see that in many places the overestimation of mixed forest (313) leads to underestimation of coniferous forest (312) and vice versa. This is an effect of the fact that generalization methods have the tendency of putting emphasis on the most common classes at the expense of the rare classes. Quite similar phenomena can be seen, only presented in a different way, with one map showing the amount of class 211 (in percentages) calculated from EU CLC 25 ha data and another showing the same class calculated from HR CLC 20m data (Figure 5.6). The generalized EU data shows high percentages for most of the areas where class 211 is present, and lower percentages on the edges only. The HR data has a lot more variety in the percentages and smaller fields that have been completely left out from the EU data with 25 ha MMU can be seen all over the area. An interesting comparison in this exercise is also the one between the1 km grid and the 100 m grid. In visual terms, the difference is quite stunning. With two maps showing the amount of class 324 in different grid sizes (Figure 5.7), it is easy to notice that most of the 1 km cells have the same, quite low level of the class 324, whereas the 100 m grid has quite large areas with no class 324 present and some smaller areas with quite high amount of class 324. So although the grid cell size does not affect the general statistics of the classes, it does affect the information about the spatial distribution. Larger cell size implies less detailed knowledge of the spatial distribution of the classes. With majority maps, the loss of information content is even more obvious (Figure 5.8). With smaller grid cells it is possible to maintain the diversity of the classes in a better way than with the bigger grid cells. In Finland where coniferous forest is common, increasing the grid cell size will make it more likely to find coniferous forest as the majority class. The presence of other classes will therefore be lost in cartography based on the “majority class method” as the grid cell size is increasing. The last comparison exemplified in this study is showing 1 km grid data with the amount of 112 and 324 HR classes respectively and EU 25 ha vector data that has the grid on the top (Figure 5.9). The fact that each grid cell contains a lot more information than a generalized vector data can easily be seen. This is especially visible in the case of class 112 which is quite rare in the EU 25 ha vector map in this region, but has frequent occurrence in the grid map. Similar phenomenon can be observed in the comparison of 100 m grid data and EU CLC 25 ha vector data with the grid on top of it (shown for class 121 in Figure 5.10). In this exercise the majority class in each grid cell was simply the class with largest area compared to other classes. In very heterogeneous areas, this can be quite misleading, since there might be many classes, each covering only a small portion of the cell area. The majority class may not be that much bigger than the others. To minimize bias in these situations, it is possible to make more complex majority calculations. Figure 5.12 shows the simple majority (the class with largest area) next to a slightly more sophisticated majority map (in this called the two-step majority). Both are calculated from the HR CLC for the 1 km grid. The two-step majority is done in two steps using the main level CLC classes to find a limited set of classes, from where the actual 3rd level CLC majority class can be calculated (Figure 5.11). From this pair of maps (Figure 5.12) it is possible to see that using the two-step majority the forest classes take over areas that have been classified as lakes or agricultural areas with the simple majority method. In some cells the two-step method seems to emphasize the artificial classes. It is also possible to adjust the majority method to form heterogeneous classes like mixed forest or weight the method to lessen the impact of the most common classes in favour of the rarer classes. It should be pointed out that the present study is carried out with only a part of Finland as the study area. For visualization of the entire country, a 100m grid is too detailed but 1km or even larger cell size like 5 km may be more appropriate. Similar considerations of course also apply to visualization of the entire EEA32 region. Task 4.2 – Develop and assess the grid approach Page 59 of 91 Figures 5.5: Difference map showing the differences between EU CLC and HR CLC for Coniferous Forest (class 312) and Mixed Forest (class 313) in 1km grid Task 4.2 – Develop and assess the grid approach Page 60 of 91 Figure 5.6: Non-irrigated Arable Land (class 211) calculated from EU CLC and HR CLC in 100m grid. Task 4.2 – Develop and assess the grid approach Page 61 of 91 Figure 5.7: Comparison of 1 km grid and 100m grid showing Transitional Woodland and Shrub (class 324) Task 4.2 – Develop and assess the grid approach Figures 5.8: Comparison of 100m grid and 1 km grid showing majority class. Page 62 of 91 Task 4.2 – Develop and assess the grid approach Page 63 of 91 Figure 5.9: 1 km grid maps showing Discontinuous Urban Fabric (class 112) and Transitional Woodland and Shrub (class 324) compared to EU CLC 25ha map. Task 4.2 – Develop and assess the grid approach Page 64 of 91 Figure 5.10: 100m grid map showing Industrial or Commercial Units (class 121) compared to EU CLC 25ha map. Task 4.2 – Develop and assess the grid approach Page 65 of 91 Figure 5.11. Two-step majority method: First the areas of main level classes are calculated and from the largest main level class a third level class with largest area is selected as majority. Figure 5.12. Simple majority map and two-step majority map. Task 4.2 – Develop and assess the grid approach Page 66 of 91 6 CONCLUSION The technical and methodological basis for grid approach to land monitoring is available. Considerable experience with this approach has been gained through projects carried out by National Statistical Agencies in cooperation with Eurostat, providing valuable knowledge and guidelines that can be transferred to land monitoring. Existing pilots and projects, including this study, demonstrate the feasibility of the grid approach. The grid approach is an applicable method for storing statistical land cover information. From a statistical point of view, the size of the grid is irrelevant. The same statistical information can be derived from grids with 100 m and 1 km cell sizes. The grid structure just transfers the statistics of the source data with no generalization applied. When the cell size increases the size of dataset decreases and thus makes data processing and usage of the data easier. The larger cell size does however affect the information on the spatial distribution of the classes. The same phenomena can also be seen in the maps and visualizations of the grid data. The grid cell size has the biggest effect on the majority maps, because the most common classes will dominate in the map especially with the larger cell sizes. With the grid approach it is also possible to perform change detection if the changed area is larger than 15-20 % of the cell area, at least this was the case for the tested forest classes. The spatial distribution of classes within each grid cell is lost. Grids are therefore not suitable for detailed studies, planning and assessment at the local level. However, the maintenance of the statistical precision of the input data probably outweighs the loss of spatial detail on the subgrid-cell level for macro-studies at the regional, national and continental level. This is possible because working with grids allows us to store information coming from datasets with higher spatial resolution than the resolution of the grid itself. Moreover, using standard, widely spread, reference grids gives the chance to combine the LC data with other types of data from other sources (e.g. population, socioeconomic indicators, biodiversity, etc.). ADVANTAGES The grid model allows land monitoring to utilize and merge existing data sources by using a fixed, shared common geometry as an interface. These sources can be national high spatial resolution maps and registers as well as high spatial resolution data retrieved from other sources, e.g. the Copernicus land services. The grid approach does limit the generalization to simplification of the geometry, without any similar generalization of the attribute data. Statistics based on the grid approach will therefore be unbiased (with respect to the input data), thus avoiding differences in statistics between the pan-European land monitoring and various other monitoring systems. Vector LC/LU maps are usually generalized based on a priori defined Minimum Mapping Units or similar parameters. It is easy to identify LC/LU change objects between two reference dates, but nearly impossible to follow changes over more than two reference dates. This is due to the generalization rules. The fixed geometry of a grid model allows creation of easily interpretable time series of LC/LU data. A European standard for grids is already established by the INSPIRE process, and it is thus easy to ensure that a grid approach in land monitoring is INSPIRE compatible. The same standard is adopted by the NSAs, implying a broad basis for harmonization between the land monitoring system and the geospatial statistical system in Europe. The methods used to “populate” the grids are available as standard tools by off-the-shelf GIS software. Much of the data processing is tabular and can also be carried out using standard database software and spreadsheets. The challenge linked to national projections and the coverage of border areas between countries (or regions) can be solved in a standardized way. This allows data capture to take place in national systems for later entry into a uniform pan-European database. Data captured by centralized systems (e.g. Copernicus land high resolution layers) can be entered directly into the central database. Statistics and analysis as well as modelling exercises are fairly easy to implement due to the regular shape and size of the grid units. Statistical bias is avoided, as mentioned above. Visualization and cartography does become more complicated than under the familiar vector approach, but this is due to an increased potential for visualization. Task 4.2 – Develop and assess the grid approach Page 67 of 91 From the visual point of view, the grid approach is very well suited for the visualization of amounts of certain attributes in each grid cell. Visualization based on the grid approach will show all the cells where that attribute is present, even the ones with just a little bit of it. The grid approach is also suitable for presenting quite detailed data in larger units without losing the actual precision of the original data. The visualization of majority class can lead to misleading and confusing outcomes with larger grid cell sizes, but the effect can be lessened with weighted majority methods. The grid approach is also useful for making all kinds of detailed and complex queries. The grid approach may improve the access to information across sectors. The work done by the NSAs demonstrate how population data, where access at the individual level is restricted due to confidentiality issues, can be released when they are compiled for a grid structure. It is quite possible that data from other sectors (transport, agriculture, hydrology, cadasters, building registers etc.) with restricted access for the land monitoring community in many countries, could be more accessible if the data is aggregated to a grid level. CHALLENGES The geometry of the grid model is unfamiliar for many users. The spatial units are artificial and do not correspond to any land cover or land use “objects” in the real world. This is in particular an issue when maps are drawn in large scale, where the regular grid cell structure is visible. There will be issues of compatibility between a grid approach and the current CLC. These compatibility issues are caused by the scale-dependent classification system used in CLC. Mixed classes will have to be decomposed into their constituents and registered as such. Mixtures can be derived at the grid cell level if sufficiently detailed information is available. This is done as an analysis of the land cover composition of the grid cell. More work is needed to analyze the composition of the CLC classes and prescribe attributes replacing the classifications in these cases. The semantic work known as the “EAGLE matrix” may contribute to this work. The backward compatibility of such an approach, with previous CLC data will also have to be examined in detail. Change data is only represented as net changes in grid approach, unless specific caution is taken to represent gross changes as specific measurements. This is possible, but depends entirely on the available high spatial resolution data used to populate the grid. Grids are populated, in many cases using national high spatial resolution data. The actual content, definition and quality of these data may vary from one country to another. Harmonization and standardization is required to ensure compatibility across the countries participating in the land monitoring effort. OPEN ISSUES The two most important issues remaining open are grid resolution / size and choice of attributes. In theory grids can be established with any resolution. The choice partly depends on the required spatial resolution, partly on the available data, and partly on the feasibility of storing / processing data. As an example, for pan-European applications the currently available technology sets the limits: • The pan-European Imperviousness database is available as a 20 m resolution raster, containing 325 000 (columns) x 230 000 (rows) cells12. Stored in a compressed format, the size of the dataset is around 1.5 Gbyte (with pyramid layers). Current technology allows a quick visualization of the data in any scale with desktop tools, but the analysis and combination of layers at this size require more powerful hardware and software tools • The pan-European 100 m resolution databases (e.g. CLC / aggregated Imperviousness) contain 65 000 x 46 000 raster cell. Stored in a compressed format, the size of the dataset is around 100 Mbyte (with pyramid layers). Current desktop technology allows analyses and combination of layers. Typical processing time is some minutes. 100 m resolution is comparable with the required delineation accuracy and Minimum Mapping Width of linear elements in European CLC. • Pan-European statistical grid approaches like Land and Ecosystem Accounting (LEAC) is based on a 1 km grid. The 1 km resolution is a technical compromise as well, unique IDs may be created and processed easily with common desktop tools for pan-European 1 km datasets, this contains about 7 716 000 cells (rows in the attribute table) excluding sea and 12 Including Azores, Canarias and Turkey as well. Task 4.2 – Develop and assess the grid approach Page 68 of 91 ocean areas. A pan-European 100 m resolution grid would require much larger attribute table and longer unique IDs, which is not anymore supported by desktop tools. Implementation of a grid approach is probably going to be a compromise involving a balance between several concerns about grid resolution and attribute availability. High grid resolution (small grid cells) is attractive due to the level of detail, while a coarser grid resolution is beneficial because: • • • It may be sufficient for the needs of pan-European monitoring Data with restricted access may become available (as detailed location is not traceable) Smaller amounts of data are easier to manage Task 4.2 – Develop and assess the grid approach Page 69 of 91 7 REFERENCES van Asselen, S. and Verburg, P. H. 2012. A Land System representation for global assessments and land-use modeling. Global Change Biology, 18: 3125–3148. doi: 10.1111/j.1365-2486.2012.02759.x Batjes, N.H.1996. Global assessment of land vulnerability to water erosion on a 1/2 grid, Land Degradation & Development, 7: 353-365 by 1/2 Ben-Asher, Z. (ed.) (2013). HELM – Harmonised European Land Monitoring: Findings and recommendations of the HELM Project. Tel-Aviv, Israel: The HELM Project. Bonan, G.B., Pollard, D. and Thompson, S.L. 1993. Influence of Subgrid-Scale Heterogeneity in Leaf Area Index, Stomatal Resistance, and Soil Moisture on Grid-Scale Land–Atmosphere Interactions, Journal of Climate, 6: 1882–1897 Chiatante, G., Brambilla, M., and Bogliani, G. 2014. Spatially explicit conservation issues for threatened bird species in Mediterranean farmland landscapes, Journal for Nature Conservation, 22: 103–112 Clement, J., van Randen, Y. and Storm, M., in prep. Handleiding en metadata VIRIS2.0 VIsual Ruimtelijk Information SystemTOP10 Monitoring Dataset 1991-2011. Werkdocument XX, Wettelijke Onderzoekstaken Natuur & Milieu. Wageningen, Juli 20?? Couclelis, H. 1992. People manipulate objects (but cultivate fields): Beyond the raster-vector debate in GIS, in Frank, A.U., Campari, I. and Formentini, U. (eds) Theories and Methods of Spatio-Temporal Reasoning in Geographic Space, Lecture Notes in Computer Science 639: 65 – 77, Springer Berlin Heidelberg Congalton, R.G. 1997. Exploring and evaluating the consequences of vector-to-raster and raster-to-vector conversion. Photogrammetric Engineering and Remote Sensing, 63: 425-434. Di Gregorio, A., and Jansen, L.J.M. (2000) Land Cover Classification System (LCCS): Classification Concepts and User Manual.. Environment and Natural Resources Service, GCP/RAF/287/ITA Africover - East Africa Project and Soil Resources, Management and Conservation Service. FAO, Rome EEA 2010. The European Environment. State and outlook 2010. Land Use. European Environment Agency, Copenhagen European Commission (2007) ESPON 2013 Programme: European observation network on territorial development and cohesion, European Commission Decision C(2007) 5313 of 7 November 2007 Fisher, P. 1997. The pixel: a snare and a delusion. International Journal of Remote Sensing, 18: 679-685. doi:10.1080/014311697219015 Gibbard, S., K. Caldeira, G. Bala, T. J. Phillips, and Wickett, M. 2005. Climate effects of global land cover change, Geophysical Research Letters, 32: L23705, doi:10.1029/2005GL024550. Gomarasca, M.A. (2009) Basics of Geomatics. Springer Netherlands Letourneau, A., Verburg, P.H. and Stehfest, E. 2012. A land-use systems approach to represent land-use dynamics at continental and global scales Environmental Modelling & Software,33: 61–79, doi:10.1016/j.envsoft.2012.01.007 Maucha, G. 2013. Recommendation for an indicator of imperviousness and imperviousness change, including documentation of technical specifications of suggested indicator(s) and its application context. ETC-LUSI report 2013. Mehr, M., Brandl, R., Hothorn, T., Dziock, F., Förster, B. and Müller, J. 2011. Land use is more important than climate for species richness and composition of bat assemblages on a regional scale, Mammalian Biology Zeitschrift für Säugetierkunde, 76: 451–460, http://dx.doi.org/10.1016/j.mambio.2010.09.004 Milego, R. and Ramos, M.J. (2011) Disaggregation of socioeconomic data into a regular grid and combination with other types of data, ESPON Technical report http://www.espon.eu/export/sites/default/Documents/ToolsandMaps/ESPON2013Database/2.2_ TR_grids.pdf accessed 20.11.2012 Task 4.2 – Develop and assess the grid approach Page 70 of 91 Mücher CA., Klijn JA., Wascher DM., Schaminée JHJ. (2010) A new European Landscape Classification (LANMAP): A transparent, flexible and user-oriented methodology to distinguish landscapes, Ecological Indicators 10: 87–103 Pierce, L.L. and Running, S.W. 1995. The effects of aggregating sub-grid land surface variation on large-scale estimates of net primary production, Landscape Ecology, 10: 239-253 Schlünzen, K. H. and Katzfey, J. J. 2003. Relevance of sub-grid-scale land-use effects for mesoscale models, Tellus A, 55: 232–246 Smith, M. L., Zhou, W., Cadenasso, M., Grove, M. and Band, L. E. 2010. Evaluation of the National Land Cover Database for Hydrologic Applications in Urban and Suburban Baltimore, Maryland. JAWRA Journal of the American Water Resources Association, 46: 429–442. doi: 10.1111/j.1752-1688.2009.00412.x Strand, G-H. 1997. Cartographic modelling with relational algebra, Norsk Geografisk Tidsskrift (Norwegian Journal of Geography) 51: 61 - 69 Strand, GH. 2011. Uncertainty in classification and delineation of landscapes: A probabilistic approach to landscape modelling, Environmental Modelling & Software 26: 1150 - 1157 Strand, G.H. and Bloch, V.V.H. (2009) Statistical grids for Norway. Documentation of national grids for analysis and visualisation of spatial data in Norway. Document 2009/9. Statistics Norway, Oslo Thuiller, W., Araújo,M.B. and Lavorel, S. 2004. Do we need land-cover data to model species distributions in Europe?, Journal of Biogeography, 31: 353–361, DOI: 10.1046/j.03050270.2003.00991.x TWG-RS. 2009. D2.8.I.2 INSPIRE Data Specification on Geographical grid systems Guidelines v3.0, http://inspire.ec.europa.eu/documents/Data_Specifications/INSPIRE_Specification_GGS_v3.0.p df Vardon, M., Eigenraam, M., McDonald, J, Mount, R. and Cadogan-Cowper, A. (2011) Towards an integrated structure for SEEA ecosystem stock and flow accounts. Issue paper for the meeting on SEEA experimental ecosystems accounts, London, 5 – 7 December 2011 Wascher, D.M. (Ed.) (2005) European Landscape Character Areas – Typology, Cartography and Indicators for the Assessment of Sustainable Landscapes. Final ELCAI Project Report, Landscape Europe, pp. 5-31 Wen, C.G. and Tateishi, R. 2001. 30-second degree grid land cover classification of Asia, International Journal of Remote Sensing, 22: 3845-3854 Task 4.2 – Develop and assess the grid approach Page 71 of 91 ANNEX 1: DESCRIPTION OF GEOSTAT 1B APPENDICES CONTENT This annex contains a list of the GISDATA 1A and GISDATA 1B documents that provides the basis for the description of the projects in this report, and a short description of fifteen of the appendices to the GEOSTAT 1B report. THE GISDATA 1A AND GISDATA 1B DOCUMENTS USED WERE: • • GEOSTAT 1A – Representing Census data in a European population grid. Final Report. GEOSTAT 1B – Final report GEOSTAT 1B – Appendix 1 – Production procedures for a harmonised European population grid GEOSTAT 1B – Appendix 2 – EFGS Standard for official grid statistics, population variables v.1.0. GEOSTAT 1B – Appendix 3 – Testing and quality assessment of pan-European population grids GEOSTAT 1B – Appendix 4 – Production process for a harmonised European population grid in Finland GEOSTAT 1B – Appendix 5 – Contribute to the GEOSTAT population grid 2011 GEOSTAT 1B – Appendix 6 – Poster production process hybrid GEOSTAT 1B – Appendix 8 – Prototype of Bulgarian population grid 2011. Hybrid solution in producing grid statistics. GEOSTAT 1B – Appendix 9 – Bulgarian Population grid 2011. Technical report. GEOSTAT 1B – Appendix 10 – Using the European Grid “ETRS89/LAEA_PT_1K” as the foundation for the new Portuguese Sampling Structure. GEOSTAT 1B – Appendix 11 – Access to emergency hospitals – An operative case study for testing gridded population data. GEOSTAT 1B – Appendix 12 – EFGS 2012 web survey GEOSTAT 1B – Appendix 13 – EFGS Open data license template GEOSTAT 1B – Appendix 14 – Vegetation change detection for statistical institutes. A case study for combining building register data with satellite imagery GEOSTAT 1B – Appendix 15 – ESSnet GEOSTAT 1B Flyer GEOSTAT 1B – Appendix 16 – Disaggregation methods for georeferencing inhabitants with unknown place of residence: the case study of population census 2011 in the Czech Republic. GEOSTAT 1B – Appendix 17 – EFGS modified quality assessment parameters. GEOSTAT 1B – APPENDIX 1 – PRODUCTION PROCEDURES FOR A HARMONISED EUROPEAN POPULATION GRID This appendix contains guidelines for helping National Statistical Institutes to produce harmonised population grid data compatible with INSPIRE specifications. Many elements in these guidelines can also be relevant for the production of other detailed geo-referenced data. These procedures focus on the bottom-up approach or aggregation method. They describe how to convert national data to European grid data in different ways, depending on the source data: • • Conversion and aggregation of point-based data (micro-data) into European Grid data Conversion of existing national grid data into European Grid data (using both centroids and intersections) Each of the procedures are very well documented and described step by step. The GIS software that was used is ArcGIS and gvSIG. GEOSTAT 1B – APPENDIX 2 – EFGS STANDARD FOR OFFICIAL GRID STATISTICS, POPULATION VARIABLES V.1.0. This appendix contains the standard names for population variables to be used for official grid statistics (described previously under the “Design” section). GEOSTAT 1B – APPENDIX 3 – TESTING AND QUALITY ASSESSMENT OF PAN-EUROPEAN POPULATION GRIDS This appendix describes a quality assessment carried out on the European population grid made by disaggregation by the AIT, which used the following data as source information: Task 4.2 – Develop and assess the grid approach • • • • • Page 72 of 91 EEA Fast Track Service Precursor on Land Monitoring - Degree of soil sealing 20061 Population per LAU2 (LAU1) 2006 LAU2 borderlines Corine Land Cover 2006 Open Street Map data The quality check is performed for each of these source datasets in comparison to Norwegian data. Additionally, a quality check is undertaken for the disaggregated European grid in comparison to the aggregated national one, on a km2 basis. The main conclusion extracted from the appendix is that the disaggregation method can be improved with the use of national data, e.g. to create a correct infrastructure mask or to distinguish residential areas from rocky/stony areas. GEOSTAT 1B – APPENDIX 4 – PRODUCTION PROCESS FOR A HARMONISED EUROPEAN FINLAND POPULATION GRID IN A poster with the production process used in Finland is shown. It corresponds to a bottom-up methodology. GEOSTAT 1B – APPENDIX 5 – CONTRIBUTE TO THE GEOSTAT POPULATION GRID 2011 This appendix shows a poster made to promote the NSIs contribution to the creation of the GEOSTAT population grid. GEOSTAT 1B – APPENDIX 6 – POSTER PRODUCTION PROCESS HYBRID This is a poster with the schema that summarises the hybrid approach. GEOSTAT 1B – APPENDIX 8 – PROTOTYPE OF BULGARIAN POPULATION GRID 2011. HYBRID SOLUTION IN PRODUCING GRID STATISTICS. This appendix encloses a presentation by the NSI of Bulgaria explaining a prototype population grid 2011 for Bulgaria, based on an hybrid solution. GEOSTAT 1B – APPENDIX 9 – BULGARIAN POPULATION GRID 2011. TECHNICAL REPORT. This report details the process of creation of the Bulgarian population grid, based on a hybrid approach, i.e. the combination of aggregation and disaggregation techniques. They point out these reasons in order to use the dual solution: • • • • Census 2011 is not geocoded; The lack of national register of addresses and buildings with coordinates; Aggregation cannot be applied to the whole territory of the country; To improve the accuracy of population density distribution by getting a higher quality than just disaggregation. The different steps, including quality assessment, are detailed in the appendix. GEOSTAT 1B – APPENDIX 10 – USING THE EUROPEAN GRID “ETRS89/LAEA_PT_1K” AS THE FOUNDATION FOR THE NEW PORTUGUESE SAMPLING STRUCTURE. It is described how the existance of the European population grid has influenced in the creation of a new Portuguese Sampling Infrastructure. The use of the European dataset as one of the geographical components of the sampling infrastructure was a major development, since the former sampling process was dependent on the administrative division. Task 4.2 – Develop and assess the grid approach Page 73 of 91 GEOSTAT 1B – APPENDIX 11 – ACCESS TO EMERGENCY HOSPITALS – AN OPERATIVE CASE STUDY FOR TESTING GRIDDED POPULATION DATA. In order to show the possibilities and test the usage of gridded population data, a case study was agreed and carried out by the GEOSTAT 1B project partners. The case study consisted on calculating the population within 30 minutes from emergency hospitals, and it is summarised in this appendix. Figure A1.1: Example from appendix 11 of the GEOSTAT 1B report:.Case study from Estonia, The results of the different case studies demonstrates clearly the advantages of grid based statistics compared to administrative boundaries for carrying out spatial analyses. Nevertheless, introducing of confidentiality thresholds and consequently suppression of grid cells below these thresholds may result in that the advantages of grid statistics are lost. Grid statistics in this case study proved to be very useful when identifying the population’s driving distance to emergency hospitals. The case study was carried out for the Czech Republic, Estonia, Finland, Norway and Bulgaria. One sample map for Estonia is, as an example, shown in Figure A1.1.. The methodology and results are very well detailed in this appendix of the GEOSTAT 1B report. GEOSTAT 1B – APPENDIX 12 – EFGS 2012 WEB SURVEY This appendix includes the results of a web survey carried out amongst NSIs to find out the availability of data to feed the grid, including aspects of pricing, confidentiality, etc. GEOSTAT 1B – APPENDIX 13 – EFGS OPEN DATA LICENSE TEMPLATE In this appendix, a license template for NSIs is provided, so that they can provide user rights for their data under some specific terms specified in the license document. GEOSTAT 1B – APPENDIX 14 – VEGETATION CHANGE DETECTION FOR STATISTICAL INSTITUTES. A CASE STUDY FOR COMBINING BUILDING REGISTER DATA WITH SATELLITE IMAGERY A case study was developed by Statistics Norway within the “hybrid” approach. The objective of the case study was to combine building register data, land use data and satellite imagery in order to follow changes in vegetation caused by construction and land use change. The results were promising and even with low resolution it was possible to follow changes in vegetation in various land use categories. The methodology proved to be operable for combining satellite Task 4.2 – Develop and assess the grid approach Page 74 of 91 imagery with register data. Transforming the data into grids made also the data more manageable for statistical institutes who are more used to work with statistics software than remote sensing software. GEOSTAT 1B – APPENDIX 15 – ESSNET GEOSTAT 1B FLYER This appendix contains a flier that was produced to explain the project GEOSTAT 1B and promote it, together with the EFGS. GEOSTAT 1B – APPENDIX 16 – DISAGGREGATION METHODS FOR GEOREFERENCING INHABITANTS WITH UNKNOWN PLACE OF RESIDENCE: THE CASE STUDY OF POPULATION CENSUS 2011 IN THE CZECH REPUBLIC. This short report under appendix 16 explains the methodology undertaken in the Czech Republic in order to disaggregate a small piece of population (about 0.9%) which cannot be directly linked to the exact place of residence. Actually, the paper discusses different approaches and their advantages and disadvantages. In some cases, population is fictitiously allocated to the building points, whereas in other cases it is directly linked to the grid cells. GEOSTAT 1B – APPENDIX 17 – EFGS MODIFIED QUALITY ASSESSMENT PARAMETERS. This is a collection of quality assessment sheets, with parameters and their values, referring both to the production of primary data and the production of grid data. Task 4.2 – Develop and assess the grid approach Page 75 of 91 ANNEX 2: COMPARISON OF PAN-EUROPEAN CLC AND NATIONAL HR CLC USING 1 KM GRID This annex contains further documentation of results from the grid approach analysis for Kymenlaakso province in south-eastern Finland The analysis is presented in chapter 3 of this report, where the methodology also is explained. The analysis was carried out for a large number of CLC classes using a 1 km grid. Part of the output was used as examples in chapter 3, but the rest is presented here. These plots can be read as a study of the statistical bias in the current, vector based panEuropean CLC. The Finnish national HR CLC is a detailed dataset with 20 meter pixels, giving a fairly accurate account of the statistical composition of land cover classes. The grid is here used as a spatial unit where the two data sources can be compared. The grid cells are seen as “dots” in the scatterplots. These dots will be placed along the diagonal when the pan-European CLC and national HR CLC are equal. The dot will be located above the diagonal when a class is over-represented in the pan-European CLC. Conversely, the dots will be located below the diagonal when a class is under-represented in the pan-European CLC. Class 211 Non-irrigated arable land is underestimated by EU CLC. The regression line, explanatory for 85% of cases, shows that the level of underestimation is independent from the surface area of the HR CLC class within the cell. Nevertheless, the effect of MMU can be seen for smaller areas, where sometimes the class disappears in EU CLC. In a few other cases, larger areas are overestimated by EU CLC, filling the cell completely with that class only, whereas the original figures are between 80 and 95 ha. Task 4.2 – Develop and assess the grid approach Page 76 of 91 Just like for the Sea and ocean, there is almost a perfect correlation between both datasets in Water bodies also (Class 512). The regression line explains 98.4% of the variance and shows a very weak overestimation from EU CLC when the area of the class within the 1 km grid cell is high. Class 523 Sea and ocean in the 1 km2 grid shows a very close relationship of cell values between HR CLC and EU CLC. The regression line exhibits an almost perfect correlation with an r-squared very close to 100%. The MMU effect is not noticed here, due to the combination of big share and homogeneity of this class (most of the cells are 100% covered by “sea and ocean”). Task 4.2 – Develop and assess the grid approach Page 77 of 91 MINORITY CLASS SCATTERPLOTS, 1 KM GRID A CLC class covering less than 5 % or more of the study area, according to the more accurate national HR CLC, is considered as a “majority class”. For class 121 (Industrial or commercial units), we can see a double trend: on the one hand, some areas smaller than 20 ha disappear in EU CLC; on the other hand, most of the areas represented in EU CLC are rather overestimated. The regression line shows this trend, but it does not explain 30% of the variance. That is why, in the end, the absolute figures show an underestimation of such phenomena in the European land cover dataset. Roads and railways are underestimated in EU CLC. As we can see, most areas in HR CLC simply disappear in EU CLC due to the effect of generalisation and the minimum width of the European CLC (100m). Task 4.2 – Develop and assess the grid approach Page 78 of 91 Class 123 “Port areas” is the smallest class included in the minority classes, but it shows a different trend compared to other minority ones. It is overestimated in EU CLC, as it is shown by the regression line which explains up to 96% of the variance. The bigger the area of the port areas in HR CLC, the higher is the overestimation by EU CLC. Class 131 (Mineral extraction sites) show a similar trend that we have seen for other minority classes. Smaller sites disappear in EU CLC. Nevertheless, some of the sites represented in EU CLC are a little bit overestimated. Correlation is weak due to the fact that some areas drop down to zero in EU CLC. Task 4.2 – Develop and assess the grid approach Page 79 of 91 Most of the sport and leisure facilities reported by the HR CLC are lost in EU CLC, due to the different MMU. Linear correlation between figures for this class is very weak. When the share of sport areas is bigger than 15 ha, there is a slight trend of overestimation in EU CLC. In the case of inland marshes, due to the MMU again, this class is underestimated by EU CLC in comparison to HR CLC. Figures are fairly correct when 411 is present in the EU CLC, but many areas where 411 constitute below 20 % of the area (20 ha per km2) is lost due to generalization. Task 4.2 – Develop and assess the grid approach Page 80 of 91 Peat bogs are rather well represented in EU CLC in comparison to HR CLC. Correlation is very good and the regression line explains 94% of the variance. Nevertheless, there are some smaller units (less than 20 ha) which disappear in EU CLC, due to the MMU effect, causing a bit of underestimation of this class in the European dataset. The case of salt marshes is very similar to the water courses, they are a bit underestimated. Due to the MMU effect, some of the areas disappear in EU CLC. Task 4.2 – Develop and assess the grid approach Page 81 of 91 The regression line shows that water courses are slightly underestimated in EU CLC (r-squared of 87.6%). For grid cells with less than 15 ha of water courses, a lot of times that share vanishes in EU CLC. Due to the different MMU, some smaller water courses are lost in the European dataset. Task 4.2 – Develop and assess the grid approach Page 82 of 91 ANNEX 3: COMPARISON OF PAN-EUROPEAN CLC AND NATIONAL HR CLC USING 100M GRID This annex contains the results from an analysis for Kymenlaakso province in south-eastern Finland where the pan-European CLC was compared to the Finnish national HR CLC using a 100m grid as the basis for the comparison. The analysis is analogous to the analysis found in Chapter 3 and in Annex 2 and the results is identical with respect to the main topic. The results are included partly as documentation for the interested reader, partly because they demonstrate similarities and differences between the two grid resolutions in terms of statistical and visual results. Carrying out a general analysis of the results (areas, percentages and absolute and relative differences, between the HR CLC and the EU CLC) we will find that the global figures are exactly the same as the figures obtained when using the 1 km grid in Chapter 3. Numerically, and with respect to the aggregated figures, there is no difference at all using a 100m grid or a 1 km grid. This is due to the fact that, despite the size of the grid cell, all the information from the original data sources are accounted for and kept linked to each individual grid cell. The only difference is the number of cells. In the case of the 100m grid, we have 769 118 cells with surface areas and % for each LC class for both the HR CLC and EU CLC source data, whereas in the case of the 1 km grid we have 7 763 cells. As the figures are the same as the ones in the analysis carried out using the 1 km grid, the majority classes are also the same. We only reproduce A3.1 for reference, but we are not displaying again the same analysis and charts to compare majority classes. Table A3.1. Total figures (area, percentage and differences) for majority classes according to the HR CLC and the EU CLC within the 1 km populated grid Class 312 523 211 313 324 512 Description Area_N Area_E %_N %_E Diff. E-N % Diff E-N Coniferous forest 241067 251617 31,37% 32,74% 10550 4,38% Sea and ocean 182323 184746 23,72% 24,04% 2423 1,33% Non-irrigated arable land 88285 70174 11,49% 9,13% -18111 -20,51% Mixed forest 76005 106230 9,89% 13,82% 30225 39,77% Transitional woodland-shrub 63594 35485 8,28% 4,62% -28109 -44,20% Water bodies 46606 46473 6,06% 6,05% -133 -0,29% Some general statements about the analysis of the scatterplots: We should take into account that we are now analysing cells of 100m x 100m (1 ha). EU CLC has a MMU of 25ha (i.e. polygons smaller than 25 ha are not mapped in EU CLC), which means that populating a grid with 1ha cells with EU CLC data results in many grids with the same CLC class (25 ha = 25 grid cells of 1ha). The HR CLC raster that has been used to populate the grid has a resolution of 20m x 20m. This means that we can have up to 25 raster cells within each 1ha grid cell and the minimum area for a CLC class is 400 m2. The result of this configuration is that you have strips of data at intervals of 400 m2 for the HR CLC axis in the scatterplot. The large number of observations (grid cells) for each CLC together with the 400 m2 interval results in 25 strips of data at intervals of 400 m2 for the HR CLC axis in the graphs without seeing the individual observations per grid back in the graph The “border effect” of EU CLC polygons against HR CLC raster pixels is also noticed, meaning that when grid cells overlays a generalised EU CLC polygon border it can include high shares of a class which is not reported by the HR CLC raster dataset at 20m x 20m. Finally, we should mention that scatterplots have been produced by means of the R software, as MS Excel was not able to respond to the huge number of figures. The red line in each chart is the fitted regression line. The value for r-squared is shown next to each chart. Units in both axis are square metres (m2). A CLC class covering 5 % or more of the study area, according to the more accurate national HR CLC, is considered as a “majority class”. Task 4.2 – Develop and assess the grid approach Page 83 of 91 r2= 0.678 Figure A3.1. Scatterplot of Discontinuous urban fabric (112) in 100 m grid Discontinuous urban fabric (112) is overestimated in EU CLC (Figure 3.18), although there is not a very strong linear correlation. The fact that class Continuous urban fabric (111) in HR CLC is generalised to 112 in EU CLC explains basically this overestimation. The smaller MMU in HR CLC might have caused a more detailed delimitation of discontinuous urban fabric, excluding some complementary elements (infrastructures, green urban...) that in EU CLC are part of 112. A CLC class covering less than 5 % or more of the study area, according to the more accurate national HR CLC, is considered as a “majority class”. r2= 0.538 Figure A3.2. Scatterplot of Industrial or commercial units (121) in 100 m grid Although it might seem that there is a lot of overestimation by EU CLC, the fact is that lots of industrial areas are lost and the global trend is of underestimation of this class by the European layer. That is why the r-squared for the fitted regression line is rather low. Task 4.2 – Develop and assess the grid approach Page 84 of 91 r2= 0.118 Figure A3.3. Scatterplot of Roads and rail network (122) in 100 m grid Roads and railways are underestimated in EU CLC. As we can see, most areas in HR CLC simply disappear in EU CLC due to the effect of generalisation and the minimum width of the European CLC (100m). r2=0.910 Figure A3.4. Scatterplot of Port areas (123) in 100 m grid For port areas, we see the same trend as for the previous grid analysis, which shows a large overestimation by EU CLC. The border effect of CLC polygons is also noticed in some cells where HR CLC has no ports at all and EU CLC accounts for some area of the class. Task 4.2 – Develop and assess the grid approach Page 85 of 91 r2=0.567 Figure A3.5. Scatterplot of Mineral extraction sites (131) in 100 m grid Class 131 (Mineral extraction sites) show a similar trend that we have seen for other minority classes. Smaller sites disappear in EU CLC. Nevertheless, some of the sites represented in EU CLC are to some extent overestimated. r2=0.304 Figure A3.6. Scatterplot of Sports and leisure facilities (142) in 100 m grid This class shows similar trends to the previous analysis. Most of the sport and leisure facilities reported by the HR CLC are not present in the 1ha grids populated with EU CLC, due to the different MMU. Task 4.2 – Develop and assess the grid approach Page 86 of 91 r2= 0.803 Figure A3.7. Scatterplot of Non-irrigated arable land (211) in 100 m grid Class 211 is underestimated by EU CLC. The regression line, explanatory for 80% of cases, shows that the underestimation is higher for higher shares of the HR CLC. There are lots of cells where HR CLC shares of this class are lost in EU CLC, although some exceptions are found in the other direction. r2= 0.180 Figure A3.8. Scatterplot of Land principally occupied by agriculture (243) in 100 m grid Taking a look at the global figures in table 1, the class 243 is strongly overestimated by the EU CLC, accounting for 5.2% of the total share, whereas in HR CLC accounts for only 0.46%. Class 243 (1 hectare cells) shows an increase in overestimation of the EU CLC with a decrease in surface area of the grids populated with HR CLC (Figure 3.19). The fact that this is a mixed class, together with the different MMU for both source layers, causes that EU CLC groups some pure classes (e.g. 211 and 311) within this mixed one. Task 4.2 – Develop and assess the grid approach Page 87 of 91 r2=0.244 Figure A3.9. Scatterplot of Broad-leaved forest (311) in 100m grid Like in the analysis by the 1 km grid, Broad-leaved forest (311) is not following a linear pattern on a cell by cell analysis (Figure 3.17). Lots of patches are lost in EU CLC, but the patches that are present in both datasets, are generally overestimated by the European dataset. The MMU can cause patches of broad-leaved forest to be included within mixed forest in EU CLC. r2= 0.754 Figure A3.10. Scatterplot of Coniferous forest (312) in 100 m grid Coniferous forest (312) is the most common class and is present in a large number of grid cells. Due to the strips in the chart (Figure 3.14), it is a bit difficult to make a visual analysis, but the regression line suggests that the EU CLC slightly overestimates this class for low values, whereas it to some extent underestimates the highest values. This linear regression explains up to 75% of the variance, which is a bit lower than in the case of the 1 km grid. There are lots of cells with some share of CLC class 313 in EU CLC and no share in HR CLC, resulting in a slight overestimation of this class by the EU CLC. Task 4.2 – Develop and assess the grid approach Page 88 of 91 r2= 0.512 Figure A3.11. Scatterplot of Mixed forest (313) in 100 m grid The mixed forest class (313) is overestimated by the EU CLC in most of the cells, although there is not a clear linear regression that explains most of the variance (Figure 3.15). In general terms, the overestimation is seen for lower values in HR CLC. The effect of a mixed class together with the MMU of the European layer is an explanation for that. r2= 0.487 Figure A3.12. Scatterplot of Transitional woodland-shrub (324) in 100 m grid In the class 324, there is a general trend of underestimation, but as it can be seen in the Figure 3.16 and also looking at the low value of the r-squared, there is high variation in the different cases, and sometimes EU CLC overestimates this class, whereas in other cells all HR CLC share is lost. Again, the different MMU and the fact that it is a kind of mixed class may cause those deviations. Task 4.2 – Develop and assess the grid approach Page 89 of 91 r2= 0.556 Figure A3.13. Scatterplot of Inland marsh (411) in 100 m grid In the case of inland marshes, due to the MMU again, many share of this class is lost by EU CLC in comparison to HR CLC. r2=0.869 Figure A3.14. Scatterplot of Peat bogs (412) in 100 m grid In general terms, peat bogs are well represented by EU CLC, although in some cases small patches are lost, causing a bit of underestimation by the European layer. Task 4.2 – Develop and assess the grid approach Page 90 of 91 r2= 0.770 Figure A3.15. Scatterplot of Salt marshes (421) in 100 m grid The case of salt marshes is very similar to the water courses. Due to the MMU effect, some of the areas disappear in EU CLC or are a bit underestimated. r2= 0.873 Figure A3.16. Scatterplot of Water courses (511) in 100 m grid Water courses are slightly underestimated in EU CLC, basically because some small water courses are lost, due to the different MMU and CLC width thresholds (100m). Task 4.2 – Develop and assess the grid approach Page 91 of 91 r2=0.953 Figure A3.17. Scatterplot of Water bodies (512) in 100 m grid As for the “Water bodies”, there is a very good correlation between figures from both datasets. The regression line explains 95.3% of the variance. In general there is very little underestimation by EU CLC for higher shares in HR CLC, whereas there is very slight overestimation for lower shares. r2= 0.993 Figure A3.18. Scatterplot of Sea and ocean (523) in 100 m grid In a similar way to the other grid analysis, “Sea and ocean” is a class that shows a big correlation of cell values between HR CLC and EU CLC. The regression line shows the almost perfect correlation of figures with an r-squared very close to 100%. This is basically caused by the big share and homogeneity of this class (most of the cells are 100% covered by “sea and ocean”). Nevertheless the effect of the polygon borders of the EU CLC is noticed because some cells have some share of this class according to the European layer and a value of 0 according to the HR CLC.