EnMAP-Box Application Tutorial: Regression - Hu
Transcription
EnMAP-Box Application Tutorial: Regression - Hu
EnMAP-Box Application Tutorial: Regression Techniques Date 16.03.2015 Authors: Matthias Held, Sebastian van der Linden, Benjamin Jakimow, Andreas Rabe and Patrick Hostert Abstract: Application and performance evaluation of different regression techniques Copyright © Humboldt-Universität zu Berlin, Geomatics Lab, 2015, www.hu-geomatics.de Citation Please cite this tutorial as: Held, M., van der Linden, S., Jakimow, B., Rabe, A., Hostert, P. (2015). EnMAP-Box Application Tutorial: Regression Techniques, Humboldt-Universität zu Berlin, Germany. Disclaimer The authors of this tutorial accept no responsibility for errors or omissions in this work and shall not be liable for any damage caused by these errors or omissions. 2 Contents 1 Introduction ......................................................................................... 4 2 Data Preparation .................................................................................. 4 3 ImageRF .............................................................................................. 7 4 ImageSVM ......................................................................................... 10 5 autoPLSR ........................................................................................... 12 3 1 Introduction The goal of this tutorial is to make you familiar with some important regression approaches which are implemented in the EnMAP-Box. These are Random Forests (imageRF), Support Vector Machines (imageSVM) and Partial Least Squares Regression (autoPLSR). 2 Data Preparation Select File > Open > EnMAP-Box Test Images. In order to get an idea of the distribution of land cover types in the test images, take a look at the Image Statistics. Select Tools > Image Statistics and select as Input Image the classification file ‘AF_LC’ and Accept. The least represented class is named ‘soils & manmade’ (1), the three other classes are ‘water’ (2), ‘forest & natural vegetation’ (3) and ‘agriculture’ (4). In the next step you create a stratified random sample from the test image containing Leaf Area Index (LAI) values from all of these classes. Select Tools > Random Sampling. Choose the Input Image named ‘AF_LAI’ and check Stratification. Your Stratification image has to be ‘AF_LC’, then Accept. In the new dialog select Equalized Sampling and type in 50, creating a total sample of 200 pixels. 4 Define the Output path of your Random Sample, name it ‘sample_LAI’ and Accept. Your sample will appear in the Image List. These 200 points might represent Leaf Area Index values measured in the field. For a later evaluation of the performance of the models, please divide the sample into a training (70%) and a validation (30%) data set. Select Tools > Random Sampling. Choose the Input Image named ‘sample_LAI’ and Accept. Now select Relative Sampling and type in ‘70’. Define the output path of your Random Sample and na me the training data set ‘TrainSample1’, then check Complement, define the output path of your validation data set and name it ‘ValidSample1’. Now two files are created, the first containing 70% randomly chosen pixels of the ‘sample_LAI’ file, the second the remaining 30%. 5 6 3 ImageRF 1) Parameterization Select Applications > Regression Parameterize RF Regression (RFR). > imageRF Regression > The Input Image has to be ‘AF_Image’ and the Reference Area ‘TrainSample1’. Some Parameters, e.g. Number of Trees, are already pre-defined and do not have to be changed. Simply define where to save the Output RFR Model, name it ‘rfrModel1_1’ and Accept. 2) Application After completion of Parameterization, you are asked if you want to apply the model to an image, answer ‘yes’. In the next dialog the last RFR Model and the Image to is already selected. Now define where the regression estimation is to be saved and Accept. After completion, you can visualize the rfrEstimation in an Image View (drag-and-drop the file onto the view manager). The grey values represent the estimated LAI values. 7 3) Accuracy Assessment Select Applications > Regression > imageRF Regression > Fast Accuracy Assessment. As RFR Model the last one is again already selected, as well as the Image. For the Reference Areas choose the ‘ValidSample1’. In your HTML browser several accuracy measures will show up. Leave the browser open for a later comparison of results. number of samples (n): mean absolute error (MAE): mean squared error (MSE): root mean squared error (RMSE): pearson correlation (r): squared person correlation (r^2): nash-sutcliffe efficiency (NSE) : 60 (masked: 115551 total: 115611) 0.551599 1.724486 1.313197 0.89 0.78 0.72 8 In the next step you are going to follow the same procedure again, in order to check for possible deviations in the parameterization of the model using the same training data. Hence, you start again with step 1 (Parameterization) and name the new model ‘rfrModel1_2’, then apply it to the image and do the accuracy assessment again. In your HTML browser now a second tab with the new result report should open up. In the last step your task is to run the model once again, this time with a different allocation of trainings- and validation pixels. Select Tools > Random Sampling. Again choose the Input Image named ‘sample_LAI’, then Accept. Select Relative Sampling and type in 70 (%). Define the Output path of your random sample and name it this time ‘TrainSample2’. Check again Complement, define the path and name it ‘ValidSample2’, then Accept. Two new files should appear in the File List. Finally do the three steps again, namely 1. Parameterization (using TrainSample2, naming the model ‘rfrModel1_3’) 2. Application 3. Accuracy Assessment (with ValidSample2). By comparing the three accuracy measures in your HTML browser, you will notice slightly different results between the three models. Perhaps your results will look comparably to those in the following example. 9 Model 1.1 Model 1.2 Model 1.3 MAE = 0.551599 MAE = 0.550949 MAE = 0.510775 MSE = 1.724486 MSE = 1.647037 MSE = 1.002154 RMSE = 1.313197 RMSE = 1.283369 RMSE = 1.001076 r = 0.89 r = 0.89 r = 0.92 r² = 0.78 r² = 0.79 r² = 0.84 NSE = 0.72 NSE = 0.73 NSE = 0.79 4 ImageSVM Select Applications > Regression Parameterize SV Regression (SVR). Choose for the Training Data the Image ‘AF_Image’ and as Reference Areas the ‘TrainSample1’ from section 2, then Accept. Now choose where to save the ‘AF_Image_scaled1.svr’, then Accept. When the parameterization is completed, select > imageSVM SVR File Regression and name > it Applications > Regression > imageSVM Regression > Apply SVR to Image. The previous SVR file is already selected, so choose as Image the ‘AF_Image’ and define a path for the regression result and a name for the file. Name it ‘AF_Image_SVR_1’, then Accept. After completion, do an Accuracy Assessment. Select Applications > Regression > imageSVM Regression > Fast Accuracy Assessment. In the first dialog the last SVR file is already selected, simply Accept. As Validation Data select the Image ‘AF_Image’ and as Reference Areas the ‘ValidSample1’. The Accuracy Assessment yields a accuracy measures, a scatterplot with histograms and a residuals plot. 10 If you follow the steps again (Parameterization, Application, Accuracy Assessment) one time, you might again notice slightly different results. Finally, like in section 3, repeat the three steps again using ‘TrainSample2’ for the Parameterization and ‘ValidSample2’ for the Accuracy Assessment. Model 2.1 Model 2.2 Model 2.3 MAE 0.3693 0.5256 0.4391 RMSE 0.8269 1.4170 1.001 R 0.9526 0.8900 0.9098 R² 0.9074 0.7921 0.8277 11 5 autoPLSR Select Applications > Regression > autoPLSR > Calibrate Model. Under the first bullet point, choose as Input Image the ‘AF_Image’ and as target image ‘TrainSample1’. For the Output define where to save the autoPLSR Model and unclick Show Report as well as Save Report, then Accept. After completion, select Applications > Regression > autoPLSR > Apply Model. Choose the Input Model ‘modelPLSR.plsr’, the Input Image ‘AF_Image’ and define the name and path for the Output image, then Accept. 12 The image ‘autoPLSR_Estimation’ has to have the file type EnMAP-Box Regression and a data ignore value in order to perform an accuracy assessment, which it currently does not. Therefore, right-click on the file ‘autoPLSR_Estimation’ and choose Edit Header File. Then change the line ‘file type = envi standard’ to ‘file type = EnMAP-Box Regression’ and add the following line ‘data ignore value = -1’ In the same window click File > Save and close the window. Now select Applications > Accuracy Assessment > Regression. Choose as Estimation the ‘autoPLSR_Estimation’ and as Reference ‘ValidSample1’, click Accept. As in the case of imageRF, some accuracy measures will show up in your HTML browser. Following the same procedure as before, do the three steps again (Calibration, Application and Accuracy Assessment). Again the result will differ slightly. Finally do the three steps again using ‘TrainSample2’ for the Calibration and ‘ValidSample2’ for the Accuracy Assessment. You should now have received accuracy measures of the three different models. 1. Model 3.1 2. Model 3.2 3. Model 3.3 MAE = 0.687782 MAE = 0.655503 MAE = 0.697173 MSE = 1.815288 MSE = 1.668609 MSE = 1.518849 RMSE = 1.347326 RMSE = 1.291747 RMSE = 1.232416 r = 0.86 r = 0.87 r = 0.87 r² = 0.74 r² = 0.76 r² = 0.76 NSE = 0.71 NSE = 0.73 NSE = 0.75 Mean accuracy measures of the three approaches. imageRF imageSVM autoPLSR MAE 0.51-0.55 0.37-0.53 0.66-0.70 RMSE 1.00-1.31 0.82-1.41 1.23-1.34 R 0.89-0.92 0.89-0.95 0.86-0.87 R^2 0.78-0.84 0.79-0.91 0.74-0.76 NSE 0.72-0.79 - 0.71-0.75 13 Conclusions: - Even with the same training data and approach the results differ markedly, which is also true for a different allocation of training and validation pixels. - Choice of training data with respect to amount and representativeness is important even for more robust approaches. 14