- Munich Personal RePEc Archive
Transcription
- Munich Personal RePEc Archive
M PRA Munich Personal RePEc Archive Rainfall Drought Simulating Using Stochastic SARIMA Models for Gadaref Region, Sudan Hisham Moahmed Hassan and Tariq Mahgoub Mohamed University of Khartoum, Khartoum Technical College 2014 Online at http://mpra.ub.uni-muenchen.de/61153/ MPRA Paper No. 61153, posted 8. January 2015 13:38 UTC Rainfall Drought Simulating Using Stochastic SARIMA Models for Gadaref Region, Sudan Hisham Mohamed Hassan Alia Tariq Mahgoub Mohamedb a Department of Econometrics and Social Statistics, University of Khartoum E-mail: [email protected] b Khartoum Technical College E-mail: [email protected] ABSTRACT Sudan is one of the countries which economy depends on rain fed agriculture and also facing recurring cycles of natural drought. For many decades, recurrent drought, with intermittent severe droughts, had become normal phenomenon in Sudan. This paper presents linear stochastic models known as multiplicative seasonal autoregressive integrated moving average model (SARIMA) used to simulate monthly rainfall in Gadaref station, Sudan. For the analysis, monthly rainfall data for the years 1971–2010 were used. The seasonality observed in ACF and PACF plots of monthly rainfall data was removed using first order seasonal differencing prior to the development of the SARIMA model. Interestingly, the SARIMA (0,0,5)x(1,0,1)12 model developed here was found to be most suitable for simulating monthly rainfall over the Gadaref station. This model is considered appropriate to forecast the monthly rainfall to assist decision makers establish priorities for water demand, storage and distribution. I. INTRODUCTION Gadaref1 region lies in east central part of Sudan, at the border with Ethiopia. The mean annual rainfall in this region, during the last four decades, is 617 mm. The annual number of rainy days, (rainfall > 1 mm), is 111 days and the mean annual reference potential evapotranspiration (ETO) using Penman/Monteith criterion for the region is about 2283mm (Le Houérou, 2009). The region experiences very hot summer and temperature in the region reaches up to 45o C in May. Generally the dry periods are accompanied with high temperatures, which lead to higher evaporation affecting natural vegetation and the agriculture of the region along with larger water resources sectors. Annual potential evapotranspiration exceeds annual precipitation in this region. The rainfall exceeds 1 The word Gedarif (Gadaref, El Gadarif, Qadarif ) is derived from the Arabic phrase All Gada-Ye-rif, meaning: he who had finished selling or buying should leave 1 evapotranspiration only in August and September (Etuk and Mohamed, 2014).The climate in the Gedaref is semi-arid with mean annual temperature near 30o C (Elagib and Mansell, 2000). A few researchers who have modeled rainfall drought using Standardized Precipitation Index (SPI) as a drought indicator by SARIMA methods in recent times are Mishra and Desai (2005) and Durdu (2010). For instance, Mishra and Desai (2005) fitted a SARIMA (1, 0, 0) x (1, 1, 1) 6 model to simulate and forecast SPI-6 in Kansabati river basin, India. Durdu (2010) modeled the SPI-6 for Buyuk Menderes river basin located in the western part of Turkey and fitted a SARIMA(1, 0, 0)x(2, 0, 1) 6 to it. Time series model development consists of three stages identification, estimation, and diagnostic checking (Box and Jenkins, 1970). The identification stage involves transforming the data (if necessary) to improve the normality and stationary of the time series to determine the general form of the model to be estimated. During the estimation stage the model parameters are calculated. Finally, diagnostic test of the model is performed to reveal possible model inadequacies to assist in the best model selection. II. METHODOLOGY For more than half a century, Box–Jenkins ARIMA linear models have dominated many areas of time series forecasting. Autoregressive (AR) models can be effectively coupled with moving average (MA) models to form a general and useful class of time series models called autoregressive moving average (ARMA) models. In ARMA model the current value of the time series is expressed as a linear aggregate of p previous values and a weighted sum of q previous deviations (original value minus fitted value of previous data) plus a random parameter, however, they can be used when the data are stationary (Mishra and Desai 2005). A time series is said to be stationary if it has constant mean and variance. This class of models can be extended to non-stationary series by allowing differencing of data series. These are called autoregressive integrated moving average (ARIMA) models. Often time series possess a seasonal component that repeats every s observations. For monthly observations s = 12 (12 in 1 year), for quarterly observations s = 4 (4 in 1 year). In order to deal with seasonality, ARIMA processes have been generalized: The full ARIMA model is called the SARIMA, a seasonal differencing element. The regular ARIMA includes the AR polynomial and the MA polynomial the SARIMA model incorporates both non-seasonal and seasonal factors in a multiplicative model. One shorthand notation for the model is ARIMA(p, d, q) × (P, D, Q)S, with p = non-seasonal AR order, d = non-seasonal differencing, q = non-seasonal MA order, P = seasonal AR order, D = seasonal differencing, Q = seasonal MA order, and S = time span of repeating seasonal pattern. Without differencing operations, the model could be written more formally as Φ(BS)φ(B)(xt - μ) = Θ(BS)θ(B)wt (1) The non-seasonal components are: 2 AR: φ(B) = 1 - φ1B - ... - φpBp MA: θ(B) = 1 + θ1B + ... + θqBq The seasonal components are: Seasonal AR: Φ(BS) = 1 - Φ1BS - ... - ΦPBPS Seasonal MA: Θ(BS) = 1 + Θ1BS + ... + ΘQBQS Note that on the left side of equation (1) the seasonal and non-seasonal AR components multiply each other, and on the right side of equation (1) the seasonal and non-seasonal MA components multiply each other. ARIMA (1, 0, 0) × (1, 0, 0)12 The model includes a non-seasonal AR(1) term, a seasonal AR(1) term, no differencing, no MA terms and the seasonal period is S = 12, the non-seasonal AR(1) polynomial is φ(B) = 1 - φ1B. the seasonal AR(1) polynomial is Φ(B12) = 1 - Φ1B12. The model is (1 - Φ1B12)(1 - φ1B)(xt - μ) = wt. (2) If we let zt = xt - μ (for simplicity), multiply the two AR components and push all but zt to the right side we get zt = φ1zt-1 + Φ1zt-12 + (-Φ1φ1)zt-13 + wt. (3) This is an AR model with predictors at lags 1, 12, and 13. R can be used to determine and plot the PACF for this model, with φ1=.6 and Φ1=.5. That PACF (partial autocorrelation function) In this study, SARIMA models were used to simulate droughts based on the procedure of models developments. The models are applied to simulate droughts using (SPI) series in Gadaref region, (SPI-6 = SPI for 6 month). After identifying models, it is needed to obtain efficient estimates of the parameters. These parameters should satisfy two conditions namely stationary and invariability for autoregressive and moving average models, respectively. The parameters should also be tested whether they are statistically significant or not. The parameters values are associated with standard errors of estimate and related t-values. The drought events were calculated using the SPI. The data series from 1971 to 2010 were used for model development for SPI-6 series. There are two software packages which are used for time series analysis. These programs are the SPSS 19 package and Eviews6. 3 IV. RESULTS AND DISCUSSION Time series plot was conducted using the raw data, SPI 6_Gadaref, to assess its stability. The assessments results are shown in fig. 1 it is clearly depicted that the time series are stationary. SPI6 FIG. 1: SPI 6 OF GADAREF STATION TIME SERIES 1971-2010 4 3 2 1 0 -1 -2 -3 -4 1975 1980 1985 1990 1995 2000 2005 2010 Stationary is also confirmed by the Augmented Dickey-Fuller Unit Root Test (ADF test) on the data. The ADF test was conducted on the entire data. Table 2 shows ADF test results. ADF test value -7.29548 less than critical vales -3.9778, -3.4194, -3.1323 all at 1%, 5%, and 10% respectively. This indicates that the series is stationary. TABLE 2: ADF UNIT ROOT TEST (SPI6-GADAREF) Station Variable ADF test Gadaref SPI_6 -7.29548 Level of Confidence Critical Value Probability 1% -3.9778 0.0000 5% -3.4194 0.0000 10% -3.1323 0.0000 Result stationary In this step, the model that seems to represent the behaviour of the series is searched, by the means of autocorrelation function (ACF) and partial auto correlation function (PACF), for further investigation and parameter estimation. The behaviour of ACF and PACF is to see whether the series is stationary or not. For modelling by ACF and PACF methods, examination of values relative to auto regression and moving average were made. An appropriate model for estimation of SPI_6 values for the station was finally found. 4 Figure 2 shows the ACF and PACF, which have been estimated for SPI-6 for Gadaref station. Many models for Gadaref stations, according to the ACF and PACF of the data, were examined to determine the best model .The model that gives the minimum Akaike Information Criterion (AIC) and Schwarz Criterion (SC) is selected as best fit model, as shown in Table 3. FIG. 2: ACF AND PACF PLOT FOR GADAREF STATION (SPI_6) SERIES Autocorrelation Partial Correlation .|***** | .|**** | .|*** | .|** | .|* | *|. | *|. | *|. | *|. | .|. | .|. | .|. | .|. | .|. | .|. | .|. | .|. | .|. | .|. | .|. | *|. | *|. | *|. | *|. | .|***** | .|. | .|. | .|. | *|. | **|. | .|* | .|. | .|. | .|. | .|. | .|. | .|. | .|. | .|. | .|. | .|. | .|. | .|. | *|. | *|. | .|. | .|. | .|. | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 AC PAC Q-Stat Prob 0.710 0.515 0.381 0.264 0.110 -0.087 -0.095 -0.074 -0.074 -0.055 -0.021 0.032 0.071 0.046 0.030 0.027 0.032 0.016 -0.014 -0.049 -0.089 -0.112 -0.144 -0.154 0.710 0.021 0.016 -0.036 -0.147 -0.235 0.168 0.065 -0.001 0.042 -0.004 -0.028 0.066 -0.063 -0.030 0.027 0.033 -0.011 0.002 -0.093 -0.088 -0.006 -0.035 0.000 240.58 367.28 436.84 470.20 476.05 479.71 484.06 486.67 489.31 490.78 490.99 491.49 493.93 494.98 495.42 495.77 496.29 496.42 496.51 497.72 501.65 507.92 518.29 530.14 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 TABLE 3: COMPARISON OF AIC FOR SELECTED MODELS, (SPI6_GADAREF) Variable SPI_6 Station Model AIC Gadaref ARIMA(1,0,0) ARIMA(1,0,1) ARIMA(2,0,1) ARIMA(2,0,2) ARIMA(0,0,1) ARIMA(0,0,5) SARIMA(0,0,5) (1,0,0) SARIMA(0,0,5) (1,0,1) SARIMA(0,0,5) (0,0,1) SARIMA(1,0,0) (1,0,1) 2.104 2.108 2.113 2.102 2.335 2.010 2.036 1.999 2.104 2.092 5 The ACF and PACF correlograms, Fig. 2, and the coefficient are analyzed carefully and the SARIMA model chosen is SARIMA (0,0,5) (1,0,1), as shown in table 3. After the identification of the model using the AIC and SC criteria, estimation of parameters was conducted. The values of the parameters are shown, in table 4. The result indicated that the parameters are all significant since their p-values is smaller than 0.05 and should be used in the model. TABLE 4: SUMMARY OF PARAMETER ESTIMATES AND SELECTION CRITERIA (AIC), (SPI6_GADAREF) Variable Coefficient Std. Error t-Statistic Prob. AR(12) MA(1) MA(2) MA(3) MA(4) MA(5) SMA(12) -0.875450 0.697188 0.514251 0.408934 0.402012 0.394849 0.951476 0.025327 0.043122 0.050715 0.052794 0.050717 0.043373 0.013937 -34.56547 16.16768 10.14007 7.745840 7.926640 9.103536 68.27071 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat Inverted AR Roots 0.573186 0.567557 0.652726 193.8532 -454.9345 2.000692 .96+.26i .26-.96i -.70-.70i .96+.26i .47-.68i -.26+.96i -.70-.70i Inverted MA Roots Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. .96-.26i .26+.96i -.70-.70i .96-.26i .47+.68i -.26-.96i -.70-.70i .70+.70i -.26-.96i -.96+.26i .70-.70i .26-.96i -.41-.73i -.82 0.024048 0.992582 1.999716 2.062376 2.024386 .70-.70i -.26+.96i -.96-.26i .70+.70i .26+.96i -.41+.73i -.96-.26i -.96+.26i As considered in table 4 the model SARIMA (0,0,5) (1.0.1) has been selected as the one with min AIC. The model has been identified and the parameters have been estimated. The model verification is concerned with checking the residuals of the model to see if they contain any systematic pattern which still can be removed to improve the chosen ARIMA. All validation tests are carried out on the residual series. The tests are summarized briefly in the following paragraph. For a good model, the residuals left over after fitting the model should be white noise. This is revealed through examining the autocorrelations and partial autocorrelations of the residuals of various orders. For this purpose, the various correlations up to 24 lags have been computed. The ACF and PACF of residuals of the model are shown in figure 3. Most of the values of the RACF and RPACF lies within confidence limits except very few individual correlations appear large compared with the confidence limits. The figure indicates no significant correlation between residuals. 6 Autocorrelation Partial Correlation AC PAC Q-Stat Prob .|. | .|. | 1 -0.001 -0.001 0.0001 .|. | .|. | 2 -0.005 -0.005 0.0110 .|. | .|. | 3 0.025 0.025 0.2931 .|. | .|. | 4 -0.009 -0.009 0.3302 .|. | .|. | 5 -0.012 -0.012 0.3967 .|. | .|. | 6 -0.022 -0.023 0.6310 .|. | .|. | 7 -0.039 -0.039 1.3437 .|. | .|. | 8 0.018 0.018 1.4908 0.222 9 -0.076 -0.076 4.2551 0.119 *|. | *|. | .|. | .|. | 10 -0.015 -0.014 4.3686 0.224 .|. | .|. | 11 -0.022 -0.026 4.6047 0.330 .|. | .|. | 12 -0.003 -0.001 4.6100 0.465 13 0.098 0.096 9.1690 0.164 .|* | .|* | .|. | .|. | 14 0.017 0.015 9.3048 0.232 .|. | .|. | 15 -0.046 -0.048 10.337 0.242 .|. | .|. | 16 -0.013 -0.025 10.415 0.318 .|. | .|. | 17 -0.005 -0.005 10.427 0.404 .|. | .|. | 18 0.028 0.027 10.815 0.459 .|. | .|. | 19 0.027 0.031 11.167 0.515 .|. | .|. | 20 -0.039 -0.037 11.899 0.536 .|. | .|. | 21 0.002 -0.006 11.900 0.614 .|. | .|. | 22 -0.016 -0.009 12.022 0.677 .|. | .|. | 23 -0.052 -0.044 13.317 0.649 .|. | .|. | 24 0.044 0.045 14.266 0.648 FIG. 3: THE ACF AND PACF OF RESIDUALS FOR SPI-6 FOR GADAREF STATION MODEL The Ljung-Box Q-statistic is employed for checking independence of residual. From figure 3, ones can observe that the p-value is greater than 0.05 for all lags, which implies that the white noise hypothesis is not rejected. The Breusch-Godfrey Serial Correlation LM test accepts the hypothesis of no serial correlation in the residuals, as shown in table 5. Durbin Watson statistic, (DW=1.999764), also indicated that there is no serial correlation in the residuals. 7 Table No. 5: The Breusch-Godfrey Serial Correlation LM test (SPI6_Gadaref) Breusch-Godfrey Serial Correlation LM Test: F-statistic 0.030213 Prob. F(2,453) 0.9702 Obs*R-squared 0.001942 Prob. Chi-Square(2) 0.9990 Breusch-Godfrey Serial Correlation LM Test: F-statistic 0.760940 Prob. F(12,443) 0.6909 Obs*R-squared 9.272109 Prob. Chi-Square(12) 0.6795 The Q-statistic and the LM test both indicated that the residuals are none correlated and the model can be used. Since the coefficients of the residual plots of ACF and PACF are lying within the confidence limits, the fit is good and the error obtained through this model is tabulated in the table 6. The graph showing the observed and fitted values is shown in figure (6.4). TABLE NO. 6: ERRORS MEASURES OBTAINED FOR THE MODEL ARIMA (0,0,5) , (SPI6_GADAREF) Error Measure RMSE MAE R squared Value 0.662 0.487 0.557 Figure 4 shows a very close agreement between the fitted model and the actual data. Histogram of residuals for SPI_6 is shown in figure 5. This histogram shows that the residuals are normally distributed. This signifies residuals to be white noise. 8 FIG.4: ACTUAL AND FITTED VALUES SARIMA (0, 0, 5)(1.0.1) , (GADAREF, SPI_6) 4 2 0 -2 3 2 -4 1 0 -1 -2 -3 1975 1980 1985 1990 Residual 1995 2000 Actual 2005 2010 Fitted The graph of the (Q-Q) plot for the residual data look fairly linear, the normality assumptions of the residuals hold, as shown in fig. 6. 90 Series: Residuals Sample 1972M06 2010M11 Observations 462 80 70 60 50 40 30 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 0.007362 0.004298 2.599692 -2.052816 0.648422 0.021672 3.808087 Jarque-Bera Probability 12.60649 0.001830 20 10 0 -2 -1 0 1 2 FIG. 5: HISTOGRAMS OF RESIDUALS FOR SPI_6 GADAREF STATION 3 Quantiles of Normal 2 1 0 -1 -2 -3 -3 -2 -1 0 1 2 3 Quantiles of RESID FIG. 6: (Q-Q) PLOT OF RESIDUALS FOR SPI_6 FOR GADAREF STATION 9 The K–S test is used to test the normality of residuals. It is observed that the Dcal is less than Dtab at 5% significant level, shown in Table 6,( = 0.102 > 0.05).This test satisfies that the residuals are normally distributed. TABLE NO 6: K-S TEST CALCULATION OF RESIDUALS FOR SPI_6 SERIES, (GADAREF) Kolmogorov-Smirnov Test Most Extreme Differences (Dcal) Dtable 0.057 0.063 One can note that all the model coefficients are statistically significant, each being more than twice its standard error. The regression is very highly significant with a p-value of 0.0000, as high as 55.7% of the variation in data is accounted for by the fitted model. Figure 3 shows that the residuals are uncorrelated. Figure 4 shows a very close agreement between the fitted model and the data. Therefore the fitted model is adequate. Fitted to the SPI_6 for Gadaref station is the SARIMA (0, 0, 5) (1.0.1) model. Using various alternative arguments it has been shown to be adequate. III. Conclusion In this paper linear stochastic model known as multiplicative seasonal autoregressive integrated moving average model (SARIMA) was used to simulate droughts in Gadarif region, Sudan. The models are applied to simulate droughts using Standardized Precipitation Index (SPI) series, the results show that the fitted model is adequate to the SPI_6 for Gadaref station is the SARIMA (0, 0, 5) (1.0.1) model. IV. References Abramowitz, M., Stegun, A. (eds.), (1965). Handbook of Mathematical Formulas, Graphs, and Mathematical Tables. Dover Publications, Inc., New York, USA. Box, G.E., Jenkins, G.M., (1970), Time Series Analysis; Forecasting and Control. HoldenDay series in time series analysis, San Francisco. Durdu, O.F., (2010), Application of Linear Stochastic Models For Drought Forecasting in The Buyuk Menderes River Basin, Western Turkey, Stochastic Environmental Research and Risk Assessment, doi:10.1007/s00477-010-0366-3. Elagib, N.A., M.G. Mansell., (2000), Recent Trends and Anomalies in Mean Seasonal and Annual Temperature Over Sudan. Journal of Arid Environment 45(3): 263-288. Etuk, E.H., Mohamed T.M., (2014). Time Series Analysis of Monthly Rainfall data for the Gadaref rainfall station, Sudan, by SARIMA Methods. International Journal of Scientific Research in Knowledge, 2(7), pp. 320-327. 10 Le Houérou, H.N. (2009): Bioclimatology and Biogeography of Africa. Verlag Berlin Heidelberg, Germany. Mishra, A. K., Desai, V. R., (2005), Drought Forecasting Using Stochastic Models. Stochastic Environmental Research and Risk Assessment 19(5): 326-339. Edwards, D.C., McKee, T.B., (1997). Characteristics of 20th Century Drought in the United States at Multiple Scales. Atmospheric Science Paper 634. Cacciamani, C., Morgillo, A., Marchesi, S., Pavan, V. (2007). Monitoring and Forecasting Drought on a Regional Scale: Emilia-Romagna Region. Water Science and Technology Library 62(1): 29-48 Thom, H.C.S., (1958). A Note on the Gamma Distribution. Monthly Weather Rev. 86, 117– 122. Press WH, Flannery BP, Teukolsky SA, Vetterling WT. 1986. Numerical Recipes. Cambridge University Press: Cambridge. P. Angelidis, F. Maris, N. Kotsovinos and V. Hrissanthou, (2012). Computation of Drought Index SPI with Alternative Distribution Functions”, Water Resour ManageVol. 26 p2453 – 2473. McKee, T. B.; N. J. Doesken; and J. Kleist. (1993). “The relation of drought frequency and duration to time scales.” Proceedings of the Eighth Conference on AppliedClimatology; pp. 179–84. American Meteorological Society, Boston. Tsakiris, G., Vangelis, H. (2004). Towards a Drought Watch System based on spatial SPI. Water Resources Management 18 (1): 1-12. Mishra, A. K., Desai, V. R., (2005), Drought Forecasting Using Stochastic Models. Stochastic Environmental Research and Risk Assessment 19(5): 326-339. Durdu, O.F., (2010), Application of Linear Stochastic Models For Drought Forecasting in The Buyuk Menderes River Basin, Western Turkey, Stochastic Environmental Research and Risk Assessment, doi:10.1007/s00477-010-0366-3. 11