Finite sample performance of small versus large scale dynamic factor models
Transcription
Finite sample performance of small versus large scale dynamic factor models
Finite sample performance of small versus large scale dynamic factor models Rocio Alvarez Maximo Camachoy Gabriel Perez-Quiros Universidad de Alicante Universidad de Murcia Banco de España and CEPR [email protected] [email protected] [email protected] Abstract We examine the …nite-sample performance of small versus large scale dynamic factor models. Monte Carlo analysis reveals that small scale factor models outperform large scale models in factor estimation and forecasting for high level of cross-correlation across the idiosyncratic errors of series from the same category, for oversampled categories, and specially for high persistence in either the common factor series or the idiosyncratic errors. In these situations, there is a superior accuracy of pre-selecting the series to be used in a small scale dynamic factor model even for arbitrary selections of the original set of indicators. Keywords: Business Cycles, Output Growth, Time Series. JEL Classi…cation: E32, C22, E27. Maximo Camacho thanks Fundacion Ramon Areces for …nancial support. The views in this paper are those of the authors and do not represent the views of Bank of Spain or the EuroSystem. y Corresponding Author: Universidad de Murcia, Facultad de Economia y Empresa, Departamento de Metodos Cuantitativos para la Economiay la Empresa, 30100, Murcia, Spain. E-mail: [email protected]. 1 1 Introduction Two versions of dynamic factor models have received a growing attention in the recent forecasting literature. On the one hand, forecasts have been computed from di¤erent enlargements of the Stock and Watson (1991) single-index small scale dynamic factor model (SSDF M ). Examples are Mariano and Murasawa (2003), Nunes (2005), Aruoba, Dieblod and Scotti (2009), and Camacho and Perez Quiros (2010) whose strict factor models are estimated by maximum likelihood using the Kalman …lter under the assumption of having non cross-correlated idiosyncratic errors. On the other hand, forecasts have been computed from di¤erent sophistications of the seminal work of Stock and Watson (2002a), principal components estimator which combine the information of many predictors. Examples of forecasts from the so-called large scale dynamic factor models (LSDF M ) are Forni, Hallin, Lippi and Reichlin (2005), Giannone, Reichlin and Small (2008), Angelini et al (2008), etc. These approximate factor models lead to asymptotically consistent estimates when the number of variables and observations tends to in…nity, under the assumptions of weak cross-correlation of the idiosyncratic components, and that the variability of the common component is not too small. Much theoretical attention has been devoted to large scale factor models by stressing that strict factor models rely on the tight assumption that the idiosyncratic components are cross-sectionally orthogonal. However, including additional time series in empirical applications frequently faced non negligible costs as well. According to Boivin and Ng (2006), the large data sets used by LSDF M are typically drawn in practice from a small number of broad categories (such as industrial production, or monetary and price indicators). Since the idiosyncratic errors of time series belonging to a particular category are expected to be highly correlated, the assumption of weakly correlation among the idiosyncratic components is more likely to fail as the number of time series of this category increases. In addition, the good asymptotic properties suggested by the theory may not hold in many empirical applications when the number of variables and observations are relatively reduced.1 1 Recently, Boivin and Ng (2006) for US and Banbura and Runstler (2007) for the Euro area show that the predictive content of empirical large scale factor models is contained in the factors extracted from as 2 The impact of this potential confront between asymptotic theory and empirical applications has rarely been addressed. Among the exceptions, Stock and Watson (2002b) …nd deterioration in performance of (static) large scale factor models when the degree of serial correlation and (to less extent) heteroskedasticity among idiosyncratic errors are large and when serial correlation of factors is high. Boivin and Ng (2006) use (static) large scale dynamic factor models to show that including series that are highly correlated with those of the same category does not necessarily outperforms models that exclude these series. Boivin and Ng (2006) for the US and Caggiano, Kapetanios, and Labhard (2009) for some Euro area countries estimate large scale (static) factor models of di¤erent dimensions to show that factors extracted from pre-screened series often yield satisfactory or even better results than using larger sets of series. Their preferred data sets sometimes includes one …fth of the original set of indicators. Bai and Ng (2008) …nd improvements over a baseline large scale (static) factor model by estimating the factors using fewer but informative predictors. Finally, Banbura and Runstler (2007) use a (dynamic) large scale model to show that forecast weights are concentrated among a relatively small set of Euro area indicators. From all these previous works, the one that it is closer to our approach is Boivin and Ng (2006). However, even though we think that their analysis is very complete, we share the views of Aruoba, Diebold and Scotti (2009), when they conclude that comparative assessments of the small sample properties of factor estimation and forecast from “small data”versus “big data”dynamic factor models is a good place to develop further empirical analyses. In that sense, we separate from Boivin and Ng (2006) …rst, because our purpose is not to determine the optimal number of variables in a large dataset but to shed some light on the dilemma of which is the optimal strategy when dealing with a forecasting problem, to start from a simple small model and enlarge if it is necessary or to go directly to a large scale model and try to eliminate the redundant information 2 Second, Boivin and few as about 40 series. 2 The estimation strategy also determines the techniques to be used in the estimation. Most of the large scale models techniques require a su¢ cient large number of series to have good properties. Therefore, a small scale model can not be estimated as a particular case of the large scale speci…cation but a di¤erent estimation strategy. 3 Ng (2006) consider static models, while we compare dynamic speci…cations. In particular, we consider our framework for large scale model the dynamic model of Giannone, Reichlin and Small (2008) while they consider the static model of Stock and Watson (2002a). This is an important feature of our analysis because we will address the question of how persistence in the factors or in the idiosyncratic shocks a¤ect how appropriate are the di¤erent speci…cations. Third, even though they mention the word "categories" in the motivation of their work, referring to the di¤erent sectors or type of data in the economy (prices, production, etc..) they classify the data in general according to their correlation or their heteroskedastic behavior. We concentrate in detail on considering and simulating the e¤ects of having di¤erent sectors in the economy and considering cross correlation across sectors, inside each sector and both across and inside jointly and its e¤ects on the estimation of the factors. Speci…cally, in this paper we develop simulations in which we try to mimic di¤erent empirical forecasting scenarios. The …rst scenario would be the case on which an analyst uses SSDF M to estimate the factors and to compute the forecasts from a small number of pre-screened series which are the main (less noisy) indicators of the di¤erent categories of data. In the second scenario, a SSDF M is again used for factor extraction and forecast computations but from a less accurate pre-screening which includes a small number of noisier indicators, the medium series of each category which are sorted by increasing variance. In the …nal scenario, a large scale data set is generated by including additional series in each category under the assumption that the additional series are …ner disaggregations of the main indicator with which they are correlated. From these large set of indicators, a LSDF M is used to compute the forecasts. Using averaged squared errors, we evaluate the accuracy of these three forecasting proposals to estimate the factors and to compute out of sample forecasts of a target variable. We …nd that adding data that bear little information about the factor components does not necessarily lead large scale dynamic factor models to improve upon the forecasts of small scale dynamic factor models. In fact, we show that when the additional data are too correlated with data from some categories which are already included in factor estimation, forecasting with many predictors perform worse than forecasting from a reasonably pre- 4 screened dataset especially when the categories are not highly correlated. We also address the role of the persistence in the factor in determining the best forecasting method. This paper proceeds as follows. Section 2 describes both small and large scale dynamic factor models. Section 3 presents the design details of the simulation exercise, i.e., how to generate the main series of each category and the …ner disaggregations. Section 4 shows the main …ndings in the comparison between SSDF M and LSDF M for di¤erent parameter’s values. Section 5 concludes. 2 Dynamic factor models Large and small scale factor models can be represented in a similar general framework. Let yt be a scalar time series variable to be forecasted and let Xt = (X1t ; :::; XN t )0 , with t = 1; :::; T , be the observed stationary time series which are candidate predictors of yt . If we are interested in one step ahead predictions, the baseline model can be stated as yt+1 = 0+ 0 Xt + p X j yt j+1 + yt+1 ; (1) j=1 where =( 1 ; :::; 0 N) , and yt+1 is a zero mean white noise. Since estimating this expression becomes impractical as the number of predictors increases, it is standard to assume that each predictor Xit has zero mean and admits a factor structure: Xit = 0 i Ft + for the ith cross-section unit at time t, i = 1; :::; N , this framework the r it = 0 i Ft (2) it; i =( i1 ; :::; 1 vector Ft contains the r common factors, the common components, and it 0 ir ) , i and t = 1; :::; T . In the r factor loadings, the idiosyncratic errors. In vector notation the model can be written as X t = Ft + where =( ij ) t; is the N r matrix of factor loadings and (3) t is the vector of N idiosyncratic shocks. In the related literature, it is standard to assume that the vectors Ft and 5 t are serially and cross-sectionally uncorrelated unobserved stationary processes.3 In contrast to static factor models, the dynamics of the common factors are supposed to follow a V AR(1) process Ft = AFt where A is the r t 1 + ut ; (4) r matrix of coe¢ cients, with E[ut ] = 0 and E[ut u0t ] = u. In addition, is assumed to follow a stationary V AR(1) process with mean zero: =C t t 1 + vt ; where vt is independent with E[vt ] = 0 and E[vt0 vt ] = (5) 4 v. Then, the objective variable yt can be forecasted through the common factors by using the expression yt+1 = 0 + 0 Ft + p X j yt j+1 + eyt+1 : (6) j=1 Finally, let us call the model small scale dynamic factor model (SSDFM ) when N is …xed and small and T is large, and large scale dynamic factor model (LSDFM ) when both N and T are large. 2.1 Small scale dynamic factor models The baseline model is the single-index dynamic factor model of Stock and Watson (1991) which can be written in state-space form. Accordingly, the autoregressive parameter A, the vector of the N loading factors , and the (N shocks v, N ) covariance matrix of the idiosyncratic can be estimated by maximum likelihood via the Kalman …lter.5 Let ht be the (N + 1) vector ht = (Ft;0 0t )0 , Ij be the identity matrix of dimension j, and 0j be the vector of j zeroes. Hence, the measurement equation can be de…ed as Xt = Hht + et ; 3 (7) In this framework the common factor is supposed to generate most of the cross-correlation between the series of the data set fXit gN i=1 : 4 Although assuming V AR(p) dynamics for the factors and the idiosyncractic components is straightforward, it would complicates notation. 5 As usual, u is assumed to be one for identi…cation purposes. 6 where H= IN ; (8) and et is a vector of N zeroes. In addition, the transition equation can be stated as ht+1 = F ht + wt ; where the (N + 1 (9) N + 1) matrix F is 0 F =@ A 0N 0N C 1 A; and wt = (ut ; vt0 ) with zero mean and covariance matrix 0 1 0 u A: Q=@ 0 v (10) (11) In the standard way, the Kalman …lter also produces …ltered and smoothed inferences s gT s T of the common factor: fFtjt t=1 and fFtjT gt=1 . These inferences can be used in the prediction equation (6) to compute OLS forecasts of the variable yt+1 : 2.2 Large scale dynamic factor models To estimate the factors in the large scale framework, we use the quasi-maximum likelihood approach suggested by Doz, Giannone and Reichlin (2007). In this method, the estimates of the parameters are obtained by maximizing the likelihood via the EM algorithm, which consists on an iterative two-step estimator. In the …rst step, the algorithm computes an estimate of the parameters given an estimate of the common factor. In the second step, the algorithm uses the estimated parameters to approximate the common factor by the Kalman smoother. At each iteration, the algorithm ensures to obtain higher values of the log-likelihood of the estimated common factor, so it is assumed that the process converges when the slope between two consecutive log-likelihood values is lower than a threshold.6 Using an initial set of time series fXit gN i=1 , the (i + 1)-th iteration of the algorithm is i, de…ned as follows. Let us assume that 6 In practice, we consider a threshold of 10 4 Ai and . 7 i x are known. Let Fti be the common factor which is the output of the Kalman …lter from the i-st iteration. The updated estimates of , A, and x can be obtained from i+1 \ \ i0 i i0 = E[X t Ft ](E[Ft Ft ]) 1 ; i i0 i\ i0 Ai+1 = E[F\ t Ft 1 ](E[Ft 1 Ft 1 ]) i+1 x (12) 1 ; (13) [ i i0 = E[X t t ]: (14) The estimates of the expectations can be obtained from T 1X 0 [ E[Xt Ft ] = Xt Fti0 ; T (15) t=1 where the series fFti gTt=1 is the one estimated at the iteration i. E[Ft Ft0 ] = E[Ft Fti0 ] + E[fFt Fti0 gfFt Fti0 g0 ], and E[fFt Fti0 gfFt In addition, since Fti0 g0 ] is the variance of the estimated common factor, then denoting the variances by fVt gTt=1 , the expectation E[Ft Ft0 ] can be estimated by \ i i0 E[F t Ft ] = T 1 X i i0 (Ft Ft + Vt ): T (16) t=1 Following a similar reasoning, E[Ft Ft0 1 ] = E[Ft Fti0 1 ] + E[fFt Fti0 gfFt 1 Fti0 1 g0 ]; and the last expectation which we denote as fCt gTt=2 can be estimated by the Kalman …lter. Then, the expectation E[Ft Ft0 1 ] can be estimated by T X i F i0 ] = 1 (Fti Fti0 1 + Ct ): E[F\ t t 1 T (17) t=1 The matrix v is estimated as the diagonal matrix whose principal diagonal is the ones of the resulting matrix given by: ^ x = diag( 1 T T X Xt (Xt i Fti )0 ); (18) t=1 These estimates can be used again in the Kalman …lter to compute the factors Fti+1 . The algorithm, which starts with the static principal components estimates of the common factors Ft0 and their factor loadings 0, is repeated until the quasi-maximum likelihood 8 estimates of the parameters are obtained. These can easily be used to compute the estimates of the common factor fFtjT gTt=1 using the Kalman smoother, treating the idiosyncratic errors as uncorrelated both in time and in the cross section.7 Finally, as in the case of SSDFM, the forecasts of yt+1 are estimated by OLS regressions on (6). 3 Designing the simulation study According to the estimation of the dynamic factor models described in the previous section, it is reasonable to think that the empirical applications that use these factor models will perform worse than expected when facing data problems that invalidate the assumptions warranted by the theory. In the case of SSDF M , the larger the covariance among idiosyncratic errors the less accurate the estimated are expected to be. With respect to the empirical performance of LSDF M , Boivin and Ng (2006) stressed that it can be worse when the average size of the common component falls, when the number of observations is not large either on the cross-section or on the time dimensions, and when the possibility of correlated errors increases as more series are included in the model. This situation is very common in practice since the data are usually drawn from a small number of broad categories (such as industrial production, money indicators or prices). In this case, if the series are ordered within each category by the importance of its common component, and expanding the datasets with series from each category will frequently lead to larger cross correlations than assumed by the theory. In that sense, it is fundamentally wrong all the analysis which base the asymptotic properties of the large scale models on the law of large numbers as if all the series were the same or had the same properties8 In this section, we perform Monte Carlo simulations to asses the extent to which the violation of the theoretical assumptions behind SSDF M and LSDF M a¤ects both 7 The algorithm requires small number of iterations to converge. In our simulations, we only required 3 or 4 iterations to converge. 8 To our knowledge, only some recent papers analyze seriously the di¤erent characteristics of the series when the information set is increased. These are the so called dynamic hierarchical factor models Moench, Ng and Potter (2009). The comparison between these type of models and the traditional large and small scale is left for further research 9 consistency of factor estimation and accuracy of forecasts. To analyze under which circumstances it is worth reducing the in‡uence of noisy predictors, the simulations are designed to replicate the two competing forecasting scheme The …rst scheme mimics the case of forecasters who develop a reasonable pre-screening of the set of indicators and apply SSDF M to obtain predictions from a reduced number of indicators. In this case, the analyst searches for the representative indicators of each economic category by screening out those time series with high correlation with the main indicators. However, also in this case, we contemplate the possibility of choosing just one indicator of each category without previous pre-screening. The second scheme mimics the case of forecasters who include a large number of indicators of each category and apply LSDF M to compute predictions. In this case, the additional indicators are assumed to be correlated with the representative indicators of each category. Finally, the goodness of …t in estimating the factors and the factor forecast accuracy of these methods is examined my means of their Mean Squared Error (M SE). 3.1 Generating small data sets The small data set, fXits gN;T i;t=1 ; with N = 10, is generated from one common factor only. First, given the parameters A and u; we generate the series of the common factor fFt gTt=1 by using expression Ft = AFt 1 + ut : (19) In this case, fut gTt=1 are random numbers which are drawn from a normal distribution with zero mean and variances u = 1. To examine the dependence of the results on the persistence of the factor, we allow for di¤erent values for the parameter A = 0:1; 0:5; and 0:75. Second, we assume that the idiosyncratic errors follow autoregressive processes. For particular values of the coe¢ cient matrix C, and v, we generate the series t =( 1t ; :::; N t ), from t =C t 1 + vt : (20) In this case, vt = (v1t ; :::; vN t ), and fvit gN;T i;t=1 are random numbers which are drawn from a normal distribution with zero mean and variance-covariances matrix 10 v. To simplify simulations, the autoregressive coe¢ cients matrix C will be diagonal with two possible values c = 0:1 and c = 0:75 in the main diagonal. In addition, to examine the e¤ects of the errors cross-correlation, the covariance matrix will take di¤erent values across the simulations. In particular, let us consider a given value for the parameter s and generate the vector !s = (1; s ; 2s ; :::; 9s ): Then, the matrix v can be viewed as the Toeplitz matrix constructed from the vector ! as s 0 v 1 B B B s B B = B 2s B B .. B . @ 9 s s 1 s .. . 8 s 2 s ::: s ::: 1 .. . ::: .. . 7 s ::: As can be deduced from this expression, the parameter 9 s 1 C C C C 7 C : s C C .. C . C A 1 8 s s (21) represents the maximum corre- lation between the error terms of two series and controls the correlation across categories of data. In the simulations, the values of this parameter will be Finally, in the simulations f t gTt=1 is used in s = 0; 0:1; 0:5; and 0:75. will be a column vector of N ones. Then, fFt gTt=1 , and XtS = Ft + t; (22) to obtain simulations of XtS . With XtS = fXits gTt=1 , for i = 1; :::; 10. Therefore, what we have here in the 10 series fXits gTt=1 could be intuitively interpreted as 10 economic sectors, that depend on the same business cycle Ft ; that has di¤erent levels of persistence measured by A = 0:1; 0:5; and 0:75, and a 10 sectorial shocks f t gTt=1 ; t =( 1t ; :::; N t ) cross correlation 9 which also have di¤erent levels of persistence c = 0:1 and c = 0:75 and s = 0; 0:1; 0:5; and 0:759 . For simplicity and clarity in the exposition we are going to assume that exists only one factor because we think that the number of cases that we contemplate is already large enough. Considering more than one factor is trivial but the computation time for the montecarlo simulations increases dramatically and the results are of the same nature. We address the possibility of estimating more than one factor even though the data are generated by one factor in the next section. 11 3.2 Generating large data sets l gM;T , with M = 100, we assume that As mentioned above, for the large data set fXjt j;t=1 the ten series generated in the previous section,XtS . represent the main indicators of each of the ten di¤erent categories of data. Accordingly, we add an error term representing the idiosyncratic error of the speci…c series to each of the ten time series fXits gN;T i;t=1 for N = 10. The new errors are called fwikt g10;10;T i;k;t=1 where i represents the sector, and k represents the series within the sector, and they are assumed to be serially correlated and cross-correlated with all the series existing within its respective category. Hence, the large data set is generated by using l Xikt = Xits + wikt ; (23) where i = 1; :::; 10, k = 1; :::; 10, and wit = (wi1t ; :::; wi10t )0 is the vector of idiosyncratic errors which is generated by wit = Cwit 1 + elit : (24) In this expression, felikt g10;10;T i;k;t=1 are random numbers drawn from a normal distribution with zero mean and covariance matrix which is the Toeplitz matrix constructed from the vector !l as in (21), where l = 0; 0:1; 0:5; and 0:75. Therefore, the parameter l controls the correlation within each of the categories of data. Again, the autoregressive coe¢ cients matrix C is diagonal with constant values of c = 0:1 and c = 0:75 in the main diagonal. According to expressions (22), (23), and (24), each series of the large data set can be decomposed as follows l Xikt = where l ikt = it i Ft + l ikt ; + wikt . Then, the idiosyncratic components common error inside the categories, it , (25) l ikt are composed by a which could be cross-correlated among di¤erent categories, and a speci…c error term, wikt , which could be correlated with series from the same category. Finally, putting together the series along all the categories, we have the large data set l Xtl = X1;1;t ; X l1;2;t ; :::; X l1;10;t ; X l2;1;t ; X l2;2;t ; :::; X l2;10;t ; :::; X l10;1;t ; X l10;2;t ; :::; X l10;10;t 0 : (26) 12 As in the previous case, the intuition behind the data generating process here is the same than before, but adding the fact that the series-speci…c shocks can also be autocorrelated c = 0:1 and c = 0:75 and cross correlation 3.3 l = 0; 0:1; 0:5; and 0:75 Generating the target series Finally, we generate the series to be predicted in the a simple scenario. To simplify simulations, we consider that forecasting with factors and one lagged value of the time series is dynamically complete. Hence, the series yt is generated from yt+1 = where 0 Ft + yt + eyt ; is one, eyt is a white noise process, with ey = 1, and (27) takes on the values of 0, 0:3, 0:5 and 0:8. 4 Simulation results In each replication, j, we estimate the small and large scale factor models and compute the accuracy of these models to infer the factor by using the Mean Squared Error over the J = 1000 replications M SE i = J T 1X1X (Fjt J T j=1 i QFjtjT )2 ; (28) t=1 for i = s in the case of the small data set and i = l in the case of the large data set. In this expression, Q is the projection matrix of the true common factor on the estimated common factor.10 In addition, we compare the out of sample forecasting accuracy of SSDF M and LSDF M by computing the errors in forecasting one step ahead the generated target series. Let b and b be the OLS estimates of the parameters given by equation (27) using the common factor series and the past values of y up to period T 1: Then we construct i b i the one-step-ahead forecast of yjT +1 by using the relation ybjT +1 = FjtjT + byjT . In this way, one can de…ne the Mean Squared one-step-ahead Forecast Errors of model i as M SF E i = J 1X (yjT +1 J j=1 10 i 2 ybjT +1 ) : (29) We need the projection matrix since the common factors are estimated up to a signal transformation. 13 However, this experiment could lead to unrealistic results in favor of SSDF M since we are implicitly assuming that in the pre-screening of the indicators the researcher would always …nd the main indicator in each category of data. To overcome this potential bias in favor of small scale factor models, we consider in the simulation exercises an additional case in which the researcher estimates a SSDF M but arbitrarily using the …fth noisier series from each category. Accordingly, we call M SErs ; M SEns , M SE l , M SF Ers , M SF Ens , and M SF E l the mean across replications of the M SE and M SF E which are computed from SSDF M with the 10 representative series of each category (superscript s, subscript r), from SSDF M with 10 noisier series of each category (superscript s, subscript n), and from LSDF M to the 100 time series of the large scale simulation exercise (superscript l). 4.1 Factor estimates Let us start the analysis of the simulations by comparing the accuracy of the models to infer the factors (using M SEs). To facilitate understanding, let us describe how the results are presented in the tables. First, the results in Tables 1 to 3 are classi…ed according to di¤erent values of the autoregressive coe¢ cient of the common factor series (coe¢ cient A). Hence, this coe¢ cient takes the value of 0:1 (low correlation) in Table 1, the value of 0:5 (medium correlation) in Table 2 and the value of 0:75 (high correlation) in Table 3. Second, each of these tables shows the accuracy of the models for di¤erent values of the cross correlation within and across categories. The …rst block of results refers to the case when the only cross-correlation presented in the idiosyncratic components is due to series that belong to the same category, s = 0, while the following blocks of results examine the e¤ects of progressively increasing the correlation across categories to 0:1, 0:5 and 0:75. Within each of these blocks, the tables report the models accuracy to infer the common factor when the correlation within categories, which is measured by l, increases from 0 to 0:1, 0:5 and 0:9. Third, the …rst three columns of the tables refer to MSEs from reasonably pre-screened SSDF M , arbitrarily chosen (…fth noisier) SSDF M , and LSDF M , respectively. Fourth, it is usually a problem in large scale models the fact that not always there is the same number of series in each category. Some categories might be over represented. We address the e¤ects of over sampling in the last two columns of 14 these tables. For this purpose, we simulate ten categories of data but including 20 series instead of 10 in the …rst category, and 5 series instead of 10 in the second and third categories. All the other 7 categories are represented by 10 series11 Fifth, in Tables 1 to 3, the idiosyncratic errors are assumed to have low serial correlation (value of c = 0:1), the sample is small (T = 50), and we assume that there is only one common factor in the estimation. The robustness of the results to allow for higher serial correlations, to use larger samples, and to permit LSDF M to select the number of common factors as in Bai and Ng (2002), are analyzed in Tables A1 to A4 in the appendix. A small summary of the main results, is the following: It can be seen in all the tables that reasonably pre-screened SSDF M , presents smaller MSE than all the other speci…cations (M SErs ; < M SEns ) and ,(M SErs ; < M SE l ) . This is an important point. From this result we learn that a good preselection in the categories could make the model impossible to beat even if we add a lot of information. This results holds for all the possible assumptions about the dynamics of the shocks, of the factors and the crosscorrelations..However, even in the case in which the econometrician is not extremely careful with the selection of the variables, still there can be some gain of estimating a SSDF M . In the comparison of SSDF M with an arbitrary selection (the …fth of each category) and the LSDF M , the relative performance of these two models depend on the autocorrelation of the factor (as can be seen in the comparison of Table 1 and Table 3) and obviously, of the cross-correlation within categories and across categories l and s. In general, our main results are in concordance with those obtained by Boivin and Ng (2006) from large scale static factor models using sets of di¤erent numbers of indicators.12 although, as we said in the introduction, they do not address the topic of small vs large estimation, they concentrate on choosing the optimal number of variables in a large scale model. 11 The accuracy of SSDF M from reasonable pre-screened series does not depend on the number of series that are included in each category because we just take the representative series of each category. Hence, we only show in the tables M SEns .and M SE l . In this case, M SEns represents the …fth noisier series in each category. 12 They suggest that the large scale factor estimates are adversely a¤ected by cross-correlation in the errors and by oversampling. 15 We are also in line with the …ndings of Stock and Watson (2002b). They …nd (using static large scale factor models) some deterioration on the quality of the factor estimates, and this deterioration occurs when the degree of serial correlation in the factor and in the idiosyncratic errors is high even when the number of variables and observations is large, exactly as we shown in Table 2 and 3. Going in detail to our results, we can observe that increasing inertia in the simulated common factor, with A ranging from 0:1 (almost no serial correlation) in Table 1 to 0:5 (moderate correlation) in Table 2 and 0:75 (high correlation) in Table 3.con…rm the deterioration in factor estimation from all the factor models although the relative losses are not uniformly distributed along the models. When the serial correlation of the factor increases, the relative gains of reasonably over arbitrarily pre-screening the series in SSDF M still hold at similar rates, except for the case of very large correlation across categories where the relative gains attenuate. Notably, the M SEs also highlight the signi…cant losses in the relative accuracy of LSDF M with respect to SSDF M as the inertia of the common factor increases. In fact, when A = 0:75, in all scenarios, the SSDF M from arbitrarily chosen series outperform LSDF M . It is also important to point out the results displayed in columns 4 and 5 of Table 1, 2 and 3, which refer to the e¤ects of oversampling of some of the categories. All the other columns are calculated assuming that the user of large scale models includes the same number of series in each category. However, practitioners usually work with unbalanced number of time series in each category, see for example, Angelini et al (2008) or Giannone et al (2008).13 To examine the e¤ect of using oversampled categories in factor analysis, these last two columns of Table 1, 2 and 3 report the M SEs of the arbitrarily noisy series SSDF M and of a LSDF M which uses 10 unbalanced categories chosen with the procedure explained before. Overall the large scale factor model with unbalanced categories performs worse than in the case of balanced categories, especially when the correlation across categories is small. Obviously, the relatively better accuracy of noisy SSDF M with respect to the oversampled LSDF M is more evident and it becomes critical when the low correlation across categories is combined with high correlation within 13 For example, typically, the number of series of disaggregated industrial production indicators is quite higher than the number of time series included in other categories. 16 categories. The tables that try to address the robustness of our results to di¤erent assumption are included in the appendix. Tables A1 and A2 in examine the e¤ects of increasing the serial correlation of the idiosyncratic components on the factor models performance. In particular, the serial correlation is assumed to grow from c = 0:1 to c = 0:75 when the serial correlation of the factor is low (A = 0:1 in Table A1) and when it is high (A = 0:75 in Table A2) which leads to the following results. First, the serial correlation in the errors contributes to deteriorate the overall performance of the models even more than the serial correlation in the factor. For example, when l = 0, s = 0:75, and A = c = 0:1, the M SErs is 0.35 and it increases to 0.50 when c = 0:75 but only to 0.40 when A = 0:75. Second, the accuracy of reasonably pre-screened SSDF M versus both arbitrarily chosen SSDF M and LSDF M is larger when there is serial correlation in the idiosyncratic components In that sense, the model more negatively a¤ected by the serial correlation is the LSDF M: Third, these results are magni…ed in the case of oversampled categories in factor analysis. In Table A3 and A4, we examine the role of the number of observations in the performance of factor models under di¤erent values of A. According to the theory, the larger the values of time series and observations and in the absence of the typical data problems which are accounted for by our simulations, the better the performance of LSDF M with respect to SSDF M . This is documented in Table A4 where the reported M SEs show that under low serial correlation of the factor and low correlation of the idiosyncratic errors, the accuracy of SSDF M from reasonably pre-screened indicators with respect to LSDF M diminishes, and LSDF M outperform SSDF M from arbitrarily selected indicators.14 However, SSDF M in both scenarios about the selection of variables, still outperform LSDF M clearly. In addition, the table shows that the relative losses in accuracy due to oversampling in LSDF M are still large and mitigates the expected asymptotic bene…ts of large scale factor models. As a last remark, it is worth noting that the number of factors has been restricted to be one according to the data generating process. However, the generation of time se14 Note that SSDF M applied to arbitrarily selected indicators are contamined by data problems as LSDF M do. 17 ries in categories with high within category and across category correlation may lead this assumption to be too restrictive.15 To evaluate the e¤ect of this potential restriction in the accuracy of LSDFM in factor estimation, we leave the large scale model to select the number of factors according to the procedure described in Bai and Ng (2002) where the maximum number of factor is 11. Table A5 and A6 report the M SE l and the averaged number of estimated factors across the 1000 replications The main results of this exercise are the following. First, there is no gain for the LSDF M of estimating more than one factor versus the model when we specify just one factor, and second, the higher the correlation within categories the larger the number of estimated factors since the high correlation in each category is interpreted as if the series belonging to this category would share a common factor and, in this case, the performance of the LSDF M improves signi…cantly although it still does not improve the results of the SSDF M 4.2 Forecasting accuracy The ability of factor models in one-step-ahead out-of-sample forecasting is examined in Table 4.16 As in the case of factor estimates, we perform an analysis under di¤erent situations. The tables allow for di¤erent values of the autoregressive coe¢ cient of the common factor series which is 0:1, 0:5 and 0:7517 ;and di¤erent degrees of cross-correlation across ( s from 0 to 0.5) and within ( l from 0 to 0.9) categories. In addition, they show the extent to which the forecasting performance of dynamic factor models depends on the inertia of the series to be forecasted. For this purpose, the forecasted series are simulated with values of ranging from 0 (no inertia) to 0:8 (high degree of time series dependence). As in the common factor series estimates, the persistency of the common factor series has no e¤ect of the M SF Ers and M SF Ens ; even in the case of one category over sampled. However, it has an important e¤ect on M SF E l ; as it does in the common factor series 15 Datasets generated from one factor but in ten categories of highly correlated indicators could need one factor in each category. 16 In-sample forecasting analyses were also developed with similar results. These results, which have been omitted to save space, are available from the authors upon request. 17 To save space, we only presents the results for A=0.1 (the equivalent to Table 1). The other tables are also available. 18 estimates. As expected, the inclusion of past values of the main series is not relevant on the one-step-ahead out-of-sample forecast relative performance of the di¤erent models. Then, the models are classi…ed in the same way that they when measured according to factor estimates. Therefore there is no case in which the M SF E l is lower than M SF Ers : Then in overall, the strategy of reasonably pre-selecting the predictors and using them in a SSDF M almost unambiguously outperforms LSDF M and SSDF M from arbitrarily chosen series as we showed in the factor estimation section. Comparing M SF E l with M SF Ens ; the former is lower than the second when A = 0:1; and the cross-correlation across categories and within categories of the idiosyncratic errors are low18 . 5 Empirical exercise We consider as empirical exercise the one presented in Stock and Watson (2002b), where we use the data set to forecast 12 months ahead the Industrial Production index growth rate. That is, our objective series to forecast is given by yt+12 = ln(IPt+12 =IPt ); where IPt is the index of industrial production for date t. The data set is composed of the same series as in Stock and Watson (2002b), although the time period is from 1997:01 to 2010:05. Then, the 12 months ahead forecasts are constructed starting at 2005:01, where using the data available at this date, we compute a forecast of the objective series at 2006:01. The last forecast would be at 2010:05 using data up to 2009:05. The …rst forecast at 2006:01 is computed as follows. The common factor series and the unknown parameters are estimated from the data up to 2005:01. Once we have an estimate of the common factor series, we run a regression of the objective function up to 2005:01 on the common factor series by yt+12 = Fbt + ut : Then we obtain an estimate of the parameter : Finally, the 12 month forecast at 2006:01 is computed as y2005:01+12 = ^ Fb2005:01: Proceeding similarly through the rest of the period, we get the 12 months forecast 18 All the tables, with the same structure that we have in the section of factor estimates, including the ones that we have in the appendix are available for the forecasting exercise. In order to save space, we have not included them in the text, but they are available from the authors under request. 19 of yt+12 : We repeat the real time forecasting exercise under di¤erent speci…cations for the number of factors using Doz, Giannone and Reichlin (2007). We use the common factor to forecast yt+12 , including a large number of series (111 series) assuming 1 or 2 factors and also estimating the optimal number of factors The data set considered is composed of 111 series of the 149 used by Stock and Watson (2002b), they are the free access series available by internet. The results of the M SF E for these three speci…cations are displayed in the …rst three lines of Table 5. We also estimate the factor including a small number of series. We use the series proposed by Stock and Watson (1991) and estimate the model by maximum likelihood. In particular, we use Industrial Production, real personal income,employment in non agricultural sector, and manufacturing and trade sales .The results are displayed in the fourth line of Table 5. Finally, just for comparison purposes, the table shows in the 5th line a simple autoregressive model for the annual growth rates of IP As can be seen in the table, the lowest M SF E is achieved when the 12-month forecast is performed using the common factor series from the 4 main indicators. In that sense, the empirical exercise con…rms the results obtained by the simulation study conducted through the paper. A well speci…ed small scale model is di¢ cult to be beaten even when large amounts of information are added into the speci…cation. 6 Conclusions In this paper, we address the research question proposed in Aruoba, Diebold and Scotti (2009) about the performance of large vs small scale factor models for the estimation common factors and the forecasting of a set of goal variables. We propose simulations which mimic di¤erent scenarios of empirical forecasting, where the list of series is …xed (rather than tending to in…nity) and where it may appear cross correlation and serial correlation among idiosyncratic components which may be greater than those warranted by the theory. The Monte Carlo analysis allows for indicators which belong to di¤erent 20 categories of data and whose idiosyncratic components show cross-correlation within and across categories in addition to serial correlation. We also allow for categories which are oversampled. Finally, the simulations examine the accuracy of small versus large data sets under di¤erent degrees of serial correlation in the factor. We …nd that adding data that bear little information about the factor components does not necessarily lead large scale dynamic factor models to improve upon the forecasts of small scale dynamic factor models. In fact, we show that when the additional data are too correlated with data from some categories which are already included in factor estimation, forecasting with many predictors perform worse than forecasting from a reasonably prescreened dataset especially when the categories are not highly correlated. This results is stronger in the case of high persistence of the common factor, in the case of high serial correlation of the idiosyncratic components, in the case of using noisy series, and in the case of oversampled categories. In these cases, even arbitrarily selecting one time series from each category and using the resulting dataset in a small scale dynamic factor model outperforms the forecasts from large scale dynamic factor models. In these situations, we can be better o¤ throwing away some redundant data even if they are available. 21 References [1] Angelini, E., Camba-Mendez, G., Giannone, D., Reichlin., L., and Rünstler, G. 2008. Short-term Forecasts of Euro Area GDP Growth. CEPR working paper 6746. [2] Aruoba, B., Diebold, F., and Scotti, C. 2009. Real-time measurement of business conditions. Journal of Business and Economic Statistics 27: 417-427. [3] Bai, J., and Ng, S. 2002. Determining the number of Factors in approximate factor models. Econometrica 70: 191-221. [4] Bai, J., and Ng, S. 2006. Evaluating latent and observed factors in macroeconomics and …nance. Journal of Econometrics, 131: 507-537. [5] Banbura, M., and Runstler, G. 2007. A look into the model factor model black box. Publication lags and the role of hard and soft data in forecasting GDP. ECB working paper 751. [6] Boivin, J., and Ng, S. 2006. Are more data always better for factor analysis? Journal of Econometrics 132: 169-194. [7] Caggiano, G., Kapetanios, G., and Labhard, V. 2009. Are more data always better for factor analysis? Results for the Euro area, the six largest Euro area countries and the UK. European Central Bank Working Paper 1051. [8] Camacho, M., and Perez Quiros, G. 2010. Introducing the Euro-STING: Short Term INdicator of Euro Area Growth. Journal of Applied Econometrics, forthcoming. [9] Doz, C., Giannone, D., and Reichlin, L. 2007. A quasi-maximum likelihood approach for large approximate dynamic factor models. ECB working paper 674. [10] Forni, M., Hallin, M., Lippi, M., and Reichlin, L. 2005. The generalized dynamic factor model: one-sided estimation and forecasting. Journal of the American Statistical Association 100: 830-40. [11] Giannone, D., Reichlin, L., and Small, D. 2008. Nowcasting: The real-time informational content of macroeconomic data. Journal of Monetary Economics 55: 665-676. 22 [12] Mariano, R., and Murasawa, Y. 2003. A new coincident index os business cycles based on monthly and quarterly series. Journal of Applied Econometrics 18: 427-443. [13] Moench, E Ng, S and Potter S. 2009 Dynamic Hierarchical Factor Models. Federal Reserve Bank of New York Sta¤ Reports 412. December. [14] Nunes, L. 2005. Nowcasting quarterly GDP growth in a monthly coincident indicator model. Journal of Forecasting 24: 575-592. [15] Stock, J., and Watson, M. 1991. A probability model of the coincident economic indicators. In Leading Economic Indicators: New Approaches and Forecasting Records, edited by K. Lahiri and G. Moore. Cambridge University Press. [16] Stock, J., and Watson, M. 2002a. Macroeconomic forecasting using di¤usion indexes. Journal of Business and Economic Statistics 20 147-162. [17] Stock, J., and Watson, M. 2002b. Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association 97: 1167-1179. 23 Table 1. Simulations common factor estimator (T=50, c=0.1, A=0.1) Correlation within categories ρl Same number of series in each category s MSEr s Over sampling one category MSEn MSE MSEns MSEl l Correlation across categories ρs=0 0 0.101 0.195 0.124 0.191 0.149 0.1 0.101 0.192 0.125 0.191 0.151 0.5 0.101 0.196 0.139 0.190 0.166 0.9 0.101 0.195 0.185 0.192 0.320 Correlation across categories ρs=0.1 0 0.116 0.207 0.139 0.204 0.159 0.1 0.116 0.205 0.141 0.204 0.162 0.5 0.116 0.205 0.152 0.203 0.175 0.9 0.116 0.206 0.197 0.202 0.310 Correlation across categories ρs=0.5 0 0.223 0.289 0.236 0.285 0.235 0.1 0.223 0.286 0.239 0.284 0.234 0.5 0.223 0.286 0.246 0.284 0.243 0.9 0.223 0.287 0.281 0.284 0.300 Correlation across categories ρs=0.75 0 0.350 0.383 0.350 0.382 0.346 0.1 0.350 0.380 0.349 0.383 0.344 0.5 0.350 0.381 0.359 0.376 0.346 0.9 0.350 0.377 0.376 0.378 0.376 Notes: The values of ρs determine the cross-correlation of the idiosyncratic shocks between series from different categories, and the values of ρl determine the cross-correlation of the idiosyncratic shocks between series from the same category. T is the sample size. Parameters A and c measure the serial correlation of the factor and the idiosyncratic shocks, respectively. MSE rs refers to the Mean Squared Error of the estimation with the 10 representative series of each category, MSE ns the model with 10 arbitrarily chosen series and MSE l the model with 100 series. 1 Table 2. Simulations common factor estimator (T=50, c=0.1, A=0.5) Correlation within categories ρl Same number of series in each category MSErs Over sampling one category MSEns MSEl MSEns MSEl Correlation across categories ρs=0 0 0.100 0.191 0.175 0.190 0.202 0.1 0.100 0.190 0.175 0.188 0.200 0.5 0.100 0.192 0.190 0.188 0.217 0.9 0.100 0.191 0.236 0.187 0.350 Correlation across categories ρs=0.1 0 0.115 0.204 0.191 0.201 0.207 0.1 0.115 0.203 0.191 0.201 0.208 0.5 0.115 0.204 0.206 0.200 0.229 0.9 0.115 0.203 0.250 0.199 0.340 Correlation across categories ρs=0.5 0 0.227 0.293 0.294 0.290 0.290 0.1 0.227 0.291 0.297 0.288 0.289 0.5 0.227 0.291 0.305 0.290 0.304 0.9 0.227 0.291 0.343 0.288 0.368 Correlation across categories ρs=0.75 0 0.372 0.399 0.414 0.403 0.409 0.1 0.372 0.400 0.415 0.405 0.415 0.5 0.372 0.407 0.430 0.402 0.422 0.9 0.372 0.400 0.450 0.402 0.448 Notes: See notes of Table 1. 2 Table 3. Simulations common factor estimator (T=50, c=0.1, A=0.75) Correlation within categories ρl Same number of series in each category MSErs Over sampling one category MSEns MSEl MSEns MSEl Correlation across categories ρs=0 0 0.097 0.182 0.382 0.180 0.395 0.1 0.097 0.182 0.384 0.180 0.427 0.5 0.097 0.183 0.397 0.181 0.429 0.9 0.097 0.182 0.444 0.180 0.525 Correlation across categories ρs=0.1 0 0.112 0.195 0.398 0.192 0.417 0.1 0.112 0.195 0.400 0.193 0.421 0.5 0.112 0.196 0.413 0.194 0.428 0.9 0.112 0.195 0.459 0.191 0.559 Correlation across categories ρs=0.5 0 0.230 0.290 0.510 0.289 0.515 0.1 0.230 0.291 0.512 0.286 0.506 0.5 0.230 0.291 0.524 0.288 0.524 0.9 0.232 0.289 0.565 0.286 0.574 Correlation across categories ρs=0.75 0 0.406 0.425 0.644 0.432 0.650 0.1 0.406 0.425 0.646 0.430 0.652 0.5 0.406 0.425 0.655 0.426 0.680 0.9 0.406 0.425 0.688 0.427 0.711 Notes: See notes of Table 1. 3 Table 4. Forecasting accuracy (T=50, c=0.1, A=0.1) Correlation within categories ρl Persistency of the target series γ Same number of series in each category Oversampling one category MSFErs MSFEns MSFEl MSFEns MSFEl 0 1.107 1.215 1.14 1.171 1.161 0.3 1.101 1.202 1.161 1.218 1.199 0.8 1.086 1.172 1.099 1.178 1.173 0 1.107 1.354 1.341 1.378 1.558 0.3 1.101 1.146 1.129 1.166 1.286 0.8 1.086 1.273 1.235 1.318 1.459 0 1.197 1.280 1.198 1.371 1.342 0.3 1.200 1.248 1.237 1.324 1.321 0.8 1.154 1.222 1.156 1.288 1.238 0 1.197 1.324 1.314 1.248 1.239 0.3 1.200 1.320 1.300 1.441 1.425 0.8 1.154 1.320 1.319 1.394 1.407 Correlation across categories ρs =0 0 0.9 Correlation across categories ρs =0.5 0 0.9 Notes: The estimated model is y t +1 = βFt + γyt + e yt +1 . 4 Table 5. Simulated Out-of-Sample Forecasting Results Industrial Production, 12-Month Horizon. Sample period 1997:01 to 2010:05, out of sample forecast period 2006:01 to 2010:05. Forecast method MSFE LSDFM , r=1 0.0038 LSDFM , r=2 0.0047 LSDFM , r* 0.0100 SSDFM, LI, r=1 0.0033 AR, annual growth 0.0037 VAR, annual growth and LSDFM , r* 0.0084 VAR, annual growth and SSDFM, LI, r=1 0.0043 Note. The parameter r determines the number of common factor series estimated. MSFE rs refers to the Mean Squared Forecast Error of the estimation with the 10 representative series of each category, MSFE ns the model with 10 arbitrarily chosen series and MSFEl the model with 100 series. 5 APPENDIX Table A1. Simulations from estimating the common factor (T=50, c=0.75, A=0.1) Correlation within categories ρl Same number of series in each category MSErs Over sampling one category MSEns MSEl MSEns MSEl Correlation across categories ρs=0 0 0.169 0.265 0.317 0.272 0.382 0.9 0.172 0.265 0.518 0.263 0.604 Correlation across categories ρs=0.75 0 0.503 0.506 0.538 0.508 0.534 0.9 0.503 0.505 0.591 0.510 0.596 Notes: See notes of Table 1. Table A2. Simulations from estimating the common factor (T=50, c=0.75, A=0.75) Correlation within categories ρl Same number of series in each category MSErs Over sampling one category MSEns MSEl MSEns MSEl Correlation across categories ρs=0 0 0.251 0.470 0.507 0.482 0.558 0.9 0.251 0.461 0.696 0.496 0.833 Correlation across categories ρs=0.75 0 0.751 0.820 0.885 0.843 0.874 0.9 0.749 0.819 0.964 0.845 0.961 Notes: See notes of Table 1. 6 Table A3. Simulations from estimating the common factor (T=150, c=0.1, A=0.1) Correlation within categories ρl Same number of series in each category MSErs Over sampling one category MSEns MSEl MSEns MSEl Correlation across categories ρs=0 0 0.095 0.175 0.108 0.176 0.134 0.9 0.094 0.176 0.161 0.175 0.333 Correlation across categories ρs=0.75 0 0.350 0.376 0.340 0.375 0.333 0.9 0.350 0.377 0.370 0.375 0.364 Notes: See notes of Table 1. Table A4. Simulations from estimating the common factor (T=150, c=0.1, A=0.75). Correlation within categories ρl Same number of series in each category MSErs MSEns Oversampling one category MSEl MSEns MSEl Correlation error term Series of SSDFM: ρs=0 0 0.092 0.168 0.195 0.168 0.218 0.9 0.092 0.169 0.252 0.169 0.314 Correlation across categories ρs=0.75 0 0.409 0.427 0.487 0.427 0.477 0.9 0.409 0.428 0.531 0.429 0.523 Notes: See notes of Table 1. 7 Table A5. Simulations from estimating the common factor (T=50, c=0.1, A=0.1). The number of common factors is selected as in Bai and Ng (2002). Correlation within categories ρl Same number of series in each category rˆ Over sampling one category MSEl rˆ MSEl Correlation across categories ρs=0 0 2.31 0.121 1 0.147 0.9 10.89 0.140 1.84 0.196 Correlation across categories ρs=0.75 0 2.60 0.326 1.20 0.350 0.9 10.89 0.288 2.04 0.363 Notes: The values of rˆ are the averaged number of estimated number of factors across replications. See notes of Table 1. 8 Table A6. Simulations from estimating the common factor (T=50, c=0.1, A=0.75). The number of common factors is selectd as in Bai and Ng (2002). Correlation within categories ρl Same number of series in each category rˆ Over sampling one category MSEl rˆ MSEl Correlation across categories ρs=0 0 2.39 0.380 1 0.404 0.9 10.88 0.403 1.89 0.455 Correlation across categories ρs=0.75 0 2.58 0.621 1.24 0.643 0.9 10.86 0.587 2.06 0.667 Notes: The values of rˆ are the averaged number of estimated number of factors across replications. See notes of Table 1. 9