BAGUETTE Utilizing predictive regression modeling to forecast
Transcription
BAGUETTE Utilizing predictive regression modeling to forecast
BAGUETTE Utilizing predictive regression modeling to forecast equity returns FINANCE 663: International Finance Cam Harvey 27 February 2014 Bryce Caswell Matthew Heitz Peter Maher 1 Introduction Predicting the stock market is one of academia’s age old ventures. The literature on the subject is both broad and deep, as countless methods and variables have been deployed in an effort to prove or disprove the notion that equity markets can be forecasted accurately. The traditional research framework is backward looking, as key variables are tested over the lifespan of the entire market. This paper departs from pre-existing academic research by approaching the problem from the viewpoint of a trader or asset manager, evaluating a strategy over a shorter time horizon, but also focusing on profitability as well as statistical accuracy. We seek to use statistical techniques to develop a trading signal that can be easily implemented in an attempt to increase a trader’s ability to time the market. We also attempt to combine market prices with macro indicators related to the economy in an effort to distil the signal into an actionable investment strategy. Section I details the General Methodology of the paper; Section II describes the rationale and results for the model applied to the US and the S&P 500; Section III details the application of the model to international markets; Section IV closes the paper with details on potential improvements to the model. I. General Methodology Our trading signal is produced from a 1 period out-of-sample forecast that is calculated from a rolling regression of the forward month’s log stock index excess return (above the 3month Treasury) on various potential leading indicators. If the forecasted excess return of the index is positive, we go long and realize the return achieved in the given month. If the forecasted excess return is negative, we can either short the index or opt to earn an excess return of 0 by investing in short term treasuries.1 In determining the efficiency and accuracy of the forecast, we have simulated the performance of the trading signal over a 20-year period when the data is available. Each period’s forecast of index returns is calculated as if the trader was applying the model from the beginning of the month to create a trading decision for that particular month. We believe that this use of out-of-sample estimation creates a more realistic and robust statistical result and avoids a few of the pitfalls of ex-post estimation bias as well as issues with over-fitting a historical regression line. As an illustrative example, our forecast of December 2013 S&P 500 returns is determined by applying selected variables available on December 1st to a regression line fit to data from November 2013 through the regression window (in this case, the previous 7 years of monthly data). One month later, after close on December 31, 2013 we observe the December returns and recalibrate our regression to this latest historical prediction and observation (while dropping the oldest observation from the previous regression) in an effort to predict January 2014 returns. We recognize that there are issues with minor slippage, as the entry point in the model is considered to be at the month-end close price. However, the model 1 We found that the short signal rarely provided value added information. As such, the paper and models assume a risk‐free investment in the case of a negative signal. 2 would still require some time to process, and the actual entry price may be slightly different from the month end close. This impact should be immaterial if the difference is random. We take particular care to ensure that the variables utilized in the forecast are publicly available at the time when the forecast is made, as recent research, such as Ghysels (2012), has chastised models that ignore the impact of revisions and delays in release. For example, the ISM PMI is typically produced as one of the first indicators of the previous month’s economic activity. However, we cannot practically use this measurement to predict equity market returns in the month that it was released. In our case, it would be impossible to use the January 2014 data point, released on February 3, 2014 to predict the full February 2014 returns, although using this data point to predict March 2014 returns would be feasible if the data is not revised further. In the presence of a term structure of interest rates with a breadth of maturities (such as the US), we created separate forecasts using several slopes of the term structure. The final forecast was determined based upon a weighted average of the individual forecasts, with weightings determined by the R-square of the underlying regression. The goal of this step was to create an estimate that was potentially more responsive to changing interest rate regimes while also incorporating a more diverse signal set without burdening the regression with copious correlated variables. In addition to the slope of the term structure, we tested several other leading indicators that are relevant to economic performance. In determining our ideal model, we attempt to maximize the R-Square of our out-of-sample forecast for each period with the market’s actual return. We also consider the risk-adjusted return achieved over the sampling period relative to the market index as another valuable metric. However, taken in isolation, high risk-adjusted returns can be produced from particularly weak models. Additionally, we found that forecasts with a strong R-square were also more likely to produce stronger risk adjusted returns, so our testing efforts established an initial benchmark of achieving an Rsquared of at least 5%. The reason for creating such a benchmark is to ensure that the T-statistic is significant enough in the presence of data mining. We tested a total of 11 unique variables (Appendix B) with approximately 8 data transformations on each variable. If we conservatively ignore the correlation between the variables (and the higher correlations among the transformations) and assume 100 total tests, a more robust T-statistic would be approximately 3.33. Given our sample size, this is roughly a 5% R-square. A glaring deficiency of this approach is the fact that variables have been selected ex-post to create the best forecasts for the sample period. While the t-statistic of our actual vs. forecasted returns exceeds a “trial-and-error adjusted” threshold, we still recognize the point in time that this selection takes place. We address this issue further in the last section of the paper. There are several variables and inputs that are determined via modeler discretion and we detail the rationale for these selections in Appendix A. In terms of actual implementation, we view index futures as the most likely vehicle to pursue this strategy. We estimate that the transaction cost related to the strategy, in a liquid market like the ES-Mini, should amount to a few basis basis points, depending on 3 whether the cost of rolling over quarter-end months is worth the added liquidity in the front quarter contract2. Additionally, given the frequent turnover in the strategy, futures offer more favorable tax treatment than ETFs by blending both the long-term and short-term capital gains taxes for returns. Differing tax treatments and transaction fees between buyand-hold and our strategy have not been incorporated into this analysis but warrant further attention, as these effects can become significant via compounding. II. The United States and the S&P 500 Summary of Variables Our regression is composed of 4 independent variables: the slope of the nominal term structure, the change in the Baa corporate bond spread over 10-year treasury, the change in the ISM PMI reading, and the month-over-month change in non-farm payrolls. We include the term structure of interest rates primarily as a recession indicator. As many papers note, the term structure is an excellent method for capturing investor preferences as viewed through an inter-temporal lens. Investor demand for consumption smoothing will lead to the purchase of insurance if the future is expected to be worse than the present (Harvey 1991). This preference and the underlying transaction are captured in the term structure. The spread of Baa corporate bond yields was selected as a medium term measure of corporate financial health, as well as a proxy for the desire for riskier assets. The change in the PMI and non-farm payrolls were both selected as a short term indicator for US business growth. Overall, we attempt to capture changes along various segments of the business cycle in an effort to approximate the market’s sentiment for general economic growth. Both the Default Spread and the Term Spread are popular regressors in previous research on forecasting stock market returns, such as Fama (1989). Additionally, the PMI and the slope of the term structure have been identified as leading indicators for the purposes of estimating GDP, as the OECD monthly economic indicator uses these two terms (in addition to others) in its calculations3. While formal research on the predictability of nonfarm payrolls is unavailable, its importance to the current market climate warrants consideration in the model. We make extensive use of normalized variables in place of the actual data points, as we find that measures relative to a historical sample are more predictive than the original statistic itself. The goal of this methodology is to best capture the data as relative to market expectations, given that markets will typically move based upon changes in forward assumptions rather than absolute levels of growth, as the absolute level is already embedded to a large degree in existing prices. While this approach is simplistic, such as assuming that investors view the world through a “mean model”, we find that T-statistics can create a sense of context and outperform the underlying indicator as-is. Ideally, we 2 3 http://www.cmegroup.com/trading/equity‐index/us‐index/e‐mini‐sandp500_contract_specifications.html http://stats.oecd.org/mei/default.asp?lang=e&subject=5&country=USA 4 would compare the data point relative to the survey of economist expectations (if available), but this data history only began at the turn of the century. When selecting variables, there are several factors that are important within the model framework. Since we are dealing with a sample of over 20 years and measuring monthly, a long historical record is required, as well as more frequent observations. Additionally, the timeliness of the data is critical as well. For this reason, data that are derived from financial markets, such as the slope of the term structure and the corporate bond spread, are desirable, as the lag between observation and application is at a minimum. Both the ISM PMI index and the NFP payroll data are subject to monthly and annual revisions. Following the advice of Koenig (2003), we are sensitive to the “vintage” of the data, and ensure that the data used in this model is derived from the initial release rather than the revised figures. For non-farm payrolls, we utilized a separate dataset from the BLS that includes several vintages of payroll data: first, second and final revisions. Furthermore, year-end census adjustments were not included in these figures. With regards to the PMI, we selected first vintages figure from FRED. Unfortunately, the detail on vintages wasn’t tracked prior to 1997, so there is some degree of “cheating” in the model up to this point. However, we find that the PMI model actually begins to pick up in explanatory power once the first vintage figures are included (Exhibit 8), although this could also coincide with the timing of the signal. The selected variables underwent several transformations in an effort to tease out a better signal. These include the use of moving averages rather than single points, as well as converting the measurement into a T-statistic based upon a particular sample size. This process began as ad-hoc trial-and-error in an effort to achieve a relevant R-square from known leading indicators, but was later formalized into a repeatable process. Exhibits 10-13 contain the results of the transformation process for the variables selected for inclusion. Typically, the highest R-square transformation is selected, although this is not always the case. In the instance of Non-farm payrolls, while the 12 month moving average achieved the highest R-square, the signal produced from this measure overlapped with that of the term structure. Therefore, we attempt to incorporate a diversity of signals by using varying moving averages and sampling periods. Summary of Regression Our forecasted forward month S&P 500 return is based upon 4 variables: A 3 year sample T-statistic measuring the nominal slope of the term structure based upon a 6 month average of the long and short rates (taken to the 3rd power4); A 3 year sample T-statistic measuring the change in the 3 month average5 of the Baa Corporate bond yield relative to the 10-year treasury rate; A 3 year sample T-statistic measuring the change in the ISM 4 The use of the 3rd power in this case is to amplify the small differences in the term structure, as interest rates have been rounded to the nearest basis point. 5 The data point from the last day of monthly trading has been excluded in an effort to enhance the realism of the model. 5 PMI from the previous month; a 5 year sample T-statistic measuring an average of the most recent and prior month finalized change in non-farm payrolls. Data for the first 3 measurements were acquired from the St. Louis Fed FRED database, while the NFP payroll data were collected from the BLS historical dataset. The foundational rates used to calculate to the slope of the term structure, as well as the Baa spreads, were calculated as the average daily observation over the month. The regression formula is summarized as: ∆ ∆ Performance We calculated the performance of a long-only model from January 1994 through December 2013. The model delivered a 7.87% compounded, annualized excess return6, with an annualized standard deviation of 10.37%, yielding a Sharpe ratio of 0.76. This compares to the S&P’s excess return of 4.08% with a risk of 15.2%, for a risk-adjusted return of 0.27. The following chart details the month end balance of $100 invested on January 1, 1994. (Note: In the case of a short signal, the return was assumed to be the 3 month Treasury Yield for the period divided by 12) 6 The 3 month US treasury yield was used as the risk‐free rate; return prior to fees and taxes 6 Additionally, we regressed actual S&P returns on our out-of-sample estimated returns from the regression, which produced the following statistical results: Regression Statistics Multiple R 25.74% R Square 6.63% Adjusted R Square 6.23% Standard Error 4.25% Observations 240 Intercept Forecasted Return Coefficients Standard Error t Stat P‐value 0.004 0.003 1.295 0.197 0.679 0.165 4.109 0.000055 Analysis of Results This model appears to perform its duty as a macro-timing indicator, as the majority of excess returns are generated by largely avoiding the market during the two major downturns over the past 2 decades (Exhibit 1). Between March 2000 and September 2002, the S&P (volatility scaled) shed 33.93% of its value, while the drawdown based on our implementation of the indicator was only 13.30%. However, the model failed to detect the turnaround in the market in 2003. This may have been the result of investors capturing low valuations after overshooting the reversion to the mean post tech bubble. We found the PMI and NFP data still painting the picture of a slowing economy during this period. A valuation indicator may have been helpful in this instance; we attempted to incorporate a valuation signal via the Fed Model and the equity risk premium, but found that it had little explanatory power over the entirety of the sample. Similarly, the market lost 39.23% of its value from October 2007 through February 2009, while the trading strategy managed a loss of only 11.03%, largely by exiting the market in February 2008 and returning in July of 2009. The biggest drawdown from the model is attributed to the Asian Financial Crisis in August 1998. The fact that the model failed to react to this crisis is intuitive, since the primary focus is on US macro indicators, as opposed to measurements of stress in financial markets. Below are additional strategy measurements related to the returns, with the benchmark having been volatility scaled to the model. Model Hit Rate 81.7% Win/Loss 0.58 Max Drawdown ‐15.57% Skew (0.47) Kurtosis 3.59 Benchmark 62.9% 0.87 ‐39.23% (0.70) 1.11 7 There are several interesting factors about the underlying regression model producing the indicator. The following chart details the highest R-square per period (selected from all the slope forecasts) of the fitted line regression. Noticeably, the low explanatory power in 2006 to 2007 produced a rather choppy trading signal, with a correlation between the forecasted return and actual return of only 1.72%. As such, the R-square of the underlying regression should be useful for determining the confidence of the indicator at a particular period in time. The adaptability of the regression is noticeable. In particular, the beta of the credit spread changes sign over time (Exhibit 2), from a significant positive beta in the late 90s to a significant negative beta. This likely highlights the evolving correlations between the equity and corporate bond markets (Exhibit 3). While the positive beta may indicate a degree of asset allocation and rotating from bonds to stocks (an increase in the spread of corporate bonds over treasuries was then followed by an increase in the stock market), the negative beta may be associated with a comingling of risky assets into a singular risk-on / risk-off mindset. From a statistical standpoint, the changing sign of the beta likely reduces confidence in the significance of the variable itself. However, we believe this evolution best captures the dynamic nature of markets, and this natural flexibility is a welcomed feature of the model. Additionally, the beta for the slope of the term structure is negative throughout the entire period. This can be explained by two potential factors. First, the slope of the term structure has been known to precede real growth in GDP by over a year, in which case, the stock market horizon may not be as far out. Thus negative slopes, predicting recession, coincide with the end of the bull period in stocks, while strong positive slopes coincide with equity returns amid the recession. 8 Second, this may provide evidence that the slope of the term structure incorporates the term structure of volatility as well. Short term bursts in volatility, creating a downward sloping volatility term structure, will lead to a positive sloping term structure, as the volatility term reduces real rates. This relationship has evolved overtime (Exhibit 4), potentially as volatility continues to become a more mainstream and studied variable. In addition to looking at the explanatory power of the model as a whole, we can attempt to isolate each variable of the model in an effort to tease out its individual contribution and how the impact changes over time7. The following table lists the R-Square of the forecasted return vs. actual return models for each isolated variable. Slope of Term Structure Baa spread PMI Change NFP change Total Combined Forecast Model R‐square 1.55% 3.00% 0.57% 0.51% 5.63% 6.63% The Baa spread is likely accounting for a large portion of the R-square of the model, as it performed particularly well under stress from the Asian Currency Crisis, the tech bubble, and the financial crisis (Exhibit 7). Some concern may be warranted for the adjustments we’ve made to the slope of the term structure. Given that its value comes largely during the financial crisis (Exhibit 6), we may be over fitting a particular historical incident. However, the axis scale may be to blame in this particular instances, as the R-square of 5% during the tech bubble is a considerable contribution in its own right. Also of note is the strong performance of the finalized NFP payroll figure during the 2000s, but the drop in value-add since the financial crisis (Exhibit 9). This may reflect a shifting perspective in non-farm payroll figures, as first releases are gaining popularity among the press, despite potentially significant revisions in future months. This bears some evidence when we place first release non-farm payrolls into the model (Exhibit 5), as the R-square of this regression has picked up post-crisis. It could also be due to investors placing more importance on Federal Reserve monetary policy in the aftermath of the financial crisis. Through multiple rounds of quantitative easing, there was an increased correlation between equity returns in Treasury markets: a poor NFP reading would lead to a decrease in yields as investors assumed the Fed would continue loose policy. Concurrently, equity markets performed well as investors reached for yield in equity risk. 7 We acknowledge that by isolating the variables, we are ignoring the interaction effects between multiple variables in a regression, and thus the explanatory power of the individual variable may expand beyond its singular relationship with the regressand. 9 III. Exporting the model – International Evidence We next present the results of our methodology on several international indices, incorporating U.S. macro and market variables as well as indicators relevant to the particular market. Canada and the S&P TSX We find that the US model, as-is, performs quite well when simply swapping in Canadian returns. This may not be surprising given the 78.2% correlation between the monthly returns of the two markets between 1994 and 2013. In an effort to add a country-specific variable to the regression, we modify our term structure variable to reflect the difference between the slope of the Canadian term structure and the slope of the U.S. term structure. The Canadian term structure contains information relevant to Canada-specific growth [Harvey (1997)]. Specifically, we take long term and short term Canadian interest rates from the OECD. We take an average of the term structure over 3 months and convert that data point into a tstatistic based upon a 36 month sample. We then cube this t-statistic and subtract the cube of the US t-statistic to arrive at our final regressor. The regression formula is summarized as (change from US Model highlighted in bold): ∆ ∆ The model managed to return an annualized excess return of 8.11%, with annualized risk of 10.06% for a Sharpe Ratio of 0.81 (Exhibit 14,15). This compares favorably to Sharpe ratio of holding the TSX over the period, which was only 0.19 (2.89% return with 15.3% risk). Again, we find the model deriving its performance by avoiding recessions, as the maximum drawdown during the period was 18.76% as opposed to 32.02% drawdown observed in the volatility-adjusted TSX. The R-square of the actual and forecasted returns was equal to 8.15% (Exhibit 16). Interestingly, the beta for the slope of the term structure is consistently positive in this model, rather than consistently negative. One possible explanation for this switch is the fact that we’ve used a relative metric as opposed to an absolute figure. We may have effectively reduced the growth and volatility expected in the US term structure, while retaining the Canadian growth component. As such, another candidate variable could be a normalized orthogonal measure of Canadian term structure on US term structure, in an effort to isolate idiosyncratic Canadian deviations8 8 This measurement is a bit beyond the scope of this paper, as a proper ex‐ante value would require a residual from a rolling regression, rather than a measure from a fit with all the available data. 10 Germany and the DAX Due to a lack of historical data depth and two major changes to Germany over the past 30 years (the fall of the Berlin Wall and the creation of the Euro), we elected to shorten the timeframe on the forecast. We tested the model from January 2004 to December 2013. The as-is model directly exported from the US performed rather poorly, with an R-square of only 1.65%. This is somewhat puzzling given the 84.27% correlation between the S&P and the DAX over the time period. Regardless, we opted to recreate the model from the ground up, identifying German specific macro variables as inputs into the model. As Germany is largely regarded as an economy heavily reliant on manufacturing, we selected the IFO business climate survey as an indicator likely to reflect the manufacturing output and sentiment. We constructed a t-statistic based upon a 10 year sample of the change in the 2 month average of the IFO survey. We also construct a relative term structure variable similar to the Canadian methodology, where the T-stat of the German term structure is reduced by the T-stat of the US term structure. Additionally, we expanded the time frame of the regression window to 10 years (from 7 year in the original model), as we found that a larger window increased the accuracy of the predictions. Again, this may reflect the nature of the German economy, particularly if its heavily unionized labor system was able to reduce the volatility (or dynamic nature) of the economy. The regression formula is summarized as (change from US Model highlighted in bold): ∆ Overall, the trading strategy was able to earn an annualized excess return of 5.72% with risk of 13.85%, for a Sharpe Ratio of 0.41 (Exhibit 17, 18). The underlying index earned an annualized excess return of 3.68% with risk of 18.2%, for a Sharpe Ratio of 0.20. While these results are less impressive than the previous two models, it does represent an improvement over the classical buy-and-hold strategy. It should also be noted that German markets contain fatter tails than the US (although roughly in line with Canada). The model was able to predict the direction of 6 of those 7 tails (which we define as monthly returns +/- 10%), with August 2011 being the lone exception (a loss of 19.19%). Given that this event can be linked to the greater European Sovereign Crisis, a macro variable capturing overall Euro health would likely be a relevant indicator worth exploring. Finally, the R-square of the predicted returns with actual returns was 13.18% (Exhibit 19). Given the smaller sample (120 periods vs 240), this higher explanatory power is required for ample significance. 11 Japan and the NIKKEI 225 Japan presents an interesting test case of pointing out the challenges and difficulties with this model. To begin with we tested if any of the four factors present in the U.S. were predicative of the Nikkei returns that we had tracking data for starting in February 1990. Overall, the model only produced an R of 1.29%. When each factor was regressed individually, they all had negative explanatory power. However, testing just the non-farm payrolls and Consumer Sentiment we found an R of 2.52%. While still well below our threshold of 5% it was interesting to see what variables actually were somewhat correlated. To continue the test we then looked at what variables might be predictive for Japan. Our initial reaction was to test the term structure and calculate the spread between long and short-term interest rates. When doing this we had data available from April 2002 and tested the model from 2004 forward in order to give us 10 years of data. On its own, the Rsquared was negligible. However, when regressed with the U.S. VIX, Non-farm payrolls, and Consumer Sentiment data, we achieved a model of R-squared of 4.12% (Exhibit 22), still short, but significantly closer to hitting our benchmark. What we found to be very counterintuitive was that having a faster signal for Japan actually produced a much better result than a slower signal did in the U.S., Canada, and Germany. Noticeably, the Betas for the variables swung from positive to negative more than once over the sample, further compounding the issue with the model. The T-Statistic for the actual vs. forecasted return regression did reach 2.25, but again, the threshold should likely far exceed 2.0 given the number of times we ran the model to achieve this. As such, we would not implement this model in its current state, despite the profitability of the strategy. The return produced was only about 3.28% with a volatility of 11.19% (Exhibit 20,21), compared to the underlying index, which was 1.33% with a volatility of nearly 20% Japan aptly points out the difficulty in discerning what variables can and will be meaningful especially given the need to couple with other variables. Additionally, based on the availability of data running the model for such a short time span is also problematic and further adds to our premise that we might be not completely objectively finding the correct data points. IV. Improving the Process Reducing model bias A consistent concern for any regression approach is that of model (or modeler) bias. In this context, the question lingers whether we have chosen the correct independent variables, such that future forecasts will remain as predictive as those in the past. One area that we believe this paper can be expanded upon is the combination of multiple macro-timing models in an effort to reduce model bias. We gather this insight from the work done in the field of Machine Learning. For example, the team that won the million dollar prize for improving upon their recommendation 12 algorithm blended over 800 models to arrive at their solution [Feuerverger (2012)]. We believe that there is opportunity to apply the same principals to forecasting returns. As a simple example, we can create a faster model (with a shorter regression sample period of 24 months), and combine those results with our existing model in an effort to tease out even more signal without compromising the fit of the original regression. We constructed a faster model based upon the VIX, the first NFP for a given month, and the University of Michigan consumer sentiment survey. While these variables were insignificant fits within our previous framework, by applying a shorter sampling period, we were able to tease out a new signal, with an R-square of 4.94% between the forecasted return and actual return from 1994 through 2013. We then created a weighted average of the two model forecasts based upon the R-square of the underlying regression at observation divided by the standard deviation of the sample error for each underlying model through time (observed up to the evaluation date). The effect of this weighting is to penalize the faster model to a certain degree, given that it’s smaller sample period created large fluctuations in the fit throughout time. As such, we trust its forecast less than the strength of the underlying R-squared. When combining the predictions from both models, we found that the new R-square of the forecast was 7.45%, compared to 6.63% for the original model (with a similar Sharpe ratio). Additionally, we believe that the combined model offers additional protection against model bias by incorporating a more diverse signal set. Future Opportunities Despite our attempts to reduce model bias, there is still a degree of variable selection that is performed external to the model and after the fact. We have attempted to minimize this ex-post selection bias when implementing the term structure of interest rates, and the same framework of model inclusion / exclusion can be incorporated to other macro variables as well, similar to the methodology that Bai (2010) proposes. In an ideal system, a basket of macro indicators are selected at each period. Univariate regressions are run with next month stock returns on each indicator, with selected data transformations also incorporated. Variables that exceed a predetermined threshold (either based upon an absolute rule or relative to the period) are selected for inclusion in a final multivariate regression, with adjustments for co-linearity if two variables overlap in signal. This adaptive approach to modeling likely increases the flexibility of the process, while turning variable selection into an endogenous decision and further reducing modeling bias. Additionally, models can be built for various developed and emerging markets. While there may be issues with data availability and reliability in EMs, we are approaching a long enough timeframe to begin cracking the shell open with regression modeling. With a collection of market forecasts as output from the models, one could construct a monthly tactical portfolio using Black-Litterman to optimize holdings. 13 Conclusion Regression modeling, while presenting a potential black box solution to the issue with market timing, has proven to be both flexible and robust relative to a static trading rule. In this paper, we have shown the ability to produce a methodology that provides a fairly realistic back test, the flexibility to deal with evolving market dynamics, and a proxy for the confidence of the indicator. The downsides of modeling bias and data mining can (hopefully) be minimized with the incorporation of multiple models as well as a sensible narrative around the input decisions. There is also great opportunity with each passing reporting period, as the historical dataset, with regards to vintages, encompasses greater periods of analysis. Today, there is a deluge of data released every month, and we expect the volume to continue to grow. Our goal should be to harvest the entire picture rather than selectively focus on a few variables, and regression modeling is one potential tool when placed in the proper hands. 14 Appendix A – Modeling Inputs Despite the extensive use of out-of-sample forecasting, there are still several inputs to the model that are determined through modeler discretion (and are therefore subject to potential data mining). In this section, we highlight each variable and our rationale for selecting it. 1) Variable transformation: Variables were transformed, typically by taking moving averages and then converting these figures into t-statistics. We tested 1,2,3,4,6 and 12 month averages, as these represent typical partitions within a year. We chose 36 months of sampling for calculating the Baa spread, the slope of the term structure and the change in the PMI, as this sampling period captures changes over a short term business cycle, while also highlighting deviations within a smaller time frame. With regards to non-farm payrolls, we found that a 5 year sample provided the strongest signal when placed amongst the other variables. We also tested sample sizes of 2 years, but found the 3 year to typically be superior. 2) Regression window: We created our rolling regression line using the latest 84 months (7 years) of data, as this coincided with both a longer business cycle as well as the findings related to Shiller’s CAPE ratios. We also tested sampling periods from 3,5 and 10 years. 3) Model selection: When multiple models were created based upon possible combinations of the slope of the US term structure, certain models were penalized for producing particularly poor forecasts in a prior period. A model was excluded from the sample if it produced an underlying R-Square that was half the average of the group. The goal of this exclusion rule was to merely restrict outlier models, as the average number of models included per period was 12.91 out of a possible 13. 4) Short rate slope inclusion: Our model was originally designed to create forecasts for all possible slope combinations, from [6 month] – [3 month] to [10 year] – [3month] to [10 year] – [7 year]. We opted to reduce the model to simply slopes with short rates of 3 and 6 months, as further portions of the yield curve produced significantly poor forecasts, and provided little in the way of valuable signal to the model. Appendix B – Other variables tests We tested several available measurements that have been utilized in other leading indicators or are relevant monthly data releases. These include: S&P 500 University of Michigan Consumer Sentiment Survey AAA yield spread to 10 year treasury Chicago Fed National Financial Conditions Leverage Subindex OECD Business Tendency and Confidence surveys New Orders of Durable Goods Housing Starts Equity Risk Premium 15 TSX Brent & WTI crude prices and monthly volatility of daily returns Year-over-Year changes in CPI Month-over-Month changes in retail sales Month-over-Month changes in building permits Changes in unemployment (note: This variable proved significant, but due to the long possibility of revisions, it was removed for fairness reasons) DAX ZEW Current Survey ZEW Expectations Survey Changes in unemployment Changes in Industrial Production Month-over-Month Changes in Factory Orders Month-over-Month IFO Expectations Survey Exhibit 1: Comparison of Drawdowns between the trading strategy and the S&P 16 Exhibit 2: Change in Baa Spread Beta: [3 Year] – [3 month] model Exhibit 3: Correlation between S&P returns and the change in the Baa spread 17 Exhibit 4: Correlation between VIX and [5 Year] – [3 Month] term structure Exhibit 5: The increasing importance of first-release NFP figures Model based upon a 2 year sample T-stat using first release NFP figures 18 Exhibit 6: Maximum R-square of Isolated Slope of the Term Structure regression Exhibit 7: R-square of Isolated Baa Spread regression 19 Exhibit 8: R-square of Isolated PMI change regression Exhibit 9: R-square of Isolated NFP change regression 20 Exhibit 10: [3 Yr] – [3Mo] transformation Correlations Avg‐‐Tsample Average Average T‐Stat Change in Avg Change T‐Stat 1 ‐5.01% ‐5.86% 2 ‐4.03% ‐9.60% 3 ‐3.70% ‐10.18% 4 ‐4.42% ‐3.94% 6 ‐7.55% ‐0.27% 12 ‐8.95% 6.86% 12‐‐24 17.17% ‐7.01% 12‐‐36 1.10% ‐2.18% 1‐‐24 ‐1.97% ‐4.00% 1‐‐36 4.50% ‐3.14% 2‐‐24 3.92% ‐4.71% 2‐‐36 9.21% ‐6.35% 3‐‐24 8.07% ‐10.78% 3‐‐36 10.85% ‐8.26% 4‐‐24 8.59% ‐4.49% 4‐‐36 13.56% ‐5.86% 6‐‐24 12.88% 3.65% 6‐‐36 16.09% 4.45% Exhibit 11: Baa Spread transformation Correlations Avg‐‐Tsample Average Average T‐Stat Change in Avg Change T‐Stat 1 3.22% 2.01% 2 3.95% 6.99% 3 4.19% 15.02% 4 4.54% 14.95% 6 5.55% 7.66% 12 8.09% 8.47% 12‐‐24 4.73% 6.87% 12‐‐36 5.02% 8.57% 1‐‐24 2.82% 6.68% 1‐‐36 7.20% 5.93% 2‐‐24 4.44% 9.50% 2‐‐36 7.98% 10.65% 3‐‐24 5.38% 15.31% 3‐‐36 7.82% 17.32% 4‐‐24 4.85% 15.39% 4‐‐36 6.95% 17.18% 6‐‐24 4.55% 10.80% 6‐‐36 5.83% 11.72% 21 Exhibit 12: PMI transformation Correlations Avg‐‐Tsample Average Average T‐Stat Change in Avg Change T‐Stat 1 ‐3.56% 6.91% 2 ‐5.17% 2.43% 3 ‐5.27% 3.97% 4 ‐4.99% ‐1.99% 6 ‐4.72% ‐7.89% 12 ‐3.91% ‐14.87% 12‐‐24 ‐0.33% ‐13.42% 12‐‐36 0.86% ‐13.27% 1‐‐24 ‐2.98% 6.64% 1‐‐36 1.43% 7.54% 2‐‐24 ‐4.81% 3.32% 2‐‐36 0.35% 4.21% 3‐‐24 ‐4.91% 5.07% 3‐‐36 ‐0.29% 4.51% 4‐‐24 ‐5.45% 0.31% 4‐‐36 ‐1.20% 0.09% 6‐‐24 ‐6.56% ‐4.57% 6‐‐36 ‐2.53% ‐5.58% Exhibit 13: Final Release NFP transformation Correlations Avg‐‐Tsample Average Average T‐Stat Change in Avg Change T‐Stat 1 ‐2.77% ‐6.48% 2 1.27% ‐3.77% 3 ‐0.23% ‐9.98% 4 ‐1.10% ‐11.90% 6 0.83% ‐10.38% 12 5.55% ‐0.36% 12‐‐24 12.61% ‐3.21% 12‐‐36 12.74% ‐2.14% 1‐‐24 1.42% ‐6.93% 1‐‐36 3.36% ‐5.66% 2‐‐24 4.94% ‐0.75% 2‐‐36 7.16% ‐1.88% 3‐‐24 1.90% ‐10.48% 3‐‐36 4.95% ‐10.92% 4‐‐24 3.41% ‐14.58% 4‐‐36 5.36% ‐14.17% 6‐‐24 7.81% ‐10.35% 6‐‐36 8.79% ‐10.15% 12‐‐60 8.08% 1.12% 1‐‐60 3.36% ‐5.78% 2‐‐60 7.88% ‐2.13% 3‐‐60 3.37% ‐10.33% 4‐‐60 1.21% ‐14.05% 6‐‐60 2.47% ‐9.64% 22 Exhibit 14: Canadian Portfolio Return Exhibit 15: Canadian strategy portfolio statistics Model Hit Rate 79.6% Win/Loss 0.69 Max Drawdown ‐18.76% Skew 0.10 Kurtosis 2.04 Benchmark 60.8% 0.91 ‐32.02% (0.97) 2.85 23 Exhibit 16: Canadian Model Statistical detail Regression Statistics Multiple R 28.56% R Square 8.15% Adjusted R Square 7.77% Standard Error 4.24% Observations 240 Intercept Forecasted Return Coefficients Standard Error t Stat P‐value 0.001 0.003 0.531 0.596 0.006 0.001 4.597 0.000007 Exhibit 17: German Model Portfolio Return Exhibit 18: German Model strategy portfolio statistics Model Hit Rate 75.0% Win/Loss 0.80 Max Drawdown ‐29.07% Skew (0.51) Kurtosis 6.52 Benchmark 62.5% 0.93 ‐42.58% (0.81) 2.45 24 Exhibit 19: German Model statistical detail Regression Statistics Multiple R 36.31% R Square 13.18% Adjusted R Square 12.45% Standard Error 4.90% Observations 120 Intercept Forecasted Return Coefficients Standard Error t Stat P‐value 0.006 0.004 1.447 0.150 0.805 0.190 4.233 0.000046 Exhibit 20: Japanese Model Portfolio Return 25 Exhibit 21: Japanese Model strategy portfolio statistics Model Hit Rate 79.2% Win/Loss 0.61 Max Drawdown ‐15.40% Skew 0.95 Kurtosis 2.78 Benchmark 55.8% 1.01 ‐37.40% (0.69) 1.92 Exhibit 22: Japanese Model statistical detail Regression Statistics Multiple R 20.30% R Square 4.12% Adjusted R Square 3.31% Standard Error 5.69% Observations 120 Intercept Forecasted Return Coefficients Standard Error t Stat P‐value 0.006 0.005 1.058 0.292 0.301 0.134 2.252 0.026 26 References Bai, Jennie. Equity premium predictions with adaptive macro indexes. No. 475. Staff Report, Federal Reserve Bank of New York, 2010. Beber, Alessandro, Michael W. Brandt, and Maurizio Luisi. Distilling the macroeconomic news flow. No. w19650. National Bureau of Economic Research, 2013. Fama, Eugene F., and Kenneth R. French. "Common risk factors in the returns on stocks and bonds." Journal of financial economics 33.1 (1993): 3-56. Feuerverger, Andrey, Yu He, and Shashi Khatri. "Statistical significance of the Netflix challenge." Statistical Science 27.2 (2012): 202-231. Ghysels, Eric, Casidhe Horan, and Emanuel Moench. Forecasting through the rear-view mirror: data revisions and bond return predictability. No. 581. Federal Reserve Bank of New York, 2012. Harvey, Campbell R., and Yan Liu. "Backtesting." Available at SSRN (2013). Harvey, Campbell R. "The relation between the term structure of interest rates and Canadian economic growth." Canadian Journal of Economics 30.1 (1997): 169-93. Harvey, Campbell R. "The term structure and world economic growth." The Journal of Fixed Income 1.1 (1991): 7-19. Koenig, Evan F., Sheila Dolmas, and Jeremy Piger. "The use and abuse of real-time data in economic forecasting." Review of Economics and Statistics 85.3 (2003): 618-628. Varian, Hal R. "Big Data: New Tricks for Econometrics." (2013). 27