BAGUETTE Utilizing predictive regression modeling to forecast

Transcription

BAGUETTE Utilizing predictive regression modeling to forecast
 BAGUETTE
Utilizing predictive regression modeling to forecast equity
returns
FINANCE 663: International Finance
Cam Harvey
27 February 2014
Bryce Caswell
Matthew Heitz
Peter Maher
1 Introduction
Predicting the stock market is one of academia’s age old ventures. The literature on the
subject is both broad and deep, as countless methods and variables have been deployed in
an effort to prove or disprove the notion that equity markets can be forecasted accurately.
The traditional research framework is backward looking, as key variables are tested over
the lifespan of the entire market. This paper departs from pre-existing academic research
by approaching the problem from the viewpoint of a trader or asset manager, evaluating a
strategy over a shorter time horizon, but also focusing on profitability as well as statistical
accuracy. We seek to use statistical techniques to develop a trading signal that can be easily
implemented in an attempt to increase a trader’s ability to time the market. We also
attempt to combine market prices with macro indicators related to the economy in an effort
to distil the signal into an actionable investment strategy.
Section I details the General Methodology of the paper; Section II describes the rationale
and results for the model applied to the US and the S&P 500; Section III details the
application of the model to international markets; Section IV closes the paper with details
on potential improvements to the model.
I.
General Methodology
Our trading signal is produced from a 1 period out-of-sample forecast that is calculated
from a rolling regression of the forward month’s log stock index excess return (above the 3month Treasury) on various potential leading indicators. If the forecasted excess return of
the index is positive, we go long and realize the return achieved in the given month. If the
forecasted excess return is negative, we can either short the index or opt to earn an excess
return of 0 by investing in short term treasuries.1
In determining the efficiency and accuracy of the forecast, we have simulated the
performance of the trading signal over a 20-year period when the data is available. Each
period’s forecast of index returns is calculated as if the trader was applying the model from
the beginning of the month to create a trading decision for that particular month. We
believe that this use of out-of-sample estimation creates a more realistic and robust
statistical result and avoids a few of the pitfalls of ex-post estimation bias as well as issues
with over-fitting a historical regression line.
As an illustrative example, our forecast of December 2013 S&P 500 returns is determined
by applying selected variables available on December 1st to a regression line fit to data from
November 2013 through the regression window (in this case, the previous 7 years of
monthly data). One month later, after close on December 31, 2013 we observe the December
returns and recalibrate our regression to this latest historical prediction and observation
(while dropping the oldest observation from the previous regression) in an effort to predict
January 2014 returns. We recognize that there are issues with minor slippage, as the entry
point in the model is considered to be at the month-end close price. However, the model
1
We found that the short signal rarely provided value added information. As such, the paper and models assume a risk‐free investment in the case of a negative signal. 2 would still require some time to process, and the actual entry price may be slightly different
from the month end close. This impact should be immaterial if the difference is random.
We take particular care to ensure that the variables utilized in the forecast are publicly
available at the time when the forecast is made, as recent research, such as Ghysels (2012),
has chastised models that ignore the impact of revisions and delays in release. For example,
the ISM PMI is typically produced as one of the first indicators of the previous month’s
economic activity. However, we cannot practically use this measurement to predict equity
market returns in the month that it was released. In our case, it would be impossible to use
the January 2014 data point, released on February 3, 2014 to predict the full February
2014 returns, although using this data point to predict March 2014 returns would be
feasible if the data is not revised further.
In the presence of a term structure of interest rates with a breadth of maturities (such as
the US), we created separate forecasts using several slopes of the term structure. The final
forecast was determined based upon a weighted average of the individual forecasts, with
weightings determined by the R-square of the underlying regression. The goal of this step
was to create an estimate that was potentially more responsive to changing interest rate
regimes while also incorporating a more diverse signal set without burdening the
regression with copious correlated variables.
In addition to the slope of the term structure, we tested several other leading indicators
that are relevant to economic performance. In determining our ideal model, we attempt to
maximize the R-Square of our out-of-sample forecast for each period with the market’s
actual return. We also consider the risk-adjusted return achieved over the sampling period
relative to the market index as another valuable metric. However, taken in isolation, high
risk-adjusted returns can be produced from particularly weak models. Additionally, we
found that forecasts with a strong R-square were also more likely to produce stronger risk
adjusted returns, so our testing efforts established an initial benchmark of achieving an Rsquared of at least 5%.
The reason for creating such a benchmark is to ensure that the T-statistic is significant
enough in the presence of data mining. We tested a total of 11 unique variables (Appendix
B) with approximately 8 data transformations on each variable. If we conservatively ignore
the correlation between the variables (and the higher correlations among the
transformations) and assume 100 total tests, a more robust T-statistic would be
approximately 3.33. Given our sample size, this is roughly a 5% R-square.
A glaring deficiency of this approach is the fact that variables have been selected ex-post to
create the best forecasts for the sample period. While the t-statistic of our actual vs.
forecasted returns exceeds a “trial-and-error adjusted” threshold, we still recognize the
point in time that this selection takes place. We address this issue further in the last
section of the paper. There are several variables and inputs that are determined via
modeler discretion and we detail the rationale for these selections in Appendix A.
In terms of actual implementation, we view index futures as the most likely vehicle to
pursue this strategy. We estimate that the transaction cost related to the strategy, in a
liquid market like the ES-Mini, should amount to a few basis basis points, depending on
3 whether the cost of rolling over quarter-end months is worth the added liquidity in the front
quarter contract2. Additionally, given the frequent turnover in the strategy, futures offer
more favorable tax treatment than ETFs by blending both the long-term and short-term
capital gains taxes for returns. Differing tax treatments and transaction fees between buyand-hold and our strategy have not been incorporated into this analysis but warrant
further attention, as these effects can become significant via compounding.
II.
The United States and the S&P 500
Summary of Variables
Our regression is composed of 4 independent variables: the slope of the nominal term
structure, the change in the Baa corporate bond spread over 10-year treasury, the change
in the ISM PMI reading, and the month-over-month change in non-farm payrolls.
We include the term structure of interest rates primarily as a recession indicator. As many
papers note, the term structure is an excellent method for capturing investor preferences as
viewed through an inter-temporal lens. Investor demand for consumption smoothing will
lead to the purchase of insurance if the future is expected to be worse than the present
(Harvey 1991). This preference and the underlying transaction are captured in the term
structure.
The spread of Baa corporate bond yields was selected as a medium term measure of
corporate financial health, as well as a proxy for the desire for riskier assets. The change in
the PMI and non-farm payrolls were both selected as a short term indicator for US business
growth. Overall, we attempt to capture changes along various segments of the business
cycle in an effort to approximate the market’s sentiment for general economic growth.
Both the Default Spread and the Term Spread are popular regressors in previous research
on forecasting stock market returns, such as Fama (1989). Additionally, the PMI and the
slope of the term structure have been identified as leading indicators for the purposes of
estimating GDP, as the OECD monthly economic indicator uses these two terms (in
addition to others) in its calculations3. While formal research on the predictability of nonfarm payrolls is unavailable, its importance to the current market climate warrants
consideration in the model.
We make extensive use of normalized variables in place of the actual data points, as we find
that measures relative to a historical sample are more predictive than the original statistic
itself. The goal of this methodology is to best capture the data as relative to market
expectations, given that markets will typically move based upon changes in forward
assumptions rather than absolute levels of growth, as the absolute level is already
embedded to a large degree in existing prices. While this approach is simplistic, such as
assuming that investors view the world through a “mean model”, we find that T-statistics
can create a sense of context and outperform the underlying indicator as-is. Ideally, we
2
3
http://www.cmegroup.com/trading/equity‐index/us‐index/e‐mini‐sandp500_contract_specifications.html http://stats.oecd.org/mei/default.asp?lang=e&subject=5&country=USA 4 would compare the data point relative to the survey of economist expectations (if available),
but this data history only began at the turn of the century.
When selecting variables, there are several factors that are important within the model
framework. Since we are dealing with a sample of over 20 years and measuring monthly, a
long historical record is required, as well as more frequent observations. Additionally, the
timeliness of the data is critical as well. For this reason, data that are derived from
financial markets, such as the slope of the term structure and the corporate bond spread,
are desirable, as the lag between observation and application is at a minimum.
Both the ISM PMI index and the NFP payroll data are subject to monthly and annual
revisions. Following the advice of Koenig (2003), we are sensitive to the “vintage” of the
data, and ensure that the data used in this model is derived from the initial release rather
than the revised figures. For non-farm payrolls, we utilized a separate dataset from the
BLS that includes several vintages of payroll data: first, second and final revisions.
Furthermore, year-end census adjustments were not included in these figures. With
regards to the PMI, we selected first vintages figure from FRED. Unfortunately, the detail
on vintages wasn’t tracked prior to 1997, so there is some degree of “cheating” in the model
up to this point. However, we find that the PMI model actually begins to pick up in
explanatory power once the first vintage figures are included (Exhibit 8), although this
could also coincide with the timing of the signal.
The selected variables underwent several transformations in an effort to tease out a better
signal. These include the use of moving averages rather than single points, as well as
converting the measurement into a T-statistic based upon a particular sample size.
This process began as ad-hoc trial-and-error in an effort to achieve a relevant R-square
from known leading indicators, but was later formalized into a repeatable process. Exhibits
10-13 contain the results of the transformation process for the variables selected for
inclusion.
Typically, the highest R-square transformation is selected, although this is not always the
case. In the instance of Non-farm payrolls, while the 12 month moving average achieved the
highest R-square, the signal produced from this measure overlapped with that of the term
structure. Therefore, we attempt to incorporate a diversity of signals by using varying
moving averages and sampling periods.
Summary of Regression
Our forecasted forward month S&P 500 return is based upon 4 variables: A 3 year sample
T-statistic measuring the nominal slope of the term structure based upon a 6 month
average of the long and short rates (taken to the 3rd power4); A 3 year sample T-statistic
measuring the change in the 3 month average5 of the Baa Corporate bond yield relative to
the 10-year treasury rate; A 3 year sample T-statistic measuring the change in the ISM
4
The use of the 3rd power in this case is to amplify the small differences in the term structure, as interest rates have been rounded to the nearest basis point. 5
The data point from the last day of monthly trading has been excluded in an effort to enhance the realism of the model. 5 PMI from the previous month; a 5 year sample T-statistic measuring an average of the most
recent and prior month finalized change in non-farm payrolls.
Data for the first 3 measurements were acquired from the St. Louis Fed FRED database,
while the NFP payroll data were collected from the BLS historical dataset. The
foundational rates used to calculate to the slope of the term structure, as well as the Baa
spreads, were calculated as the average daily observation over the month.
The regression formula is summarized as:
∆
∆
Performance
We calculated the performance of a long-only model from January 1994 through December
2013. The model delivered a 7.87% compounded, annualized excess return6, with an
annualized standard deviation of 10.37%, yielding a Sharpe ratio of 0.76. This compares to
the S&P’s excess return of 4.08% with a risk of 15.2%, for a risk-adjusted return of 0.27.
The following chart details the month end balance of $100 invested on January 1, 1994.
(Note: In the case of a short signal, the return was assumed to be the 3 month Treasury Yield for the period
divided by 12)
6
The 3 month US treasury yield was used as the risk‐free rate; return prior to fees and taxes 6 Additionally, we regressed actual S&P returns on our out-of-sample estimated returns from
the regression, which produced the following statistical results:
Regression Statistics
Multiple R
25.74%
R Square
6.63%
Adjusted R Square
6.23%
Standard Error
4.25%
Observations
240
Intercept
Forecasted Return
Coefficients Standard Error
t Stat
P‐value
0.004 0.003 1.295 0.197
0.679 0.165 4.109 0.000055
Analysis of Results
This model appears to perform its duty as a macro-timing indicator, as the majority of
excess returns are generated by largely avoiding the market during the two major
downturns over the past 2 decades (Exhibit 1). Between March 2000 and September 2002,
the S&P (volatility scaled) shed 33.93% of its value, while the drawdown based on our
implementation of the indicator was only 13.30%. However, the model failed to detect the
turnaround in the market in 2003. This may have been the result of investors capturing low
valuations after overshooting the reversion to the mean post tech bubble. We found the PMI
and NFP data still painting the picture of a slowing economy during this period. A
valuation indicator may have been helpful in this instance; we attempted to incorporate a
valuation signal via the Fed Model and the equity risk premium, but found that it had little
explanatory power over the entirety of the sample.
Similarly, the market lost 39.23% of its value from October 2007 through February 2009,
while the trading strategy managed a loss of only 11.03%, largely by exiting the market in
February 2008 and returning in July of 2009.
The biggest drawdown from the model is attributed to the Asian Financial Crisis in August
1998. The fact that the model failed to react to this crisis is intuitive, since the primary
focus is on US macro indicators, as opposed to measurements of stress in financial markets.
Below are additional strategy measurements related to the returns, with the benchmark
having been volatility scaled to the model.
Model
Hit Rate
81.7%
Win/Loss
0.58
Max Drawdown ‐15.57%
Skew
(0.47)
Kurtosis
3.59
Benchmark
62.9%
0.87
‐39.23%
(0.70)
1.11
7 There are several interesting factors about the underlying regression model producing the
indicator. The following chart details the highest R-square per period (selected from all the
slope forecasts) of the fitted line regression.
Noticeably, the low explanatory power in 2006 to 2007 produced a rather choppy trading
signal, with a correlation between the forecasted return and actual return of only 1.72%. As
such, the R-square of the underlying regression should be useful for determining the
confidence of the indicator at a particular period in time.
The adaptability of the regression is noticeable. In particular, the beta of the credit spread
changes sign over time (Exhibit 2), from a significant positive beta in the late 90s to a
significant negative beta. This likely highlights the evolving correlations between the
equity and corporate bond markets (Exhibit 3). While the positive beta may indicate a
degree of asset allocation and rotating from bonds to stocks (an increase in the spread of
corporate bonds over treasuries was then followed by an increase in the stock market), the
negative beta may be associated with a comingling of risky assets into a singular risk-on /
risk-off mindset. From a statistical standpoint, the changing sign of the beta likely reduces
confidence in the significance of the variable itself. However, we believe this evolution best
captures the dynamic nature of markets, and this natural flexibility is a welcomed feature
of the model.
Additionally, the beta for the slope of the term structure is negative throughout the entire
period. This can be explained by two potential factors. First, the slope of the term structure
has been known to precede real growth in GDP by over a year, in which case, the stock
market horizon may not be as far out. Thus negative slopes, predicting recession, coincide
with the end of the bull period in stocks, while strong positive slopes coincide with equity
returns amid the recession.
8 Second, this may provide evidence that the slope of the term structure incorporates the
term structure of volatility as well. Short term bursts in volatility, creating a downward
sloping volatility term structure, will lead to a positive sloping term structure, as the
volatility term reduces real rates. This relationship has evolved overtime (Exhibit 4),
potentially as volatility continues to become a more mainstream and studied variable.
In addition to looking at the explanatory power of the model as a whole, we can attempt to
isolate each variable of the model in an effort to tease out its individual contribution and
how the impact changes over time7.
The following table lists the R-Square of the forecasted return vs. actual return models for
each isolated variable.
Slope of Term Structure
Baa spread
PMI Change
NFP change
Total
Combined Forecast Model
R‐square
1.55%
3.00%
0.57%
0.51%
5.63%
6.63%
The Baa spread is likely accounting for a large portion of the R-square of the model, as it
performed particularly well under stress from the Asian Currency Crisis, the tech bubble,
and the financial crisis (Exhibit 7). Some concern may be warranted for the adjustments
we’ve made to the slope of the term structure. Given that its value comes largely during the
financial crisis (Exhibit 6), we may be over fitting a particular historical incident. However,
the axis scale may be to blame in this particular instances, as the R-square of 5% during
the tech bubble is a considerable contribution in its own right.
Also of note is the strong performance of the finalized NFP payroll figure during the 2000s,
but the drop in value-add since the financial crisis (Exhibit 9). This may reflect a shifting
perspective in non-farm payroll figures, as first releases are gaining popularity among the
press, despite potentially significant revisions in future months. This bears some evidence
when we place first release non-farm payrolls into the model (Exhibit 5), as the R-square of
this regression has picked up post-crisis. It could also be due to investors placing more
importance on Federal Reserve monetary policy in the aftermath of the financial crisis.
Through multiple rounds of quantitative easing, there was an increased correlation
between equity returns in Treasury markets: a poor NFP reading would lead to a decrease
in yields as investors assumed the Fed would continue loose policy. Concurrently, equity
markets performed well as investors reached for yield in equity risk.
7
We acknowledge that by isolating the variables, we are ignoring the interaction effects between multiple variables in a regression, and thus the explanatory power of the individual variable may expand beyond its singular relationship with the regressand. 9 III.
Exporting the model – International Evidence
We next present the results of our methodology on several international indices,
incorporating U.S. macro and market variables as well as indicators relevant to the
particular market.
Canada and the S&P TSX
We find that the US model, as-is, performs quite well when simply swapping in Canadian
returns. This may not be surprising given the 78.2% correlation between the monthly
returns of the two markets between 1994 and 2013. In an effort to add a country-specific
variable to the regression, we modify our term structure variable to reflect the difference
between the slope of the Canadian term structure and the slope of the U.S. term structure.
The Canadian term structure contains information relevant to Canada-specific growth
[Harvey (1997)].
Specifically, we take long term and short term Canadian interest rates from the OECD. We
take an average of the term structure over 3 months and convert that data point into a tstatistic based upon a 36 month sample. We then cube this t-statistic and subtract the cube
of the US t-statistic to arrive at our final regressor.
The regression formula is summarized as (change from US Model highlighted in bold):
∆
∆
The model managed to return an annualized excess return of 8.11%, with annualized risk of
10.06% for a Sharpe Ratio of 0.81 (Exhibit 14,15). This compares favorably to Sharpe ratio
of holding the TSX over the period, which was only 0.19 (2.89% return with 15.3% risk).
Again, we find the model deriving its performance by avoiding recessions, as the maximum
drawdown during the period was 18.76% as opposed to 32.02% drawdown observed in the
volatility-adjusted TSX. The R-square of the actual and forecasted returns was equal to
8.15% (Exhibit 16).
Interestingly, the beta for the slope of the term structure is consistently positive in this
model, rather than consistently negative. One possible explanation for this switch is the
fact that we’ve used a relative metric as opposed to an absolute figure. We may have
effectively reduced the growth and volatility expected in the US term structure, while
retaining the Canadian growth component. As such, another candidate variable could be a
normalized orthogonal measure of Canadian term structure on US term structure, in an
effort to isolate idiosyncratic Canadian deviations8
8
This measurement is a bit beyond the scope of this paper, as a proper ex‐ante value would require a residual from a rolling regression, rather than a measure from a fit with all the available data. 10 Germany and the DAX
Due to a lack of historical data depth and two major changes to Germany over the past 30
years (the fall of the Berlin Wall and the creation of the Euro), we elected to shorten the
timeframe on the forecast. We tested the model from January 2004 to December 2013.
The as-is model directly exported from the US performed rather poorly, with an R-square of
only 1.65%. This is somewhat puzzling given the 84.27% correlation between the S&P and
the DAX over the time period. Regardless, we opted to recreate the model from the ground
up, identifying German specific macro variables as inputs into the model.
As Germany is largely regarded as an economy heavily reliant on manufacturing, we
selected the IFO business climate survey as an indicator likely to reflect the manufacturing
output and sentiment. We constructed a t-statistic based upon a 10 year sample of the
change in the 2 month average of the IFO survey.
We also construct a relative term structure variable similar to the Canadian methodology,
where the T-stat of the German term structure is reduced by the T-stat of the US term
structure.
Additionally, we expanded the time frame of the regression window to 10 years (from 7 year
in the original model), as we found that a larger window increased the accuracy of the
predictions. Again, this may reflect the nature of the German economy, particularly if its
heavily unionized labor system was able to reduce the volatility (or dynamic nature) of the
economy.
The regression formula is summarized as (change from US Model highlighted in bold):
∆
Overall, the trading strategy was able to earn an annualized excess return of 5.72% with
risk of 13.85%, for a Sharpe Ratio of 0.41 (Exhibit 17, 18). The underlying index earned an
annualized excess return of 3.68% with risk of 18.2%, for a Sharpe Ratio of 0.20. While
these results are less impressive than the previous two models, it does represent an
improvement over the classical buy-and-hold strategy.
It should also be noted that German markets contain fatter tails than the US (although
roughly in line with Canada). The model was able to predict the direction of 6 of those 7
tails (which we define as monthly returns +/- 10%), with August 2011 being the lone
exception (a loss of 19.19%). Given that this event can be linked to the greater European
Sovereign Crisis, a macro variable capturing overall Euro health would likely be a relevant
indicator worth exploring.
Finally, the R-square of the predicted returns with actual returns was 13.18% (Exhibit 19).
Given the smaller sample (120 periods vs 240), this higher explanatory power is required
for ample significance.
11 Japan and the NIKKEI 225
Japan presents an interesting test case of pointing out the challenges and difficulties with
this model. To begin with we tested if any of the four factors present in the U.S. were
predicative of the Nikkei returns that we had tracking data for starting in February 1990.
Overall, the model only produced an R of 1.29%. When each factor was regressed
individually, they all had negative explanatory power. However, testing just the non-farm
payrolls and Consumer Sentiment we found an R of 2.52%. While still well below our
threshold of 5% it was interesting to see what variables actually were somewhat correlated.
To continue the test we then looked at what variables might be predictive for Japan. Our
initial reaction was to test the term structure and calculate the spread between long and
short-term interest rates. When doing this we had data available from April 2002 and
tested the model from 2004 forward in order to give us 10 years of data. On its own, the Rsquared was negligible. However, when regressed with the U.S. VIX, Non-farm payrolls,
and Consumer Sentiment data, we achieved a model of R-squared of 4.12% (Exhibit 22),
still short, but significantly closer to hitting our benchmark. What we found to be very
counterintuitive was that having a faster signal for Japan actually produced a much better
result than a slower signal did in the U.S., Canada, and Germany. Noticeably, the Betas
for the variables swung from positive to negative more than once over the sample, further
compounding the issue with the model.
The T-Statistic for the actual vs. forecasted return regression did reach 2.25, but again, the
threshold should likely far exceed 2.0 given the number of times we ran the model to
achieve this. As such, we would not implement this model in its current state, despite the
profitability of the strategy. The return produced was only about 3.28% with a volatility of
11.19% (Exhibit 20,21), compared to the underlying index, which was 1.33% with a
volatility of nearly 20%
Japan aptly points out the difficulty in discerning what variables can and will be
meaningful especially given the need to couple with other variables. Additionally, based on
the availability of data running the model for such a short time span is also problematic
and further adds to our premise that we might be not completely objectively finding the
correct data points.
IV.
Improving the Process
Reducing model bias
A consistent concern for any regression approach is that of model (or modeler) bias. In this
context, the question lingers whether we have chosen the correct independent variables,
such that future forecasts will remain as predictive as those in the past. One area that we
believe this paper can be expanded upon is the combination of multiple macro-timing
models in an effort to reduce model bias.
We gather this insight from the work done in the field of Machine Learning. For example,
the team that won the million dollar prize for improving upon their recommendation
12 algorithm blended over 800 models to arrive at their solution [Feuerverger (2012)]. We
believe that there is opportunity to apply the same principals to forecasting returns.
As a simple example, we can create a faster model (with a shorter regression sample period
of 24 months), and combine those results with our existing model in an effort to tease out
even more signal without compromising the fit of the original regression.
We constructed a faster model based upon the VIX, the first NFP for a given month, and
the University of Michigan consumer sentiment survey. While these variables were
insignificant fits within our previous framework, by applying a shorter sampling period, we
were able to tease out a new signal, with an R-square of 4.94% between the forecasted
return and actual return from 1994 through 2013.
We then created a weighted average of the two model forecasts based upon the R-square of
the underlying regression at observation divided by the standard deviation of the sample
error for each underlying model through time (observed up to the evaluation date). The
effect of this weighting is to penalize the faster model to a certain degree, given that it’s
smaller sample period created large fluctuations in the fit throughout time. As such, we
trust its forecast less than the strength of the underlying R-squared.
When combining the predictions from both models, we found that the new R-square of the
forecast was 7.45%, compared to 6.63% for the original model (with a similar Sharpe ratio).
Additionally, we believe that the combined model offers additional protection against model
bias by incorporating a more diverse signal set.
Future Opportunities
Despite our attempts to reduce model bias, there is still a degree of variable selection that
is performed external to the model and after the fact. We have attempted to minimize this
ex-post selection bias when implementing the term structure of interest rates, and the same
framework of model inclusion / exclusion can be incorporated to other macro variables as
well, similar to the methodology that Bai (2010) proposes.
In an ideal system, a basket of macro indicators are selected at each period. Univariate
regressions are run with next month stock returns on each indicator, with selected data
transformations also incorporated. Variables that exceed a predetermined threshold (either
based upon an absolute rule or relative to the period) are selected for inclusion in a final
multivariate regression, with adjustments for co-linearity if two variables overlap in signal.
This adaptive approach to modeling likely increases the flexibility of the process, while
turning variable selection into an endogenous decision and further reducing modeling bias.
Additionally, models can be built for various developed and emerging markets. While there
may be issues with data availability and reliability in EMs, we are approaching a long
enough timeframe to begin cracking the shell open with regression modeling. With a
collection of market forecasts as output from the models, one could construct a monthly
tactical portfolio using Black-Litterman to optimize holdings.
13 Conclusion
Regression modeling, while presenting a potential black box solution to the issue with
market timing, has proven to be both flexible and robust relative to a static trading rule. In
this paper, we have shown the ability to produce a methodology that provides a fairly
realistic back test, the flexibility to deal with evolving market dynamics, and a proxy for the
confidence of the indicator. The downsides of modeling bias and data mining can
(hopefully) be minimized with the incorporation of multiple models as well as a sensible
narrative around the input decisions.
There is also great opportunity with each passing reporting period, as the historical
dataset, with regards to vintages, encompasses greater periods of analysis. Today, there is a
deluge of data released every month, and we expect the volume to continue to grow. Our
goal should be to harvest the entire picture rather than selectively focus on a few variables,
and regression modeling is one potential tool when placed in the proper hands.
14 Appendix A – Modeling Inputs
Despite the extensive use of out-of-sample forecasting, there are still several inputs to the
model that are determined through modeler discretion (and are therefore subject to
potential data mining). In this section, we highlight each variable and our rationale for
selecting it.
1) Variable transformation: Variables were transformed, typically by taking moving
averages and then converting these figures into t-statistics. We tested 1,2,3,4,6 and
12 month averages, as these represent typical partitions within a year. We chose 36
months of sampling for calculating the Baa spread, the slope of the term structure
and the change in the PMI, as this sampling period captures changes over a short
term business cycle, while also highlighting deviations within a smaller time frame.
With regards to non-farm payrolls, we found that a 5 year sample provided the
strongest signal when placed amongst the other variables. We also tested sample
sizes of 2 years, but found the 3 year to typically be superior.
2) Regression window: We created our rolling regression line using the latest 84
months (7 years) of data, as this coincided with both a longer business cycle as well
as the findings related to Shiller’s CAPE ratios. We also tested sampling periods
from 3,5 and 10 years.
3) Model selection: When multiple models were created based upon possible
combinations of the slope of the US term structure, certain models were penalized
for producing particularly poor forecasts in a prior period. A model was excluded
from the sample if it produced an underlying R-Square that was half the average of
the group. The goal of this exclusion rule was to merely restrict outlier models, as
the average number of models included per period was 12.91 out of a possible 13.
4) Short rate slope inclusion: Our model was originally designed to create forecasts for
all possible slope combinations, from [6 month] – [3 month] to [10 year] – [3month]
to [10 year] – [7 year]. We opted to reduce the model to simply slopes with short
rates of 3 and 6 months, as further portions of the yield curve produced significantly
poor forecasts, and provided little in the way of valuable signal to the model.
Appendix B – Other variables tests
We tested several available measurements that have been utilized in other leading
indicators or are relevant monthly data releases. These include:
S&P 500







University of Michigan Consumer Sentiment Survey
AAA yield spread to 10 year treasury
Chicago Fed National Financial Conditions Leverage Subindex
OECD Business Tendency and Confidence surveys
New Orders of Durable Goods
Housing Starts
Equity Risk Premium
15 TSX





Brent & WTI crude prices and monthly volatility of daily returns
Year-over-Year changes in CPI
Month-over-Month changes in retail sales
Month-over-Month changes in building permits
Changes in unemployment (note: This variable proved significant, but due to the
long possibility of revisions, it was removed for fairness reasons)
DAX






ZEW Current Survey
ZEW Expectations Survey
Changes in unemployment
Changes in Industrial Production Month-over-Month
Changes in Factory Orders Month-over-Month
IFO Expectations Survey
Exhibit 1: Comparison of Drawdowns between the trading strategy and the S&P
16 Exhibit 2: Change in Baa Spread Beta: [3 Year] – [3 month] model
Exhibit 3: Correlation between S&P returns and the change in the Baa spread
17 Exhibit 4: Correlation between VIX and [5 Year] – [3 Month] term structure
Exhibit 5: The increasing importance of first-release NFP figures
Model based upon a 2 year sample T-stat using first release NFP figures
18 Exhibit 6: Maximum R-square of Isolated Slope of the Term Structure regression
Exhibit 7: R-square of Isolated Baa Spread regression
19 Exhibit 8: R-square of Isolated PMI change regression
Exhibit 9: R-square of Isolated NFP change regression
20 Exhibit 10: [3 Yr] – [3Mo] transformation Correlations
Avg‐‐Tsample Average Average T‐Stat Change in Avg Change T‐Stat
1
‐5.01%
‐5.86%
2
‐4.03%
‐9.60%
3
‐3.70%
‐10.18%
4
‐4.42%
‐3.94%
6
‐7.55%
‐0.27%
12
‐8.95%
6.86%
12‐‐24
17.17%
‐7.01%
12‐‐36
1.10%
‐2.18%
1‐‐24
‐1.97%
‐4.00%
1‐‐36
4.50%
‐3.14%
2‐‐24
3.92%
‐4.71%
2‐‐36
9.21%
‐6.35%
3‐‐24
8.07%
‐10.78%
3‐‐36
10.85%
‐8.26%
4‐‐24
8.59%
‐4.49%
4‐‐36
13.56%
‐5.86%
6‐‐24
12.88%
3.65%
6‐‐36
16.09%
4.45%
Exhibit 11: Baa Spread transformation Correlations
Avg‐‐Tsample Average Average T‐Stat Change in Avg Change T‐Stat
1
3.22%
2.01%
2
3.95%
6.99%
3
4.19%
15.02%
4
4.54%
14.95%
6
5.55%
7.66%
12
8.09%
8.47%
12‐‐24
4.73%
6.87%
12‐‐36
5.02%
8.57%
1‐‐24
2.82%
6.68%
1‐‐36
7.20%
5.93%
2‐‐24
4.44%
9.50%
2‐‐36
7.98%
10.65%
3‐‐24
5.38%
15.31%
3‐‐36
7.82%
17.32%
4‐‐24
4.85%
15.39%
4‐‐36
6.95%
17.18%
6‐‐24
4.55%
10.80%
6‐‐36
5.83%
11.72%
21 Exhibit 12: PMI transformation Correlations
Avg‐‐Tsample Average Average T‐Stat Change in Avg Change T‐Stat
1
‐3.56%
6.91%
2
‐5.17%
2.43%
3
‐5.27%
3.97%
4
‐4.99%
‐1.99%
6
‐4.72%
‐7.89%
12
‐3.91%
‐14.87%
12‐‐24
‐0.33%
‐13.42%
12‐‐36
0.86%
‐13.27%
1‐‐24
‐2.98%
6.64%
1‐‐36
1.43%
7.54%
2‐‐24
‐4.81%
3.32%
2‐‐36
0.35%
4.21%
3‐‐24
‐4.91%
5.07%
3‐‐36
‐0.29%
4.51%
4‐‐24
‐5.45%
0.31%
4‐‐36
‐1.20%
0.09%
6‐‐24
‐6.56%
‐4.57%
6‐‐36
‐2.53%
‐5.58%
Exhibit 13: Final Release NFP transformation Correlations
Avg‐‐Tsample Average Average T‐Stat Change in Avg Change T‐Stat
1
‐2.77%
‐6.48%
2
1.27%
‐3.77%
3
‐0.23%
‐9.98%
4
‐1.10%
‐11.90%
6
0.83%
‐10.38%
12
5.55%
‐0.36%
12‐‐24
12.61%
‐3.21%
12‐‐36
12.74%
‐2.14%
1‐‐24
1.42%
‐6.93%
1‐‐36
3.36%
‐5.66%
2‐‐24
4.94%
‐0.75%
2‐‐36
7.16%
‐1.88%
3‐‐24
1.90%
‐10.48%
3‐‐36
4.95%
‐10.92%
4‐‐24
3.41%
‐14.58%
4‐‐36
5.36%
‐14.17%
6‐‐24
7.81%
‐10.35%
6‐‐36
8.79%
‐10.15%
12‐‐60
8.08%
1.12%
1‐‐60
3.36%
‐5.78%
2‐‐60
7.88%
‐2.13%
3‐‐60
3.37%
‐10.33%
4‐‐60
1.21%
‐14.05%
6‐‐60
2.47%
‐9.64%
22 Exhibit 14: Canadian Portfolio Return
Exhibit 15: Canadian strategy portfolio statistics
Model
Hit Rate
79.6%
Win/Loss
0.69
Max Drawdown ‐18.76%
Skew
0.10
Kurtosis
2.04
Benchmark
60.8%
0.91
‐32.02%
(0.97)
2.85
23 Exhibit 16: Canadian Model Statistical detail
Regression Statistics
Multiple R
28.56%
R Square
8.15%
Adjusted R Square
7.77%
Standard Error
4.24%
Observations
240
Intercept
Forecasted Return
Coefficients Standard Error
t Stat
P‐value
0.001 0.003 0.531 0.596
0.006 0.001 4.597 0.000007
Exhibit 17: German Model Portfolio Return
Exhibit 18: German Model strategy portfolio statistics
Model
Hit Rate
75.0%
Win/Loss
0.80
Max Drawdown ‐29.07%
Skew
(0.51)
Kurtosis
6.52
Benchmark
62.5%
0.93
‐42.58%
(0.81)
2.45
24 Exhibit 19: German Model statistical detail
Regression Statistics
Multiple R
36.31%
R Square
13.18%
Adjusted R Square
12.45%
Standard Error
4.90%
Observations
120
Intercept
Forecasted Return
Coefficients Standard Error
t Stat
P‐value
0.006 0.004 1.447 0.150
0.805 0.190 4.233 0.000046
Exhibit 20: Japanese Model Portfolio Return
25 Exhibit 21: Japanese Model strategy portfolio statistics
Model
Hit Rate
79.2%
Win/Loss
0.61
Max Drawdown ‐15.40%
Skew
0.95
Kurtosis
2.78
Benchmark
55.8%
1.01
‐37.40%
(0.69)
1.92
Exhibit 22: Japanese Model statistical detail
Regression Statistics
Multiple R
20.30%
R Square
4.12%
Adjusted R Square
3.31%
Standard Error
5.69%
Observations
120
Intercept
Forecasted Return
Coefficients Standard Error
t Stat
P‐value
0.006 0.005 1.058 0.292
0.301 0.134 2.252 0.026
26 References
Bai, Jennie. Equity premium predictions with adaptive macro indexes. No. 475. Staff
Report, Federal Reserve Bank of New York, 2010.
Beber, Alessandro, Michael W. Brandt, and Maurizio Luisi. Distilling the macroeconomic
news flow. No. w19650. National Bureau of Economic Research, 2013.
Fama, Eugene F., and Kenneth R. French. "Common risk factors in the returns on stocks
and bonds." Journal of financial economics 33.1 (1993): 3-56.
Feuerverger, Andrey, Yu He, and Shashi Khatri. "Statistical significance of the Netflix
challenge." Statistical Science 27.2 (2012): 202-231.
Ghysels, Eric, Casidhe Horan, and Emanuel Moench. Forecasting through the rear-view
mirror: data revisions and bond return predictability. No. 581. Federal Reserve
Bank of New York, 2012.
Harvey, Campbell R., and Yan Liu. "Backtesting." Available at SSRN (2013).
Harvey, Campbell R. "The relation between the term structure of interest rates and
Canadian economic growth." Canadian Journal of Economics 30.1 (1997): 169-93.
Harvey, Campbell R. "The term structure and world economic growth." The Journal of
Fixed Income 1.1 (1991): 7-19.
Koenig, Evan F., Sheila Dolmas, and Jeremy Piger. "The use and abuse of real-time data in
economic forecasting." Review of Economics and Statistics 85.3 (2003): 618-628.
Varian, Hal R. "Big Data: New Tricks for Econometrics." (2013).
27