A Linked Linear Regression - Data Envelopment Analysis Model for

Transcription

A Linked Linear Regression - Data Envelopment Analysis Model for
Case Study:
A Linked Linear Regression Data Envelopment Analysis Model
for the Long-Term Prediction of
Mutual Fund Performance
Cindy Chao, Greg Higgins
Michael McGee
Department of Computer Science and Engineering
SOUTHERN METHODIST UNIVERSITY
Dallas, Texas
May 14, 1994
LINKED LINEAR RECRESSION - DEA MODEL
Contents
Abstract.............................................................................................................................3
Introduction......................................................................................................................3
Whatis a Mutual Fund? .................................................................................................4
DEAFactors......................................................................................................................4
Model Development and Validation.............................................................................6
Examinationof Resiilts ...................................................................................................9
InvestmentPerformance............................................................................................... 10
Conclusion...................................................................................................................... 11
References....................................................................................................................... 12
ProfessionalReferences................................................................................................. 13
Appendix........................................................................................................................ 14
DEAFormulation .............................................................................................. 15
DEACode ........................................................................................................... 16
ForcastingCode ................................................................................................. 19
2
Case Study:
A Linked Linear Regression - Data Envelopment Analysis Model
for the Long-Term Prediction of Mutual Fund Performance
Michael McGee
Greg Higgins
Cindy Chao
Department of Computer Science and Engineering
Southern Methodist University
Dallas, Texas 75275
Data Envelopment Analysis (DEA) can be used to evaluate
managerial efficiency of mutual funds. By extending the DEA
methodology with linear regression, a new prediction tool for
future efficiency is developed. The linked model was tested
on a sample of fifty growth mutual funds over a three year
time horizon - resulting in accurate predictions of future
efficiency with 99% confidence.
he current mutual fund market
current assessment of a fund's
T contains nearly $1.4 trillion
management. DEA is a linear
dollars; this rapid growth of mutual
programming methodology that
funds as a long—term investment
computes relative efficiency scores
instrument has led to an increased
among a group of homogeneous
demand for evaluation and forecasting decision—making units (DMUs).
techniques. The performance of a Combined with linear regression, DEA
mutual fund is directly linked to the
becomes a prediction tool for longperformance of the management staff.
term forecasting. The linked model not
Data Envelopment Analysis (DEA) can
only evaluates return on investment
be used to evaluate managerial
but also the quality of the fund's
efficiency; however, this is only, a
managerial team on a long—term basis.
3
LINKED LINEAR REGRESSION - DEA MODEL
What is a Mutual Fund?
goal of the linked model is forecasting
of long-term performance, growth
mutual funds were selected because
their objectives emphasize the
accumulation of capital gains, and they
have the greatest long-term growth
potential.
A mutual fund is an investment
company that pools the money of
many institutional and individual
investors into one professionally
managed investment account. The
fund manager creates a portfolio for
the shareholders by buying a variety of
"true" financial instruments, such as
corporate stocks and bonds, and
government and municipal securities,
in accordance with the fund's
particular investment objectives. Fund
managers are professional investors
who know the financial market and
follow it daily with extensive research
capabilities to effectively investigate all
investment opportunities. The fund is
always monitored, with individual
securities constantly being bought,
sold, and traded in order to optimize
the investment objectives. The beauty
of investing in sometimes twenty
different securities lies in the protection
of the investment through portfolio
diversification.
Mutual funds are divided into six
major categories according to their
investment objectives, which are
outlined in the fund's prospectus, such
as goals, level of risk, and the type and
frequency of return desired. Since the
DEA Factors
The inputs and outputs of the
envelopment analysis were made up of
the main characteristics of a mutual
fund. They were selected because they
are important factors to investors and
are directly linked to the performance
of the fund, manager. The inputs
selected were beta - a measure of risk,
expense ratio, and loads & other fees.
The outputs used were net asset value
(NAV) per share, dividends, and
capital gains.
Inputs
Beta is a measure of the fund
portfolio's average volatility,or risk,
relative to the S&P 500. The beta value
of the market is a constant equal to
1.00. A fund with a beta value greater
than 1.00 is more volatile than the
market. It is expected to outperform
the "average fund" in a bull (rising)
market and bring in greater returns,
while it is expected to fall more than
4
LINKED LINEAR REGRESSION
the average fund in times of a bear
(declining) market. A fund with a beta
value less than 1.00 Possesses the
characteristics of the converse
si tuation. For example, a portfolio with
a beta value of 0.75 is expected to be
25% less responsive to the general
economic trend than the average fund
and return 25% more conservative than
the average fund.
DEA MODEL
brokerage costs, and transaction fees.
This value is a Percentage of the
Purchase price taken off either before
the money is invested (front-end load)
or after investment when the
shareholder redeems the shares (backend load). The amount of the load is
established by the managerial company,
but is paid to the sales and transaction
team. Some funds have a load of 0.00%,
Expense ratio is a measure which
meaning that they have the same
relates expenses to the average net
benefits as load funds, except they are
assets for a period. It is calculated by
sold directly by the fund company
taking the annual expenses paid by the
rather than through a sales broker so
fund (including management and
tha t no loads or other fees are incurred.
i nvestment advisory fees, custodial
With no-load funds, the investor must
and transfer agency charges, legal fees,
seek out and contact the investment
and dis tribution costs) divided by the
firm directly. There is a distinct
average shares outstanding for the
a dvantage to investing in no-load
period. The greater the expense ratio,
funds. For example, if we had invested
the larger the percentage of the
1000 in a particular portfolio with a
shareholder's investment gets paid out
7.50% front-end load, $75 would be
directly to the management company. i :aken off the
top, and we would be left
The expense ratio does not include
vith only $925 worth of shares in the
loads or commissions paid for sales or f
und. Had this been a no-load fund, the
tra nsaction fees.
ftill $1000 would have been utilized.
Loads and other fees represent all
C )utputs
other charges made to the shareholder
Net asset value per share represents
which are not a ccounted for in the h DW much each share is worth. It is
expense ratio. A load is a portion of the c
Ll Culated by taking the total market
offering price of the fund that pays for
v due of a fund's shares (securities plus
costs such as sales commissions,
sh plus accrued earnings minus
5
LINKED LINEAR REGRESSION -
DEA MODEL
liabilities) divided by the number of
in different ways. Shareholders can
shares outstanding. No profits or losses reinvest dividends and capital
are realized through NAV until the
distributions if they elect to do so.
fund is sold. Interest lies in the amount
of increase or decrease in NAV from
Model Development and Validation
time of purchase to time of sale.
The d ecision—making units (DMUs)
Dividends are payments declared
in this analysis were the 50 growth
by a corporation's board of directors
mutual funds. A prerequisite of DEA is
and made to shareholders, usually on a homogeneous DMUs. By selecting the
quarterly basis. A fund's dividends
funds from the same sector (growth
come directly from the dividends paid
funds), this requirement is met.
out by the individual securities within
The DEA model uses the dual
the portfolio. They are paid out equally formulation of the linear program as
among the shareholders on a
described by Charnes, Cooper and
scheduled time frame.
Rhodes (1978). The objective function
Capital gains on a mutual fund are
of this form is to minimize the
the profits made from the sale of a
objective value of the current DM10. If
capital asset (such as a security, stock,
a DM10 is efficient the optimal value is
or bond) within the portfolio. The
1, while an inefficient DM10 is driven
number used for capital gains is
towards zero. The first step in the
adjusted for losses and stock splits. The
development of the linked model is the
sale of an asset occurs when the
generation of DEA scores for the
manager decides that a security needs
mutual funds being studied. The data
to be sold for the benefit of the
collected for the mutual funds is used
portfolio. Capital gains are distributed
as the inputs and outputs for the DEA
to shareholders when they are realized
code which is documented in the
(when a security is sold and a profit is
appendix. The DEA code is written
made) and are paid out as capital
using the SAS/OR package allowing
distributions.
:he DEA scores to be fed directly into
Dividends and capital gains are t he regression phase of the model.
treated as separate outputs because
DEA scores are generated for the
they are incurred, paid out, and taxed
fifty mutual funds in the study using
LINKED LINEAR REGRESSION -
DEA MODEL
IflI &A A*Iy.I.
•_.
the 1990 data collected from
Morningstar Mutual Funds. At this point
DEA scores are also generated for 1993
and stored for use in validating the
model's forecasts. The 1990 DEA scores
are then regressed against all of the six
DEA factors previously described.
This regression model is unique
from previous research in that the goal
of the regression is not to create an
efficiency score, but to forecast an
efficiency score for the future of a
particular DIv[LT. Previous work by
Thanassoulis (1993) compared the DEA
methodology with linear regression as a
means of creating efficiency scores. Our
model links the two methods into one
prediction tool. The full factor model
that resulted appeared on the surface to
be a good predictor of efficiency (R2 =
0.8239). The full factor model included
three non-significant factors at the 5%
level of significance. Beta (p-value =
0.68), load (p-value = 0.65), and
dividend (p-value = 0.50) were all
eliminated from the model leaving only
expense ratio, net asset value, and
capital gains as significant factors. The
revised model remained significant
with an R2 = 0.8213.
As stated before, this model appears
to be a good predictor; however, linear
I
S..
•.I
I
5.7
5.5
•.4
.,a,.t.. V.,.. - *fl
II
•.5
1.2
P5(5
Figure 1: Residual vs. Predicted DEA Score
regression models must be validated
through diagnostic tests. One
assumption in regression analysis
underpins the entire methodology errors are a randomly distributed
variable following a normal distribution
with mean equal to zero, N(0,a2).
Regression analysis in SAS is based on a
linear relationship between the
dependent and independent variables.
These assumptions can be verified
through the examination of the
residuals, or errors, of the regression
model. A plot of the residuals versus the
predicted values, Figure 1, reveals a
distinct pattern indicating a non-linear
relationship between the independent
and dependent variables. This requires
the examination of univariate statistics
7
LINKED LINEAR REGRESSION — DEA MODEL
for each of the variables in the
regression model. A standard normal
quantile plot for each of the variables
allows the determination of the
distribution of each variable.
From the univariate procedures,
some of the variables demonstrated an
exponential distribution requiring a
log—linear transformation in order to
validate the regression model. Net
asset value, capital gains, dividends
and the actual DEA score follow
exponential distributions while beta,
load, and expense ratio are all normally
distributed. A second data set of the
inputs and outputs was created by
taking the natural log of each of the
exponentially distributed variables.
Once normality was confirmed
through univarjate procedures, a new
full factor linear regression model was
fitted. The newly formed model
resulted in an R 2 = 0.8163 and was
validated through the residual plot
shown in Figure 2. Using the SAS
regression report, the regression
equation was created using the
coefficients of the significant factors:
beta, expense ratio, net asset value, and
dividend.
The next step is the creation of
predicted efficiency scores. The
8
•I
I,:
I
•
,
PfldItt*O V.,..
•
.i —
It LOLA
•.S
P.O
AS
POLO
Figure 2: Residual vs. Predicted DEA Score—after linear traa
common method for forecast model
verification is the prediction of the
past. The linked model is verified by
using the input and output data for
1993 in the regression equation created
from the 1990 data. The predicted
values are then compared to the actual
1993 DEA scores generated earlier by
the DEA code. The forecast errors are
evaluated using a significance test on
the mean difference. The test is
conducted at the 1% level of
significance. H0 : mean difference = 0;
Ha: mean difference # 0. The resulting
test statistic: z = -0.5164 and critical
value, z = 2.57 leads to the conclusion
that the prediction values are not
significantly different from the actual
DEA scores with 99% confidence.
LINKED LINEAR REGRESSION - DEA MODEL
Fund DEA 1990
1.00000
FTRNX
Predicted 1993
1.00000
DEA 1993
0.62320
Error
-0.37680
PRWAX
1.00000
0.78983
0.29415
-0.49568
G1NLX
1.00000
0.33207
0.23691
-0.09516
ACRNX
1.00000
1.00000
1.00000
0.00000
FDETX
0.41944
0.98900
1.00000
0.01100
CFIMX
0.73415
0.98032
1.00000
0.01968
LOMCX
0.32965
0.68927
1.00000
0.31073
Table 1: Efficient MUtUI t-unas
Examination of Results
In analyzing the results of the
sample studied, it is important to
remember that a mutual fund is mainly
used as a long-term investment,
providing stable low-risk growth. A
potential shareholder wants to invest
in a fund that will be efficient in the
future, regardless of its current
efficiency. Therefore we focus our
examination on the set of funds with
efficient scores in either 1990 or 1993.
Table 1 shows the 1990 DEA, Predicted
1993, and actual 1993 DEA scores along
with the errors for each prediction
(Error = 1993 DEA - Predicted 1993).
Of the first four funds listed (all
receiving efficient ratings in 1990), the
future efficiency of three funds was
correctly predicted. Our linked model
predicted future behavior of currently
efficient funds with 75% accuracy.
Although the error for T. Rowe Price
New America Fund, PRWAX, is high,
we are not concerned with the
magnitude of inefficiency, only the
state of its efficiency.
Of the last four funds in the table
(all receiving efficient ratings in 1993),
three funds were predicted accurately.
The fifth and sixth funds, Fidelity
Destiny Portfolios: Destiny II, FDETX,
and Clipper Fund, CFIMX, both
received a predicted score of
approximately 0.98 for 1993. Based on
the previous significance test, these
values are not significantly different
from 1.00. Thus our model predicts the
funds that will be efficient in the future
with 75% accuracy.
LINKED LINEAR REGRESSION - DEA MODEL
Table 2: Investment Simulation Results
Investment Performance
Since investors are concerned with
the long-term growth and appreciation
of their investments, DEA by itself is
not an effective tool to aid in the
development of investment strategies.
However, combined with linear
regression, DEA gives us the ability to
estimate the long-term efficiency of a
fund. To illustrate the effectiveness of
the linked model relative to DEA alone,
we simulated an investment in the
efficient funds in 1990 and the funds
predicted to be efficient by our linked
model. Since mutual funds allow the
purchase of fractional shares, we
invested $1000 in each of the mutual
funds. The results are given in Table 2.
The percent of appreciation we
received in the portfolio of 1990
efficient mutual funds was 46.92%,
while the appreciation was
significantly higher for the portfolio of
funds recommended by the linked
model at 60.41%.
10
LINKED LINEAR REGRESSION - DEA MODEL
Conclusion
The linked model allows us to
broaden the range of applications for
DEA. The linked model enables the
prediction of future efficiency, which is
of great benefit to investors because it
provides more insight into the longterm viability of a mutual fund. The
model also requires little historical data
for each DM0 unlike other forecasting
methods. Another benefit is the relative
compactness of the model; very few
predictors are required to explain a
high degree of variability and achieve
accurate predictions. Data
envelopment analysis by itself is not an
accurate predictor of future efficiency
of mutual funds. Only one fund that
was efficient in 1990 remained in the
efficient set in 1993. Linear regression
and DEA combine to give us an
accurate prediction for the future
behavior of DMUs.
An immediate extension to this
research is the definition of an "ideal
type" mutual fund. An optimization
algorithm will be used to generate
values for the regression factors and
record the combinations which yield
and efficient fund prediction. Once this
benchmark is defined a growth mutual
fund prospectus can simply be visually
inspected to determine managerial
efficiency. The next phase of this
research would be to compare the
accuracy of the linked model with that
of other forecasting methods, such as
time series decomposition. Another
step in the continued development of
this area of research is to improve the
accuracy of the predictions through
further modifications to the regression
model using a weighted regression
model or the inclusion of adaptive
error control.
11
LINKED LINEAR REGRESSION - DEA MODEL References
Barr, R. S., L. M. Seiford, and T. F. Siems (1991), "An Envelopment Analysis
Approach to Measuring the Management Quality of Banks," The Annals of
Operations Research. J.C. Baltzer AG, Science Publishers.
Dilorio, F. C. (1991), SAS Applications Programming: a Gentle Introduction. Boston:
PWS-KENT Publishing Company.
Flatt, L. G. (1994), "FUNDamentals: An Envelopment-Analysis Approach to
Measuring the Managerial Quality of Mutual Funds," technical report.
Dallas, Texas: Southern Methodist University, Department of Computer
Science and Engineering.
Morningstar Mutual Funds User's Guide (1992), Chicago: Morningstar, Inc.
Morningstar Mutual Funds (1993), Chicago: Morningstar, Inc.
Norman, M. and B. Stoker (1991), Data Envelopment Analysis: The Assessment of
Performance. New York: John Wiley & Sons.
SAS Institute Inc. (1989), SAS/OR User's Guide, Version 6, First Edition. Cary, NC:
SAS Institute Inc.
Sexton, T. R., S. Sleeper, and R. E. Taggart, Jr. (1994), "Improving Pupil
Transportation in North Carolina," Interfaces, January-February 1994.
Providence, RI: The Institute of Management Sciences.
Thanassoulis, E. (1993), "A Comparison of Regression Analysis and Data
Envelopment Analysis as Alternative Methods for Performance
Assessments," O.R. Journal of the Operations Research Society. Coventry:
Operational Research Society Ltd.
12
LINKED LINEAR REGRESSION -
DEA MODEL
Professional References
Barr, Richard S., Ph.D. Associate Professor, School of Engineering and Applied
Sciences. Southern Methodist University, Dallas, Texas.
Dula, Jose, Ph.D. Assistant Professor, School of Engineering and Applied
Sciences. Southern Methodist University, Dallas, Texas.
Guerra, Rudy, Ph.D. Assistant Professor, Department of Statistical Science.
Southern Methodist University, Dallas, Texas.
Welch, David. Senior Systems Analyst II, Bradfield Computer Center. Southern
Methodist University, Dallas, Texas.
13
LINKED LINEAR REGRESSION - DEA MODEL
Appendix
DEA Formulation
Mat hematical formuation of the DEA model used in the Linked Model.
DEA Code
The DEA code is developed using Release 6.0 of the SAS System on an IBM
3090 mainframe. The code presented requires only two input files . One file
contains the input and output factor data while the second file contains a
listing of the mutualfrnd ticker symbols. The program is construct ed from
a series of SAS Macros which fully automate the formulation and solution
of the multiple linear programs required in Data Envelopment Analysis.
Forecasting Code
Developed as an extension to the DEA code, the forecasting model uses the
same computing platform and software. The program reads the same data
files as the DEA code as well as the DEA scores generated by the DEA
study, The program generates predicted efficiency scores and all necessary
statistical procedures for validating the model.
14
LINKED LINEAR REGRESSION -
DEA
MODEL
DEA Formulation
A
Inputs
ou
+
tputs
Slacks
=
e
-
10
[X
A.
Mi n E)
A-A:!^O
A7
e^!o
Mat heinaticalformulation of DEA requires that the inputs and out puts for each DMU
are formated in a matrix, A The matrix is multiplied by a vector, A,to construct the linear
programming constraints.
Minimize 0
Subject to:
Inputs Beta: X,DMUl+XDMU2+... +xODMU5O-x51(0i'l):5OiO
Load: xIDMUI+xDMU2+... +xo'DMU50-X5(0i1)!90O
E_Pafio: xiDMUi +x2DMU2+... +x5oDMUso-x51(0i l):50 O
Output$
NAV: X1DMUI+XDMU2+... +X5ODMU5O-X51(00) 011
Cap_Gain: xrDMUr +x2 DMU2+ ... +x5ODMU5Ox5(0rO) 0i • 1
Dividend: xrDMUr
Mere
+)(2
DMU2 + ... + X50 DMUso - X51 (0 0) All Variables 2: 0
a Is current ciecislon-maldng unit (DMU)
Linear program solved by SAS for each DM11.
15
1
LINKED LINEAR REGRESSION —DEA MODEL
DEA Code
Part 1: Initialization
This section reads the data files into
a SAS dataset. Using the SAS Data
Management Language datasets are
created for the right hand side vectors,
as well as the theta vectors. The two
datasets are perfectly matched - each
right hand side has exactly one
corresponding theta vector. There is one
right hand side and one theta vector for
each DMU in the study. These vectors are
merged with the inputs and outputs to
formulate the linear program. The final
step in the initialization is the renaming
of variables to the proper ticker symbols
in order to aid in report interpretation.
MICHAEL MCGEE
SENIOR DESIGN GROUP '94
DEA ANALYZER
FILENAME NDT MENAME CTAA';
FILENAME DAflAF9O DATAA';
OPTIONS UNESIZE.72;
%LET SYMBOLS =
ACRNX LOMCX HRTVX STCSX BEONX GINIX PARNX VCAGX DEVLXAGBBX
FPPTX FDE1X FDESI( FCN1( FBGRX FDVLX SRFCX FRGRX BRW1X SAMYX
FDFFX FAGOX MLSVXALTFX ANEFX GPAFX PRWAX CGRWX OVOPX WTMGX
NYVTX SEOGX BVALX BARAXAFGPX FTRNXAMRGX DNLDX HCRPX NBSSX
PRGM( INNSI( TWHIX DWM( LMITX JHNX SRSPX CFIMX cWBX BOSSX
DATA TICK;
INFILENDT;
ARRAY TICKR(50) $5;
INPUT TICKRr);
r MUTUAL FUND TICKER SYMBOLS
DATA ONE;
INFILE DAT;
ARRAY FUND (50)
INPUT JD_$
FUND()
_TYPE_ $ _RHS.;
r GENERATE THETA COLUMNS 1
DATA NEW-COL;
SET ONE;
ARRAY THETA (SO);
ARRAY FUND(50);
DO I ITO W.
IF N_ = I THEN THETA(I) I;
ELSE
IF_N== 2 &_N_ <--4 THEN THETA(I .FUNO(I);
ELSETHETA(I) =0;
END
DROP I;
P GENERATE RIGHT HAND SIDE VECTORS
DATA RANGE;
ARRAY RH(50);
ARRAY FUND(50);
1
SET ONE;
DO I = ITO SE;
IF _N_ = I THEN RH(I) MISSING;
ELSE
IF _N_ 4 THEN RH(I)= FUND([),
ELSE RH(I) 0;
END;
DROP MISSING I FUNDI.FUND50;
DATA PARAM;
MERGE NEW_COL RANGE;
P RENAMES THE FUNDSTO THE PROPER NASDAQTICKER SYMBOL!
PROC DATASETS;
MODIFY ONE;
RENAME FUNDI =ACRNX
FUND2=LOMCX
FUND3= HRTVX
FUND4 STCSX
FUNDS=BEONX
FUNDO=GINLX
FUNDT=PARNX
FUND8=VCAGA
FUND9=DEVLC
FUNDIO=AGBBX
FUNDII= FPPIX
FUNDI2= FDEIX
16
LINKED LINEAR REGRESSION - DEA MODEL
FUNDI3 FDESX
FUND14 FCNTX
FUNDI5= FBGFO(
FUND16 FDVLX
FUND17 SRFCX
FUNDI8 FRGRX
FUNDI9 BRW1X
FUND20SAMVX
FUND21 FDFFX
FUND22 FAGOX
FUND23=MLSVX
FUND24.ALTFX
FUND26ANEFX
EUND26 GPAFX
FUND27. PRWAX
FUND28=CGRWX
FUND29. QVOPX
FUND30.WTMGX
FUND3I NWTX
FUND SEX3X
FUND33 BVALX
FUND34BARAX
FUND35=AFOPX
FUND36= FTRNX
FUND37AMRGX
FUND38= DNLDX
FUND39 HCRPX
FUND40=NBSSX
FUND41 PRGY(
FUND42= INNDX
FUND43 TWHIX
FUND44=DWIVX
FUND45=LMVTX
FUND46=JHNGX
FUND47 SRSPX
FUND48CFX
FUND49 CLMBX
FUND5O= BOSSX;
Part 2: DEA Macro
This macro accepts one parameter:
the index of the DMIJ to evaluate. The
macro merges the input/output dataset
with the appropriate theta and right
hand side vectors. This dataset now
contains the LP formulation for SAS to
analyze. The procedure generates
detailed printed reports for each of the
NAME DEA
DEC: AUTOMATES THE CREATION OF PROBLEM DATASET.
MERGES THE CURRENT THETAAND RHS INTO THE
COEFFICIENT MATRIX AND EXECUTES THE 1?,
PAI*A: NDMU -THE INDEX NUMBER OF THE DMU TO BE ANALYZED
%MACRO DEA(NOMU);
DATA —NULL—;
ARRAY TICKR(50) $S;
SEE TICK:
CALL S'PUT (FUND)(',PUT(TICKR{&NDMU}$S.));
RUN;
DATA CUR—U;
ARRAY THETA(50);
ARRAY RH{5O};
IMIDMU;
SET PARAM;
CTHETA THETA(I);
RHS_ RH{I};
KEEP CTHETA_RHS_;
DATA PROBLEM;
MERGE ONE CUR_DMU;
TIELEI DATA ENVELOPMENT ANALYSIS';
TITLE2 CURREN1 E(AU= &FUNDX';
P EXECUTE THE LP' /
PROC LP DATAPROBLEM POUT a &FUNDX;
%MEND DEA
17
LINKED LINEAR REGRESSION - DEA MODEL
Part 3: Loop Macro
' In order to iterate the execution of a
SAS procedure without repetitive
statements a macro was written to loop
around the calls to the DEA macro. The
loop macro accepts one parameter: the
number of DMUs in study.
Part 4: Extract Macro
In addition to the written reports
generated by the DEA macro, datasets
were created in memory containing the
results of the LPs. This macro combines
the reports for each of the DMUs and
extracts the DEA scores into a dataset
which is then fed to the forecasting
model.
NAME: LOOP
DEM. ITERATIVELY CALLS THE DEA MACRO
PA4: LIMIT - NUMBER OF OMUS TO BE PROCESSED
%MACRO LOOP(LIMIT)
%DO INDEX I %TO &LIIrr;
%DEA(&INDEX);
%ENE;
%MEND LOOE;
NAME: EXTRACT
DEW. EXTRACTS THE OBJECT WE VALUES FROM THE
INDIVIDUAL FUND REPORTS AND CREATES
DATASET CONTAINING THE IN FORMATION
%MACRO EXTRACT;
P MERGE LP OUTPUT DATASETS AND REMOVE UNIMPORtANT VARIABLES V
DATA REEl;
SEE &SYMBOLS;
KEEP _VAR_ _VALUE.
P EXTRACT OBJECTIVE FUNCTION VALUES i
DATA RES2;
SEE RESI;
IF _VAft. OBJECr THEN SCORE = -VALUE-;
ELSE DELETE;
KEEP SCORE
P GENERATE LISTING OF MUTUAL FUNDS
DATA RESE;
SEE TICK;
ARRAY TIOKR(50) $5;
DO I ITO 50 PCHANGE THIS TO 50 IN PRODUCTION RUN 1
NAME TICKR(I);
OUTPUT;
END;
KEEP NAME;
P MERGE CNTASEI' INTO ANAL FORM!
DATA RESULTS;
MERGE RESE RESE;
PROC PRINT DATA=RESULTS;
WEND EXTRACT;
r
EXECUTABLE CODE
%LOOP(50);
%EXTRACT;
18
LINKED LINEAR REGRESSION - DEA MODEL
Forcasting Code
P Mutual Fund Data Files 1
filename dati 'mfs93 data a';
filename dat2 'mfname data a';
filename dat5 'mfs90 data a';
proc rag;
tle2 'Minimum Spec Model;
model dea93 = E_Ratio NAV CAP_GAIN;
P DEA Score Data Files 1
filename dat3 'score93 data a';
filename dat4 'score90 data a';
/*
data info9o;
infile dat5;
array val(50};
input name $ val{};
option linesize 72 nonumber nodate;
P Read 1993 Mutual Fund Data */
data one;
mule dati;
array val(50);
input name $ val{};
proc transpose; P Transpose Matrix from IODMU to DMUIO
run;
proc transpose; P Transpose Matrix: from lO*DMU to DMUIO 1
run;
proc datasets; P Assign proper names to lOs
modify data 1;
rename colt =Beta
co12 = Load
co13 = E_Rabe
col4=NAV
col5 = Cap-Gain
co16 Dividend
_Name_= Fund;
1990 Data Analysis */
1
data scr9o;
infile dat4;
input dea90;
data funds; P Create List of Fund Names /
infile dat2;
input Fund $;
data scr93; r Read 1993 DEA scores from file
infile dat3;
input dea93;
proc datasets; P Assign proper names to lOs
modify data2;
rename coil = Beta
col2 Load
co13 E_Rabo
col4=NAV
col5 = Cap-Gain
coI6 Dividend
_Name_= Fund;
data comp90;
merge data2 funds scr90;
1
tie1 '1993 DEA Analysis';
data comp93;
set datal;
merge datal funds scr93;
tiflel 1990 DEA Analysis';
proc reg data=comp9o;
tille2 'All Possible Regressions';
model dea9o= Beta Load E_Ratio NAV Cap-Gain Dividend
/selection rsquare;
proc reg data=cornp9o;
tifle2 'Full Model';
model dea90= Beta Load E_Ratio NAV Cap- Gain Dividend;
proc rag data=comp9o;
tide2 'Minimum Spec';
model dea90= E_Ratio NAV CAP-GAIN;
plot residual.(predicted. E_Ratio NAV Cap-gain);
Run All possible regression models 1
proc reg;
tle2 'All Possible Regression Models';
model dea93=Beta Load E_Ratio NAV Cap- Gain Dividend/
selecben=Rsquare;
/
/* Full Fit Model */
proc rag;
title2'Full Model';
model dea93=Beta Load E_Ratio NAV Cap-Gain Dividend;
Forecasting Validation Model 1
P Create predicted values using forecast model
compare to actual 1993 values.
/*
19
1
LINKED LINEAR REGRESSION - DEA MODEL
model Ldea = Beta Load E_Ratio LNAV LCG LDiv
plot residuat(predicted.);
data predict;
set datal;
Exp_dea=-0.177220E_Rao + 0.016477NAV +
0.152344Cap_Gain+0.279313;
if Exp_dea> 1 then Exp_dea = 1; proc reg data = linear;
model Idea = Beta E_Rao LNAV LDIV;
data forecast;
merge predict funds scr93 scr9o;
Err = dea93 - exp_dea;
keep fund dea9O exp_dea dea93 err;
/ Linear Regression Equation /
data predict2;
set trans93;
PDEA 0.6874598eta - 0.525658E_Ratio + 0.753563LNAV
+ O.229900LD1V -2.930810;
Exp_dea = exp(PDEA);
if Exp_dea> I then Exp_dea = 1;
tine 'Regression Predictions';
tiUe2'Non-Linear Model';
proc pnnt data=forecast;
var fund dea90 dec93 exp_dea err;
/ Compute Forecast Errors
data frcs;
merge predict2 funds scr93 scr90;
Err = dea93 - exp_dea;
keep fund dea90 exp_dea dea93 err;
proc plot;
plot exp_dea * dea93;
/ added 4/18/94'1
P Plot 93 Prediction vs 90 Actual
data fcas
merge scr90 forecast;
keep exp_dea dea90 dea93 fund;
title 'Regression Predictions';
tille2'Linear Model';
proc print data=frcst2;
var fund dea90 dea93 exp_dea err;
tillel 'Predicted 93 vs. Actual 1990';
tiUe2 'Non-Linear Model';
proc plot data=fcas
plot exp_dea * dea90;
P Linear transform to improve predictions
data linear;
set comp90;
Ldea = log(dea90);
LNAV = log(NAV);
it Cap—Gain = 0 then LCG = 0;
else LCG log(Cap_Gain);
LEA log(E_Ratio);
if Dividend = 0 then LDIV = 0;
else LDIV = LOG(Dividend);
P Transform forecast inputs to linear model
/
data trans93;
set comp93;
Ldea log(dea93);
LNAV = log(NAV);
if Cap—Gain = 0 then LCG = 0;
else LCG = log(Cap_Gain);
LER = log(E_Ratio);
if Dividend = 0 then LDIV = 0;
else LDIV = LOG(Dividend);
Title 'Linear Forecast Model';
title2 'Model Verification';
proc rag data = linear;
20
Case Study:
A Linked Linear Regression Data Envelopment Analysis Model
for the Long-Term Prediction of
Mutual Fund Performance
Cindy Chao, Greg Higgins
Michael McGee
Department of Computer Science and Engineering SOUTHERN METHODIST UNIVERSITY
Dallas, Texas
May 14, 1994
C
LINKED LINEAR REGRESSION -
DEA MODEL
Contents
Abstract.............................................................................................................................3
Introduction...................................................................................................................... 3
Whatis a Mutual Fund? .................................................................................................4
DEAFactors......................................................................................................................4
Model Development and Validation............................................................................. 6
Examinationof Results ...................................................................................................9
InvestmentPerformance............................................................................................... 10
Conclusion...................................................................................................................... 11
References....................................................................................................................... 12
ProfessionalReferences................................................................................................. 13
Appendix........................................................................................................................ 14
DEAFormulation .............................................................................................. 15
DEACode ........................................................................................................... 16
ForcastingCode ................................................................................................. 19
2
Case Study:
A Linked Linear Regression - Data Envelopment Analysis Model
for the Long-Term Prediction of Mutual Fund Performance
Michael McGee
Greg Higgins
Cindy Chao
Department of Computer Science and Engineering
Southern Methodist University
Dallas, Texas 75275
Data Envelopment Analysis (DEA) can be used to evaluate
managerial efficiency of mutual funds. By extending the DEA
methodology with linear regression, a new prediction tool for
future efficiency is developed. The linked model was tested
on a sample of fifty growth mutual funds over a three year
time horizon - resulting in accurate predictions of future
efficiency with 99% confidence.
he current mutual fund market
T contains nearly $1.4 trillion
dollars; this rapid growth of mutual
funds as a long-term investment
instrument has led to an increased
demand for evaluation and forecasting
techniques. The performance of a
mutual fund is directly linked to the
performance of the management staff.
Data Envelopment Analysis (DEA) can
be used to evaluate managerial
efficiency; however, this is only a
current assessment of a fund's
management. DEA is a linear
programming methodology that
computes relative efficiency scores
among a group of homogeneous
decision-making units (DMU5).
Combined with linear regression, DEA
becomes a prediction tool for longterm forecasting. The linked model not
only evaluates return on investment
but also the quality of the fund's
managerial team on a long-term basis.
'3
LINKED LINEAR REGRESSION - DEA MODEL
What is a Mutual Fund?
goal of the linked model is forecasting
A mutual fund is an investment
of long-term performance, growth
company that pools the money of
mutual funds were selected because
many institutional and individual
their objectives emphasize the
accumulation of capital gains, and they
investors into one professionally
have the greatest long-term growth
managed investment account. The
fund manager creates a portfolio for
potential.
the shareholders by buying a variety of
"true" financial instruments, such as
DEA Factors
The inputs and outputs of the
corporate stocks and bonds, and
government and municipal securities,
envelopment analysis were made up of
in accordance with the fund's
the main characteristics of a mutual
particular investment objectives. Fund
fund. They were selected because they
managers are professional investors
are important factors to investors and
are directly linked to the performance
who know the financial market and
follow it daily with extensive research
of the fund manager. The inputs
capabilities to effectively investigate all
selected were beta - a measure of risk,
investment opportunities. The fund is
expense ratio, and loads & other fees.
always monitored, with individual
The outputs used were net asset value
securities constantly being bought,
(NAV) per share, dividends, and
sold, and traded in order to optimize
capital gains.
the investment objectives. The beauty
Inputs
of investing in sometimes twenty
Beta is a measure of the fund
different securities lies in the protection portfolio's average volatility, or risk,
of the investment through portfolio
relative to the S&P 500. The beta value
diversification.
of the market is a constant equal to
Mutual funds are divided into six
1.00. A fund with a beta value greater
major categories according to their
than 1.00 is more volatile than the
market. It is expected to outperform
investment objectives, which are
outlined in the fund's prospectus, such
the "average fund" in a bull (rising)
as goals, level of risk, and the type and
market and bring in greater returns,
frequency of return desired. Since the
while it is expected to fall more than
4
LINKED LINEAR REGRESSION - DEA MODEL
Ll
brokerage costs, and transaction fees.
the average fund in times of a bear
(declining) market. A fund with a beta This value is a percentage of the
purchase price taken off either before
value less than 1.00 possesses the
characteristics of the converse the money is invested (front-end load)
situation. For example, a portfolio with or after investment when the
a beta value of 0.75 is expected to be
shareholder redeems the shares (backend load). The amount of the load is
25% less responsive to the general
economic trend than the average fund established by the managerial company,
and return 25% more conservative than but is paid to the sales and transaction
the average fund.
team. Some funds have a load of 0.00%,
Expense ratio is a measure which
meaning that they have the same
relates expenses to the average net
benefits as load funds, except they are
assets for a period. It is calculated by
sold directly by the fund company
rather than through a sales broker so
taking the annual expenses paid by the
that no loads or other fees are incurred.
fund (including management and
With no-load funds, the investor must
investment advisory fees, custodial
and transfer agency charges, legal fees,
seek out and contact the investment
and distribution costs) divided by the
firm directly. There is a distinct
advantage to investing in no-load
average shares outstanding for the
period. The greater the expense ratio,
funds. For example, if we had invested
the larger the percentage of the
$1000 in a particular portfolio with a
shareholder's investment gets paid out 7.50% front-end load, $75 would be
taken off the top, and we would be left
directly to the management company.
The expense ratio does not include
with only $925 worth of shares in the
loads or commissions paid for sales or
fund. Had this been a no-load fund, the
transaction fees.
full $1000 would have been utilized.
Loads and other fees represent all
Outputs
Net asset value per share represents
other charges made to the shareholder
how much each share is worth. It is
which are not accounted for in the
calculated by taking the total market
expense ratio. A load is a portion of the
value of a fund's shares (securities plus
offering price of the fund that pays for
cash plus accrued earnings minus
costs such as sales commissions,
LINKED LINEAR REGRESSION -
DEA MODEL
in different ways. Shareholders can
liabilities) divided by the number of
shares outstanding. No profits or losses reinvest dividends and capital
distributions if they elect to do so.
are realized through NAV until the
fund is sold. Interest lies in the amount
Model Development and Validation
of increase or decrease in NAV from
The decision—making units (DMUs)
time of purchase to time of sale.
in this analysis were the 50 growth
Dividends are payments declared
mutual funds. A prerequisite of DEA is
by a corporation's board of directors
and made to shareholders, usually on a homogeneous DMUs. By selecting the
funds from the same sector (growth
quarterly basis. A fund's dividends
funds), this requirement is met.
come directly from the dividends paid
The DEA model uses the dual
out by the individual securities within
the portfolio. They are paid out equally formulation of the linear program as
described by Charnes, Cooper and
among the shareholders on a
Rhodes (1978). The objective function
scheduled time frame.
of this form is to minimize the
Capital gains on a mutual fund are
objective value of the current DMU. If
the profits made from the sale of a
a DMU is efficient the optimal value is
capital asset (such as a security, stock,
1, while an inefficient DMU is driven
or bond) within the portfolio. The
towards zero. The first step in the
number used for capital gains is
adjusted for losses and stock splits. The development of the linked model is the
generation of DEA scores for the
sale of an asset occurs when the
mutual funds being studied. The data
manager decides that a security needs
collected for the mutual funds is used
to be sold for the benefit of the
portfolio. Capital gains are distributed as the inputs and outputs for the DEA
to shareholders when they are realized code which is documented in the
appendix. The DEA code is written
(when a security is sold and a profit is
using the SAS/OR package allowing
made) and are paid out as capital
the DEA scores to be fed directly into
distributions.
the regression phase of the model.
Dividends and capital gains are
DEA scores are generated for the
treated as separate outputs because
fifty mutual funds in the study using
they are incurred, paid out, and taxed
n
LINKED LINEAR REGRESSION -
.
S
S
S
the 1990 data collected from
Morningstar Mutual Funds. At this point
DEA scores are also generated for 1993
and stored for use in validating the
model's forecasts. The 1990 DEA scores
are then regressed against all of the six
DEA factors previously described.
This regression model is unique
from previous research in that the goal
of the regression is not to create an
efficiency score, but to forecast an
efficiency score for the future of a
particular DMtJ. Previous work by
Thanassoulis (1993) compared the DEA
methodology with linear regression as a
means of creating efficiency scores. Our
model links the two methods into one
prediction tool. The full factor model
that resulted appeared on the surface to
be a good predictor of efficiency (R2 =
0.8239). The full factor model included
three non-significant factors at the 5%
level of significance. Beta (p-value =
0.68), load (p-value = 0.65), and
dividend (p-value = 0.50) were all
eliminated from the model leaving only
expense ratio, net asset value, and
capital gains as significant factors. The
revised model remained significant
with an R2 = 0.8213.
As stated before, this model appears
to be a good predictor; however, linear
fl
fLAW
flED
Figure 1: Residual vs. Predicted DEA Score
regression models must be validated
through diagnostic tests. One
assumption in regression analysis
underpins the entire methodology errors are a randomly distributed
variable following a normal distribution
with mean equal to zero, N(0,(Y).
Regression analysis in SAS is based on a
linear relationship between the
dependent and independent variables.
These assumptions can be verified
through the examination of the
residuals, or errors, of the regression
model. A plot of the residuals versus the
predicted values, Figure 1, reveals a
distinct pattern indicating a non-linear
relationship between the independent
and dependent variables. This requires
the examination of univariate statistics
S
7
0
DEA MODEL
LINKED LINEAR REGRESSION - DEA MODEL
S
.
S
S
S
S
for each of the variables in the
regression model. A standard normal
quantile plot for each of the variables
allows the determination of the
distribution of each variable.
From the univariate procedures,
some of the variables demonstrated an
exponential distribution requiring a
log—linear transformation in order to
validate the regression model. Net
asset value, capital gains, dividends
and the actual DEA score follow
exponential distributions while beta,
load, and expense ratio are all normally
distributed. A second data set of the
inputs and outputs was created by
taking the natural log of each of the
exponentially distributed variables.
Once normality was confirmed
through univariate procedures, a new
full factor linear regression model was
fitted. The newly formed model
resulted in an R2 = 0.8163 and was
validated through the residual plot
shown in Figure 2. Using the SAS
regression report, the regression
equation was created using the
coefficients of the significant factors:
beta, expense ratio, net asset value, and
dividend.
The next step is the creation of
predicted efficiency scores. The
Figure 2: Residual vs. Predicted DEA Score—after linear transform
common method for forecast model
verification is the prediction of the
past. The linked model is verified by
using the input and output data for
1993 in the regression equation created
from the 1990 data. The predicted
values are then compared to the actual
1993 DEA scores generated earlier by
the DEA code. The forecast errors are
evaluated using a significance test on
the mean difference. The test is
conducted at the 1% level of
significance. H0: mean difference = 0;
Ha: mean difference # 0. The resulting
test statistic: z = -0.5164 and critical
value, z = 2.57 leads to the conclusion
that the prediction values are not
significantly different from the actual
DEA scores with 99% confidence.
8
0
LINKED LINEAR REGRESSION -
flEA 1990
1.00000
FTRNX
Fund
DEA MODEL
Predicted 1993
1.00000
flEA 1993
Error
0.62320
-0.37680
PRWAX
GINLX
1.00000
1.00000
0.78983
0.33207
0.29415
0.23691
-0.49568
-0.09516
ACRNX
1.00000
1.00000
1.00000
0.00000
FDETX
CFIMX
0.41944
0.73415
1.00000
1.00000
0.01100
0.01968
LOMCX
0.32965
0.98900
0.98032
0.68927
1.00000
10.31073
Table 1: Efficient Mutual Funds
Examination of Results
In analyzing the results of the
sample studied, it is important to
remember that a mutual fund is mainly
used as a long-term investment,
providing stable low-risk growth. A
potential shareholder wants to invest
in a fund that will be efficient in the
future, regardless of its current
efficiency. Therefore we focus our
examination on the set of funds with
efficient scores in either 1990 or 1993.
Table 1 shows the 1990 DEA, Predicted
1993, and actual 1993 DEA scores along
with the errors for each prediction
(Error = 1993 DEA - Predicted 1993).
Of the first four funds listed (all
receiving efficient ratings in 1990), the
future efficiency of three funds was
correctly predicted. Our linked model
predicted future behavior of currently
efficient funds with 75% accuracy.
Although the error for T. Rowe Price
New America Fund, PRWAX, is high,
we are not concerned with the
magnitude of inefficiency, only the
state of its efficiency.
Of the last four funds in the table
(all receiving efficient ratings in 1993),
three funds were predicted accurately.
The fifth and sixth funds, Fidelity
Destiny Portfolios: Destiny II, FDETX,
and Clipper Fund, CFIMX, both
received' a predicted score of
approximately 0.98 for 1993. Based on
the previous significance test, these
values are not significantly different
from 1.00. Thus our model predicts the
funds that will be efficient in the future
with 75% accuracy.
LINKED LINEAR REGRESSION - DEA MODEL
Table 2: Investment Simulation Results
Investment Performance
Since investors are concerned with
the long-term growth and appreciation
of their investments, DEA by itself is
not an effective tool to aid in the
development of investment strategies.
However, combined with linear
regression, DEA gives us the ability to
estimate the long-term efficiency of a
fund. To illustrate the effectiveness of
the linked model relative to DEA alone,
we simulated an investment in the
efficient funds in 1990 and the funds
predicted to be efficient by our linked
model. Since mutual funds allow the
purchase of fractional shares, we
invested $1000 in each of the mutual
funds. The results are given in Table 2.
The percent of appreciation we
received in the portfolio of 1990
efficient mutual funds was 46.92%,
while the appreciation was
significantly higher for the portfolio of
funds recommended by the linked
model at 60.41%.
10
LINKED LINEAR REGRESSION -
Q
Conclusion
The linked model allows us to
broaden the range of applications for
DEA. The linked model enables the
prediction of future efficiency, which is
of great benefit to investors because it
provides more insight into the longterm viability of a mutual fund. The
model also requires little historical data
for each DMIJ unlike other forecasting
methods. Another benefit is the relative
compactness of the model; very few
predictors are required to explain a
high degree of variability and achieve
accurate predictions. Data
envelopment analysis by itself is not an
accurate predictor of future efficiency
of mutual funds. Only one fund that
was efficient in 1990 remained in the
efficient set in 1993. Linear regression
and DEA combine to give us an
accurate prediction for the future
behavior of DMUs.
DEA MODEL
An immediate extension to this
research is the definition of an "ideal
type" mutual fund. An optimization
algorithm will be used to generate
values for the regression factors and
record the combinations which yield
and efficient fund prediction. Once this
benchmark is defined a growth mutual
fund prospectus can simply be visually
inspected to determine managerial
efficiency. The next phase of this
research would be to compare the
accuracy of the linked model with that
of other forecasting methods, such as
time series decomposition. Another
step in the continued development of
this area of research is to improve the
accuracy of the predictions through
further modifications to the regression
model using a weighted regression
model or the inclusion of adaptive
error control.
11
LINKED LINEAR REGRESSION - DEA MODEL
References
Barr, R. S., L. M. Seiford, and T. F. Siems (1991), "An Envelopment Analysis
Approach to Measuring the Management Quality of Banks," The Annals of
Operations Research. J.C. Baltzer AG, Science Publishers.
Dilorio, F. C. (1991), SAS Applications Programming: a Gentle Introduction. Boston:
PWS-KENT Publishing Company.
Flatt, L. G. (1994), "FUNDamentals: An Envelopment-Analysis Approach to
Measuring the Managerial Quality of Mutual Funds," technical report.
Dallas, Texas: Southern Methodist University, Department of Computer
Science and Engineering.
Morningstar Mutual Funds User's Guide (1992), Chicago: Morningstar, Inc.
Morningstar Mutual Funds (1993), Chicago: Morningstar, Inc.
Norman, M. and B. Stoker (1991), Data Envelopment Analysis: The Assessment of
Performance. New York: John Wiley & Sons.
SAS Institute Inc. (1989), SAS/OR User's Guide, Version 6, First Edition. Cary, NC:
SAS Institute Inc.
Sexton, T. R., S. Sleeper, and R. E. Taggart, Jr. (1994), "Improving Pupil
Transportation in North Carolina," Interfaces, January-February 1994.
Providence, RI: The Institute of Management Sciences.
Thanassoulis, E. (1993), "A Comparison of Regression Analysis and Data
Envelopment Analysis as Alternative Methods for Performance
Assessments," O.R. Journal of the Operations Research Society. Coventry:
Operational Research Society Ltd.
12
LINKED LINEAR REGRESSION -
[ii
DEA MODEL
Professional References
Barr, Richard S., Ph.D. Associate Professor, School of Engineering and Applied
Sciences. Southern Methodist University, Dallas, Texas.
Dula, Jose, Ph.D. Assistant Professor, School of Engineering and Applied
Sciences. Southern Methodist University, Dallas, Texas.
Guerra, Rudy, Ph.D. Assistant Professor, Department of Statistical Science.
Southern Methodist University, Dallas, Texas.
Welch, David. Senior Systems Analyst II, Bradfield Computer Center. Southern
Methodist University, Dallas, Texas.
13
LINKED LINEAR REGRESSION - DEA MODEL
Appendix
DEA Formulation
Mat hematicalformuation of the DEA model used in the Linked Model.
DEA Code
The DEA code is developed using Release 6.0 of the SAS System on an IBM
3090 mainframe. The code presented requires only two input files. One file
contains the input and output factor data while the second file contains a
listing of the mutualfund ticker symbols. The program is construct ed from
a series of SAS Macros which fully automate the formulation and solution
of the multiple linear programs required in Data Envelopment Analysis.
Forecasting Code
Developed as an extension to the DEA code, the forecasting model uses the
same computing platform and software. The program reads the same data
files as the DEA code as well as the DEA scores generated by the DEA
study, The program generates predicted efficiency scores and all necessary
statistical procedures for validating the model.
14
F]
LINKED LINEAR REGRESSION -
DEA MODEL
DEA Formulation
A
0
Inputs
=
-------- ] S
Ex
+
Slacks
=
=
outputs
A.
Mine
A7-A!^O
A7
e^!o
Mat heinatical formulation of DEA requires that the inputs and out puts for each DMU
are formated in a matrix, A The matrix is multiplied by a vector, 2,to construct the linear
programming constraints.
Minimize 0
Subject to:
Inputs Beta: xi'DMUi +xDMU2+... +xsoDMUso - xs(E 1)<-E 0
Load: xIDMUI+x'DMU2+...+x5oDMU50-x5I(&1):50O E_Patio: XIDMUI + X2 'DMU2 +... + x5a • DMU50 - Xsi (& 1)--^0 '0
Outputs
NAV: xDMU +X'DMU2+... +XDMU50-x51(00) 01
Cap_Gain: xDMUi + X2 • DMU2 + + Xo DMU5O - Xti (0 0) 0 * 1
Dividend: xiDMUi + X2 • DMU2 +... + x50 DMU5O X51 (e 0) 0 1
All Variables ^! 0
where 01$ current declslon-inaldng unit (OMU)
Linear program solved by SAS for each DM11.
15
fl
LINKED LINEAR REGRESSION -
S
DEA Code
Part 1: Initialization
MICHAEL MCGEE
SENIOR DESIGN GROUP '94
DEA ANALYZER
This section reads the data files into S
DEA MODEL
a SAS dataset. Using the SAS Data
Management Language datasets are
created for the right hand side vectors,
as well as the theta vectors. The two
datasets are perfectly matched - each
right hand side has exactly one
corresponding theta vector. There is one .
right hand side and one theta vector for
each DMtJ in the study. These vectors are
merged with the inputs and outputs to
formulate the linear program. The final
step in the initialization is the renaming
of variables to the proper ticker symbols
in order to aid in report interpretation.
FILENAME NOT MFNAME DATAA';
FILENAME DAT F9O DATA A';
OPTIONS LINESIZE72;
%LEI SYMBOLS
ACRNX LOMCX HRTVX STCSX BEONX GINIX PARNX VCAGX DEVLXAGBBX
FPPTX FDETX FDESX FCND( FBGRX FDVLX SRFCX FRGRX BIRWIX SAMVX
FDFFX FAGOX MLSVX ALTFX ANEFX GPAFX PRWAX CGRWI( OVOPX WTMGX
NWTX SEOCX BVALX BARANAFGPX FTRNXAMRGX DNLDX HCRPX NBSSX
PRGWI( INNDX TWHIX DW1X LMVTX JHNGX SRSPX CFIMX CLMBX BOSSX
DATA TICK;
WILE NOT;
ARMY TICKR(so)ss;
INPUT TICKR?);
rU
MUTUAL FUND TICKER SYMBOLS
•1
DATA ONE;
INFILE DAT;
ARRAY FUND (50)
INPUT _ID_ $
FUND (}
_TYPE_ $ _RHS_;
P GENERATE THETACOLUMNS
DATA NEW-COL;
SET ONE;
ARRAY THETA (SO);
ARRAY FUND(5O);
DOIITO5
IF _N_ = I THEN THETA(I) = I;
ELSE
IF _N_ >=2 & _N_ <=4 THEN THETA{I}= -FUND(I);
ELSE THETA(I) 0;
END
DROP I;
P GENERATE RIGHT HAND SIDE VECTORS
DATA RANGE;
ARRAY RH(50);
ARRAY FUND(50);
SET ONE;
DO I = ITO 5
IF _N_ I THEN RH(I} = MISSING;
ELSE
IF _N_>4 THEN RH(I)= FUND{I};
ELSE RH(I) 0;
END;
DROP MISSING I FUNDI-FUND5O;
DATA PARAM;
MERGE NEW-COL RANGE;
P RENAMES THE FUNDS TO THE PROPER NASDAQ TICKER SYMBOL /
PROC DATASETS;
MODIFY ONE;
RENAME FUNDI ACRNX
FUNO2 LOMCX
FUNDS HRT\
FUND4 = STCSX
FUNDS = BEONX
FUND6 = GINLX
FUND7= PARNX
FUND8 = VCAGA
FUNDS = DEVLC
FUNDIO= AGBBX
FUNDII= FPPT)(
FUNDI2= FDETX
S
iri
LINKED LINEAR REGRESSION -
DEA MODEL
FUNDI3= FDESX
FUNDl4 FCN1X
FUNDIS= FBGRX
FUNDI6= FDVLX
FUNDI7= SRFCX
FUND118 FRGRX
FUNDI9= BRWIX
FUND20= SAM'fX
FUND2I FDFFX
FUND22 FAGOX
RJND23= MLSVX
FUND24ALTFX
FUND25=ANEFX
FUND26= GPAFX
FUND27= PRWAX
FUND28 CGRY(
FUND29 QVOPX
FUND30= WTMGX
FUND3I= NYVTX
FUND32 SEcGX
FUND33 BVALX
FUND34= BARAX
FUND35=AFGPX
FUND36 FTRNX
FUND37 AMRGX
FUND38DNLDX
FUND39= HCRPX
FUND40=NBSSX
FUND4I= PRGWX
FUND42 INNDX
FUND431WHD(
FUND44 DWIvX
FUND45= LMVD(
FUND46= JHNGX
FUND47= SRSPX
FUND48= CFX
FUND49= CLMB)(
FUNDSO. BOSSX;
Part 2: DEA Macro
'NAME: DEA
DESC: AUTOMATES THE CREATION OF PROBLEM DATASET.
MERGES THE CURRENT THETAAND RHS INTOTHE
COEFFICIENT MATRIXAND EXECUTESTHE LP.
PAPM:NDMU-THEINDEXNUMBEROFTHEDMUTOBEANALYZED
This macro accepts one parameter:
the index of the DM1.1 to evaluate. The
macro merges the input/output dataset
with the appropriate theta and right
hand side vectors. This dataset now
contains the LP formulation for SAS to
analyze. The procedure generates
detailed printed reports for each of the
Dtv[(Js.
•1
%MACRODEA(NDMU);
DATA _NULL_;
ARMY TICKR(So)ss;
SErTK;
CALL SYMPUT ('FUNDX',PUr(TICKR(&NDMU},SS.));
RUN;
DATA CURU;
ARRAY THETA{50};
ARRAYRH(5O);
h&NDMU;
SET PARM;
CTHEIA=A TH ErA{I};
_RHS_ = RH{I};
KEEP CTHETA_RHS_;
DATA PROBLEM;
MERGE ONE CU R_DMU;
TITLEI 'DATA ENVELOPMENT ANALYSIS';
TITLE2 CURRENT OMU= &FUNDX';
/ EXECUTETHE LP '/
PROC LP DATA=PROBLEM POUT &FUNDX;
%MEND DEA;
17
LINKED LINEAR REGRESSION - DEA MODEL
Part 3: Loop Macro
In order to iterate the execution of a
SAS procedure without repetitive
statements a macro was written to loop
around the calls to the DEA macro. The
loop macro accepts one parameter: the
number of DMUs in study.
-
P
NAME LOOP
DESC: ITERATIVELY CALLS THE DEA MACRO
PAPM: LIMIT • NUMBER OFDMU5TOBEPROCESSED
%WCROLOOLIM
%DO INDEX I%TO8LAIIT;
%DEA(AJNDEX);
%EN
%MENDLOO
Part 4: Extract Macro
In addition to the written reports
generated by the DEA macro, datasets
were created in memory containing the
results of the LPs. This macro combines
the reports for each of the DMUs and
extracts the DEA scores into a dataset
which is then fed to the forecasting
-1 1
model.
"NAME EXTRACT
DESC: EXTRACTS THE OBJECTIVE VALUES FROM THE
INDIVIDUAL FUND REPORTSANDCREATESA
DATASET CONTAINING THE INFORMATION
-/.MACRO EXTRACT;
P MERGE LP OUTPUT DATASETSAND REMOVE UNIMPORTANT VARIABLES 'I
DATA RESI;
SET &SYMBOLS;
KEEP VARVALUE;
P EXTRACT OBJECTFIE FUNCTION VALUES /
DATARES2;
SET RESI;
IF -VAR­ OBJECT'THEN SCORE =_VALUE_;
ELSE DELETE;
KEEP SCORE;
P GENERATE LISTING OF MUTUAL FUNDS/
DATA RES3;
SET TICK;
ARRAY TICKR{SO} $5;
ITO 501, PCHANGE THIS TO 50 IN PRODUCTION RUN 'I
DO
NAME TICKR{l);
OUTPUT;
END;
KEEP NAME;
P MERGE DATASET INTO FINAL FORM •/
DATA RESULTS;
MERGE RES3 RES2;
PROC PRINT DATMRESULTS;
%MEND EXTRACT;
P
EXECUTABLE CODE
%LOOP(50);
%EXTRACT;
18
LINKED LINEAR REGRESSION - DEA MODEL
Forcasting Code
1 Mutual Fund Data Files 'I
filename datl 'm1s93 data a';
filename daf2 'mfname data a';
filename dat5 'mfs go data a';
proc reg;
tle2 'Minimum Spec Model';
model dea93 = E_Ratio NAV CAP_GAIN;
1 DEA Score Data Files 1
filename dat3 'score93 data a';
filename dat4 'score90 data a';
1* 1990 Data Analysis /
option linesize = 72 nonumber nodate;
data info9o;
intile dat5;
array val(50);
input name $ val{};
/* Read 1993 Mutual Fund Data •/
data one;
del
I;
infile
array val(50);
input name $val{};
proc transpose;
run;
proc transpose; / Transpose Matrix: from IODMU to DMU*IO *1
run;
J*
Transpose Matrix: from IODMU to DMUIO
proc datasets; /* Assign proper names to lOs 1
modify data2;
rename coil = Beta
co12 = Load
co13 = E_Ratio
col4=NAV
col5 Cap—Gain
co16 = Dividend
_Name— = Fund;
proc datasets; P Assign proper names to lOs 1
modify data 1;
rename coil = Beta
co12 = Load
co13 = E_Ratio
col4=NAV
co15 = Cap—Gain
co16 = Dividend
_Name— = Fund;
data scr90;
infile dat4;
input dea9o;
data funds;
P Create List of Fund Names 1
infile dat2;
input Fund $;
data comp9o;
merge data2 funds scr9o;
data scr93; / Read 1993 DEA scores from file 1
infile dat3;
input dea93;
title 1 '1990 DEA Analysis';
proc reg dala=comp9o;
tifle2 'All Possible Regressions';
model dea90= Beta Load E_Ratio NAV Cap —Gain Dividend
/selection = rsquare;
titlel '1993 DEAAnalysis';
data comp93;
set datal;
merge data 1 funds scr93;
proc reg data=comp90;
tie2 'Full Model';
model dea9o= Beta Load E_Ratio NAV Cap —Gain Dividend;
1 Run All possible regression models 1
proc reg data=comp90;
title2 'Minimum Spec';
model dea9o= E_Ratio NAV CAP_GAIN;
plot residual.(predicted. E_Ratio NAV Cap—gain);
proc reg;
title2 'All Possible Regression Models';
model dea93=Beta Load E_Ratio NAV Cap_Gain Dividend/
selection=Rsquare;
1 Full Fit Model 1
proc reg;
title2'Full Model';
model dea93=Beta Load E_Ratio NAV Cap—Gain Dividend;
P Forecasting Validation Model /
P Create predicted values using forecast model
compare to actual 1993 values.
19
I
LINKED LINEAR REGRESSION - DEA MODEL
data predict;
set data 11;
Exp_dea=-0.1 77220E_Rao + 0.016477NAV +
0. 152344*Cap_Gain+0.279313;
model Ldea = Beta Load E_Ratio LNAV LCG LDiv;
plot residual.(predicted.);
proc reg data = linear;
model Idea = Beta E_Rao LNAV LDIV;
if Exp_dea> 1 then Exp_dea = 1;
data forecast;
merge predict funds scr93 scr9o;
Err = dea93 - exp_dea;
keep fund dea90 exp_dea dea93 err;
S
I' Linear Regression Equation 1
data predict2;
set trans93;
PDEA = 0.687459Beta - 0.525658E —Ratio + 0.753563LNAV
+ 0.229900LD1V -2.930810;
Exp_dea = exp(PDEA);
if Exp_dea> 1 then Exp_dea = 1;
title 'Regression Predictions';
title2'Non-Linear Model';
proc print data=forecast;
var fund dea90 dea93 exp_dea err;
1 Compute Forecast Errors 1
data frcs;
merge predict2 funds scr93 scr9o;
Err dea93 - exp_dea;
keep fund dea90 exp_dea dea93 err;
proc plot;
plot exp_dea dea93;
1 added 4/18/94/
1 Plot 93 Prediction vs 90 Actual 1
data fcast
merge scr90 forecast;
keep exp_dea dea90 dea93 fund;
title Regression Predictions';
title2'Linear Model';
proc print data=frcst2;
var fund dea90 dea93 exp_dea err;
title 1 'Predicted 93 vs. Actual 1990';
tifle2 'Non-Linear Model';
proc plot data=fcast
plot exp_dea * dea90;
/* Linear transform to improve predictions 1
data linear;
set comp90;
Ldea = log(deago);
LNAV = log(NAV);
if Cap—Gain = 0 then LCG = 0;
else LCG = log(Cap_Gain);
LEA = log(E_Ratio);
if Dividend = 0 then LDIV = 0;
else LDIV = LOG(Dividend);
/ Transform forecast inputs to linear model */
data trans93;
set comp93;
Ldea = log(dea93);
LNAV log(NAV);
if Cap—Gain = 0 then LCG = 0;
else LCG = Iog(Cap_Gain);
LEA = log(E_Aatio);
if Dividend = 0 then LDIV = 0;
else LDIV LOG(Dividend);
Title 'Linear Forecast Model';
title2 'Model Verification';
proc reg data = linear;
20